Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
ISOM2500 BUSINESS STATISTICS
Module 10b: Curved
Patterns
Contents
¾Dealing with Curved Patterns
2
Anscombe’s Quartet
3
Different
Data
Values
Identical
Statistical
Measures!
4Different Data, But Identical Fits
• Importance of looking at a
set of data graphically
before starting to analyze
according to a particular
type of relationship
• Inadequacy of basic
statistical properties for
describing realistic datasets
The same
LS equation
is fitted!
Procedure in Applying SRM (Recall)
1. Produce a scatterplot of y on x
• check for linearity
2. Find the LS equation, and obtain residuals
3. Test the slope parameter
4. Perform regression diagnostics
5. Interpret the LS equation
• intercept and slope estimates, R2
6. Make prediction of new observations
5
What if x and y are NOT linearly related?
1 carat
Detecting Nonlinear Patterns
¾Sometimes, two variables may NOT be linearly related
• Change in X of a given size does not come with a change of similar magnitude
in the response, for different value of X
¾Consider the “diamond” example:
• An increase of 1/10 of a carat increases the estimated cost by $270, regardless
of the size of the diamond, based on diamonds less than ½ carat
• The difference in average costs between 1.0-carat diamonds and 1.1-carat
diamonds is probably more than $270 (as diamonds get larger,
they become scarcer, and increments in size/weight command
ever-larger increments in cost!)
• When we have diamonds of larger weight (e.g., >1 carat), the linear
relationship between weight and cost may NOT hold true anymore
6
Example 1: Curved But Not Linear Pattern
¾The following scatterplot graphs mileage (in miles per gallon) versus weight
(in thousands of pounds) for 303 gasoline powered, non-hybrid passenger
vehicles, with an orange fitted line,
both showing a negative association
between weight and MPG
• R2 = 70%
• SD of residuals (RMSE) = 2.9MPG
¾The violet downward bending curve seems
to better describe the relationship!
7
Lack of Fit From Residual Plots by the LS Line
¾A comparison between the original residual plot (left below) with those by
the scrambled ones:
• A visual distinction implies that there is a pattern in the residuals
• A straight line is inadequate to summarize the association in the data
8
Transformation
¾Transformations allow us to use regression analysis
to find an equation that describes a curved pattern
• Find a transformation of X or Y (or both) so that
association between the transformed variables
is linear (even though the association between
the original variables is not)
¾Two useful nonlinear transformations:
• Reciprocals (convert observed data d into 1/d)
ɊData are already in form of a ratio
• Logarithms (convert observed data d into log d)
ɊData are meaningful on a percentage scale
9
Tukey’s Bulging Rule
Tukey’s bulging rule suggests when to use logs, reciprocals, and
squares to convert a bending pattern into a linear pattern
¾Match the pattern in scatterplot of Y on X to one of the shapes
¾e.g., the violet downward bending curve
in the scatterplot in Slide 7 resembles the
lower left corner. We can try:
• Reciprocal of MPG (reciprocal of a ratio
variable is easily interpreted)
• Log of MPG
• Log of weight
10
Example 2: Reciprocal Transformation
¾Taking the reciprocal of MPG and scaling it:
Rec-MPG = 100/MPG
¾A positive linear relationship is revealed in the next
scatterplot
• Association between fuel consumption (in gallons
per hundred miles) and weight is +ve
¾A fitted line is given by
• R2 = 0.713
• SD of residuals = 0.667 (in gallons per hundred miles)
11
Interpretation of the Fitted Line
¾The fitted line is given by
¾After transformation, both the intercept and slope change drastically
(previously were 43.3 and -5.19)
¾The slope 1.2 estimates the amount of gasoline needed to drive 100 miles
grows by 1.2 gallons on average for each additional 1,000 pounds in weight
¾The intercept -0.11 is meaningless in this context as it speaks of weightless
cars
• One may interpret the intercept as the fuel burned regardless of the
weight, for example, in air conditioning and sound systems
• But, in this case, the value remains an extrapolation!
12
Outliers in Residual Plot
¾ In general, no curved pattern shown in the
residual plot except that some relatively
large positive residuals are observed
• 2 cars with average weights (i.e., more
fuel consumption than the expected
amount of their weights)
• 2 sports cars, including an Audi R8 and
an Aston Martin
Ɋ Audi R8 has the largest +ve residual as
2.6 (i.e., it needs 2.6 more gallons to
go 100 miles than expected for cars of
its weight)
13
Comparing Linear and Nonlinear Equation
¾Tempting to choose an equation with a larger
R-squared
• BUT it works only when both
equations/models look at the same
response based on the same observations
¾Visualizing from a scatterplot, one would
believe that the “reciprocal model” (green)
produces a much more reasonable
estimated mileage
• MPG does not continue to decline at the
same rate as vehicles become heavier
14
Different Effects/Rates of Change
¾The linear equation (orange): a fixed drop of 5.19 miles travelled per gallon
of gasoline on average per 1,000 pound-increase in weight of a car
• The effect of change in weight on mileage is identical for light or heavy cars
¾The nonlinear equation (i.e., the reciprocal model)
yields the estimated MPG as
¾Differences in weight matter more for small/light cars (see next slide)
• Reduction on mileage of an increase of the same amount in weight for light
cars is more substantial than for heavy cars
15
Weight
(in lb)
Estimated
MPG
Difference With the
“next” Heavier Car
2,000 43.7 15.0
3,000 28.7 7.4
4,000 21.3 4.3
5,000 17 -
16
This diminishing reduction effect of increase
in weight on mileage makes more sense
than a constant reduction
Difference in Estimated MPGs Between 2
Light Cars and 2 Heavy Cars
Example 3: Transforming The Predictor
¾Let’s consider how advertising is
associated with the sales of grocery
items
¾Predictor: Number of pages showing
grocery items in the advertising circular
given away in local newspaper and
online
¾The scatterplot of 64 pairs of
observations shows a curved pattern
(green) matching with the top left
corner
17
Reciprocal of The Predictor
¾Without transforming the predictor, Advertising Pages, a linear fit (orange) is
• An additional page corresponds to an estimated increase in mean daily
sales by $246 irrespective of # of pages
¾Using the reciprocal of the predictor:
¾R2 (orange) = 0.60 VS R2 (green) = 0.87 :
• The nonlinear fit is much better!
18
Diminishing Marginal Return
¾Diminishing margin return by number of advertising pages:
• An increase of 1 page from 1 to 2 pages:
Ɋ increase in estimated sales as $680
• An increase of 1 page from 4 to 5 pages:
Ɋ increase in estimated sales as $68