Estimation of Regression Function
Estimation of Regression Function
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Regression Modelling
Estimation of Regression Function
(Ch 1.6)
Estimated Regression Function
Regression Model: Y = β0 + β1X + ε.
Regression Function: E (Y ) = β0 + β1X .
Use least squares estimation to estimate β0 and β1.
Estimated regression function:
Yˆ = b0 + b1X ,
where
b1 =
∑n
i=1(Xi − X¯ )(Yi − Y¯ )∑n
i=1(Xi − X¯ )2
= SxySxx
, b0 = Y¯ − b1X¯ ,
and we call Yˆ the value of the estimated regression function at the
level X of the predictor variable.
Estimated Regression Function
• We call the value of Y a response and E (Y ) the mean
response. So Yˆ is a point estimator of the mean response
E (Y ) at the level X of the predictor variable.
• It can be shown that Yˆ is an unbiased point estimator of the
mean response E (Y ).
• Given a new level X , we can use Yˆ as a (point) prediction of
the response.
• For the i-th observation in the data, Yi is the observed value,
and we call
Yˆi = b0 + b1Xi ,
the fitted value for the i-th observation.
Residuals
• The i-th residual is the difference between the observed value
Yi and the corresponding fitted value Yˆi , denoted as ei :
ei = Yi − Yˆi .
• For our model, the i-th residual becomes
ei = Yi − (b0 + b1Xi).
Residuals
• Do not confuse
• εi = Yi − E (Yi) "Model error"
• ei = Yi − Yˆi "Residual"
• εi : deviation from the unknown true regression line,
unobservable.
• ei : deviation from the estimated regression line, known.
• Residuals are highly useful for studying whether a given
regression model is appropriate for the data at hand.
Residuals
2 4 6 8 10
5
10
15
X
Y
Residuals
2 4 6 8 10
5
10
15
X
Y
Toluca Company Example
The Toluca Company manufactures refrigeration equipment as well
as many replacement parts. In the past, one of the replacement
parts has been produced periodically in lots of varying sizes. When a
cost improvement program was undertaken, company officials
wished to determine the optimum lot size for producing this part.
The production of this part involves setting up the production
process (which must be done no matter what is the lot size) and
machining and assembly operations. One key input for the model to
ascertain the optimum lot size was the relationship between lot size
and labor hours required to produce the lot. To determine this
relationship, data on lot size and work hours for 25 recent
production runs were utilized.
• Page 19, Chapter 1 of the textbook.
Load Data
• Use dataset from the R package “ALSM”.
# install.packages("ALSM")
library("ALSM")
mydata <- TolucaCompany
dim(mydata)
[1] 25 2
head(mydata, 2)
x y
1 80 399
2 30 121
X <- mydata[,1] # or X <- mydata$x
Y <- mydata[,2] # or Y <- mydata$y
• X = “Lot Size” and Y = “Work Hours”
Load Data
• Or download “Kutner Textbook Datasets” from Wattle, file
named “CH01TA01.txt”.
mydata <- read.table("CH01TA01.txt")
head(mydata, 2)
V1 V2
1 80 399
2 30 121
names(mydata) <- c("x","y")
head(mydata, 2)
x y
1 80 399
2 30 121
Summary Statistics
summary(mydata)
x y
Min. : 20 Min. :113.0
1st Qu.: 50 1st Qu.:224.0
Median : 70 Median :342.0
Mean : 70 Mean :312.3
3rd Qu.: 90 3rd Qu.:389.0
Max. :120 Max. :546.0
Summary Statistics
boxplot(mydata) # or boxplot(X); boxplot(Y)
x y
0
10
0
20
0
30
0
40
0
50
0
Scatter Plot
plot(X, Y, pch = 16, xlab = "Lot size", ylab = "Work hours",
main = "Toluca Company")
20 40 60 80 100 120
10
0
20
0
30
0
40
0
50
0
Toluca Company
Lot size
W
o
rk
h
ou
rs
Fitting Model Manually
• Recall we have
b1 =
∑n
i=1(Xi − X¯ )(Yi − Y¯ )∑n
i=1(Xi − X¯ )2
b0 = Y¯ − b1X¯
Xbar <- mean(X)
Ybar <- mean(Y)
Xcenter <- X - Xbar
Ycenter <- Y - Ybar
Sxy <- sum(Xcenter*Ycenter) # or sum(X*Y)-length(X)*mean(X)*mean(Y)
Sxx<- sum(Xcenterˆ2) # or sum(Xˆ2)-length(X)*mean(X)*mean(X)
Sxy
[1] 70690
Sxx
[1] 19800
Fitting Model Manually
b1 <- Sxy/Sxx
b1
[1] 3.570202
b0 <- Ybar - b1*Xbar
b0
[1] 62.36586
Fitting Model Manually
• Another way to calculate b1,
b1 =
∑n
i=1(Xi − X¯ )(Yi − Y¯ )∑n
i=1(Xi − X¯ )2
= rxy sy sxs2x
= rxy sysx
,
where rxy is the sample correlation between X and Y , sy and
sx are the standard deviations of X and Y .
Fitting Model Manually
b1 <- cov(X, Y)/var(X)
b1
[1] 3.570202
# or
b1 <- cor(X, Y)*sd(Y)/sd(X)
b1
[1] 3.570202
Fitting Model with “lm” Function
mymodel <- lm(Y ~ X)
class(mymodel)
[1] "lm"
# or
# mymodel <- lm(y ~ x, data = mydata)
View(mymodel)
Fitting Model with “lm” Function
mymodel$coefficients # or use coef(mymodel)
(Intercept) X
62.365859 3.570202
Fitting Model with “lm” Function
summary(mymodel)
Call:
lm(formula = Y ~ X)
Residuals:
Min 1Q Median 3Q Max
-83.876 -34.088 -5.982 38.826 103.528
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 62.366 26.177 2.382 0.0259 *
X 3.570 0.347 10.290 4.45e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 48.82 on 23 degrees of freedom
Multiple R-squared: 0.8215, Adjusted R-squared: 0.8138
F-statistic: 105.9 on 1 and 23 DF, p-value: 4.449e-10
Estimated Regression Line
• Yˆ = 62.366+ 3.570X
plot(X, Y, pch = 16)
abline(mymodel, col = "purple", lty = 2, lwd = 2)
20 40 60 80 100 120
10
0
20
0
30
0
40
0
50
0
X
Y
Fitted Values
Yhat <- b0 + b1*X # manually compute
Yfit <- mymodel$fitted.values # get it from model output
# or Yfit <- fitted(mymodel)
Yhat
[1] 347.9820 169.4719 240.8760 383.6840 312.2800 276.5780 490.7901 347.9820
[9] 419.3861 240.8760 205.1739 312.2800 383.6840 133.7699 455.0881 419.3861
[17] 169.4719 240.8760 383.6840 455.0881 169.4719 383.6840 205.1739 347.9820
[25] 312.2800
Yfit
1 2 3 4 5 6 7 8
347.9820 169.4719 240.8760 383.6840 312.2800 276.5780 490.7901 347.9820
9 10 11 12 13 14 15 16
419.3861 240.8760 205.1739 312.2800 383.6840 133.7699 455.0881 419.3861
17 18 19 20 21 22 23 24
169.4719 240.8760 383.6840 455.0881 169.4719 383.6840 205.1739 347.9820
25
312.2800
Predict Y for a New Observation
# Predict Y as the new level of X = 85
Xnew <- 85
Ypredict <- b0 + b1*Xnew
# or Ypredict <- mymodel$coefficients[1] + mymodel$coefficients[2]*Xnew
Ypredict
[1] 365.833
# Or use matrix multiplication
Xnew <- c(1, 85)
Ypredict <- Xnew %*% mymodel$coefficients
Ypredict
[,1]
[1,] 365.833
Residuals
Res <- Y - Yhat # manually compute
Res
[1] 51.0179798 -48.4719192 -19.8759596 -7.6840404 48.7200000 -52.5779798
[7] 55.2098990 4.0179798 -66.3860606 -83.8759596 -45.1739394 -60.2800000
[13] 5.3159596 -20.7698990 -20.0880808 0.6139394 42.5280808 27.1240404
[19] -6.6840404 -34.0880808 103.5280808 84.3159596 38.8260606 -5.9820202
[25] 10.7200000
Res <- mymodel$residuals # get it from model output
# or Res <- residuals(mymodel)
Res
1 2 3 4 5 6
51.0179798 -48.4719192 -19.8759596 -7.6840404 48.7200000 -52.5779798
7 8 9 10 11 12
55.2098990 4.0179798 -66.3860606 -83.8759596 -45.1739394 -60.2800000
13 14 15 16 17 18
5.3159596 -20.7698990 -20.0880808 0.6139394 42.5280808 27.1240404
19 20 21 22 23 24
-6.6840404 -34.0880808 103.5280808 84.3159596 38.8260606 -5.9820202
25
10.7200000
Properties of Fitted Regression Line
(Ch 1.6)
Properties of Fitted Regression Line
1. The sum of residuals is zero: ∑ni=1 ei = 0.
Follows from one of the normal equation.
sum(Res)
[1] 1.29674e-13
Properties of Fitted Regression Line
2. The sum of squared residuals is a minimum: ∑ni=1 e2i .
Properties of Fitted Regression Line
3. The sum of the observed values Yi equals the sum of the fitted
values Yˆi :
n∑
i=1
Yi =
n∑
i=1
Yˆi .
It follows that the mean of the fitted values Yˆi is the same as
the mean of the observed values Yi , namely, Y¯ .
Properties of Fitted Regression Line
4. The sum of the weighted residuals is zero when the residual in
the ith trial is weighted by the level of the predictor variable in
the ith trial:
n∑
i=1
Xiei = 0.
Follows from the second normal equation.
sum(X*Res)
[1] 5.570655e-12
Properties of Fitted Regression Line
5. The sum of the weighted residuals is zero when the residual in
the ith trial is weighted by the fitted value of the response
variable for the ith trial:
n∑
i=1
Yˆiei = 0.
sum(Yhat*Res)
[1] 4.638423e-11
Properties of Fitted Regression Line
6. The fitted regression line always goes through the point (X¯ , Y¯ ).
Estimation of Error Terms Variance
σ2 (Ch 1.7)
Estimation of Error Terms Variance σ2
• The variance σ2 of the error terms εi needs to be estimated as
it measures the variability of the probability distribution of Y .
• In addition, many statistical inferences of the regression model
and getting a prediction interval of Y require an estimator of
σ2.
Estimation of Error Terms Variance σ2
Recall: Estimate σ2 for a single population
s2 =
∑n
i=1(Yi − Y¯ )2
n − 1 .
• ∑ni=1(Yi − Y¯ )2 is called a sum of squares, where Yi − Y¯ is
the deviation of Yi from the estimated mean Y¯ .
• The degrees of freedom (DF) is n − 1 since we lost one
degree of freedom by using Y¯ as an estimator of the unknown
population mean.
• The estimator s2 can be regarded as a mean square (sum of
squares/DF).
• It is unbiased, E (s2) = σ2.
Estimation of Error Terms Variance σ2
For the regression model, estimating σ2 (i.e., Var(Y )), we also need
to calculate a sum of squared deviations, but the deviation of an
observation Yi must be calculated around its own estimated mean
Yˆi , i.e., ei = Yi − Yˆi .
• The appropriate sum of squares, denoted by SSE, is
SSE =
n∑
i=1
e2i =
n∑
i=1
(Yi − Yˆi)2.
• SSE stands for error sum of squares or residual sum of
squares. It has n − 2 degrees of freedom as two degrees of
freedom are lost because we need to estimate two parameters
to obtain Yˆi .
Estimation of Error Terms Variance σ2
Our estimator of σ2 is a mean square, i.e.,
s2e = MSE =
SSE
DF =
∑n
i=1(Yi − Yˆi)2
n − 2 .
• MSE stands for error mean square or residual mean square.
• An estimator of the standard deviation σ is se =
√
MSE .
• It can be shown that MSE is an unbiased point estimator of
σ2, i.e.,
E (MSE ) = σ2.
Estimation of Error Terms Variance σ2
SSE <- sum(Resˆ2)
SSE
[1] 54825.46
n <- length(Y) # or n <- dim(mydata)[1]
MSE <- SSE/(n-2)
MSE
[1] 2383.716
# Estimator of the standard deviation sigma
sigma_hat = sqrt(MSE)
sigma_hat
[1] 48.82331
# This is also called "residual standard error"
Estimation of Error Terms Variance σ2
summary(mymodel)
Call:
lm(formula = Y ~ X)
Residuals:
Min 1Q Median 3Q Max
-83.876 -34.088 -5.982 38.826 103.528
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 62.366 26.177 2.382 0.0259 *
X 3.570 0.347 10.290 4.45e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 48.82 on 23 degrees of freedom
Multiple R-squared: 0.8215, Adjusted R-squared: 0.8138
F-statistic: 105.9 on 1 and 23 DF, p-value: 4.449e-10
Normal Error Regression Model (Ch
1.8)
Normal Error Regression Model
The normal error regression model is
Yi = β0 + β1Xi + εi ,
where the error εi ∼iid N (0, σ2).
• This model implies that the Yi ∼ N (β0 + β1Xi , σ2)
independently.
• Throughout this course unless otherwise stated, we assume the
normal error regression model is applicable. This assumption
simplifies theory and helps us setting up the procedures of
statistical inferences.
No matter what form of the distribution of εi , the least squares
method provides unbiased estimators of β0 and β1 that have
minimum variance among all unbiased linear estimators.
• Read Ch 1.6-1.8 of the textbook.