ECMT1020 Introduction to Econometrics
Introduction to Econometrics
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
ECMT1020 Introduction to Econometrics
Lecture 3: Simple Regression Analysis
Please read Chapter 1 of the textbook.
Contents
1 The simple linear regression model 1
2 The fitted regression model 2
2.1 Criteria to fit the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Ordinary least squares (OLS) regression . . . . . . . . . . . . . . . . . . . . . 3
2.3 Two algebraic results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 The goodness of fit: R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Interpretation of a regression equation 9
3.1 Changes in the units of measurement . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Demeaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4 Exercises 13
1 The simple linear regression model
Let X and Y be two random variables. We hypothesize the relationship between Y and X
is of the form
Y = β1 + β2X + u (1)
where
• Y is called the dependent variable or regressand ;
• X is called the independent variable, explanatory variable, or regressor ;
• β1 and β2 are fixed numbers which are unknown;
• u is called the disturbance term or error term, and it is also a random variable. The
reasons why a disturbance term exists include:
- omission of other explanatory variables;
- aggregation of variables;
- model misspecification;
- functional misspecification;
- measurement error.
1
The hypothesized mathematical relationship between Y and X given in (1) is known as the
regression model, and β1 and β2 are regarded as the (unknown) parameters of the regression
model.
Note that in the regression model, Y and X are observable, while the disturbance term
u is not observable. Assume that we collect a random sample of n observations for both
random variables X and Y . We denote our observations as
a sample of X : X1, X2, . . . , Xn,
a sample of Y : Y1, Y2, . . . , Yn.
Then the regression model (1) written in terms of (pre-)sampled random variables is
Yi = β1 + β2Xi + ui, i = 1, . . . , n. (2)
In fact, more assumptions on the distributional properties of the random variables in the
regression model (1) or (2) are necessary to make sure that the parameters β1 and β2 (i)
exist, (ii) can be uniquely identified, and (iii) have meaningful interpretations. The discussion
around this will come later in the next lecture.
2 The fitted regression model
Given n observations of Y and X, a researcher is then asked to fit the relationship between
Y and X specified in a regression model (1) or (2). Practically, this means we need to
‘estimate’ the values of the unknown parameters β1 and β2 using our data. Suppose we
(based on certain rules which will be made clear soon) decide that we will
• use some number b1 as an estimate for β1, and
• use some number b2 as an estimate for β2.
Then the fitted regression model is written as
Yˆ = b1 + b2X or Yˆi = b1 + b2Xi. (3)
Note the variables with or without a “hat” are completely different!! You need to be very
careful to follow the notations and understand when or when not to put a hat on a variable.
Please compare the ‘fitted model’ (3) with the ‘true model’ in (1) or (2), and notice
1. The difference between Yi and the fitted value Yˆi:
Yi = β1 + β2Xi + ui
Yˆi = b1 + b2Xi
2. The difference between the disturbance term ui and the so-called residual uˆi := Yi− Yˆi:
ui = Yi − β1 − β2Xi
uˆi = Yi − b1 − b2Xi
2
3. The difference between the two decompositions of the dependent variable:
theoretical: Yi = β1 + β2Xi + ui
operational: Yi = b1 + b2Xi︸ ︷︷ ︸
Yˆi
+uˆi
2.1 Criteria to fit the model
A natural question you may now ask is: how to find b1 and b2? Figure 1 gives a visual
illustration of how the realized values of a sample may suggest a “fitted line” where
• b1 is the intercept of the line, and
• b2 is the slope of the line.
Given such a fitted line, the realized residual uˆi is clearly the deviation of the realized Yi
from the value on the line corresponding to the realized value of Xi.
Figure 1: A fitted line (Figure 1.2 in the textbook)
The next smart questions is: what values of b1 and b2 give the “best” fitted line? Well,
to answer this question, we need to first devise a criterion by which we can judge how good a
potential fitted line is. In general, we may want the overall distance between the observations
of Y and the fitted line to be as small as possible. What quantity should we use to measure
such overall distance?
• Does it make sense to look at the sum of residuals, i.e.,
∑n
i=1 uˆi?
• How about the sum of the absolute values of residuals, i.e.,
∑n
i=1 |uˆi|?
• How about the sum of the squared residuals, i.e.,
∑n
i=1 uˆ
2
i ?
2.2 Ordinary least squares (OLS) regression
The most popularly used criterion to determine the best fitted line is known as the ordinary
least squares (OLS) criterion, which requires the choice of b1 and b2 to minimize the RSS ,
3
where RSS is the abbreviation of residual sum of squares (or, equivalently, sum of squared
residuals):
RSS =
n∑
i=1
uˆ2i .
The regression analysis using the OLS criterion to estimate the unknown parameters in the
regression model is known as the OLS regression.
In the following, we show how to estimate the unknown parameters β1 and β2 based
on the OLS criterion, or, in other words, how to obtain the OLS estimators of β1 and β2,
denoted as βˆ1 and βˆ2.
1
First, note that given the observations Yi and Xi and some tentative choices b1 and b2,
RSS =
n∑
i=1
uˆ2i =
n∑
i=1
(Yi − b1 − b2Xi)2
=
n∑
i=1
(Y 2i + b
2
1 + b
2
2X
2
i − 2b1Yi − 2b2XiYi + 2b1b2Xi)
=
n∑
i=1
Y 2i + nb
2
1 + b
2
2
n∑
i=1
X2i − 2b1
n∑
i=1
Yi − 2b2
n∑
i=1
XiYi + 2b1b2
n∑
i=1
Xi.
Next, since the choice variables here are b1 and b2, let’s consider RSS as a function of b1
and b2, and write
RSS(b1, b2) = nb
2
1 − 2b1
n∑
i=1
Yi + 2b1b2
n∑
i=1
Xi − 2b2
n∑
i=1
XiYi + b
2
2
n∑
i=1
X2i +
n∑
i=1
Y 2i . (4)
Now our problem boils down to a typical problem of minimizing a function with two argu-
ments (input variables): we want to find particular values of b1 and b2 such that RSS(b1, b2)
defined in (4) takes the minimum value2. That is, the OLS estimators for β1 and β2 are
given by
(βˆ1, βˆ2) = arg min
(b1,b2)
RSS(b1, b2).
The below “first-order conditons” can help us solve the ‘optimal’ values of b1 and b2
which give the minimum of RSS(b1, b2):
∂RSS(b1, b2)
∂b1
∣∣∣∣
b1=βˆ1,b2=βˆ2
= 0 and
∂RSS(b1, b2)
∂b2
∣∣∣∣
b1=βˆ1,b2=βˆ2
= 0. (5)
To derive the explicit form of these conditions, we need to take the partial derivatives of
1Note that we might want to write βˆOLS1 and βˆ
OLS
2 with the superscript ‘OLS’ to differentiate the OLS
estimators from estimators obtained using other criteria. But since the OLS estimators are the only estimators
we are concerned at present, we suppress the superscript for expositional simplicity. Later when we will
consider other estimators, we shall add the superscript back.
2Note that this process is not affected by X1, . . . , Xn and Y1, . . . , Yn as they are considered as given.
4
RSS(b1, b2) with respect to b1 and b2 separately:
∂RSS(b1, b2)
∂b1
= 2nb1 − 2
n∑
i=1
Yi + 2b2
n∑
i=1
Xi,
∂RSS(b1, b2)
∂b2
= 2b1
n∑
i=1
Xi − 2
n∑
i=1
XiYi + 2b2
n∑
i=1
X2i .
Then the first-order conditions in (5) imply
nβˆ1 −
n∑
i=1
Yi + βˆ2
n∑
i=1
Xi = 0 (6)
βˆ1
n∑
i=1
Xi −
n∑
i=1
XiYi + βˆ2
n∑
i=1
X2i = 0 (7)
which form a system of two equations with two unknowns (βˆ1 and βˆ2). Solving this system
of equations yields3
βˆ1 = Y − βˆ2X (8)
βˆ2 =
∑n
i=1XiYi − nX Y∑n
i=1X
2
i − nX
2 =
∑n
i=1(Xi −X)(Yi − Y )∑n
i=1(Xi −X)2
, (9)
where X = 1n
∑n
i=1Xi and Y =
1
n
∑n
i=1 Yi.
Given the OLS estimators βˆ1 and βˆ2, the fitted regression model is written as
Yˆi = βˆ1 + βˆ2Xi,
and the fitted residuals are
uˆi = Yi − Yˆi = Yi − βˆ1 − βˆ2Xi.
Lastly, as a practice of the above procedure for solving the OLS estimator(s), consider a
simpler case when there is no intercept term (β1 = 0) in the regression model:
Yi = β2Xi + ui.
In this case, write RSS as a function of b2 only and solve the OLS estimator βˆ2 which
minimizes the RSS. See pages 96–97 of the textbook.
2.3 Two algebraic results
Just to repeat: our true/population model with unknown population parameters β1 and β2
is
Yi = β1 + β2Xi + ui,
3The full derivation can be found on pages 93-94 of the textbook. Also, although we omit the process
here, to verify that βˆ1 and βˆ2 indeed minimize RSS(b1, b2) we need also check the second-order condition
which requires the second-order derivative to be positive.
5
and the fitted model with OLS estimators βˆ1 and βˆ2 is
Yˆi = βˆ1 + βˆ2Xi,
with the fitted residuals given by
uˆi = Yi − Yˆi = Yi − βˆ1 − βˆ2Xi. (10)
We can prove two purely mechanical results using simple algebra:
1. The sample mean of the residuals is always zero:
uˆ :=
1
n
n∑
i=1
uˆi = 0, (11)
which immediately implies that the sum of the residuals is always zero:
n∑
i=1
uˆi = 0, (12)
and
Yˆ :=
1
n
n∑
i=1
Yˆi
(11)
==
1
n
n∑
i=1
Yˆi +
1
n
n∑
i=1
uˆi =
1
n
n∑
i=1
(Yˆi + uˆi)
(10)
==
1
n
n∑
i=1
Yi =: Y . (13)
2. The sum of the products of Xi and uˆi is always zero:
n∑
i=1
Xiuˆi = 0, (14)
which, together with (11), implies that the sample covariance of X and uˆ is also always
zero:
1
n− 1
n∑
i=1
(Xi −X)(uˆi − uˆ) = 0. (15)
This also implies that the sample correlation coefficient for X and uˆ is zero (assuming
the denominator is nonzero).
Let’s prove (11) by first looking at
n∑
i=1
uˆi
(10)
==
n∑
i=1
(Yi − βˆ1 − βˆ2Xi) =
n∑
i=1
Yi − nβˆ1 − βˆ2
n∑
i=1
Xi,
and then dividing by n on both sides to get
1
n
n∑
i=1
uˆi =
1
n
n∑
i=1
Yi︸ ︷︷ ︸
Y
−βˆ1 − βˆ2 1
n
n∑
i=1
Xi︸ ︷︷ ︸
X
= Y − βˆ2X − βˆ1 (8)== βˆ1 − βˆ1 = 0. Done.
6
Next, we prove (14):
n∑
i=1
Xiuˆi
(10)
==
n∑
i=1
Xi(Yi − βˆ1 − βˆ2Xi) =
n∑
i=1
XiYi − βˆ1
n∑
i=1
Xi − βˆ2
n∑
i=1
X2i
(7)
== 0. Done.
Lastly, to see how we derive (15) from (14) and (11), note that
1
n
n∑
i=1
(Xi −X)(uˆi − uˆ) (11)== 1
n
n∑
i=1
(Xi −X)uˆi
=
1
n
n∑
i=1
Xiuˆi −X 1
n
n∑
i=1
uˆi︸ ︷︷ ︸
=0
=
1
n
n∑
i=1
Xiuˆi
(15)
== 0. Done.
Exercise: Can you prove
n∑
i=1
Yˆiuˆi = 0 (16)
as a consequence of (14), and then demonstrate that the sample covariance of Yˆ and uˆ is
zero?
2.4 The goodness of fit: R2
We have seen that the regression analysis decompose the dependent variable to a fitted value
component and a residual component:
Yi = Yˆi + uˆi. (17)
How to quantify the so-called “goodness of fit”?
• Recall the discussion in Sections 2.1 and 2.2, the residual sum of squares (RSS ) can
certainly considered as a measure for the goodness of fit. The smaller RSS is, the
better the model fit is.
• Here we introduce a ‘variance analysis’ of regression models by looking at the variance
decomposition of the fitted model. The goal is to measure how much variance/variation
in the observations of the dependent variance can be explained by the fitted model
Yˆi = b1+b2Xi. The more variation explained by the fitted model, the better the model
fit is.
We first look at the left-hand side of (17), and define
TSS :=
n∑
i=1
(Yi − Y )2,
which is (n − 1) times the sample variance of Y . We called it the total sum of squares
(TSS), as it is the sum of the squared deviations of the sample observations Yi about the
sample mean. We sometimes say that TSS characterizes the ‘total variation’ in the sampled
dependent variable.