Creation date:2024-05-08 15:28:20
ECMT1020 Introduction to Econometrics
Lecture 8: Specification of Regression Variables
Please read Chapter 6 of the textbook.
1 Omitting a relevant variable § 2
1.1 Omitted variable bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Review: If simple regression (2) were true . . . . . . . . . . . . . . . . 3
1.1.2 Now: True model is multiple regression (1) . . . . . . . . . . . . . . . 4
1.2 Effects on the statistical tests and R2 . . . . . . . . . . . . . . . . . . . . . . 6
2 Including a redundant variable 7
2.1 No bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Efficiency loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3 Proxy variable 9
4 Testing for linear restrictions 10
4.1 F test for linear restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.1.1 F test for one linear restrictions . . . . . . . . . . . . . . . . . . . . . 12
4.1.2 F test for multiple linear restrictions . . . . . . . . . . . . . . . . . . . 13
4.2 t test for a linear restriction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Figure 1: Four cases of model specification
1 Omitting a relevant variable §
Recall the example that we used at the beginning of Lecture 5 (Multiple Regression Analysis)
to introduce the multiple regression:
Y = β1 + β2X2 + β3X3 + u (1)
• Y = EARNINGS, the hourly earnings measured in dollars;
• X2 = S, years of schooling (highest grade completed);
• X3 = EXP, years spent on working after leaving full-time education (experience);
• u is the disturbance term.
When we discussed the interpretation of coefficient β2 in Section 1.3 of Lecture 5, we em-
phasized the difference between the interpretation of β2 from a simple regression
Y = β1 + β2X2 + v, (2)
and the interpretation of β2 from the multiple regression (1) where an additional regressor
X3 is present. Note that I use v to denote the disturbance term in regresssion (2), because
I want to make it clear that the two disturbance terms in these two regressions are very
In this section, we will analyze in more detail the consequences of not including the
variable X3 in our regression, in particular when it ought to be included. In other words,
what will happen if (1) is the ‘true model’ but we run regression (2)? It turns out that our
estimator of the coefficient of X2 will suffer from the so-called omitted variable bias; and the
statistical tests will be invalid.
The analysis here is similar to what we did in Section 1.3 of Lecture 5. Again, we
• denote βˆ2 as the OLS estimator for β2 in the multiple regression (1);
• denote β˜2 as the OLS estimator for β2 in the simple regression (2).
A note before we proceed: we are still in the classical linear regression model (CLRM)
world, and all the assumptions for CLRM apply to the true models under our discussion.
1.1 Omitted variable bias
In Lecture 5, we showed that (in Section 1.4.1) βˆ2 is an unbiased estimator for β2 in model (1)
under the assumptions of CLRM. Moreover, we explained intuitively and by example (earn-
ings regression) that β˜2 and βˆ2 can be very different, and this is not just due to random
1Oftentimes we use the same generic notation u to denote the disturbance term. But here is one of the
examples when we want to make the difference more explicit.
2By checking the formulas of βˆ2 and β˜2, you can see they differ in their values for sure. In fact, in our
earnings regression example, both βˆ2 and β˜2 are postive and β˜2 < βˆ2. This can be seen from the comparison
of two slopes estimated from the simple regression without X3 and from the multiple regression with X3,
shown in Figure 3.2 in the textbook, or from the Stata outputs. Again, omitting the work experience as
the explanatory variable for hourly earnings will lead us to underestimate the effect of schooling on hourly
Here, we follow up with the previous discussion, and will formally show that β˜2 is a biased
estimator for the effect of X2 on Y , if model (1) is the true model. This addresses a very
common scenario in practice: suppose model (1) is the true model (X3 has explanatory
power to the dependent variable and hence ought to be included in the model), but unfor-
tunately we did not know it §. Instead, we fit a regression omitting X3, like in (2), and get
the OLS estimate β˜2 for the slope coefficient of X2. Then we use β˜2 to interpret the effect
of X2 on Y . The question is:
HOW WRONG can we be?
To answer this question, we begin with writing down the OLS formula for β˜2 given n
observations of Y and X2:
β˜2 =
i=1(X2i −X2)(Yi − Y )∑n
i=1(X2i −X2)2
, (3)
where X2 =
i=1X2i and Y =
i=1 Yi. In what follows, we analyze the property of β˜2
in two different scenarios:
1. The simple regression (2) is indeed the true model; −→ This is a review of simple
regression analysis
2. The simple regression (2) is NOT the true model, and the true model is the multiple
regression (1). −→ This is new because we now deal with ‘model misspecification’
1.1.1 Review: If simple regression (2) were true
Recall that if model (2) were the true model where the assumptions of CLRM holds (including
E(vi) = 0 for all i = 1, . . . , n), then β˜2 is an unbiased estimator for β2 in model (2). This
is what we learned in Lecture 4 (Chapter 3 in the textbook) on simple regression analysis.
The way how we show it was to first note that under the true model (2),
Yi = β1 + β2X2i + vi and Y = β1 + β2X2 + v,
which implies that for each i = 1, . . . , n,
Yi − Y = β2(X2i −X2) + (vi − v), (4)
where v = 1n
i=1 vi. Then we plug (4) into the OLS formula (3) to get
β˜2 =
i=1(X2i −X2)[(β2(X2i −X2) + (vi − v)]∑n
i=1(X2i −X2)2
i=1(X2i −X2)2 +
i=1(X2i −X2)(vi − v)∑n
i=1(X2i −X2)2
= β2
i=1(X2i −X2)2∑n
i=1(X2i −X2)2
i=1(X2i −X2)(vi − v)∑n
i=1(X2i −X2)2
= β2 +
i=1(X2i −X2)(vi − v)∑n
i=1(X2i −X2)2
. (5)
Lastly, we denote
di :=
X2i −X2∑n
i=1(X2i −X2)2
, i = 1, . . . , n, (6)
which allows us to write (5) as
β˜2 = β2 +
di(vi − v). (7)
The unbiasedness can then be deduced by taking expectation on both sides of (7), with the
notice that di’s are all deterministic, which yields
E(β˜2) = β2 +
diE(vi − v¯) = β2 + 0 = β2,
using the assumption that E(vi) = 0 for i = 1, . . . , n. The interpretation is that β˜2 is an
unbiased estimator for β2, the popoulation parameter in regression (2).
1.1.2 Now: True model is multiple regression (1)
The estimator here is still β˜2 given by (3). But now suppose the true model is actually (1)
which means the partial effect ofX2 on Y is characterized by the population parameter β2 in (1).
Again, we impose the usual assumptions of CLRM on the true model, which include that
E(ui) = 0 for i = 1, . . . , n.
Note that under the true model (1), what we have in (4) are no longer true. Instead, we
Yi = β1 + β2X2i + β3X3i + ui and Y = β1 + β2X2 + β3X3 + u,
which implies that
Yi − Y = β2(X2i −X2) + β3(X3i −X3) + (ui − u), (8)
for each i = 1, . . . , n. Next, we plug (8) into the OLS formula (3) to understand the properties
of β˜2 under the true model (1). It’s a pain in the neck
3 but we can do it similarly as how we
derived (5). Omitting some lines of tedious (but simple) derivations, we have
β˜2 =
i=1(X2i −X2)[(β2(X2i −X2) + β3(X3i −X3) + (ui − u)]∑n
i=1(X2i −X2)2
= β2︸︷︷︸
Term 1
i=1(X2i −X2)(X3i −X3)∑n
i=1(X2i −X2)2︸ ︷︷ ︸
Term 2
i=1(X2i −X2)(ui − u)∑n
i=1(X2i −X2)2︸ ︷︷ ︸
Term 3
. (9)
Let’s pause here a bit. We should have a close look at the three terms on the right-hand
side of (9), and compare them with the two terms on the right-hand side of (5).
3Especially because I did not assume away the intercept term in the regression as I did in Lecture 4 to
simplify the derivation.
1. Term 1 in (9) is β2, the population parameter in the true model (1). Note that β2
here is different than the first term in (5), although also denoted as ‘β2’, which was the
popoulation parameter in model (2).
2. Term 3 in (9) is similar to the second term in (5) except that we have a different
disturbance term u here. The zero mean assumption of the disturbance term now
applies to u becuase it is the disturbance term of the true model. So, using the
assumption E(ui) = 0 for i = 1, . . . , n, we have E(u) =
i=1E(ui) = 0 and hence
E(ui − u) = E(ui)− E(u) = 0 for i = 1, . . . , n. Then it follows that
E(Term 3) = E
di(ui − u¯)
diE(ui − u¯) = 0, (10)
where the deterministic di’s are the same as defined in (6).
3. Term 2 in (9) is completely new. In particular,
Term 2 = β3 · γX2,X3 , (11)
γX2,X3 :=
i=1(X2i −X2)(X3i −X3)∑n
i=1(X2i −X2)2
. (12)
The first observation is that γX2,X3 is deterministic because it depends solely on the
observations of the regressors. Therefore,
E(Term 2) = Term 2 = β3γX2,X3 , (13)
which is nonzero in general. The second observation 4 is that γX2,X3 is closely related
to the sample correlation coefficient between X2 and X3:
rX2,X3 =
i=1(X2i −X2)(X3i −X3)√∑n
i=1(X2i −X2)2
i=1(X3i −X3)2
In fact, we have
γX2,X3 = rX2,X3 ·
i=1(X3i −X3)2∑n
i=1(X2i −X2)2︸ ︷︷ ︸
= rX2,X3 ·
and hence γX2,X3 and rX2,X3 have the same sign:
• If X2 and X3 are positively correlated, then γX2,X3 is positive;
• If X2 and X3 are negatively correlated, then γX2,X3 is negative;
• If X2 and X3 are not correlated, i.e. γX2,X3 = 0, then Term 2 in (9) is zero too.
4You might have also recognized that γX2,X3 defined in (12) is actually the estimated slope coefficient if
you fit a simple regression of X3 (the dependent variable) on X2 (the regressor)! Fitting a regression between
two regressors in a multiple regression should not sound unfamiliar to you, because this appeared in the
‘purged regression’ we discussed in Lecture 5 when presenting the Frisch-Waugh-Lovell theorem.
Now, we have reached the point that we can deduce from (9), (13) and (10) that
E(β˜2) = β2 + β3γX2,X3 .
Clearly, β˜2 is a biased estimator for β2, the parameter in the true model (1) interpreted as
the partical effect of X2 on Y , unless β3 = 0 or γX2,X3 = 0. When β˜2 is a biased estimator,
its bias is
E(β˜2)− β2 = β3 · γX2,X3 , (14)
which is often called ‘omitted variable bias’ for the variable X3 is omitted when estimating
the model. We say
• β˜2 has upward bias, or β˜2 overestimates the true parameter, if E(β˜2)− β2 > 0;
• β˜2 has downward bias, or β˜2 underestimates the true parameter, if E(β˜2)− β2 < 0.
Therefore, what determine whether β˜2 overestimates or underestimates the partial effect of
X2 on Y are the signs of β3 and γX2,X3 (we have noted that the sign of γX2,X3 is the same
as the sign of the sample correlation coefficient between X2 and X3). Depending on your
application, the omitted variable bias can be either positive or negative.
For example, in Lecture 5 when we considered the earnings regression example, X2 (years
of schooling) and X3 (work experience) are negatively correlated. This makes γˆX2,X3 neg-
ative. On the other hand, β3 is much likely to be positive because it captures the partial
effect of work experience on hourly earnings. All together, E(β˜2)−β2 < 0 by (14), and hence
β˜2 may underestimate the effect of schooling on the hourly earnings with variable on work
experience omitted.
The intuitive understanding is shown in Figure 2 which is also given in the textbook.
Figure 2: Direct and indirect effects of X2 when X3 is omitted.
1.2 Effects on the statistical tests and R2
Another consequence of omitting a variable that is in the true model is that the standard
errors of the coefficients and the test statistics are in general invalidated. In principle, this
means that we are unable to test any hypotheses based on the regression results. §§
In general, it is impossible to determine the contribution to R2 of each explanatory
variable in multiple regression analysis. We can see why in the context of omitted variable.
2 Including a redundant variable
Next, we analyze the flip side of the coin: What if the simple regression model (2) is true,
but we opt for a more complicated model – the multiple regression (1) with a redundant
variable X3?
First of all, there is an intuitive explanation. Note that the the simple regression without
variable X3 is nested in the multiple regression with variable X3, in the sense that the former
can be considered as the latter subject to the parameter restriction that β3 = 0. Specifically,
the true model (2) can be written as
Y = β1 + β2X2 + 0︸︷︷︸
·X3 + v. (15)
Provided that this is the true model, then the OLS estimator for β2 will be an unbiased
estimator of β2 and the OLS estimator for β3 will be an unbiased estimator for β3 = 0 under
the assumptions of CLRM.
But if you realized beforehand that in the true model we have β3 = 0, then you would
be able to exploit this information to exclude X3 from your regression, which will yield an
efficiency gain in the estimation. Conversely, if you did not realize this but included X3 in
your model specification, then you will face an efficiency loss.
Below we give a more formal analysis on the properties of the OLS estimator for the
effect of X2 on Y when a redundant variable X3 is included in the regression specification.
2.1 No bias
We first denote
s22 =
(X2i −X2)2, s23 =
(X3i −X3)2, s23 =
(X2i −X2)(X3i −X3),
where s22 is essentially n ·MSD(X2) and s23 is essentially n ·MSD(X3). Then the formula for
βˆ2 in the multiple regression (1) can be written as
βˆ2 =
numerator = s23
(X2i −X2)(Yi − Y )− s23
(X3i −X3)(Yi − Y ),
denominator = s22s
3 − s223 = s22s23
1− s
= s22s
3(1− r2X2,X3). (17)
Note that the denominator of βˆ2 depends solely on the regressors.
Under the true model (2), we have (4). To analyze the behavior of βˆ2 under the true
model (2), we need to plug (4) into the OLS formula (16). Since the denominator does not
depend on Y , all that matters is to see that
numerator = s23
(X2i −X2)[β2(X2i −X2) + (vi − v)]
− s23
(X3i −X3)[β2(X2i −X2) + (vi − v)]
= s23β2
(X2i −X2)2︸ ︷︷ ︸
(X2i −X2)(vi − v)
− s23β2
(X2i −X2)(X3i −X3)︸ ︷︷ ︸
(X3i −X3)(vi − v)
= (s22s
3 − s223)︸ ︷︷ ︸
β2 +
s23(X2i −X2)(vi − v) +
s23(X3i −X3)(vi − v)
= β2 · denominator +
s23(X2i −X2) + s23(X3i −X3)
]︸ ︷︷ ︸
(vi − v). (18)
Notice that the denominator given in (17) is simply a number calculated from the observa-
tions of regresssors. For each i = 1, . . . , n, we define
d†i :=
Then deviding both sides of (18) by the ‘denominator’, we have
βˆ2 = β2 +
d†i (vi − v), (19)
the usual form!
We should compare the right-hand side of equation (19) and that of equation (7):
1. β2 in (19) is the same as β2 in (7), because they are both population parameter in the
(assumed) true model (2).
2. The difference between the second term in (19) and that in (7) lies in the difference
between di and d
i . It is clear that di 6= d†i : the former depends only on the observations
of X2, while the latter depends on observations of both X2 and X3. But they are both
deterministic as in the usual structure.