ECMT2150 Asymptotic Properties of OLS
Asymptotic Properties of OLS
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
ECMT2150 – Lecture 5
Topics Today
Week 5
Asymptotic Properties of OLS – Chp 5
Review using indicator variables – Chp 7, 7.1-7.4
Linear Probability Model – Chp 7, 7.5
Reference: Chapter 5 & 7
Asymptotic Properties of OLS
Where are we at?
In previous lectures we focused on properties of OLS that hold for any
sample
Properties of OLS that hold for any sample/sample size
– Expected values/unbiasedness under MLR.1 – MLR.4
– Variance formulas under MLR.1 – MLR.5
– Gauss-Markov Theorem under MLR.1 – MLR.5
– Exact sampling distributions/tests under MLR.1 – MLR.6
Properties of OLS that hold in large samples
– Consistency under MLR.1 – MLR.4
– Asymptotic normality/tests under MLR.1 – MLR.5
Asymptotic Theory = Large Sample Theory
Without assuming
normality of the error
term!
10
MLR.6 = normality of
the error term!
Consistency
• Interpretation:
– Consistency means that the probability that the estimate is
arbitrarily close to the true population value can be made
arbitrarily high by increasing the sample size
Consistency is a minimum requirement for sensible estimators
(why?)
An estimator is consistent for a population
parameter if:
for arbitrary and .
Alternative notation:
The estimate converges in
probaility to the true
population value
11
Consistency
– consistency means that as the sample size grows
large, it is more and more unlikely for an estimator to
be far away from the true values.
– with larger sample size, we have more information,
and the estimator should get closer and closer (in
probability sense) to its true value
– Clive Granger (Nobel prize winning econometrician):
“If you can’t get it right as n goes to infinity, you shouldn’t be
in this business”
12
Sampling Distributions as n ↑
β1
n1
n2
n3 n1 < n2 < n3
13
Compare Consistency to Unbiasedness
• Unbiasedness is a feature of an estimator for a given
sample size
• Consistency involves the behaviour of the sampling
distribution of the estimator as the sample size gets large
– If an estimator is not consistent then it does not help us learn
about the parameter of interest, even with an unlimited amount
of data!
Confused about unbiasedness vs. consistency?
See Ben Lambert’s YouTube video:
https://www.youtube.com/watch?v=6i7mqDJICzQ
14
Compare Consistency to Unbiasedness
An Example
We have a random sample from a population with mean
and variance 2,
Consider two estimators for the population mean 1. 1, the first observation from this sample2. �, the sample mean
Both are unbiased:
(1) = E( �) =
15
Compare Consistency to Unbiasedness
An Example continued
But only one – the sample mean – is consistent
Unbiased estimators are not necessarily consistent, but
those whose variances shrink to zero as the sample size
grows are consistent
Formally:
If Wn is an unbiased estimator of and Var Wn → 0 as n → ∞, then plim Wn =
As the sample size grows, Var � = 2
→ 0
but the Var 1 is unchanged
16
Consistency
Theorem (Consistency of OLS)
Under assumptions MLR.1 - MLR.4, as
– consistency means that as the sample size grows
large, it is more and more unlikely for the OLS
estimators to be far away from the true values.
– with larger sample size, we have more information,
and the estimator should get closer and closer (in
probability sense) to its true value.
17
Consistency
• Theorem - Consistency of OLS
• Special case of simple regression model
• We can use Assumption MLR.4 to obtain that the OLS
estimator for is consistent:
1, 2, … , , … , = 0 ⇒ , = 0
• But we actually only need a weaker set of assumptions for
OLS to be consistent: MLR.4‘
= 0, , = 0
All explanatory variables must be uncorrelated with the error term. This
assumption is weaker than the zero conditional mean assumption MLR.4
One can see that the slope estimate
is consistent if the explanatory
variable is exogenous, i.e. un-
correlated with the error term
18
Compare Consistency to Unbiasedness
Theorem 2.1 (Unbiasedness of OLS)
Under assumptions MLR.1 - MLR.4,
• Interpretation of unbiasedness
– The estimated coefficients may be smaller or larger, depending on
the sample that is the result of a random draw
– However, on average, they will be equal to the values that
characterize the true relationship between y and x in the population
– On average means if the sampling were repeated (i.e., draw the
random sample and do the estimation over and over again)
19
Normality
Recall that if we make the classical linear model assumptions
We get:
Theorem 4.1 (Normal sampling distributions)
The estimators are normally distributed
around the true parameters with the variance
that was derived earlier (wk3)
The standardised estimators follow a standard
normal distribution (wk4)
20
Asymptotic Normality
The problem is: in practice, the normality assumption MLR.6
is often questionable
– The good news is: OLS estimates are normal in large
samples even without MLR.6
Theorem - Asymptotic normality of OLS estimator
Under assumptions MLR.1 – MLR.5:
also
In large samples,
the standardised
estimates are
normally
distributed
21
Asymptotic Normality
Moreover, under assumptions MLR.1 – MLR.5, as
where is the usual OLS standard error
Bottom Line: The theorem shows that, as sample size
increases, the OLS estimator follows a normal distribution
• Using this, we can construct confidence intervals and
conduct hypothesis testing as before
• We need to estimate the variance of , i.e. estimate
Remember that
22
Summary
• Inference in Linear Regression
– Hypothesis tests – one-sided, two-sided t-tests
– p-values
– Confidence intervals
– Testing more general alternatives
• An estimate is equal to a constant
• One estimate is equal to another (linear combo)
• Testing multiple linear restrictions (exclusion restrictions)
• Asymptotic Properties of OLS
– Consistency
– Asymptotic normality
Now what?
• Review/extend using indicator variables for qualitative data
– Add this to quadratics/polynomials, logs
=> a flexible toolkit for model specification
• Week 6:
– Specification Errors I – omitted variables & irrelevant variables
– Endogeneity, and one solution
• Week 7:
– Specification Errors II – Functional form misspecification,
measurement error, outliers, missing data
Later:
• Specification Errors III: Heteroskedasticity. Linear
Probability Model
• Instrumental variables
• Panel Data models
What about qualitative
information?
Indicator Variables
(aka ‘Dummy’ Variables)
Wooldridge 7.1 – 7.4
Categorical and Discrete Variables
Often a variable is not continuous (numerical), but rather is
either:
Discrete: takes on two values (e.g. yes/no; male/female); or
Categorical: takes on limited number of values (e.g. states)
Examples:
– gender
– Highest qualification
– nationality
– Transport methods
26
Indicator (or Dummy) Variables
Indicator (or Dummy) Variables are qualitative measures
indicating the presence or absence of an attribute or category
– Binary (or 0-1) variable
• Typically, 1 indicates attribute is present (e.g. 1 = male)
and 0 indicates attribute is absent (0 = female)
• Formally, for dummy variable d
– d = 1 if some attribute or category is present
– d = 0 otherwise (i.e. absent)
Dummy variables are used extensively with both cross-sectional
and time-series data
27
Examples
• Let d be a dummy variable
– d = 1 if a person is female, 0 otherwise
– d = 1 if a person has a postgraduate degree, 0 otherwise
– d = 1 if a person belongs to a union, 0 otherwise
– d = 1 if the time is after 11/09/2001, 0 otherwise
– d = 1 if the firm is in the manufacturing industry, 0
otherwise
28
Dummy variables in regressions
Key difference between two cases:
– Dummy variables as explanatory variables
• As control variables
• For treatment effects
• Also useful in panel data models
– Dummy variables as dependent variables
• Linear probability models (LPM)
• Discrete choice models
• Logit / probit
29
Regression with Indicator
Variables
Regression with indicator variables
Consider a regression with a single indicator variable d and a
quantitative x variable:
yi = β0 + β1xi + δ0di + ui
– model when di = 0
yi = β0 + β1xi + ui
– model when di = 1
yi = (β0 + δ0) + β1xi + ui
• The inclusion of di allows us to estimate separate
intercepts – but the same slope – for different groups
31
Example when δ0 > 0
x
y
β0 + δ0
β0
y = (β0 + δ0) + β1x
y = β0 + β1x
slope = β1
slope = β1
32
Coefficient on the Indicator Variable
• The intercept depends on whether d=0 or d=1
– 0 intercept for category assigned to 0 (the base category)
– (0 + 0) intercept for category assigned to 1
– dummy variable coefficient 0 measures the difference in
the intercept between the two groups
• e.g., wage regression with female = 1 for a female, 0
for non-female
= 0 + 1 + 0 +
• 0 measures the wage difference between males and
females controlling for the influence of education (i.e.
gender wage gap)
Wage equation with a gender dummy
intercept shift: δ0 < 0
34
Inference and Interpretation in
Models with Indicator Variables
Dummy variable hypothesis tests
To test for a difference between the two categories we can use a
t-test
H0: 0 = 0 H1: 0 ≠ 0
(This is just a standard t-test)
Note that the slope 1 is the same for both categories
– i.e., the slope coefficient for x is not affected by d
• e.g., in the wage example 1 measures the effect of another
year of education on salary
– an average of the effects for males and females
36
Example 1: WAMs
A dataset has information on the determinants of uni WAMs
and we can use the data to illustrate the interpretation of
dummy variables:
Consider the model
WAM= β0 + δ0 male + β1 hsWAM + u
• where
WAM = weighted average of university marks
male = 1 for males
= 0 for females (the base category)
hsWAM = high-school WAM
37
Example 1 continued
• predicted WAM for a female student
WÂM = 75.03 + 0.0785 hsWAM
• predicted WAM for a male student
WÂM = (75.03+ 0.023) + 0.0785 hsWAM
= 75.053 + 0.0785 hsWAM
Controlling for high-school grades, males and females
receive very similar college GPAs on average:
– use the t-statistic to test whether there is a significant
difference between males and females
– p-value = 0.9003
=> we don’t reject H0: 0 = 0 in this case
38
Dummy Variable Trap
Why not include a dummy variable for both males and
females?
– if the model has a constant term, this will lead to
perfect collinearity between the explanatory variables
– the constant term is an x variable which takes the value
1 for all observations
– if we include dM = 1 (males), dF = 1 (females), and a
constant in our regression equation
constant = dM + dF
• perfect collinearity → violate MLR.3 → can’t estimate the
model
39
Avoiding the Dummy Variable Trap
Usual Solution: Omit one category
– e.g., for gender, which has 2 categories, either male or
female must be omitted
If we have a categorical variable with m categories:
=> include (m − 1) dummy variables to avoid DVT
– the category for which no dummy variable is included is
then the base category (as before)
– e.g., for state, which has 6 categories, only include 5
dummy variables
40
Example 1 continued
Let’s see how uni WAM depends on the year of study and
high-school WAM:
= 0 + 22 + 33 + 1ℎ +
2ndYr = 1 if student is in 2nd year, 0 otherwise
3ndYr = 1 if student is in 3rd year, 0 otherwise
By excluding 1st-year students we assign them to be the base
category
– one category must be omitted to avoid the DVT
– Interpretation will depend on the category chosen
41
Example 1 continued
= 0 + 22 + 33 + 1ℎ +
– 0 is the intercept for 1st-year students (base category)
– and tell us by how much the intercepts of the other
categories differ from 1st-year students
This means, for:
– 1st-years: E(yi|xi,2ndYr=0,3rdYr=0) = 0 + 1hsWAM
– 2nd-years: E(yi|xi, 2ndYr=1,3rdYr=0) = (0 + 2) + 1hsWAM
– 3rd-years: E(yi|xi, 2ndYr=0,3rdYr=1) = (0 + 3) + 1hsWAM
42
Multiple Qualitative Variables
Regression analysis can easily be extended to handle more
than one qualitative variable:
– e.g., a salary equation with gender, occupation, and
level of education dummy variables
– no limit to the number of dummy variable categories
– for each qualitative variable with m categories, we
include m − 1 dummy variables
– one base category for each qualitative variable, or set of
dummy variables
In the following example, we investigate qualitative
variables for both gender and Yr at uni
43
Example 1 continued
Estimated regression equation:
WÂM = 75.03 + 0.0785 hsWAM + 1.075 2ndYr
− 2.010 3rdYr+ 0.023 male
Estimated mean for male 3rd yrs
WÂM |m3 = (75.03 + 0.023 – 2.010 ) + 0.0785 hsWAM
Estimated mean for female 2nd yrs
WÂM |f2 = (75.03 + 1.075 ) + 0.0785 hsWAM
44
Example 1 continued
Interpreting the estimated coefficients:
constant – female students in their first year have an average
WAM = 75.03 + 0.0785 hsWAM
2ndYr– all else equal, 2nd yrs have a WAM that is 1.075 points
higher
male – all else equal, male students have a WAM that is 0.023
points higher
hsWAM – all else equal, a 1-point increase in high-school
WAM leads to a 0.0785 point increase in uni WAM
45
Interactive indicator variables
So far, we have only allowed the intercept to change across
categories, but slopes may also differ across categories:
Consider
y = β0 + δ0d + β1x + δ1(d x) + u
– We interact the dummy variable d with the continuous
variable x
In this case
E(y|x, d=0) = β0 + β1x
E(y|x, d=1) = (β0 + δ0) + (β1 + δ1) x
46
Dummy variable interaction with a
continuous variable
If x is a continuous variable when d = 1
y = β0 + δ0d + β1x + δ1(d x) + u
– intercept increases by δ0 (as before)
– slope increases by δ1
Another way to write the same model:
y = (β0 + δ0d )+ (β1+ δ1d )x + u
• the coefficient of x is not constant; it depends on d
– slope changes depending on the category measured by d
47
Example of δ0 > 0 and δ1 <0
y
x
y = β0 + β1x
y = (β0 + δ0) + (β1 + δ1) x
d0 = 1,
d1 = 1
d0 = 0, d1 = 0
48
Dummy variable interacted with another dummy
• Consider:
• Thus:
is the increase when d1=1
is the increase when d2=1
is the additional increase when d1=d2=1
Interactive dummy variables: Example
• Estimated wage equation with interaction term
No evidence against
hypothesis that the return to
education is the same for men
and women
Implies there is no significant evidence of lower
pay for women at the same levels of educ,
exper, and tenure? No: this is only the effect for
educ = 0. We need to re-center the interaction
term, e.g. at ave. educ = 12.5.
� log = −.227 − (.0056 ∗ 12.5)
50
Interactive dummy variables: Example
Case 1: Case 2:
Interacting both the
intercept and the slope with
the female dummy enables
one to model completely
independent wage
equations for men and
women
51
Testing for differences across groups
Suppose we want to test whether one group is different
from another in a regression:
– e.g., females are different from males in the model
Estimate:
GPA = b0 + d0 female + b1sat + d1 female·sat
+ b2hsperc + d2 female·hsperc
+ b3tothrs + d3 female·tothrs + u
where: sat = standardized aptitude test score
hsperc = high school rank percentile
tothrs = total hours spent in college courses
52
Testing for differences across groups
• Unrestricted model (contains full set of interactions)
• Restricted model (same regression for both groups)
College grade point average
Standardised aptitude test
score High school rank percentile
Total hours
spent
in college
courses
53
Testing for differences across groups
• Null hypothesis
• Estimation of the unrestricted model
All interaction effects are zero,
i.e. the same regression
coefficients apply to men and
women
Tested individually, the
hypothesis that the
interaction effects are
zero cannot be
rejected