Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
ECONOMETRICS
LECTURE SLIDES #7A TOPICS
Adjusted R-Squared
Qualitative Information
Dummy Variable and Multiple Groups
Key references 6.3 , 7.1, 7.2 and 7.3
2
MORE ON GOODNESS OF FIT
General remarks on R-squared
High R-squared does not imply there is a causal interpretation
Low R-squared does not preclude precise estimation of
marginal effects
R-squared will always increase (at least never decrease) when we
add an extra variable
How to construct a version of R-squared that takes into account
this fact
3
MORE ON GOODNESS OF FIT...
Adjusted R-squared accounts for degrees of freedom
�2 = 1 − ( ⁄ ( − − 1))( ⁄ ( − 1))
Adjusted R-squared imposes a penalty for adding new
regressors
Adjusted R-squared may increase or decrease when add a
variable
Potentially useful in comparing models with alternative
numbers of regressors
Adjusted R-squared may be negative
�2 = 1 − (1 − 2)( − 1)/( − − 1)
4
ADJUSTED R-SQUARED IN STATA
5
_cons 4.821997 .2883396 16.72 0.000 4.253538 5.390455
lsales .2566717 .0345167 7.44 0.000 .1886224 .3247209
lsalary Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 66.7221632 208 .320779631 Root MSE = .50436
Adj R-squared = 0.2070
Residual 52.6559944 207 .254376785 R-squared = 0.2108
Model 14.0661688 1 14.0661688 Prob > F = 0.0000
F(1, 207) = 55.30
Source SS df MS Number of obs = 209
. reg lsalary lsales
USING FIT TO CHOOSE BETWEEN MODELS
If models are nested – one is a special case of other
= 0 + 1 +
= 0 + 1 + 22 +
First model is nested within second depending on 2
Could choose between models on basis of test of 0:2 = 0
Could also decide on basis of fit using �2
Implicitly selecting a specific critical value
�2 for first model increases relative to second iff t-statistic for
estimate of 2 is greater than one in absolute value
6
USING FIT TO CHOOSE BETWEEN MODELS...
Models are nonnested if neither model is special case of other
= 0 + 1 log +
= 0 + 1 + 22 +
Can’t impose restrictions on 1 & 2 to move to log model
Testing option not available but can compare fit
Using RDCHEM data log model 2 = .061 while quadratic model
yields 2 = .148 but comparison unfair to first model
�2 = 0.030 for log & �2 = 0.090 for quadratic model
Even after adjusting for difference in degrees of freedom quadratic
model is preferred
7
USING FIT TO CHOOSE BETWEEN MODELS...
Models with different dependent variables will typically
be non-nested
Here neither R-squared nor adjusted R-squared should be
used for comparison
Continuing previous ex. what if comparison between log() = 0 + 1 + 22 +
= 0 + 1 + 22 +
Now not possible to compare fit
Comparing how well variation in log is explained
versus with how well variation in is explained
Extent of variation in these two could be very different
8
QUALITATIVE INFORMATION
Thus far variables have been quantitative – number of
bedrooms, years of education, hourly wage, …
Many features likely to appear in analyses are
qualitative
Gender of individual, their occupation, whether they are
employed or not, …
Industry classification of firm, its credit rating, whether or not it
paid a dividend last quarter, …
One way to incorporate qualitative information is to use
dummy (binary, indicator) variables
Equals 1 or 0 representing presence or absence of feature
May appear as dependent or as independent variables 9
DUMMY EXPLANATORY VARIABLE
Single dummy independent variable
= 0 + 0 +
= 1 if person is a woman & = 0 otherwise
Choice of who is the dummy is arbitrary
Using zero/one also arbitrary but useful for
interpretation
In our example being a woman is choosen for the dummy
variable being equal to 1. By using the binary female,
we have chosen male to be the base/benchmark group.
10
DUMMY VARIABLE
= 0 + 0 +
Specified model is regression representation of
conditional means:
= 0 = 0
= 1 = 0 + 0
= 1 − = 0 = 0
11
DUMMY EXPLANATORY VARIABLE
Have relied on ZCM assumption
To better estimate gender effect need to control for other
factors
= 0 + 0 + 1 +
= 1, − = 0, = 0
0represents difference in mean wage between men & women
with the same education
12
DUMMY EXPLANATORY VARIABLE
13
Implication of this
particular model
Difference does not
depend on level of
education
Data determine this
difference
Graphically, model
specifies an intercept shift
according to gender
DUMMY VARIABLE TRAP
What happened to the male dummy?
Why can’t we estimate
= 0 + 0 + 0 + 1 + ?
Answer: There is a perfect multicollinearity problem (MLR.3 not
satisfied)
+ = 1
Male & female dummy variables are perfectly collinear with
the intercept
An example of the dummy variable trap
More latter when talk about multiple groups