Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Final Exam Review Problems
These question resemble how you would apply skills learned in PP208 as a policy researcher or
consumer of policy analysis; and for that reason, they resemble questions you will see on your final
exam. With a firm grasp of course material, you should be able to complete this review set using only
your notes, the tables at the back, and a scientific calculator, in far less than 80 minutes.
1. You’re interested in how a person’s birth order and number of siblings affect wages. You regress log
of wages on birth order (1 = first born, 2 = second born, etc.), the number of siblings, years of work
experience, age and father’s educational attainment, for a random sample of working adults:
(1) ln() = 0 + 1ℎ + 2 + 3 + 4 + 5 + .
The results of this regression is reported in column (1) of the table, below. You also run additional
specifications excluding some of the control variables, reported in columns (2)-(4).
Dependent Variable: ln(wage)
(1) (2) (3) (4)
Birth Order -0.017 -0.027 -0.028
(0.012) (0.010) (0.013)
Number of Siblings -0.007 -0.011 -0.013
(0.008) (0.006) (0.007)
Years of Work Experience 0.000 -0.006 0.000
(0.004) (0.004) (0.004)
Age (years) 0.028 0.030 0.029
(0.006) (0.006) (0.006)
Father's Education (years) 0.029 0.031
(0.005) (0.005)
Constant 5.635 5.973 5.526 6.911
(0.179) (0.174) (0.174) (0.029)
Observations 680 680 680 680
R-squared 0.1106 0.0650 0.1030 0.0240
SSR 102.04 107.30 102.91 112.03
Standard Errors are reported in parentheses.
a. (10 points) In column 1, interpret the coefficient estimate on birth order in column (1).
Being 1 lower in the birth order is on average associated with earning 1.7% lower wages.
b. (10 points) In column 1, can you reject a null hypothesis that Number of Siblings has no effect on
wages (i.e. 0:2 = 0;:2 ≠ 0) at the 95% level? (For the purposes of determining the
critical t value, assume the sample size is close to infinity.)
The t statistic on the test equals � 2
�
�2��
� = 0.007.008 = 0.875 which is lower than 1.96 = . So we
fail to reject the null.
c. (10 points) At what level of confidence would you have rejected the two-sided hypothesis test in
part b? (While you won’t be able to calculate this exactly, you can use the tables at the end of
the exam to give a range).
Assuming the sample size is near infinity, the two-tailed critical values in the attached tables
closest to our test statistic = 0.875 are 0.60 = 0.842 and 0.70 = 1.036. So the value is 0.6 < < 0.7, but closer to 0.6. So you could have rejected at the 60% level (but not the 70%
level)—quite a low level of confidence. (Using an online resource, you can find that the p-value is
about 0.62).
d. (10 points) The difference between columns (1) and (2) is the inclusion of Father’s Education. By
including Father’s Education, the coefficient on the number of siblings changes from -0.011 to
-0.007 (less negative). Offer a potential explanation for why the coefficient on number of
siblings changed.
Households where the father has more education tend to have fewer children
((. ,) < 0), a common pattern where higher SES families invest more in
fewer numbers of children. And father’s education typically is positively correlated with their
children’s earnings ( > 0) due to any number of reasons (social network, access to better
schools, greater at-home investments that are valued in market, etc.). So by not controlling for
father’s education, the measured effect of number of siblings is also capturing the negative
effect on wages of the associated lower education/SES of the household.
Another way to look at this is by examining the algebra. The difference in the coefficients on the
number of siblings in column (1) (. � ) and in column (2) (. � ) is:
. � = . � + 1�� , where 1� captures (. ,), so that
1�� < 0. So . � < . � .
e. (10 points) Comparing columns (1) and (2), note that the standard error on the coefficient on
Number of Siblings is slightly higher in column (1). Explain why this happened (factoring in your
response to part d).
�. � � = � �2∑ (,−���=1 )2∗�1−2�. We know that including father’s education must have
lowered �2, because it’s a strong explanatory variable in column (1). So for the �. � � to
be higher than the �. � �, it must mean that the increase in 2 for number of siblings (the
degree to which the other Xs explain the number of siblings) from including father’s education
(which raises the standard error) must more than offset the downward pressure on the standard
error from the decrease in �2.
f. (10 points) How would you test the joint hypothesis that neither Birth Order nor Number of
Siblings affect ln(wages)? Please explain the logic behind the test, and the test statistic used.
You would test 0: ℎ = 0,. = 0, using an F test. The logic is the following: We
compare the model fit (as measured by the total sum of squared explained, or the 2) of two
models: a) when the model is unconstrained (you just run regression in column (1) and let the
data give you the best fitting coefficient estimates for ℎ and .; and b) when you
impose the hypothesis on the model—in this case that ℎ = 0,. = 0, which is
essentially column (3) where you drop birth order and number of siblings. If the true model is
such that birth order and number of siblings have small effects on wages, then there won’t be
much of a difference between the model fits of the constrained and unconstrained models. In
contrast, if birth order and number of siblings have sizeable effects on wages, then the fit
between to the constrained and unconstrained models will be large. The difference in the two
models’ fit is an F statistic, where the larger the difference, the more likely you are to reject the
null.
2. (The following question is loosely based on a well-known Card and Krueger 1994 study of the effects
of the minimum wage on employment.) In 1992, New Jersey raised its minimum wage from $4.25 to
$5.05, higher than the Federal minimum wage at the time. Pennsylvania, which neighbors New
Jersey, maintained the Federal minimum wage at $4.25. Assume the model determining
employment is:
(1) ln() = 0 + 1 + 2 +
where ln() is the log of the number of employees at store i, is a binary
variable that equals 1 if store i is located in New Jersey, is the poverty rate of store i's
neighborhood and is the error term.
a. (10pts) You have cross sectional data on 500 fast food restaurants on the border of New Jersey
and Pennsylvania, from 1992, 6 months after New Jersey raised its minimum wage. You use this
data to estimate equation (1). Explain how your estimate of 1 might be biased.
The economic condition of New Jersey could differ from Pennsylvania on a number of dimension
not captured by poverty rates. New Jersey could have a stronger (weaker) economy with more
(less) economic activity, and more (less) consumer demand for spending (i.e.
( , ℎ) > 0 ( < 0), and the independent effect of economic
strength on employment is positive, giving rise to a positive (negative) bias.
b. (10pts) Now assume that you have panel data on the 500 fast food restaurants. The panel data
includes information on all 500 restaurants in two periods: 1991 and 1992 (i.e. 6 months before
and 6 months after New Jersey raised its minimum wage). With this panel data, write down:
(1) The difference-in-differences regression equation that allows you to estimate the effect of
raising the minimum wage on employment, using New Jersey as the treatment state, and
Pennsylvania as a comparison group, to control for the common trend in employment;
(2) The corresponding hypothesis test for whether the minimum wage law in NJ has an impact on
employment.
(1) ln() = 0 + 1 + 2 +
3 ∗ + 4 +
(2) 0: 3 = 0;: 3 ≠ 0
c. (10pts) Consider your difference-in-differences panel regression specification from part c).
(1) For your panel regression estimate of the effect of the higher minimum wage to be unbiased,
what assumption needs to hold?
(2) And what is example of an omitted variable in this regression that might violate this
assumption?
(1) The changes in the control group over time capture what would have happened to the
treatment group had the policy been implemented.
(2) Unobserved economic conditions that affect employment changed differentially for treatment
(NJ) and control (PA) stores at the same time as the minimum wage law was implemented.
d. With the same panel data described above, you’re now interested in the causal effect of ln(wages)
on employment in the fast food industry, but you worry about the endogeneity of ln(wages).
(1) Write down the regression equation(s) that allow you to exploit the minimum wage law as an
instrument for ln(wage) changes, to estimate the effect of ln(wages) on employment.
(2) What assumptions need to hold for your instrument to be valid.