Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
ECO 480
Computer Assignment 2
1
ECO 480 Econometrics 1
Computer Assignment 2
Instruction: You have approximately three weeks to complete this assignment. No late work will be
accepted and all the files must be submitted via UBLearns. Make sure you upload it to the correct
submission slot because no credit will be given for incorrect submissions. I do not accept email
submissions.
Important: It is extremely important to write a clean well-commented program for transparency and
replication purposes in all empirical work. You should always be able to reproduce your result from raw
data to support your claim.
There are 3 items to hand in: (1) Typed write-up (i.e., word-file) answering the assigned questions,
reporting your results, and interpreting your findings; if the question asks for graphs or tables, these must
be in the word-file in an organized manner with your interpretation, (2) do-file (i.e., program file), and (3)
log-file (i.e., output file that shows the results). You MUST use Stata. For questions involving data
analysis, you will NOT get any credit if you do not provide a program code and the output. You may not
use Excel. Do not submit any undigested log-file that contains errors. Put all your answers in the word file
and do NOT say “please see log-file (or do-file) for answers. You will not receive any credit for answers
that is not stated in the word file.
1. Do citizens demand more democracy and political freedom as their incomes grow? That is, is
democracy a normal good? We will use Income_Democracy to explore this question. This data
contains a panel data from 195 countries for the years 1960, 1965, ... , 2000. A detailed description is
given in Income_Democracy_Description. The data set contains an index of political
freedom/democracy for each country in each year, together with data on each country’s income and
various demographic controls. (The income and demographic controls are lagged five years relative
to the democracy index to allow time for democracy to adjust to changes in these variables.)
a. Is the data set a balanced panel? Explain.
b. The index of political freedom/democracy is labeled Dem_ind.
i. What are the minimum and maximum values of Dem_ind in the data set? What are the
mean and standard deviation of Dem_ind in the data set? What are the 10th, 25th, 50th, 75th,
and 90th percentiles of its distribution?
ii. What is the value of Dem_ind for the United States in 2000? Averaged over all years in the
data set?
iii. What is the value of Dem_ind for Libya in 2000? Averaged over all years in the data set?
iv. List five countries with an average value of Dem_ind greater than 0.95; less than 0.10; and
between 0.3 and 0.7.
c. The logarithm of per capital income is labeled Log_GDPPC. Regress Dem_ind on Log_GDPPC.
Use standard errors that are clustered by country.
i. How large is the estimated coefficient on Log_GDPPC? Is the coefficient statistically
significant?
ii. If per capita income in a country increases by 20%, by how much is Dem_ind predicted to
increase? What is a 95% confidence interval for the prediction? Is the predicted increase in
ECO 480
Computer Assignment 2
2
Dem_ind large or small? (Explain what you mean by large or small). [Hint: The descriptive
statistics you computed is helpful to understand whether the increase is large or small.]
d. Suggest a variable that varies across countries but plausibly does not vary (or varies little) over
time that could cause omitted variable bias in the regression in (c). Write down the model that
address this omitted variable bias concern.
e. Suggest a variable that varies over time but plausibly does not vary (or varies little) across
countries that could cause omitted variable bias in the regression in (c). Write down the model
that address this omitted variable bias concern.
f. Write down your preferred model and estimate the model. How does your answer to (c)(i) and
(c)(ii) change?
g. What is the omitted variable bias concern in (f). There are additional demographic controls in the
data set. Should these variables be included in the regression? If so, how do the results change
when they are included?
h. Based on your analysis, what conclusion do you draw about the effects of income on democracy?
2. In 1993, Georgia initiated a HOPE scholarship program to let state residents who had at least a B
average in high school attend public college in Georgia for free. The program is not need based. Did
the program increase college enrollment? Or did it simply transfer funds to families who would have
sent their children to college anyway? We will use HOPE dataset for this problem that come from
Dynarski (2000). She used data on young people in Georgia and neighboring states to asses this
question.
The definition of each variable is the following:
• InCollege = A dummy variable equal to 1 if the individual is in college
• AfterGeogia = A dummy variable equal to 1 for Georgia residents after 1992
• Georgia = A dummy variable equal to 1 if the individual is a Georgia resident
• After = A dummy variable equal to 1 for observations after 1992
• Age = Age
• Age18 = A dummy variable equal to 1 if the individual is 18 years old
• Black = A dummy variable equal to 1 if the individual is African-American
• StateCode = State codes
• Year = Year of observation
• Weight = Weight used in Dynarski (2000)
a. Run a basic difference-in-difference model (i.e., treatment, control, and before and after without
any additional controls). What is the effect of the program?
b. Calculate the percent of people in the sample in college from the following four groups: (i)
Before 1993/non-Georgia, (ii) Before 1993/Georgia, (iii) After 1992/non-Georgia, and (iv) After
1992/Georgia. First use the mean function (Hint: use mean Y if X1 == 0 & X2 == 0). Second, use
the coefficients from the OLS output in part (a).
c. Graph the fitted lines for the Georgia group and non-Georgia samples.
ECO 480
Computer Assignment 2
3
d. Use panel data formulation for a difference-in-difference model to control for all year and state
fixed effects.
e. Add covariates for 18-year-olds and African-Americans to the panel data formulation. What is the
effect of the HOPE program?
f. The way the program was designed, Georgia high school graduates with a B or higher average
and annual family income over $50,000 could qualify for HOPE by filling out a simple one-page
form. Those with lower income were required to apply for federal aid with a complex four-page
form and had any federal aid deducted from their HOPE scholarship. Run separate basic
difference-in-difference models for these two groups, which is indicated by LowIncome.
Comment on the substantive implication of the results.
g. Re-do part (f) in one regression. What is the main advantage of re-doing part (f) in one
regression?
3. Researchers have long been interested in the relationship between economic factors and presidential
elections. The PresApproval.dta data set includes data on presidential approval polls and
unemployment rates by state over a number of years.
The definition of each variable is the following:
• State = state name
• StCode = state numeric ID
• Year = year
• PresApprov = Percent positive presidential approval
• UnemPct = state unemployment rate
• South = binary variable (1 if Southern state, 0 otherwise)
a. Use pooled data for all years to estimate a pooled OLS regression explaining presidential
approval as a function of state unemployment rate (i.e., ignore the panel data structure). Report
the estimated regression equation, and interpret the result.
b. What is the omitted variable concern in part (a)?
c. Many political observers believe politics in the South are different. Add South as an additional
independent variable, and re-estimate the model from part (a). Report the estimated regression
equation. Do the results change?
d. Re-estimate the model from part (a), controlling for state fixed effects. [Hint: Instead of creating
dummy variables for each state, you can easily do this by adding i.StCode as part of your
regressor]. How does this approach affect the results? What is the omitted variable bias concern
in this new model?
e. Re-estimate the model from (d) and also add the South variable. What happened to the UnemPct,
South variables, and state fixed effects in the model? Why do you think this happens? Does this
model control for differences between southern and other states? [Hint: There are two different
ECO 480
Computer Assignment 2
4
ways to think about this: (1) estimating time invariable characteristics, or (2) perfect multi-
collinearity.]
f. Re-estimate the model from (d) controlling for both state and year fixed effects (i.e., two-way
fixed effects). How does this model affect the results? What is the omitted variable bias concern
in this new model? Out of all the regressions you ran for this question, which model do you prefer
the most? Briefly explain.
4. [Extra Credit] Simulation is a powerful methodology for investigating the properties of econometric
estimators and tests. The power of the method derives from being able to define and control the
statistical environment in which the investigator specifies the>generate data used to investigate the properties. We are going to use this simulation method to
examine the OLS estimator properties we learned in class. [Hint: Understand the Stata code I posted
for this question.]
Suppose we are interested in the effect of education on salary as expressed in the following
model:
= 0 + 1 +
For this problem, we are going to assume that the true model is
= 12000 + 1000 +
The model indicates that the salary for each person is $12,000 plus $1,000 times the number of
years of education plus the error term for the individual. Our goal is to explore how much our
estimate of ̂ varies.
I posted a code that will simulate a data set with 100 observations. Values of education for each
observation are between 0 and 16 years. The error term will be a normally distributed error term
with a standard deviation of 10,000. [Hint: Understand the OLS properties.]
a. (5) Explain why the means of the estimated coefficients across the multiple simulations
are what they are.
b. (5) What are the minimum and maximum values of the estimated coefficients on
education? Explain whether these values are inconsistent with our statement that OLS
estimates are unbiased.
c. (5) Rerun the simulation with a larger sample size in each simulation. Specifically, set the
sample size to 1,000 in each simulation. Compare the mean, minimum, and maximum of
the estimated coefficients on education to the original results above. Briefly explain.
d. (5) Rerun the simulation with a smaller sample size in each simulation. Specifically, set
the sample size to 20 in each simulation. Compare the mean, minimum, and maximum of
the estimated coefficients on education to the original results above. Briefly explain.
e. (5) Reset the sample size to 100 for each simulation, and rerun the simulation with a
smaller standard deviation (equal to 500) for each simulation. Compare the mean,
ECO 480
Computer Assignment 2
5
minimum, and maximum of the estimated coefficients on education to the original results
above. Briefly explain.
f. (5) Keeping the sample size at 100 for each simulation, rerun the simulation with a larger
standard deviation for each simulation. Specifically, set the standard deviation to 50,000
for each simulation. Compare the mean, minimum, and maximum of the estimated
coefficients on education to the original results above. Briefly explain.
g. (5) Revert to original model (sample size at 100 and standard deviation at 10,000). Now
run 500 simulations. Summarize the distribution of the ̂ estimates as you’ve
done so far, but now also plot the distribution of these coefficients using code provided.
Describe the density plot in your own words.