Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
ECO 480
Computer Assignment 2 1
ECO 480 Econometrics 1 Computer Assignment 2
Instruction: You have approximately three weeks to complete this assignment. No late work will be accepted and all the files must be submitted via UBLearns. Make sure you upload it to the correct submission slot because no credit will be given for incorrect submissions. I do not accept email submissions.
Important: It is extremely important to write a clean well-commented program for transparency and replication purposes in all empirical work. You should always be able to reproduce your result from raw data to support your claim.
There are 3 items to hand in: (1) Typed write-up (i.e., word-file) answering the assigned questions, reporting your results, and interpreting your findings; if the question asks for graphs or tables, these must be in the word-file in an organized manner with your interpretation, (2) do-file (i.e., program file), and (3) log-file (i.e., output file that shows the results). You MUST use Stata. For questions involving data analysis, you will NOT get any credit if you do not provide a program code and the output. You may not use Excel. Do not submit any undigested log-file that contains errors. Put all your answers in the word file and do NOT say “please see log-file (or do-file) for answers. You will not receive any credit for answers that is not stated in the word file.
1. Do citizens demand more democracy and political freedom as their incomes grow? That is, is democracy a normal good? We will use Income_Democracy to explore this question. This data contains a panel data from 195 countries for the years 1960, 1965, ... , 2000. A detailed description is given in Income_Democracy_Description. The data set contains an index of political freedom/democracy for each country in each year, together with data on each country’s income and various demographic controls. (The income and demographic controls are lagged five years relative to the democracy index to allow time for democracy to adjust to changes in these variables.)
a. Is the data set a balanced panel? Explain. b. The index of political freedom/democracy is labeled Dem_ind. i. What are the minimum and maximum values of Dem_ind in the data set? What are the mean and standard deviation of Dem_ind in the data set? What are the 10th, 25th, 50th, 75th, and 90th percentiles of its distribution? ii. What is the value of Dem_ind for the United States in 2000? Averaged over all years in the data set? iii. What is the value of Dem_ind for Libya in 2000? Averaged over all years in the data set? iv. List five countries with an average value of Dem_ind greater than 0.95; less than 0.10; and between 0.3 and 0.7.
c. The logarithm of per capital income is labeled Log_GDPPC. Regress Dem_ind on Log_GDPPC. Use standard errors that are clustered by country.
i. How large is the estimated coefficient on Log_GDPPC? Is the coefficient statistically significant? ii. If per capita income in a country increases by 20%, by how much is Dem_ind predicted to increase? What is a 95% confidence interval for the prediction? Is the predicted increase in ECO 480 Computer Assignment 2 2
Dem_ind large or small? (Explain what you mean by large or small). [Hint: The descriptive statistics you computed is helpful to understand whether the increase is large or small.]
d. Suggest a variable that varies across countries but plausibly does not vary (or varies little) over time that could cause omitted variable bias in the regression in (c). Write down the model that address this omitted variable bias concern.
e. Suggest a variable that varies over time but plausibly does not vary (or varies little) across countries that could cause omitted variable bias in the regression in (c). Write down the model that address this omitted variable bias concern.
f. Write down your preferred model and estimate the model. How does your answer to (c)(i) and (c)(ii) change?
g. What is the omitted variable bias concern in (f). There are additional demographic controls in the data set. Should these variables be included in the regression? If so, how do the results change when they are included?
h. Based on your analysis, what conclusion do you draw about the effects of income on democracy?
2. In 1993, Georgia initiated a HOPE scholarship program to let state residents who had at least a B average in high school attend public college in Georgia for free. The program is not need based. Did the program increase college enrollment? Or did it simply transfer funds to families who would have sent their children to college anyway? We will use HOPE dataset for this problem that come from Dynarski (2000). She used data on young people in Georgia and neighboring states to asses this question.
The definition of each variable is the following:
• InCollege = A dummy variable equal to 1 if the individual is in college • AfterGeogia = A dummy variable equal to 1 for Georgia residents after 1992 • Georgia = A dummy variable equal to 1 if the individual is a Georgia resident • After = A dummy variable equal to 1 for observations after 1992 • Age = Age • Age18 = A dummy variable equal to 1 if the individual is 18 years old • Black = A dummy variable equal to 1 if the individual is African-American • StateCode = State codes • Year = Year of observation • Weight = Weight used in Dynarski (2000)
a. Run a basic difference-in-difference model (i.e., treatment, control, and before and after without any additional controls). What is the effect of the program?
b. Calculate the percent of people in the sample in college from the following four groups: (i) Before 1993/non-Georgia, (ii) Before 1993/Georgia, (iii) After 1992/non-Georgia, and (iv) After 1992/Georgia. First use the mean function (Hint: use mean Y if X1 == 0 & X2 == 0). Second, use the coefficients from the OLS output in part (a).
c. Graph the fitted lines for the Georgia group and non-Georgia samples. ECO 480 Computer Assignment 2 3
d. Use panel data formulation for a difference-in-difference model to control for all year and state fixed effects.
e. Add covariates for 18-year-olds and African-Americans to the panel data formulation. What is the effect of the HOPE program?
f. The way the program was designed, Georgia high school graduates with a B or higher average and annual family income over $50,000 could qualify for HOPE by filling out a simple one-page form. Those with lower income were required to apply for federal aid with a complex four-page form and had any federal aid deducted from their HOPE scholarship. Run separate basic difference-in-difference models for these two groups, which is indicated by LowIncome. Comment on the substantive implication of the results.
g. Re-do part (f) in one regression. What is the main advantage of re-doing part (f) in one regression?
3. Researchers have long been interested in the relationship between economic factors and presidential elections. The PresApproval.dta data set includes data on presidential approval polls and unemployment rates by state over a number of years.
The definition of each variable is the following:
• State = state name • StCode = state numeric ID • Year = year • PresApprov = Percent positive presidential approval • UnemPct = state unemployment rate • South = binary variable (1 if Southern state, 0 otherwise)
a. Use pooled data for all years to estimate a pooled OLS regression explaining presidential approval as a function of state unemployment rate (i.e., ignore the panel data structure). Report the estimated regression equation, and interpret the result.
b. What is the omitted variable concern in part (a)?
c. Many political observers believe politics in the South are different. Add South as an additional independent variable, and re-estimate the model from part (a). Report the estimated regression equation. Do the results change?
d. Re-estimate the model from part (a), controlling for state fixed effects. [Hint: Instead of creating dummy variables for each state, you can easily do this by adding i.StCode as part of your regressor]. How does this approach affect the results? What is the omitted variable bias concern in this new model?
e. Re-estimate the model from (d) and also add the South variable. What happened to the UnemPct, South variables, and state fixed effects in the model? Why do you think this happens? Does this model control for differences between southern and other states? [Hint: There are two different ECO 480 Computer Assignment 2 4
ways to think about this: (1) estimating time invariable characteristics, or (2) perfect multi- collinearity.]
f. Re-estimate the model from (d) controlling for both state and year fixed effects (i.e., two-way fixed effects). How does this model affect the results? What is the omitted variable bias concern in this new model? Out of all the regressions you ran for this question, which model do you prefer the most? Briefly explain.
4. [Extra Credit] Simulation is a powerful methodology for investigating the properties of econometric estimators and tests. The power of the method derives from being able to define and control the statistical environment in which the investigator specifies the>generate data used to investigate the properties. We are going to use this simulation method to examine the OLS estimator properties we learned in class. [Hint: Understand the Stata code I posted for this question.]
Suppose we are interested in the effect of education on salary as expressed in the following model:
= 0 + 1 +
For this problem, we are going to assume that the true model is
= 12000 + 1000 +
The model indicates that the salary for each person is $12,000 plus $1,000 times the number of years of education plus the error term for the individual. Our goal is to explore how much our estimate of ̂ varies.
I posted a code that will simulate a data set with 100 observations. Values of education for each observation are between 0 and 16 years. The error term will be a normally distributed error term with a standard deviation of 10,000. [Hint: Understand the OLS properties.]
a. (5) Explain why the means of the estimated coefficients across the multiple simulations are what they are.
b. (5) What are the minimum and maximum values of the estimated coefficients on education? Explain whether these values are inconsistent with our statement that OLS estimates are unbiased.
c. (5) Rerun the simulation with a larger sample size in each simulation. Specifically, set the sample size to 1,000 in each simulation. Compare the mean, minimum, and maximum of the estimated coefficients on education to the original results above. Briefly explain.
d. (5) Rerun the simulation with a smaller sample size in each simulation. Specifically, set the sample size to 20 in each simulation. Compare the mean, minimum, and maximum of the estimated coefficients on education to the original results above. Briefly explain.
e. (5) Reset the sample size to 100 for each simulation, and rerun the simulation with a smaller standard deviation (equal to 500) for each simulation. Compare the mean, ECO 480 Computer Assignment 2 5
minimum, and maximum of the estimated coefficients on education to the original results above. Briefly explain.
f. (5) Keeping the sample size at 100 for each simulation, rerun the simulation with a larger standard deviation for each simulation. Specifically, set the standard deviation to 50,000 for each simulation. Compare the mean, minimum, and maximum of the estimated coefficients on education to the original results above. Briefly explain.
g. (5) Revert to original model (sample size at 100 and standard deviation at 10,000). Now run 500 simulations. Summarize the distribution of the ̂ estimates as you’ve done so far, but now also plot the distribution of these coefficients using code provided. Describe the density plot in your own words.