EMET8002 Analysis and Econometrics
Analysis and Econometrics
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
EMET8002 Case Studies in Applied Economic
Analysis and Econometrics
Question 1: Simple Linear Regression
Download the “states” data from Wattle and open it in Stata. As part of this question we
explore the relationship between SAT (Scholastic Assessment Test) scores and the per pupil
expenditure in primary and secondary school, in the U.S. on a state level.
(a) Describe the variables of interest (the SAT score, coded as “csat” and education
expense, coded as “expense”) individually as well as their correlations and a
scatterplot. Are there any outliers?
(b) Run a simple linear regression model where “csat” is the dependent (outcome)
variable and “expense” is the independent (explanatory) variable. Do this with and
without accounting for outliers. What changes? Which model do you prefer?
(c) Test whether the distribution of the residuals from your regressions in part (b)
follows a normal distribution. Does the normality assumption hold?
Question 2: Multiple Linear Regression and Quantile Regression
We continue working with the “states” dataset. As part of this question we explore the
relationship between SAT (Scholastic Assessment Test) scores and the following four
variables: (1) Per pupil expenditure in primary and secondary school ("expense"), (2) % High
school graduates taking SAT ("percent"), (3) Median household income in $1,000 ("income")
and (4) % adults college degree ("college"). The data is provided on a state level for the U.S.
(a) Describe the five variables of interest individually as well as their correlations.
(b) Run a multiple linear regression model where “csat” is the dependent (outcome)
variable and the other four variables are the independent (explanatory) variables.
2
(c) Test whether the distribution of the residuals from your regressions in part (b)
follows a normal distribution. Does the normality assumption hold?
(d) Instead of running a multiple linear regression which estimates the mean test
scores, as in part (b), run quantile regressions to estimate the median, the 10th
quantile and the 90th quantile of mean test scores. Use the same dependent and
independent variables as in your model from part (b).
Question 3: Preparation for the Research Report [not required for problem set]
Last week we discussed some aspects of the research report (worth 45% of your final mark)
and we now continue the preparation for the report as well as the research proposal. We
strongly recommend starting your work on the project as soon as possible.
(a) Have a look at the section with the research report on Wattle and discuss the structure
of the final research report.
(b) What data is required for replicating the papers? What are the data sources? If you
need to apply for the data through the Australian Data Archive we recommend to start
the process now.
(c) As part of the project you are required to replicate and extend one of the papers. First
of all, explain in your own words what is meant by replicating the main findings of a
paper.
(d) Now explain in your own words what is meant by extending the results of a paper.
(e) As an example, consider the possible extension to update the data (e.g., using new
waves of data). Discuss some ideas how this could be turned into a research question
and backed up with economic theory and/or academic literature.