Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
ECON7310: Elements of Econometrics
Instruction
Answer all questions following a similar format of the answers to your tutorial questions. When
you use R to conduct empirical analysis, you should show your R script(s) and outputs (e.g.,
screenshots for commands, tables, and figures, etc.). You will lose 2 points whenever you fail
to provide R commands and outputs. When you are asked to explain or discuss something,
your response should be brief and compact. To facilitate our grading work, please clearly label
all your answers. You should upload your research report (in PDF or Word format) via the
“Turnitin” submission link (in the “Research Project 1” folder under “Assessment”) by 11:59
AM on the due date April 24, 2022. Do not hand in a hard copy. You are allowed to work
on this assignment in groups; that is, you can discuss how to answer these questions with your
group members. However, this is not a group assignment, which means that you must answer
all the questions in your own words and submit your report separately. The marking system will
check the similarity, and UQ’s student integrity and misconduct policies on plagiarism apply.
Background
Use the cps09mar.csv dataset to estimate the effect of education on earnings. Data description
and variable definitions can be found in the document cps09mar description.pdf. For all
questions below, use the sub-sample of non-Hispanic women at least 23 years old.
Research Questions
1. (20 points) Load this dataset in R (2 points). Create a new variable
wage = earnings/(hours× week).
Obtain summary statistics (mean, standard deviation, 25, 50 (median), and 75 percentiles)
for wage and education (5 points). Plot histograms for these two variables to explore
their distributions. Make your histograms reader-friendly; that is, give informative ti-
tles and variable names instead of just using the default titles and variable names (6
points). For example, you could use Years of Schooling in place of education. Create
a new variable ln(wage)1 and draw a scatter plot of ln(wage) versus education (5 points).
Comment on the correlation between these two variables (2 points).
2. (25 points) Take the sub-sample of non-Hispanic women to estimate the simple linear
regression model:
ln(wagei) = β0 + β1educationi + ei. (1)
where ei is the error and β0 and β1 are the unknown population coefficients.
1In R, the function log() computes logarithms, by default natural logarithms.
1
(a) (3 points) Report the estimation results in a standard form as introduced in Lecture
5. For example, see page 5, where the estimates are presented in an equation form,
along with standard errors (SE) and some measure for goodness of fit.
(b) (3 points) Plot the estimated regression line you obtained in (a) on the scatter plot
you constructed in Question 1.
(c) (6 points) Interpret the estimated coefficient on education (3 points) and test
whether or not the population coefficient β1 is zero at the 1% significance level (3
points).
(d) (6 points) The hourly wage could also depend on one’s work experience. Under what
condition(s) would the estimates in (a) be biased and inconsistent due to the omission
of the work experience (4 points)? Explain whether the coefficient on education in
(a) would be overestimated or underestimated (2 points). Hint: Review pages 4 and
5 of Lecture 4.
(e) (7 points) Create a new variable experience = age − education − 6 to measure
one’s work experience. You want to include experience in regression (1) and regress
ln(wage) on education and experience. However, you are not sure whether to also
add a quadratic term, such as experience2, to the regression equation along with
experience. Use a hypothesis test to help you choose the more appropriate model
(4 points). Estimate your selected model and report the results in a standard form
(3 points).
3. (43 points) With the regression model that you selected in 2(e), you are still concerned
about omitted variable bias. For that reason, you decide to include more control variables
in the regression.
(a) (11 points) Include a set of dummy variables for regions and marital status and
estimate the extended model (4 points). For regions, create dummy variables for
Northeast, South, and West so that Midwest is the excluded group. For marital
status, create variables for married (marital ≤ 3), widowed or divorced, and sepa-
rated, so that single (never married) is the excluded group. Report a 95% confidence
interval for the slope coefficient on education (2 points), explain the relationship
between the confidence interval and hypothesis testing (2 points), and test the hy-
pothesis that one year of additional education would increase hourly wage by 12%
(3 points).
(b) (5 points) Using the estimation results, test the hypothesis that the hourly wage
is not affected by the geographic location (3 points). Explain how you reach your
conclusion (2 points).
(c) (8 points) Include a dummy variable black for black workers (race = 2) in the
model you considered in 3(b) and run OLS estimation. Explain what the estimated
coefficient on black means on hourly wage (3 points), compare the effect of being a
black worker and the effect of one year of additional education (2 points), and test
whether these two effects are of the same magnitude (3 points).
(d) (7 points) How would you modify the model to test if the effects on hourly wage
of one additional year of education differ between black and non-black workers (4
points). Implement your proposed test and report the results (3 points). Hint: See
pages 27–39 of Lecture 6.
(e) (7 points) Kate has 31 years of work experience. Using the regression model of
3(d), test if one additional year of work experience has significant effects on her
hourly wage (5 points). Provide a formula for calculating this effect (2 points). Hint:
Read pp. 9–17 of Lecture 6.
2
(f) (5 points) Betty is a married, white woman, working in Boston. After she obtained
her college degree (= 16 years of schooling), she got a job and started working instead
of getting a higher education. Now she has a five-year of experience in the industry.
Predict Betty’s hourly wage.2
4. (12 points) It may be more useful to estimate the effect on earnings of education by using
the highest diploma/degree rather than years of schooling. Define four dummy variables
to indicate educational achievements:
lt hs = 1 if education < 12
hs = 1 if education = 12
col = 1 if education ≥ 16
some col = 1 for all other values of education.
(a) (4 points) Create the dummy variables lt hs, hs, col, and some col as defined
above and compute the sample means of hourly wage for each of the four education
categories.
(b) (5 points) Replace the education in the regression model of 3(d) with these dum-
mies and estimate their coefficients. Can you obtain the OLS estimates for all these
four dummies? Explain your answer (3 points). Interpret the coefficient on hs (2
points).
(c) (3 points) Report estimation results of regressions in 2(e), 3(a), 3(c), 3(d), and 4(b)
using a table similar to those presented in your Tutorials 5–6. Hint: If you are not
familiar with LATEX, you can use the screenreg() function instead of texreg().
2Be careful! the left-hand side variable is ln(wage), but you are asked to predict Betty’s wage.