Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
ECON7310: Elements of Econometrics
Research Project 1
Instruction
Please answer all questions following a format similar to the answers to your tutorial questions.
When you use R to conduct empirical analysis, you should show your R script(s) and outputs
(e.g., screenshots of commands, tables, and figures). You will lose 2 points whenever you fail
to provide R commands and outputs. Please clearly label all your answers and keep your
response brief and concise. You should upload your research report (in PDF or Word format)
via the Turnitin submission link (in the “Research Project 1” folder under “Assessment”) by
11:59 AM on the due date September 18, 2023. You are allowed to work on this assignment
in groups; however, you must answer all the questions in your own words and submit your
report separately. The marking system will check the similarity, and UQ’s student integrity
and misconduct policies on plagiarism will apply.
Background
Use the cps09mar.csv dataset to estimate the effect of education on earnings. Data descrip-
tion and variable definitions can be found in the document cps09mar description.pdf. For
all questions below, use the sub-sample of individuals who are non-Hispanic and at least 22
years old.
Research Questions
1. (10 points) Create two new variables: wage = earnings/(hours × week) and ln(wage)1.
Plot histograms for these two variables to explore their distributions.
1In R, the function log() by default computes natural logarithms.
1
2. (15 points) You have read in the news that women make 70 cents for every dollar earned
by men. To investigate this phenomenon, you first regress ln(wage) on a constant and
a binary variable, which takes on a value of 1 for females and is 0 otherwise.
(a) (3 points) Report the estimation result in the standard equation form as introduced
in Lecture 5, where the estimates are presented along with standard errors and
some measure of goodness of fit.
(b) (9 points) Based on the estimation result in part (a), calculate the female earnings
as a percentage of the male earnings. Indicate whether or not the percentage
difference in the mean wages is statistically significant.
(c) (1 point) How would you test whether or not women earn less than men on average
(in percentage terms)?
(d) (2 points) Are these results enough to argue that there is discrimination against
females in the labor market? Why or why not?
3. (15 points) You recall from your textbook that additional years of education are
supposed to result in higher earnings. For that reason, you decide to include the
education variable in the regression in question 2.
(a) (8 points) Report the estimation results. What is the effect of an additional year
of education on earnings (“returns to education”) for men? For women?
(b) (5 points) For a given level of education, how much less do females earn on average?
Does this result represent stronger evidence of discrimination against females?
(c) (2 points) To investigate whether or not there is discrimination against females,
you regress the log of earnings on determining variables, such as education, and a
binary variable for females. You consider two possible specifications. First, you run
two separate regressions, one for females and one for the others. Second, you run a
single regression but allow for the binary variable to appear in the regression. Your
professor suggests that the latter option is better for the task at hand, as long as
you allow for a shift in both the intercepts and the slopes. Explain her reasoning.
4. (15 points) You read in the literature that there should also be returns to on-the-job
training. To approximate on-the-job training, researchers often use the so-called Mincer
or potential experience variable, which is defined as exper = age - education - 6.
2
(a) (4 points) Under what condition(s) would the estimates in Question 3 be biased
and inconsistent due to the omission of the work experience?
(b) (8 points) You incorporate the experience variable into your regression in Question
3. Report the estimation results and interpret the estimated coefficients.
(c) (3 points) Draw scatter plots of ln(wage) versus exper for female workers with at
least a Bachelor’s degree or equivalent.
5. (20 points) You suspect the relationship between ln(wage) and exper is not linear. To
test this idea, you add the square of experience to your log-linear regression in Question
4.
(a) (6 points) Test for the significance of the coefficient of the quadratic term. Is it
meaningful? Are there strong reasons to assume that this specification is superior
to the previous one?
(b) (2 points) Has the coefficient on education changed much compared to the esti-
mation results in Questions 3 and 4? Why or why not?
(c) (6 points) Bob is a 40-year-old male high school graduate. What is the effect of an
additional year of experience on his hourly wages? Predict his hourly wages2.
(d) (6 points) What is the effect of an additional year of experience on the hourly wage
of a person who has 20 years of work experience, holding constant the gender and
the education variables? Calculate the 95% confidence interval of the estimated
effect. Is it a significant effect?
6. (12 points) With the regression model in Question 5, you are still concerned about
omitted variable bias. For that reason, you decide to include one more control variable in
the regression, and you want to find the effect of introducing marital status. Accordingly,
you specify a binary variable, Married, that takes on the value of one if the worker is
married (marital ≤ 3) and zero otherwise.
(a) (6 points) Compare the effect of being married and the effect of one year of
additional education, and test whether these two effects are of the same magnitude.
2Be careful! The left-hand side variable is ln(wage), but you are asked to predict Bob’s wage.
3
(b) (6 points) What is the percentage difference in earnings between a single male and
a married female, controlling for education and experience? What about between
single males and females? Between married males and females?
7. (10 points) In your final specification, you allow for the two binary variables, gender
and marital status, to interact, by adding the interaction term to the regression in
Question 6. Repeat the exercise in 6(b) of calculating the various percentage differences
between gender and marital status. Do you think the approach in this question is more
general than the one in Question 6?
8. (3 points) Report estimation results of all regressions in Questions 2 to 7 using a table
similar to those presented in your Tutorials 5–6.3
3If you are not familiar with LATEX, you can use the screenreg() function instead of texreg().