Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
EMET8005 Assignment
Instructions The assignment is due 12 noon on Tuesday 16 May 2023. Your report should
be uploaded to Wattle using the Turnitin link provided. Late submissions will receive a mark
of 0 unless an extension has been granted before the deadline, as per the course outline.
Your report must be all your own original work. Your report should be typed and the
file should be in either Word or pdf format. Part of the assignment is to present results
‘professionally’. This means that there should be no Stata commands or Stata output in the
main text. Extract the information you need from the Stata output, and create nice tables and
figures similar to those you see in textbooks and journal articles. Attach your Stata do and log
files as appendices. The do file must be annotated with explanatory comments, so that it is
clear what results are sought, and it must run without syntax errors (assuming the data file is
in the current working directory).
There is no strict word limit but, everything else equal, a clear and concise writing style
may attract higher marks. We anticipate most reports will be between 600 and 1400 words
(excluding tables).
If you have any questions about the assignment, please email [email protected]. There
is no penalty for clarification questions.
Student beauty and grades Before working on this question, read sections of 1–3 of Adrian
Mehic’s study ‘Student beauty and grades under in-person and remote teaching’ published in
Economics Letters 219 (2022), article number 110782. (You can find it through the ANU
library.) In this question, we will analyse Mehic’s data. We will take a di↵erent approach, so
you don’t need to read section 4 in his paper.
Download the file lundbeauty2023.dta from Wattle. The data are described in Mehic’s
article. The dataset has brief variable labels in English, so hopefully you can understand what
each variable represents. The dataset is a panel, where the observational entity is a student
and the course code plays the role of time (cf eg a state-year panel dataset).
Note the data are arranged in ‘long form’: there is a row for each examination result for
each student.
Mehic analysed standardised log grades. We are not going to bother here, and we will
just use grades directly as the regressand. We will use the standardised beauty measure
however.
Beware there are 15 courses in the program, and one course is missing from Table 1 in
the paper.
Something to keep in mind is that the cohorts starting in each year may have di↵erent
(observed and unobserved) characteristics on average (eg ability, ambition, beauty).
Something to keep in mind is that average grades vary across courses; in particular, grades
are lower in the advanced courses 13–15 compared to courses 1–12.
Unfortunately, the standard errors will be relatively large, so we will not be able to make
any firm conclusions in this analysis.
Questions follow below.
It
t
_TEF Tf
32
2(a) To get familiar with the data, examine the properties and write a short description of
each variable. Report the unit of measurement, the mean and standard deviation for
variables where these quantities make sense. For categorical variables, describe how
many categories there are and what the distribution is. Check if there are any missing
values in any variables. Include histograms for standardised beauty and for grades.
Note: When you compute and discuss summary statistics, beware of possible ‘double-
counting’. There are multiple exam results per student, multiple students in each cohort,
multiple students enrolled in each course, etc.
(b) The dataset concerns students who started in each of the years 2015–2019. So there are
5 student cohorts. During the first two years of study, these students have to take 15
mandatory courses. A di↵erent instance of each course is o↵ered for each cohort. We
need a word to refer to that, and I am proposing to use ‘unit’. So in total there are data
for di↵erent 75 units (15 courses for 5 cohorts).
We can think of the di↵erence between being taught online vs being taught in person
on campus as the treatment vs the control. To understand the pattern of treated and
untreated units, compute the proportion treated for each unit. Present the proportions
for the cohorts and courses in a two-way table. Comment on the pattern (eg which cohort
is treated when, how many treated vs untreated units).
(c) To begin, let’s investigate the relationship between beauty and grades in normal times, ie
in units that are taught in person and on campus. Since the cohorts are di↵erent and the
courses are di↵erent, we need to allow for di↵erent average grades across units, but for
simplicity let’s estimate common coecients for standardised beauty. (It should represent
some weighted average of the unit-specific e↵ects.)
We can use subscript i for the student and subscript c for the course. To state the models,
we notation for all the category dummies. Define the 5 cohort dummies cohort ji =
1(cohort i = j), the 15 course dummies coursekc = 1(coursec = k), the 75 unit dummies
unitcji = 1(unitci = j), and the age dummies by ageki = 1(age i = k). Then a basic
short model is
gradeci = 1stdbeauty i +
19X
j=16
2jcohort ji +
15X
k=2
3kcoursekc + 0 + Ui .
A long model with more detailed controls is
gradeci = 1stdbeauty i +
75X
j=2
2junitcji +
25X
j=19
3kageki + 0 + Ui .
Estimate these models using the data for the subsample of units not a↵ected by pandemic-
related restrictions. Cluster the standard errors at the student level. (Check that you
have estimated 20 coecients in the short model and 69 in the long model.)
Present the key estimates and discuss the implications (eg the magnitude and the uncer-
tainty of the estimates). Here and elsewhere, all estimates should be accompanied by a
standard error and a confidence interval.
Note: You will need to create the unit variable, unitci , from cohort i and coursec . Anything
that assigns a unique code to each cohort-course combination will do. For example, gen
int unit=cohort*100+course. Or use egen if you prefer values like 1, 2, ....
3Note: The grades are clearly not independent across courses for the same student, so at
least we should cluster the standard errors at the student level. Grades are probably not
independent within courses either, since there is usually a single teacher lecturing and
writing the exam, but hopefully we can capture most of that dependence by including
course or unit dummies in the regressions.
(d) Next, write the equations for extended models that allow for both di↵erent levels of
average grades and di↵erent beauty coecients across the 4 combinations of male/female
gender and quantitative/non-quantitative courses, but keep the other controls (cohort,
course, unit, age) the same for all 4 combinations.
Estimate the extended models using only untreated units. Present the key estimates in a
table and discuss the implications.
Note: Beware that quantc is collinear with the set of unitcji dummies, so Stata will
probably omit one of latter dummies to avoid the dummy variable trap. The key estimates
should be the same no matter which dummy Stata omits.
(e) Now on to comparing online vs on-campus courses. Remember only certain courses for
certain cohorts were online, so we have to think about what is a good comparison group.
Suppose we focus on the cohorts which were treated in some but not all courses. Discuss
the merit of comparing grades in online and on-campus courses for the specific cohorts
which had both.
Suppose we focus on courses which were taught on-campus for some cohorts and online
for other cohorts. Discuss the merit of comparing grades in online and on-campus courses
for those specific courses.
(f) Let’s try a DD approach to investigate how the relationship was a↵ected when teaching
moved online. Let’s first see if there is any treatment e↵ect on average grades. Here is
a basic DD model
gradeci = 1covid ci +
19X
j=16
2jcohort ji +
15X
k=2
3kcoursekc + 0 + Ui .
Estimate the model using all units (ie the full sample), present the key result, and com-
ment.
(g) Now, let’s see if the relationship between grades and facial attraction is di↵erent in
treated and untreated units. This is an extension of the DD methodology we’ve discussed
previously, since we are looking at di↵erences in ‘slope’ as opposed to di↵erences in ‘level’,
but the idea is the same. Here is an extended DD model
gradeci = 1stdbeauty i + 2covid cistdbeauty i + 3covid ci
+
19X
j=16
4jcohort ji +
15X
k=2
5kcoursekc + 0 + Ui .
You can verify (using the ‘plug-in’ method) that 1 represents the e↵ect (‘slope’) of
stdbeauty i in untreated units and 2 is the e↵ect for treated units.
Estimate the model (using all units), present the key results, and comment.
4(h) Now, let’s further extend the model in part (g) to see if there are gender di↵erences. (For
simplicity, let’s ignore di↵erences across quantitative and non-quantitative courses.)
Write the DD equation extending the model in part (g) to allow for both the mean
level of grades and the coecient on beauty to vary by both female i and covid ci , but
keep the cohort and course part of the model unchanged. (Probably you should have 26
coecients.) Estimate the model. Tabulate and discuss the results.
(i) Explain the conditions that must hold if we are going to interpret the estimated treatment
coecient(s) in part (h) as causal.
Optional: Suggest a graphical way of examining the main condition. Present and discuss
the graph. (You may be disappointed, the dataset is small and the estimates are very
‘noisy’, so your graph may not yield clear conclusions.)