Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
MOEC0021 Empirical Methods
Problem Set 1 – Solutions are to be handed in as a single PDF file on OLAT. Please name your file Lastname1Lastname2Lastname3_PS1.pdf (alphabetical order). Include any code you wrote to answer the questions in the problem set. One of your goals is to communicate efficiently. Please keep your answers succinct. Lengthy answers will be marked down. 1. Theory – Using the CLRM to Make Predictions Consider the following regression model yi = x ′ iβ + εi, i = 1, 2, . . . , n, n+ 1 where xi and β are vectors of dimensions K × 1, and yi and εi are scalars. Suppose that you observe the regressors for all observations, x1, x2, . . . , xn, xn+1, and the outcome variable only for the first n observations, y1, y2, . . . , yn. You want to use your model to predict the unobserved outcome value yn+1. Let your prediction be yˆn+1 = x ′ n+1βˆn, where βˆn is the OLS estimator computed using the n observations for which yi is observed. (a) This may feel very abstract. Can you think of a economic context where this framework may be applied? Provide an example, specifying what i, yi, xi and εi would be in this context. Assume for the rest of this exercise that the CLRM assumptions hold. In particular, ε|X ∼ N (0, σ2In+1), where we define ε = (ε1, ε2, . . . , εn+1)′ and X′ = (x1, x2, . . . , xn+1). (b) Derive the conditional expectation function of yi given xi, E(yi|xi), and the conditional variance of yi given xi, Var(yi|xi). (c) Suppose the CLRM assumptions hold in the example you provided in part (a). Briefly interpret your results for E(yi|xi) and Var(yi|xi) in this context. (d) Define your prediction error for observation n+ 1 as eˆn+1 = yn+1 − yˆn+1. We say that a prediction is unbiased if E(eˆn+1|X) = 0. Is you prediction yˆn+1 unbiased? Explain why in your own terms. Problem Set 1 – The Classical Linear Regression Model 2 (e) What is the conditional variance of your prediction error for observation n+ 1, Var(eˆn+1|X)? Is it larger or smaller than Var(yi|xi) derived in part (b)? Explain. (f) What happens to Var(yi|xi) and Var(eˆn+1|X) as n, the size of the estimating sample, increases? Comment briefly. Problem Set 1 – The Classical Linear Regression Model 3 2. Empirical Application – The Beauty and the Student. Interpret- ing Regressions in the CLRM The goal of this question—besides learning some cool econometrics—is to investigate how university students evaluate the teaching performance of their professors. In particular, we will ask whether professors’ looks have an effect on their overall teaching evaluation score. After calculating standard summary statistics and plotting the data, we will run some regressions to better understand which instructor characteristics are associated with high course evaluation ratings.1 (a) Install the R package AER. We will use the data set TeachingRatings that is included in the AER package. There should be 12 variables and 463 observations. Read the R Documentation of TeachingRatings and explain in one sentence: what is the unit of observation in this data set? (b) Provide a table with the mean, standard deviation, minimum and maximum value for the variables: eval, beauty, age, allstudents. Furthermore, cross-tabulate the variables gender and minority to find out how many courses in our sample were taught by professors who are female and belong to a minority group. Finally, we want to make sure our data set is complete: count the number of missing values in each variable: eval, beauty, age, allstudents, gender, minority. (c) Depict the distribution of eval in a histogram. Additionally, plot the joint distribution of eval and beauty using a scatterplot. Explain in one sentence: why is it a good idea to always plot your data? (d) We want to study the relationship between a course’s overall teaching evaluation score and its instructor’s physical appearance. What is your prior regarding the sign of the coefficient? Explain your reasoning in one sentence. To check, we consider the following regression model: evali = β1 + β2beautyi + i (1) • Compute the OLS coefficient estimates βˆ1 and βˆ2 using only the following func- tions from the R ’base’ package: mean(), sum(), var() and cov(). (Hint: use the formulas from the lecture notes Topic 1b.) • Compute the same OLS coefficient estimates using your favorite estimation com- mand in R (or Stata) and present the results in a clean table (like those in pub- lished papers). How do the computer’s results compare to your own calculations? • Suppose the CLRM Assumption 2 holds. How do you interpret the coefficient of instructor’s physical appearance? 1We use data from Hamermesh and Parker (2005): https://doi.org/10.1016/j.econedurev.2004. 07.013. Problem Set 1 – The Classical Linear Regression Model 4 (e) Do you believe it is important to include a constant in the above regressions? Why or why not? Please explain your reasoning in 80 words or less. (f) Let us consider a more elaborate regression model: evali =β1 + β2beautyi + β3agei + β4age 2 i + β5ln(allstudentsi)+ + β6genderi + β7minorityi + β8female_minorityi + i (2) where female_minorityi is 1 if the instructor of course i is female and belongs to a minority group, and 0 otherwise. Please run the regression, present your results in a clean table, and calculate the marginal effects of • an increase of the instructor’s beauty rating by one standard deviation (you cal- culated this in question (b)) • the instructor being a non-minority male as opposed to non-minority female • the course having 10% more students • the instructor being 60 as opposed to 50 years old on a course’s teaching evaluation score. (g) What is the predicted course teaching evaluation score when all explanatory variables are at their mean? Why is this not an informative number to look at? (h) You might also consider including a dummy variable male_minorityi in the above specified multivariate regression model. Why is this a good or bad idea? (i) Suppose the CLRM Assumption 2 holds in the case of the regression model estimated in question (f). What do you conclude about the importance of professors’ looks on their overall teaching evaluation score? (Hint: How large is the effect? Is the effect causal?) (j) Provide a concrete scenario in which the CLRM Assumption 2 does not hold in the case of the regression model estimated in question (f). (k) Using the model specified in question (f), calculate the predicted residuals ˆi. • Calculate the covariance between beautyi and ˆi (round your answer to 6 decimal places). What does this tell you about the validity of CLRM Assumption 2? • Construct a scatter plot of the residuals against ln(allstudentsi). What does this tell you about the validity of CLRM Assumption 3? • Plot the density of the residuals over the density of a normal distribution. What does this tell you about the validity of CLRM Assumption 5? Hint: You can randomly draw N=463 observations from a standard normal dis- tribution and plot that series against the residuals using a stacked data set.