Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
REGRESSION MODELLING
STAT7038
INSTRUCTIONS:
• This assignment is worth 15% of your overall marks for this course.
• You must complete this assignment by yourself. If you copy someone else’s work or allow your work to be copied, you will receive a mark of zero for the assignment and risk very severe academic consequences.
• Your report should be submitted to Turnitin on Wattle as a single pdf document (less than 25MB) including the following:
1. The assignment cover sheet (available to download from Wattle).
2. Your assignment (no more than 10 pages).
3. An appendix including the R codes you used. Failure to upload the R code will result in a penalty.
Question 1 [100 Marks]
You decide to work as an academic staff in a university. Other than research ability, academic administrators pay attention to teaching quality in setting salaries. You would like to know how some ascriptive characteristics, such as beauty, affect the instructor’s ratings by students. You are given a dataset containing professor characteristics for 463 courses for the academic years 2000 −2002 at the University of Texas at Austin. The response variable is teaching evaluation scores (eval) and the predictors are ratings of the instructor’s physical appearance measured by a score (beauty), age (age), number of students that participated the evaluation (student), number of students enrolled in the course (allstudents), whether the instructor is male or female (gender), whether the instructor is from a minority group (minority), whether the instructor is on tenure track (tenure), and whether the instructor is a native English speaker (native).
In this assignment, we would like to use some of these variables to try and build a multiple regression model with eval as the response variable. Use R to further analyse the “teach” data (available on Wattle) and answer the following questions:
(a) [6 marks] First identify which variables are numeric in this dataset and fit a multi- ple linear regression (MLR) model with eval as the response variable and all other numeric variables as predictors. Present the main residual plot of the residuals against the fitted values for this model. Are there are any obvious problems with underlying assumptions?
(b) [10 marks] It is not very difficult to see that eval is always positive (ranges from 0 to 5), so it would be worth trying to transform the variable such as the log transformation. Now fit a MLR model with ln(eval) as the response variable, still using all the other numeric variables (not log transformed) as explanatory variables. Again present the main residual plot of the residuals against the fitted values for this new model. Comment on this new residual plot. Then, test whether this model is significant.