Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
In this assignment, we will continue with developing a suitable regression model for your CEO dataset from Assignment #2, continuing with your fitted model from 2e) of the assignment (i.e. the model fit without the BACKGRD variate).
1) Plot the residuals vs the fitted values, as well as a QQ plot. Comment on the adequacy of the fitted model, in terms of the model assumptions.
2) One approach to stabilize the variance of the residuals and/or more adequately describe the relationship between a response variate and the explanatory variates is with an appropriate transformation of the response variate.
a) Create a histogram of CEO compensation. What characteristic of this variate might lead you to suspect that a log transformation maybe suitable?
b) Refit the data using the (natural) log transformation of compensation.
c) Compare the overall fit of the model and significance of the individual parameters with that of the original (untransformed) model.
d) Replot the two residual plots in 1). Has the transformation helped to address the issues with the adequacy of the (untransformed) model?
3) We can also investigate the suitability of transformations of one or more of the explanatory variates by looking at scatterplots of the variates vs the response (log(COMP), in this case).
a) Create a scatterplot of SALES vs log(COMP). Does a linear model seem appropriate for these two variates?
b) Create a scatterplot of log(SALES) vs log(COMP). Comment.
c) Refit the model once again, this time taking the log transformation of compensation as well as of the variates SALES, VAL, PCNTOWN and PROF. We will use this model going forward. Comment on the effect these transformations have on the overall fit of the model, and on the p-values of the associated variates.
4) Plot the residuals vs the fitted values and the QQ plot for the model in 3). Comment on the effect of the transformations on the model assumptions.
5) Replot the plots in 4) using the studentized residuals. Do you notice any major changes in these plots? Are there any outliers present?
6) Plot the hat values vs index (observation number). Are there any high leverage points?
7) Investigate the observation with the highest leverage for a possible cause.
8) Plot the Cook’s Distance values. Are there any influential cases?
9) Now that we have obtained a more adequate model through transformation of the response and some of the explanatory variables, we can further improve the model by using model selection methods.to select which subset of variables to include.