Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
STAT2008 Regression Modelling
Examination/Writing Time Duration: 180 minutes
Reading Time: 15 minutes
Exam Conditions:
Central Examination. This examination paper is not available to the ANU Library archives.
Students must return the examination paper at the end of the examination.
Materials permitted in the exam venue: (No electronic aids are permitted e.g. laptops, phones)
Unannotated paper-based dictionary (no approval required),
One A4 page with notes on both side, Calculator
Materials to be supplied to Students:
Scribble Paper
Instructions to Students:
1. This examination paper comprises a total of twenty (20) pages and there is a separate handout of
R output which also has a total of twenty (20) pages. During the reading time preceding the exam,
please check that both documents have the correct number of pages.
2. All answers are to be written on this exam paper, which is to be handed in at the end of the exam.
You may make notes on scribble paper (or on the R handout) during the reading time, but
do NOT write on this exam paper until after the start of the writing time. If you need additional
space, use the rear of the previous page and clearly indicate the part of the question that your
answer refers to. The R handout and any scribble paper will be collected at the end of the
examination and destroyed, they will not be marked.
3. There are a total of four questions, which are worth 15 marks each, for a total of 60 marks.
The parts of each question are of unequal value, with the marks indicated for each part.
You should attempt to answer each and every part of all four questions. This examination
counts towards 60% of your final assessment.
4. Please write your student number in the space provided at the top of this page.
5. Include a clear statement of the formulae you use to answer each question.
6. Statistical tables (generated using R) are provided on pages 19 and 20 at the end of the handout of
R output. Unless otherwise indicated, use a significance level of 5% and note that log x refers to the
natural logarithm of x.
Q1 Q2 Q3 Q4 Total
Pages 2 to 6 7 to 11 12 to 15 16 to 20
Marks 15 15 15 15 60
Score
Venue _________________________________________
STUDENT
NUMBER U
Final Examination, Semester 1, 2017 STAT2008 Regression Modelling
Page 2 of 20
Question 1 (15 marks)
The faraway library includes a data frame called cheddar, which contains data from a study of
cheddar cheese from the La Trobe Valley in Victoria. The concentration of Lactic acid, along
with the concentrations (on a log scale) of both Acetic acid and H2S (hydrogen sulphide) were
measured from 30 samples of cheese, which were then subjected to taste tests. Overall taste
scores were obtained by combining the scores from several tasters.
(a) A multiple regression model (cheddar.lm) has been fitted to these data and the summary
output from this model is given at the top of page 2 of the R output, but the analysis of
variance (ANOVA) table is not shown. Fill in the details of the ANOVA table in the
spaces shown below:
Df Sum Sq Mean Sq F value Pr(>F)
H2S
Lactic
Residuals
(3 marks – 1 for each row of the ANOVA table)
[Hint: rounding errors will accumulate as you derive entries in this table from other
values shown in the R output, so do NOT round the results of intermediate
calculations. DO round all your final answers in the above table to 2 decimal places.
You may also have to use the statistical tables to estimate one or more of the
p-values, or you can receive the marks for showing appropriate critical values.]
Working
Final Examination, Semester 1, 2017 STAT2008 Regression Modelling
Page 3 of 20
Question 1 continued
(b) Residual plots for the model in part (a) are shown on pages 2 and 3 of the R output.
Do these plots suggest any problems with the underlying assumptions?
What is your overall assessment? (select just ONE of the following options)
□ Residuals are not independent (obvious pattern)
□ Residuals do not have constant variance (heteroscedasticity)
□ Residuals are not normally distributed
□ There are possible outliers and/or influential observations
□ More than one of the above problems
□ No obvious problems
(2 marks – 0.5 for each section)
Are there any problem(s) shown on the “Residuals vs Fitted” plot on page 2?
If so describe the problem(s):
Are there any problem(s) shown on the “Cook’s distance” plot on page 3?
If so describe the problem(s):
Are there any problem(s) shown on the “Normal Q-Q” plot on page 3?
If so describe the problem(s):
Final Examination, Semester 1, 2017 STAT2008 Regression Modelling
Page 4 of 20
Question 1 continued
(c) For each of the following five diagnostic measures shown on page 4 of the R output,
calculate the relevant cut-off value suggested in the lecture notes and discuss whether or
not this cut-off is appropriate in this instance. Which observations, if any, exceed each
of the cut-off values?
(see the next page for more answer spaces for part (c) of Question 1)
The leverage or hat values (hii)
DFFITS
The externally studentised residuals (ti)
Final Examination, Semester 1, 2017 STAT2008 Regression Modelling
Page 5 of 20
Question 1, part (c) continued
(7 marks – 1 for each of the first 5 sections and 2 for the last summary section)
Given your answers above and considering the residual plots in part (b), are there
any observations that are vertical outliers and/or highly influential observations?
Should some observations be removed and the model re-fit to the remaining data?
COVRATIO
DFBETAS
Final Examination, Semester 1, 2017 STAT2008 Regression Modelling
Page 6 of 20
Question 1 continued
(d) Output for a second model (cheddar.lm2) is shown on page 5 of the R output, which
includes an additional term added to the initial model described in the earlier parts of
this question. Is the term involving Acetic a significant addition to a model which
already includes H2S and Lactic? Give full details of an appropriate hypothesis test.
(3 marks)