Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
RESEARCH SCHOOL OF FINANCE, ACTUARIAL STUDIES AND STATISTICS REGRESSION MODELLING STAT2008 / STAT4038 / STAT6038 Final Examination for Semester 1, 2020
Reading Time: 0 minutes Writing Time: 240 minutes Exam Conditions: Centrally Scheduled Examination Permitted Materials: Any Instructions: • This examination paper comprises a total of 21 pages and there is a separate file of R Output which also has a total of 15 pages. • You can write your answer on this question sheet or on blank sheets of paper. Please include the cover sheet as the first page when you submit. • Please write your student number and course code in the space provided at the top of this page. • There is one part (Q2c) which is only to be attempted by students enrolled in STAT4038 and STAT6038. • There are 5 questions worth a total of 103 marks for STAT2008 students, excluding Q2c. • There are 5 questions worth a total of 113 marks for students enrolled in STAT4038 and STAT6038. • Statistical tables (generated using R) are provided on Wattle site. • Unless otherwise indicated, use a significance level (α) of 5% and 4 decimal places. Question 1 2 3 4 5 Total STAT2008/4038/6038 8 15 16 36 28 103 STAT4038/6038 only 0 10 0 0 0 10 Score: Final Examination - Semester 1, 2020 Page 1 of 21 Question 1 [8 Marks] You want to conduct some simulation studies. You have written out an algorithm as follows: 1. Generate Xi = i, i = 1, . . . , 10 2. Generate Yi = 4−2X+εi, i = 1, . . . , 10, where εi are i.i.d. and follow normal distribution with mean 0 and variance 5. 3. Fit Xi and Yi in a simple linear regression model using least squares and record the parameter estimates b0 and b1. Ŷ = b0 + b1X. 4. Construct a 99% confidence interval for the intercept parameter. 5. Repeat 1000 times steps 2, 3 and 4, so that you have 1000 b0 values and confidence intervals at hand. (a) 6 marks What distribution do you expect best describes these 1000 b0 values? You should state the shape, mean, and variance of this distribution. (b) 2 marks How many of these 1000 confidence interval do you expect to include the number 4? Final Examination - Semester 1, 2020 Page 2 of 21 Question 2 [15 Marks] For STAT4038 and STAT6038 students: [25 Marks] The Fnord Motor Company is about to embark on a major marketing campaign for their newly released vehicle, the Imposta II, under the slogan “Have more left in the tank after a ride in an Imposta II”. Before they can make such a statement, Fnord’s crack legal department suggested that the company might conduct a scientific study to support their claim. Fnord’s slick marketing team decided to put 30 Fnord Imposta IIs through their paces at Fnord’s top-secret test facilities in Oodnadatta. There, Fnord’s expert drivers were each given an Imposta II full of petrol (40 litres) and each car was driven for a different number of kilometres, denoted X, and at their return, the amount of petrol remaining in the tank, Y , was measured. In this question, we will attempt to model the amount of petrol left in terms of the distance driven. The attached R printout gives details of the modelling exercise attempted. From the attached printout, answer the following questions: (a) 3 marks State the true model as in model1 and then write down the fitted model (b) 4 marks Find a 95% confidence interval for β0. Is the value 40 contained in your confidence interval? Explain why you might expect it to be. Final Examination - Semester 1, 2020 Page 3 of 21 (c) 10 marks [For 4038 and 6038 students only] The manager argued that all Imposta IIs are given full tank of petrol so the intercept is known to be 40, thus the model should be Yi = 40 + β1Xi + i. Using least squares, you find that the estimator for β1 in this model is b1 = ∑ XiYi−40 ∑ Xi∑ X2i . Find the variance of this estimator and then develop a formula for a 95% confidence interval for the mean response given X = x0 under this model. You do not need to use the actual numbers but make sure you specify the appropriate degrees of freedom used. Final Examination - Semester 1, 2020 Page 4 of 21 (d) 8 marks The manager claims that the new Imposta II is more energy efficient than their old model Imposta I. He asked the same drivers to also drive 30 old Imposta I’s for a different number of distances W , and measure the amount of petrol remaining in the tank denoted as Z. He fitted a second model as model2. Assume the true error variance of model1 and model2 are the same. Conduct a test testing the claim of the manager. Final Examination - Semester 1, 2020 Page 5 of 21 Question 3 [16 Marks] You are given a response variable Y and 4 covariates X1, X2, X3 and X4. A number of different models are fitted in R. Use the printout to answer the following questions. (a) 2 marks Write down the model fit as model1 making sure you write down the values estimated for the parameters in the model. (b) 2 marks Write down the parameter estimates for the model fit as model2. (c) 2 marks What proportion of the variability in Y is explained by the linear model in X1, X2, X3 and X4 fit as model2? Final Examination - Semester 1, 2020 Page 6 of 21 (d) 4 marks Test whether the linear model fit in model2 plausibly passes through the origin. Use a 5% test. Final Examination - Semester 1, 2020 Page 7 of 21 (e) 6 marks You want to see whether X1 and X4 are given equal weighting in the formula for Y and whether X2 and X3 are given equal weighting in the formula for Y . Test the hypothesis H0 : β1 = β4, β2 = β3. Use information contained in the printout to test this hypothesis at the 5% level. Final Examination - Semester 1, 2020 Page 8 of 21 Question 4 [36 Marks] Drought, flood, super-cell storms. . . climate change is upon us! One of the indicators of the progress of climate change is the onset and nature of the so-called El Nin˜o and La Nin˜a effects used to describe ocean temperatures. Scientists think that these effects are instrumental parts of our weather patterns, and that these effects dictate the prevalence of floods and drought in south-eastern Australia and other parts of the world. The data for this question concerns the number of tropical storms and hurricanes in the Atlantic Basin between 1950 and 1997. Several variables were recorded: the year (Year); a record of whether the year was a cold, warm or neutral El Nin˜o year (elnino); a record of whether West Africa was wet, dry or normal that year (wa); the number of tropical storms each year; the number of hurricanes each year; and a storm index (called NTC) which is a composite variable measuring the overall intensity of the hurricane season (an average of the number of tropical storms, the number of hurricanes, the number of days of tropical storms, the number of days of hurricanes, the total number of intense hurricanes, and the number of days they last). In this question, we will focus on the relationship between the storm index NTC and its relationship with time and the variables elnino and wa. (a) 3 marks The data set contains 48 years worth of data. How many of the 48 years were cold El Nin˜o years? Warm El Nin˜o years? Neutral El Nin˜o years? How many of the 48 years were wet years in West Africa?