ACTUARIAL ANALYTICS AND DATA
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
ACTL30008 ACTUARIAL ANALYTICS AND DATA
Number of pages:
Authorised materials: R; lecture material
Instructions to students:
This is a practice exam only. The types of questions are consistent with the questions in the final assessment.
The length of this practice paper is shorter.
You need to submit a single pdf file named as ACTL30008_xxxxxx.pdf where xxxxxx is your student ID by
the due time of this test.
You will need to use R to complete this exam when needed. Creating a final submission by R markdown is a
preferrable way of editing your answers. You may use a R script file to produce your final submission, but
you will need to include the required outputs and/or hand written answers manually.
Question One (2+4+4 = 10 marks)
You are given a data set which is named “ACTL30008_practice_2021.csv”. You are asked to build statistical
models that help to predict the response variable Y given new predictor observations.
(a)
Load the data set and build a multiple linear regression model using the data. Describe how well the obtained
model fit the given data and show at least two numerical evidences.
(b)
You are suspecting that there might be some non-linear relationship between the response Y and the two
predictors.
• Present a polynomial regression model using the data. The degrees of any polynomial terms in your
model should be optimised and explain how these best degrees are selected.
• Is this non-linear model a better fit to the given data than the MLR obtained in (a)? Why?
(c)
• Using the LOOCV method to estimate the test MSE of the models you obtained in (a) and (b).
• Which model is likely to give a more accurate prediction for any new observations? why?
• Can the estimated test MSE accurately represent the true test MSE values? Why?
1
Question One Solutions:
(a)
data= read.csv("ACTL30008_practice_2022.csv", header=T)
str(data)
## 'data.frame': 30 obs. of 3 variables:
## $ X1: num 2.78 2.63 2.29 2.57 2.71 ...
## $ X2: num 113 119 101 188 20 ...
## $ Y : num 99.9 92.6 34.7 78.7 68 ...
attach(data)
lm.fit=lm(Y~., data)
summary(lm.fit)
##
## Call:
## lm(formula = Y ~ ., data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.7900 -5.5297 -0.3682 4.1799 19.5911
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -226.8882 17.2725 -13.136 3.05e-13 ***
## X1 108.2979 6.9420 15.600 4.96e-15 ***
## X2 0.1736 0.0257 6.753 2.99e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.566 on 27 degrees of freedom
## Multiple R-squared: 0.927, Adjusted R-squared: 0.9216
## F-statistic: 171.3 on 2 and 27 DF, p-value: 4.547e-16
The MLR obtained above is a good fit to the given data. Firstly, the overall relationship between the response
variable and two predictors is very significant with a p-value = 4.55× 10−16. Secondly, the R2 = 92.7% which
also indicate that the MLR is a strong fit.
(b)
poly.fit=lm(Y~poly(X1,5)+poly(X2,5),data=data)
summary(poly.fit)
##
## Call:
## lm(formula = Y ~ poly(X1, 5) + poly(X2, 5), data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.4999 -1.7411 0.0192 2.6680 7.3079
##
2
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 62.745 0.846 74.167 < 2e-16 ***
## poly(X1, 5)1 120.097 5.372 22.355 4.17e-15 ***
## poly(X1, 5)2 13.413 4.758 2.819 0.010959 *
## poly(X1, 5)3 -21.217 5.338 -3.975 0.000812 ***
## poly(X1, 5)4 -3.592 4.833 -0.743 0.466439
## poly(X1, 5)5 6.229 4.836 1.288 0.213251
## poly(X2, 5)1 43.778 5.508 7.948 1.85e-07 ***
## poly(X2, 5)2 -11.919 4.701 -2.535 0.020182 *
## poly(X2, 5)3 -9.746 4.877 -1.998 0.060212 .
## poly(X2, 5)4 8.163 4.964 1.644 0.116554
## poly(X2, 5)5 10.088 5.086 1.983 0.061961 .
## ---