Statistical Methods in Insurance
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
STAT3010/6075 Statistical Methods in Insurance
Assignment 1
• This assignment consists of two questions and is worth 10% of the overall mark for STAT3010/6075.
• The deadline for submission is 16.00 on Thursday 2 March 2023.
• Standard University policies and procedures will be followed for late submission, extensions and
academic integrity (see the Module Outline for details).
• Submission is via Blackboard. You must submit a report of at most six pages (in pdf format),
containing your answers, and a separate R script, containing the code that you used to obtain
your results.
– Your should submit your report via TurnitinUK on Blackboard (see Module Outline for
details) in a file called report-ID.pdf, where ID is your student ID number, for example
report-12345678.pdf. In the Assignments folder, click on Assignment 1 report submission
to submit your report. Please enter this file name as the Submission Title.
– You should not include R code used in your analysis in your report, but you must submit
a separate R script via Blackboard containing your code called code-ID.R, for example
code-12345678.R. Please rename and use the R template code-xxx.R provided. In the
Assignments folder, click on Assignment 1 code submission to submit your code.
– Please start your R script with the command set.seed(ID ), for example set.seed(12345678).
– Whenever you are asked to fit a model you should present in your report the estimate,
standard error and p-value corresponding to each parameter in the model.
– Whenever you are asked to perform a formal test, you should present in your report the test
statistic, p-value or critical value and what you conclude from the test.
• The page limit is strict and is easily sufficient to receive full credit. If your report is more than
six pages of A4, only the first six pages will be marked.
1
Question 1
A health insurance company is developing a model to assess the risk of its policy holders having diabetes
based on the following data from the file diabetes.csv:
Diabetes Binary variable indicating diabetes diagnosis, either positive (pos) or negative (neg)
Age Age of individual, recorded in years
BMI Body mass index (weight in kg/(height in m)2)
Glucose Plasma glucose concentration
Pressure Diastolic blood pressure (mm Hg)
Pregnant Number of times pregnant
1. Produce and briefly discuss appropriate tables or plots to assess the relationship between Diabetes
and Age, BMI, Glucose, Pressure and Pregnant.
[7 marks]
2. Fit an appropriate generalised linear model to estimate the probability of having diabetes using
Age, BMI, Glucose, Pressure and Pregnant. Explain your choice of distribution and link function.
[4 marks]
3. Formally test if the square of Age improves the fit of the model from Part 2.
[2 marks]
4. Based on the test in Part 3. and any further tests you think appropriate, select a model for the
probability of having diabetes.
[8 marks]
5. Present and interpret the estimated coefficients for the model you selected in Part 4.
[4 marks]