MATH38172 Generalised Linear Models
Generalised Linear Models
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
MATH38172 Generalised Linear Models
Coursework 2023
Instructions. Attempt the questions below and submit your work online via Blackboard by the deadline of
3pm on Friday 28th April 2023. Your submission must be a single file. It may contain any sensible mix of
word-processed and (scanned) handwritten parts, for example using LaTeX, RMarkdown or Microsoft Word.
You should include any R code used. A complete solution is possible in 5 pages of type 10 font; please limit
your response to 10 pages at most. The coursework may take up to 10 hours to complete. The submitted work
MUST be your own. Plagiarism will not be tolerated and will result in serious consequences if discovered.
Background. The file smokingdata.csv on Blackboard contains data on the relationship between smoking
and health for 1314 women in Northern England. The women are grouped according to the combination of
levels of two variables:
• Age: the women’s age in the initial survey in the 1970s (categories: 18-25, 25-34, 35-44, 45-54, 55-64,
65-74, 75+);
• Smoking: the women’s smoking status in the original survey (categories: NonSmoker, Smoker).
For example, one group is the set of women who were non-smokers aged 25-34. For each group, the following
two variables have been collected:
• Alive: the number of women in the group that were still alive 20 years after the original survey;
• Dead: the number of women in the group that were dead 20 years after the original survey.
Questions
1. Read the dataset into R. (1 mark)
2. (a) Fit a logistic regression model to explain the probability of death within 20 years, using smoking
status as the ONLY explanatory variable, and present the summary for the fitted model. (1 mark)
(b) Write down the fitted model in equation form and interpret its parameters, including the parameter
values. Do you notice anything unusual about the parameter estimates? (3 marks)
3. (a) Fit a logistic regression model to explain the probability of death within 20 years, using BOTH
age and smoking status as explanatory variables, and present the summary for the fitted model.
(1 mark)
(b) Write down the fitted model in equation form and interpret its parameters, including the parameter
values. (3 marks)
(c) Compare your answers in 2(b) and 3(b). What do you notice? What is the reason for any
dierences? (3 marks)
(d) Assess whether there is significant evidence that the probability of death depends on (i) smoking
status, or (ii) age. Give details of which tests are used, equations for the test statistic, critical
value, etc. (4 marks)
4. (a) Using the model you fitted in Question 3, estimate the probability of death within 20 years for a
woman aged 55-64 who does not smoke. (1 mark)
(b) Find a 95% confidence interval for the probability in 4(a). Explain your working. (3 marks)