Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
MATH363: Group Project part 2
Group project contributes 60% to your mark for Math363.
Part 1 = 15 marks; part 2 = 85 marks
You will work on all parts of this project as a group and submit your answers as a group. In
addition to the project you need to submit minutes from your meetings and provide the peer as-
sessment of everyone’s contribution through Buddycheck. The final mark for each member of the
group will be adjusted according to these peer scores.
The project needs to be submitted on Canvas and will be checked for potential plagiarism using
turnitin. A high similarity score might result in the project being investigated for suspected plagia-
rism; see Code of Practice on Assessment Appendix L.
DATA SETS FOR EACH GROUP ARE DIFFERENT and THEREFORE
YOUR RESULTS AND ANSWERS SHOULD BE DIFFERENT
Please only discuss your project with your group members. Any similarity between different
project submission might be investigated for suspected plagiarism.
You must include your minutes [10 marks] and include all relevant code in the appendix [10
marks].
Part II
1. [25 marks] A six minute walking test (6MWT) is used to assess functional exercise perfor-
mance. In this test the subject is asked to walk for 6 minutes on a level course and the distance
covered is recorded in meters (m). The pace is set by the subject and breaks are allowed if
needed. The data were collected on healthy subjects and includes information about
sex (Female = 1; Male = 0 );
age (in years);
height (cm);
weight (kg);
BMI (= body mass index);
resting heart rate (beats per minute, bmp);
heart rate at the end of the 6 minutes;
current smoking status (1 if smoker);
1
if the person ever smoked;
usual activity level: 0 =sedentary (less then 30 min physical activity a day), 1 = moder-
ately active ( 30-60 min physical activity a day), 2 = active ( more than 60 min physical
activity a day);
You are asked to find a model for dependance of the distance travelled in 6MWT on explanatory
variables provided. The model will be used to predict the average distance for a person and
should only include variables available before the test is taken (that is, not their heart rate at
the end of the test).
(a) Use appropriate plots to help suggest possible models. You should consider here which
of the explanatory variables are covariates and which are factors.
(b) Fit different linear models as suggested by the plots in (a) and decide which model is most
appropriate. Examples of things that you could consider here are models with quadratic
or cubic terms, interactions, other transformations of variables are also possible.
(c) For the model chosen in (b), perform residual analysis and decide if the model fits well.
If it does not, suggest changes that can be made to the model to address these issues.
2. [10 marks] Based on the analysis in Q1, recommend a model which should be used to predict
distance travelled in 6MWT and interpret the parameters of your model. Explain any limita-
tions of your model. Discuss what information about the patient is needed to use your model
and provide examples of predicted values including confidence and prediction intervals. You
can use some of the subjects included in your data set for these examples. Indicative word
count: 500
3. [15 marks] Choosing a model for large data sets can be a very complex task but there are
a few different methods which help researchers to decide on the best set of variables to be
included in a model in a systematic way. Three commonly used methods are forward selec-
tion, backward elimination and stepwise regression. Research one of these methods and write
a short explanation in your own words of how this method works. Apply this method to your
data on 6WMT. Discuss similarities and differences between the model obtained here and the
model you choose in 1(b). You should use published resources to in this part and cite any
books or articles used. A good source here is Applied regression analysis by H. Smith, and
N.R. Draper (on reading lists).Indicative word count: 750
4. [15 marks] It is also of interest to model the probability that a subject reached the threshold
heart rate (HR) for vigorous-intensity physical activity (VIPA), that is a heart rate of at
least 77% of maximum HR. Maximum HR is calculated as 220 minus age. For example,
for a person who is 35, maximum HR is 185 bpm and VIPA is when their HR is at least
0.77(220− 35) ≈ 143bpm. The researchers want to use the data set used in Q1 to estimate
a probability of VIPA during 6MWT. The distance travelled should be one of the explanatory
variables in this case.
(a) Create a new variable which is 1 if a person’s HR reached VIPA, and 0 otherwise (you
might want to first create a variable with maximum HR for each person).
2
(b) Propose a model for the probability of VIPA. You do not need to consider interactions
or higher order terms in this case but you need to consider which link function is most
appropriate.
(c) Interpret your model parameters, discuss its fit and any limitations.