Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Rubric for STAT3888 Disciplinary Assignment
You may create the report for the Disciplinary Assignment based solely on the biomedical
data (you may exclude the food and nutrition datasets). However, you should aim to
familiarise yourself with the rest of the datasets for Interdisciplinary Project 1.
1. Executive summary (total of 5 marks)
a. Summary of what you did and found (in parts 2, 3 and 4). (2 marks)
b. State at least three research questions that you could answer with the data
provided (1 Mark)
c. A table or a 1 paragraph summary of missingness totals for each variable
including percentages of missingness. (1 mark)
d. Summary of size of complete dataset (that is after missing values are dealt with)
(1 mark)
2. EDA (total of 5 marks)
Report any problems that you find in the data including
a. low variance variables (and whether you decided to remove them – how low the
variance is to be removed is up to you),
b. mismatches between the data and the data dictionary (there are some),
c. logical inconsistencies that you find.
I haven’t assigned specific marks in (a-c) above because I am not sure whether there
are logical inconsistencies in the dataset, but if you do find them, please report them
on ED so that everyone knows about them.
3. Missing values (total of 3 marks)
a. Produce some visualization to determine missingness amounts and patterns. (1
marks)
b. Produce a table summarising the prevalence of missingness in the dataset. (1
mark)
c. Produce a version of the biomedical dataset that is complete by sensibly
excluding a combination of rows and columns containing missing data. (1 mark)
4. Save results of your cleaned datasets (total 1 mark)
5. Appearance of the report (total of 6 marks)
a. Explain what you did using sentences. (3 mark)
b. Do not include uninformative plots/tables. Only include such if it supports a claim
made in the text. (3 marks)
Maximum mark: 20
In this exercise, you will need to make decisions about how to clean the data, e.g., what cut-
offs to use, what categories to exclude, etc. You will not get marks deducted simply because
you made different choices that what I might make. (See next page for mark deductions).
Additionally,
• 1 mark will be deducted for each instance where raw R output is displayed where it
could be displayed as a table.
• 1 mark will be deducted for each Table/Figure without a caption.
• 0.5 marks will be deducted for each spelling mistake.
• 2 marks will be deducted if you submit the raw data.