Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
DATA3888 (2024): Assignment
Instructions
1. Your assignment submission needs to be a HTML document that you have compiled using R Markdown
or Quarto. Name your file as SIDXXX_Assignment.html” where XXX is your Student ID.
2. Under author, put your Student ID at the top of the Rmd file (NOT your name).
3. For your assignment, please use set.seed(3888) at the start of each chunk (where required).
4. Do not upload the code file (i.e. the Rmd or qmd file).
5. You must use code folding so that the marker can inspect your code where required.
6. Your assignment should make sense and provide all the relevant information in the text when the code
is hidden. Don’t rely on the marker to understand your code.
7. Any output that you include needs to be explained in the text of the document. If your code chunk
generates unnecessary output, please suppress it by specifying chunk options like message = FALSE.
8. Start each of the 3 questions in a separate section. The parts of each question should be in the same
section.
9. You may be penalised for excessive or poorly formatted output.
Question 1: Reef
Between 2014-2017, marine scientists recorded an unprecedented global coral bleaching event. Your
friend Farhan is a marine science expert who wants to study the environmental variables that
may have triggered this event. To do this, we will use a public dataset, curated by Sally and
colleagues. This dataset records coral bleaching events at 3351 locations in 81 countries from
1998 to 2017 with a suite of environmental and temperature metrics. The data is in the file
Reef_Check_with_cortad_variables_with_annual_rate_of_SST_change.csv and the full descrip-
tion of the variables can be found in the supplementary table of the study.
Part (a)
Farhan has noticed on average the North of Australia experienced higher levels of coral bleaching compared
to the South, during the global bleaching event from 2014-2017. In the paper, the authors find that the
following variables are associated with the probability of coral bleaching.
• TSA_Frequency_Standard_Deviation
• Temperature_Mean
• TSA_Frequency
• Temperature_Kelvin_Standard_Deviation
• TSA_DHW_Standard_Deviation
• SSTA_Frequency_Standard_Deviation
Create one informative graphic to visualise how these six variables are different between the North
and South of Australia during the 2014-2017 global coral bleaching event. Explain any data filtering or
transformation that you perform. Comment on the visualisation and suggest at least one variable that
appears to be different between the North and the South and thus may be associated with the higher levels
of bleaching observed in the North.
Note: the midpoint of Australia is located at -23 degrees Latitude. Observations higher than -23 degrees
latitude is considered North Australia. Your graphic can have multiple panels.
1
Part (b)
Farhan is interested in exploring which reefs were the most affected by the 2014-2017 global bleaching
event, across the globe. Create an interactive map visualisation to show the average proportion of coral
bleaching between 2014-2017, that allows a marine scientist to identify the names of the most affected coral
reefs, the region (recorded as State.Province.Island) and the values of the measurements of the associated
environmental variables identified in part (a). Justify your choice of visualisation, and comment on the result.
List 4 regions that were severely bleached in this time period.
Part (c)
Farhan wants to explore the impact of environmental variables on coral bleaching in the most affected regions.
For the regions identified in part (b), create one informative visualisation to show how the average
bleaching has changed over time (not restricted to 2014-2017), and its relationship with one of the associated
environmental variables identified in part (a). Comment on the visualisation.
Note: your graphic can have multiple panels.
2
Question 2: Kidney
Your friend Harry is a nephrologist (kidney specialist) who is interested in building an accurate classifier to
detect graft rejection in his kidney transplant patients. He is also interested in knowing which genes may
be affecting graft rejection. In this problem, we will build a classification model using the public data set
GSE138043. We will perform feature selection and build a classifier, estimating its accuracy on unseen data.