Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
MATH253 Week 8 Tutorial
R Tutorial
This tutorial sheet is related to material covered in chapters 10 and 11 (how to do tests/calculations/plots
from these chapters in R).
Solutions will be available on Canvas on Friday.
Part A
For each of five different types of pasta (A, B, C, D, E), seven 100g servings were prepared, and the amount of
salt absorbed was recorded for each of the 35 servings. The aim is to determine whether there are significant
differences in mean salt absorption for the five types of pasta.
The data can be found on Canvas – file Tutorial8 salt.xlsx.
Download the file Tutorial8 salt.xlsx to your computer into a folder dedicated to R. Make sure
that this folder is set up as your working directory in RStudio.
In RStudio open a new R script.
Load the file Tutorial8 salt.xlsx using readxl package, creating the variable called salt. (See
Tutorial 2 for details how to load data using readxl package.)
Make sure you save your R script in the folder dedicated to R and it is a good idea to keep saving it
after each task you complete.
1. Perform the ANOVA F -test and report your conclusions.
If the data are in seperate columns like in our case, we will have to combine them into one column
while preserving groups. This can be done by using the command cbind (this ensures that responses
will be assigned to their groups A, B ,C, D, E), loading into a data frame and then using the
command stack (this will create two columns of data – one column called values which consists
of all responses and the other called ind which says to which group the responses belong). Run the
following code to do this:
A <- salt$A
B <- salt$B
C <- salt$C
D <- salt$D
E <- salt$E
combined <- data.frame(cbind(A, B, C, D, E))
stacked <- stack(combined)
stacked
1
Now we perform the ANOVA test, using the command aov, to print the results we use the command
summary. So, we run the following code:
anovaresults <- aov(values ∼ ind, data = stacked)
summary(anovaresults)
Note that in aov we first define what the responses and groups are, which in our case are values
and ind, and they are separated by the symbol tilde ∼. The expression values ∼ ind tells R that
our responses in values depend/are linked to groups in ind. The order is important here so take
care that you put the columns of responses on the left-hand side of ∼ and the column of groups on
the right-hand side of ∼.
Using data = stacked, we tell R with which data set to work which in our case is stacked.
R produces an ANOVA table, including the p-value in the column Pr(>F).
Give the value of the estimate of the error variance σ2.
2. Using the normality test for residuals, the histogram of residuals, and the normal probability plot of
residuals decide if the assumption about normal distribution is reasonable here.
First we need to find the residuals by running the command residuals(anovaresults).
Now use these residuals to perform the normality test, to construct the histogram and the normal
probability plot, using the commands discussed in Tutorial 6.
3. Is the assumption of equal variances justified for these data? Explain your answer, using appropriate tests
and boxplots.
We perform Bartlett’s test (based on the normal distribution) by running the command:
bartlett.test(values ∼ ind, data = stacked)
To perform Levene’s test, we will have to first install the package car (see Tutorial 2 for details how
to install a package), and then run the commands:
library(car)
leveneTest(values ∼ ind, data = stacked)
Without carrying out any formal tests, what is a rough rule for deciding whether it is OK to assume equal
variances? For the Salt Absorption data, does this rough rule suggest that assuming equal variances is
OK?
4. Which groups have significantly different means? Perform post-hoc tests to answer this.
We perform Tukey’s HSD test by running the following command:
TukeyHSD(anovaresults)
We obtain the output of p-values for the differences between all pairs of groups.
To perform Fisher’s LSD test, we will have to first install the package PMCMRplus (see Tutorial 2
for details how to install a package), and then run the commands:
library(PMCMRplus)
summary(lsdTest(anovaresults))
We obtain the output of test statistics and p-values for the differences between all pairs of groups.
Finally, save your work.
2
Part B
Diabetic retinopathy is a disease of the retina (the back of the eye which is important for our vision). A
clinician identifies three stages of diabetic retinopathy as: No Diabetic Retinopathy (Group 1), Early Diabetic
Retinopathy (Group 2) and Late Diabetic Retinopathy (Group 3). The clinician wants to find out if the
visual acuity of patients at different stages of diabetic retinopathy differ. The clinician collected data for 60
randomly chosen patients. Visual acuity (VA) was measured as the number of letters correctly read from a
standardised vision chart. Therefore the larger the VA number, the better the vision. You may assume that
the groups are independent. The data set is available on Canvas – file Tutorial8 vision.xlsx.
1. Perform Analysis of Variance for these data. State conclusions clearly, and provide practical interpretation.
2. Decide whether the assumptions of normality and equal variances are justified here. Explain your answer
carefully, using appropriate tests.
3. Use Post-Hoc tests to determine which disease group patients differ in visual acuity. Report your conclu-
sions clearly.
4. Using the results of the Post-Hoc tests and a box plot, discuss whether we can use visual acuity to
differentiate between the early and late stage of diabetic retinopathy.