Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
PSYM201
ADVANCED STATISTICS
Duration: 3 hours
This is an Open Book exam
Materials to be supplied:
Data files potent-potions.csv, soothing-pets.csv and thirsty-bees.csv
Additional materials:
Statistical software R, RStudio, rstudio.cloud
R packages psych, car, pwr, ggplot2, multcomp, emmeans, ppcor, afex, GPArotation, corpcor, lavaan,
rockchalk, lme4, lattice, ResourceSelection, mlogit, arm, MASS, brms
Part A (35 marks)
Evil mastermind Professor Leigh Hogwarts is busy in his underground laboratory, concocting potions that he
believes will boost his cognitive powers and enable him to take over the world. He has developed four different
types of potion—WombleJuice, GutterSlime, BurpleGoo and FibbleJelly—and now wishes to test their
effectiveness alongside an inert placebo solution. He buys 100 guinea-pigs and measures their performance in
a radial arm maze (maze1), with higher numbers representing better performance. He then randomly allocates
the guinea-pigs to five groups of 20, gives each group a different potion or the placebo, and tests them in the
radial arm maze a second time (maze2). His data are in the file potent-potions.csv.
A1. (2 marks) Report the mean and standard error for the change in performance between the first and second
maze attempts.
mean = 6.78 (1), s.e. = 0.53 (1)
potions$change <- potions$maze2 - potions$maze1
library(psych)
describe(potions$change)
A2. (5 marks) Nine guinea-pigs sadly died between their first and second maze tests (indicated as ‘NA’ for
maze2). Is there evidence that this pattern was non-random with respect to the potion they received? Report
the results of a statistical test to support your answer, and remember to check the assumptions.
Pattern is significantly non-random (1) (Fisher’s exact test: P < 0.001) (3). There were excess deaths in the
GutterSlime group (1).
Binomial logistic regression also fine if comparing to null model (χ24 = 25.6, P < 0.001), but max. 1 mark if
only reporting parameter estimates (as standard errors unreliable).
Max. 3 marks for chi-squared test (half of expected counts < 5): χ24 = 29.8, P < 0.001.
Max. 1 mark for Kruskal–Wallis test on dead/alive DV: χ24 = 29.5, P < 0.001.
potions$died <- ifelse(is.na(potions$maze2),1,0)
fisher.test(potions$died,potions$potion)
table(potions$died,potions$potion)
A3. (4 marks) Professor Hogwarts notices that the variables maze1 and maze2 are not normally distributed.
Identify suitable transformations for each variable and report whether these have corrected the problem, using
either plots or a statistical test.
Plots indicate strong positive skew (1), so log transformation (or square-root transformation) (1) is
appropriate. Q-Q plots or Shapiro–Wilk tests indicate that log-transformed variables are normally distributed
(2).
library(car)
qqPlot(potions$maze1)
hist(potions$maze1)
shapiro.test(potions$maze1)
qqPlot(potions$maze2)
shapiro.test(potions$maze2)
hist(potions$maze2)
potions$log.maze1 <- log10(potions$maze1+1)
qqPlot(potions$log.maze1)
shapiro.test(potions$log.maze1)
potions$log.maze2 <- log10(potions$maze2+1)
qqPlot(potions$log.maze2)
shapiro.test(potions$log.maze2)
A4. (4 marks) Report Pearson’s correlation coefficient between the maze1 and maze2 scores and test its
significance, using the transformed variables if appropriate. What is the percentage of shared variance?
r = 0.933 (1), t89 = 24.46, P < 0.001 (2). Shared variance = 100*(0.933
2) = 87% (1).
Max. 1 mark if using untransformed data or a non-parametric alternative.
with(potions,cor.test(log.maze1,log.maze2))
100*(0.9330068^2)
A5. (4 marks) Professor Hogwarts is excited to discover that the 11 guinea-pigs with the lowest maze1 scores
all performed better in their second maze attempt. Give two reasons why this does not provide convincing
evidence that his potions improve maze performance.
Performing the same task a second time may lead to improvement, even if the potions are ineffective (2). And
even with no overall improvement, the lowest scores are statistically likely to increase on the second
measurement because of regression towards the mean (2).
A6. (3 marks) The bumbling professor wants to analyse his data using a repeated-measures ANOVA, to test
whether the repeated measurements of maze performance (maze1, maze2) differ between the potion
treatments. Explain why instead it would be more logical to use an ANCOVA to test whether maze2 differs
between the potion treatments, with maze1 as a covariate.
maze1 is measured before the guinea-pigs have received the potion treatment, so it cannot conceivably affect
their performance (3) [or something else along these lines].
A7. (8 marks) Use an ANCOVA to compare the maze performance (maze2) of the guinea-pigs after their
potion treatment, including maze1 as a covariate to control for pre-existing differences in cognitive ability and
using the transformed versions of these variables if appropriate. For both the fixed factor and the covariate,
report the results of a statistical test and a brief interpretation of the effect.
Using log10(maze2+1)~potion+log10(maze1+1):
Maze performance after the potion treatment was significantly positively associated with pre-treatment maze
performance (1) (F1,85 = 619.9, P < 0.001) (3), but was not significantly affected by potion treatment (1) (F4,85
= 1.8, P = 0.136) (3).
potions$potion <- factor(potions$potion)
potions$potion <- relevel(potions$potion,"placebo")
model1 <- lm(log.maze2~log.maze1+potion,data=potions)
anova(model1)
summary(model1)
par(mfrow=c(2,2))
plot(model1)
A8. (5 marks) Test whether pre-treatment maze performance is independent of potion type, remembering to
check the assumptions of your analysis. Report the test statistic, degrees of freedom, P value and your
conclusion.
Need to use log-transformed data (1).
Pre-treatment maze performance is independent of potion type (1) (F4,95 = 0.677, P = 0.610) (3).
model2 <- aov(log.maze1~potion,data=potions)
summary(model2)
par(mfrow=c(2,2))
plot(model2)
A9. UPLOAD YOUR R SCRIPT
Part B (40 marks)