EMAT30007 Applied Statistics
Applied Statistics
EMAT30007 Applied Statistics
General information:
Attempt to answer all questions. This coursework has three parts and four questions. The questions
are released successively as the material is covered in the course.
Submit a Matlab script (.m file) file with your answers. This should run when copied into the same
folder as the data files (see below) and should only use commands and Matlab packages used in the
worksheets. Clearly annotate your code and include the required discussion of your findings directly
in the script.
The limit for your submission is 900 lines of standard Matlab script with at most 100 characters per
line, in addition to a restriction on the number of figures/plots for each question (see below). Just for
indication: the model solution is 600 lines long with a lot of empty lines.
There are two additional files available on Blackboard for this piece of coursework:
coursework_data1.txt and coursework_data2.txt. The contents of the files are
described in more detail below.
The only way I will answer questions about the coursework is via the dedicated coursework
discussion forum on Blackboard. This is to ensure that the entire class has access to the same
information.
There are a total of 60 marks for the coursework, 20 marks for each part of the coursework.
2
PART I – (≤300 lines of code)
Question 1 (20 marks):
The file coursework_data1.txt contains a data set of response times in seconds from an
experiment on visual attention. The data points in the file are listed in the order in which they were
collected. When they collected the data, the scientists conducting the experiment were excited
about the data and keen to analyse it. They hypothesised that the mean of their data was larger than
1.5 seconds and tested this at a significance level of 5%. The scientists performed a standard
statistical test on their data after they had collected 10 data points and again after they had
collected a total of 30 data points. Then they stopped the experiment and published their findings
even though they had originally planned to collect 100 data points.
(The data is inspired by real data, but it is hypothetical)
(a) Conduct the two statistical tests the scientists performed. Why did they stop the
experiment?
(b) The lab technician was not told the experiment had been stopped and continued to collect
data. Perform the correct statistical test to examine the scientists’ hypothesis on all data.
How do you interpret the outcome of this test compared to the result the scientists
published?
(c) Use a bootstrap procedure to estimate the probability that the scientists find a significant p-
value when performing a test on the first 30 data points.
(d) By what factor does this probability change if you perform significance tests starting at the
20th data point and then continue to perform significance tests until you either run out of
data or find a significant p-value? Does the probability change again if you start at the 10th
data point?
(e) Explain, referring to your answers to questions 1(a)-(d), why the scientists’ approach is a
form of statistical malpractice (in bullet points, not continuous text).
(f) Fit a Gamma distribution to the data via maximum likelihood estimation. Show qualitatively
that the fit is good and explain why your estimates are further evidence that the scientists’
hypothesis is wrong (1 figure).