Introduction to Statistical Modelling
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
ST117 Introduction to Statistical Modelling
Please read guidance on expected answer format on Moodle in Log-4 week
1. Create the cohort
(a) Simulate 288 synthetic students with realistic-sounding names (e.g. use the randomNames R
package) and assign them to lab groups of 18 students each. Within each lab group, randomly
assign them to Homework Pods of 3 denoted by A, B,. . . , F. Sampling from the whole cohort,
independently of lab groups, randomly assign the students into Report Pods of 4 students each.
(b) Calculate the observed relative frequencies pk of Report Pods containing exactly k ∈ {0, 1, 2, 3, 4}
students who were in a Homework Pods labelled A (across any of the labs). Define qk in a similar
way to pk, but with the additional constraint that there are not any two students from the same
lab group in those Report Pods. Plot tables of both relative frequency distributions.
(c) For each k ∈ {0, 1, 2, 3, 4} run 100 realisations of (a) to obtain 100 samples pk(i) (i = 1, 2 . . . , 100)
and qk(i) (i = 1, 2, . . . , 100) of the frequency distributions pk and qk (k = 0, 1, 2, 3, 4), respectively.
Generate a series of 5 boxplots of pk (k = 0, 1, 2, 3, 4), each of them visualising the distribution
of relative frequencies based on the 100 simulations pk(i) (i = 1, 2 . . . , 100). For each of the five
simulated distributions calculate means and SDs and display them all in a table. Describe the
shape of the distributions. Carry out the same for qk (k = 0, 1, 2, 3, 4).
2. Pod combinatorics (theoretical)
You only receive credit if you prove your statements step by step, involving explanations as concise
as possible while including all necessary rationales.
(a) What is the probability mass function for the number of students in a randomly selected Report
Pod that had a Homework Pod denoted with the letter A?