Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
STAB57: Assignment
Late penalty: 10% per day.
Instructions on creating documents for submission
• We will use crowdmark for submission and grading which only accepts PDF, JPG and
PNG files.
• I recommend using R-markdown(if you are familiar with it). If you are not familiar
with R-markdown, you can write your answers using Microsoft Word and in the end
save them as pdfs. Answers that are fully handwritten will not be accepted.
• The numerical calculations involved in this assignment are simple and you are already
familiar with them (hopefully). Calculations are mostly repetitive in nature!
• If you are a Python user, feel free to use Python in place of R to answer any of the
questions. You can also use Microsoft Excel if you want.
• For each answer, make sure you have provided your codes and outputs. If you use Excel,
take screen shots of the worksheet showing formula used and outputs and submit them
as part of your answer(as appendix for example).
• Make sure your answers are easy to read and nicely presented.
1
Academic Integrity
Each student will work alone. You are not allowed to ask anyone for help on any platform.
Don’t ask for solutions to anyone. Do not share your codes or answers. If you need
clarification on any of these questions, you are allowed to ask questions on Ed or ask
questions during office hours (please do not email us). And please do not post your solution
on Ed and ask “does it look ok?”.
When submitting your assignment on crowdmark, there will be a space for an academic
integrity statement. Write this following statement on paper/ipad/surface and upload a
screenshot of it.
Statement:
I am attesting to the fact that I, [name] (write your full name here), [stnum] (write your
student number here), have abided fully to the Code of Behaviour on Academic Matters.
I have not committed academic misconduct, and am aware of the penalties that may be
imposed if I have committed an academic offence.
2
Question 1 (7 points)
Suppose you have a population of size 7 [i.e. N=7]. You measure some quantity (X) and the
corresponding numbers are:
11, 13, 15, 17, 19, 21, 23
a) Calculate the population mean (µ) and print/show the value.
b) Calculate the population variance (σ2) using the formula σ2 =
∑N
j=1(Xj−µ)2
N
and
print/show the value.
c) Imagine you are taking samples (of size n = 3) from this population with replacement.
Imagine every possible way that you could have a sample of size 3 with replacement
from this population. (hint: there will be 7 ∗ 7 ∗ 7 = 343 possible combinations)
R code to get all possible combinations
X=c(11, 13, 15, 17, 19, 21, 23)
d=expand.grid(X,X,X) #You can continue your calculations using this "d"
# For excel users, this following line will create a csv file for you.
write.csv(d,file="Question1.csv",row.names = F)
d) For each of these samples of size 3, calculate the sample mean and record it (either
as a new object in R or as a new column if you are using excel; no need to print all
the values). Lets call this new column X_bar. So you should have 343 values in this
column.
e) You should have noticed that the values in the X_bar column are repetitive. Construct
a frequency table based on the column X_bar. [i.e. write down which values showed up
how many times]. Now using the frequencies (also known as counts) calculate proportion
of each of those repeated values.
f) Plot these proportions against the values and connect the points to form a curve. Does
the shape of this plot look like any known distribution? Name the distribution.
g) Using the table of proportions[from part(e)] or otherwise, calculate the mean of these
343 numbers (values under X_bar) and compare it to your answer of 1(a).
h) Using the table of proportions[from part(e)] or otherwise, calculate the variance of these
343 numbers. Use the population variance formula (i.e. divide by 343, not 342). What
is the relationship of this answer to your answer of 1(b)?
i) Which theorem did you demonstrate empirically in part f, g and h?
3
Question 2 (4 points)
This question continues from question 1(c). For each of these sample of size 3, calculate the
sample variance using the following two formulas
S2 = 1
n− 1
∑
(Xi − X¯)2
and
σˆ2 = 1
n
∑
(Xi − X¯)2
Assume the population variance, σ2 = 16. (you should get 343 different values of S2 and and
343 different values of σˆ2)
a. By calculating (numerically, using the 343 different values) Bias[S2] and Bias[σˆ2] check
the unbiasedness of these two estimators.
b. By calculating all three components separately(make sure to print them), show that the
following identity is true
MSE[σˆ2] = var[σˆ2] + (Bias[σˆ2])2
4
Question 3 (4 points)
In a lecture, we demonstrated an R code that replicates the sampling distribution of X¯. Here
is the code that was used in the lecture.
sample_4m_normal=function(){
s=rnorm(30,mean=10,sd=2)
return(mean(s))
}
X_bar=replicate(10000,sample_4m_normal())
plot(density(X_bar))
Simply change the distribution and number of samples in this code to do this question.
Produce the density of X¯ = X1+X2+...+Xn
n
a) when n = 2, X ∼ Unif [0, 1]
b) when n = 5, X ∼ Unif [0, 1]
c) when n = 5, X ∼ χ2df=2
d) when n = 30, X ∼ χ2df=2
e) when n = 5, X ∼ χ2df=50
f) CLT says for large n, X¯ converges(in distribution) to a Normal distribution. By
comparing your graphs from parts (a) to (e), can you comment on how large n has to
be[this question is not asking for a fixed “n” value] in order for X¯ to converge to a
Normal distribution? What role the skewness of the original distributions (Unif [0, 1],
χ2df=2 and χ2df=50 ) play here?
Note: For parts(a)-(e), search online for R command that generates random numbers from
Uniform or Chi-sq distribution.