STAT 2011 Probability and Estimation Theory
Probability and Estimation Theory
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
STAT 2011 Probability and Estimation Theory
Computer Practical Sheet Week 4
If you want the same (pseudo-)random numbers generated each time, place a command like
set.seed(36784)
in your first chunk so that the (pseudo-)random number generator always starts from the same
“point”.
Consider taking random samples with replacement of size n from the population
S = {1, 2, 3, 4, 5, 6}.
Before you start the actual computer exercise it will help if you do some quick calculations,
namely if X denotes a single random draw from S, determine
µX = E(X) and σ
2
X = Var(X) = E(X
2)− [E(X)]2.
Furthermore note that the expected value µY = E(Y ) and variance σ
2
Y = E(Y
2) − [E(Y )]2 of
the total Y =
∑n
i=1Xi (as functions of µ, σ
2 and n) where X1, X2, . . . , Xn denote a random
sample of size n taken from S with replacement are given by µY = nµX and σ
2
Y = nσ
2
X .
We shall be looking at ways to evaluate and/or approximate probabilities of the form P (Y ≥ y)
for various values of n and t, in particular
P (Y ≥ 15) when n = 3 and
P (Y ≥ 45) when n = 10.
We shall be making use of the rep() command in R, which repeats numbers in various ways.
There are two main forms of usage:
rep(vec,int), where vec is a vector and int is a positive integer, simply creates a new
vector where vec is repeated int times, e.g.
> x = 1:6
> x
[1] 1 2 3 4 5 6
> rep(x,3)
[1] 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
rep(vec1,vec2), where vec1 is an arbitrary vector and vec2 is a vector of non-negative
integers the same length as vec1 ; this creates a new vector in which vec1[i] (the i-th
element of vec1) is repeated vec2[i] times, e.g.
> x = 1:6
> x
[1] 1 2 3 4 5 6
> rep(x,c(1,2,3,1,2,3))
[1] 1 2 2 3 3 3 4 5 5 6 6 6
1
These two forms can be combined together in rather interesting ways:
> x = 1:6
> rep(rep(x,rep(2,6)),3)
[1] 1 1 2 2 3 3 4 4 5 5 6 6 1 1 2 2 3 3 4 4 5 5 6 6 1 1 2 2 3 3 4 4 5 5 6 6
The above is the same as the following sequence of steps:
> a = rep(2,6)
> a
[1] 2 2 2 2 2 2
> b = rep(x,a)
> b
[1] 1 1 2 2 3 3 4 4 5 5 6 6
> rep(b,3)
[1] 1 1 2 2 3 3 4 4 5 5 6 6 1 1 2 2 3 3 4 4 5 5 6 6 1 1 2 2 3 3 4 4 5 5 6 6
Computer Problems for Week 4
For this week’s lab report: Please submit your code, output, and any comment
required, for Q7 and Q8 (a), (b), and (c) ONLY. An assignment item has been set
up on Canvas; file upload format is limited to pdf and html.
1. Define x=1:6. By applying rep() to x, create a vector X3 which simply repeats x 36
times.
2. Now create another vector X1 of length 216 with 36 1’s, 36 2’s,. . . ,36 6’s (hint: use a
command of the form rep(...,rep(...))).
3. Finally create a vector X2 of length 216 which takes a vector consisting of 6 1’s, 6 2’s,. . . ,6
6’s and then repeats it 6 times (hint: rep(rep(...,rep(...)),...)).
4. The matrix outcomes=cbind(X1,X2,X3) gives a representation of all outcomes in the
sample space when sampling 3 times from S with replacement. Print the first 13 lines
of it using outcomes[1:13,]. Obtain a vector sums of row sums of this matrix (see last
week’s exercise if you forget how to do this).
5. Using the vector sums, determine (for the case n = 3)
(a) µY = E(Y ),
(b) σ2Y = E(Y
2)− [E(Y )]2 and
(c) P (Y ≥ 15)
(note sum(vec>=x) counts the number of elements of the vector vec that are greater than
or equal to x; also mean(vec) is the same as sum(vec)/length(vec)).
6. A nice way to visualise the probability distribution of Y is to plot an ordinate diagram
(as opposed to a histogram, since the random variable is discrete):
‘‘‘{r Q6}
propns=table(sums)/length(sums)
plot(propns,type="h")
‘‘‘
2
7. We now turn our attention to the case n = 10. In this case there are 610 > 60 million
different possible outcomes, and so constructing an exhaustive matrix of outcomes is a
daunting exercise. However we can use Monte Carlo methods: we can simulate a large
number y1, . . . , yB of realisations of this random variable Y , and then for any function g(·)
we can approximate E[g(Y )] simply by determining the long-run average 1B
∑B
j=1 g(yj).
Set n=10, B=10,000 and popn=1:6. Define a vector of n*B simulated random draws with
replacement from popn using
draws=sample(size=n*B,x=popn,replace=TRUE)
and form the vector draws into a B-by-n matrix called samples (see previous computer
labs if you forgot how to do this). Each row then represents a sample of size n with
replacement from popn.
8. Obtain a vector sim.sums of (simulated) sample sums. Use this to compute Monte Carlo
approximations to
(a) µY = E(Y ),
(b) σ2Y = E(Y
2)− [E(Y )]2 and
(c) P (Y ≥ 45)
Compare the approximations, using the R functions mean and var to calculate the mean
(sample expectation) and sample variance, respectively, to the theoretical values computed
earlier and comment. (You can state the theoretical value of the expectation without
showing derivation.)
9. Produce an ordinate diagram of the relative frequencies/proportions of each value in
sim.sums.