Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
MAST20005/MAST90058 Assignment
1. Robert inherits an ancient coin from his granddad which is said to be the fairest coin ever minted in
history; he is determined to test whether the coin is indeed fair, by using the number of heads Y turning
up based on 36 independent tosses. Formally, if p is the chance of the coin turning up a head, he tests
the hypothesis
H0 : p = 0.5 vs H1; p 6= 0.5.
His test is such that he would reject H0 whenever |Y − 18| ≥ 4.
(a) (R) Find the size of the test. You should first explicitly identify the particular distribution involved
for its computation, and then show how you can find it using appropriate function(s) in R. [2]
(b) (R) Plot a power function of the test in the range p ∈ [0.5, 1]. Label your y-axis as “power in p” and
x-axis as “p”. [1]
2. Let θˆ be a statistic that estimate a parameter θ. Assume that se(θˆ) is a standard error of θˆ and θˆ−θ
se(θˆ)
is
approximately distributed as N(0, 1) for any θ ∈ R. Let α ∈ (0, 1).
(a) Give an approximate two-sided (1− α)-confidence interval for θ. [1]
(b) Give a test with approximate size α for the null hypothesis H0 : θ = θ0, for a given value θ0 ∈ R.
You have to specify the rejection region and the test statistic used. [1]
(c) Show that H0 is rejected by your test in (b) if and only if θ0 falls outside of your confidence interval
in (a). [1]
(Moral: Any value in a confidence interval can be considered an “acceptable” value for the parameter
θ, and hence there are generally many acceptable values for θ. This gives another reason why modern
hypothesis testing abstains from “accepting a null hypothesis” )
3. Let X1, . . . , Xn be a random sample of X with a population density fθ(x) parametrized by θ ∈ R. Recall
that the Fisher information is defined as
In(θ) = −
n∑
i=1
Eθ
[
∂2
∂θ2
ln fθ(Xi)
]
= −nEθ
[
∂2
∂θ2
ln fθ(X)
]
,
where the expectation is taken with respect to the distribution under the parameter value θ.
(a) Suppose g : R → R is a smooth invertible function, and by defining φ = g(θ), let fφ(x) be the
population density of X1, . . . , Xn re-parametrized in φ. Show that the Fisher information with
respect to φ has the form
In(φ) = In(g(θ)) =
In(θ)(
g′(θ)
)2 ,
where g′(·) is the derivative of the function g(·).
Hint: You can use the facts (i) Eθ
[
d
dθ ln fθ(Xi)
]
= 0 and (ii) if h(φ) = g−1(φ) and h′(·) is the
derivative of h(·), then h′(φ) = (g′(g−1(φ)))−1 = (g′(θ))−1. [3]
(b) Let Y1, . . . , Yn denote a random sample of size n from a Poisson distribution with mean λ. For large
n, find an approximate one-sided 100(1 − α)% confidence interval for g(λ) = e−λ = P (Y = 0) that
is an upper bound, based on normal quantiles.
Hint: Use the fact that the MLE for λ is Y¯ = n−1
∑n
i=1 Yi, In(λ) =
n
λ and part (a). [3]
MAST20005/90058 Assignment 2 Semester 2, 2023
4. Let Y1, . . . , Ym be m random variables, whose means and covariances are denoted by
µi = E[Yi] for 1 ≤ i ≤ m and σij = E[YiYj ]− E[Yi]E[Yj ] for 1 ≤ i, j ≤ m;
in particular, σii = var(Yi). Moreover, define the mean vector and covariance matrix,
µ = (µ1, . . . , µm)
T and Σ = (σij)1≤i≤m
1≤j≤m
,
for the random vector Y = (Y1, . . . , Ym)
T . The variables Y1, . . . , Ym are said to have a joint normal
distribution, if the random vector Y has a probability density function of the form
f(y) =
1
(2pi)m/2
√
det(Σ)
exp
(
−(y − µ)TΣ−1(y − µ)
2
)
for y ∈ Rm.
This is denoted as Y ∼ Nm(µ,Σ) and generalizes the bivariate normal distribution learnt in MAST20004;
in particular, any component Yi of Y is distributed as N(µi, σii). Moreover, if Y ∼ Nm(µ,Σ), it is
well-known properties (so you can take for granted) that
• For any k-by-m constant matrix B ∈ Rk×m, BY ∼ Nk(Bµ,BΣBT ).
• Y1, . . . , Ym are mutually independent if and only if all the off-diagonal entries of Σ are zero.
The following parts assume X1, . . . , Xn is a random sample from a (univariate) normal distribution with
mean µ and variance σ2. In other words, X ∼ Nn(µ1n, σ2In), where X = (X1, . . . , Xn)T , 1n is a column
vector of length n filled with 1’s and In is an n-by-n identity matrix. Let A be an n-by-n matrix defined
by
A =
1√
n
1√
n
1√
n
1√
n
· · · 1√
n
1√
n
1√
2
−1√
2
0 0 · · · 0 0
1√
2·3
1√
2·3
−2√
2·3 0 · · · 0 0
...
...
...
...
...
...
...
1√
(n−1)n
1√
(n−1)n · · · · · · · · ·
1√
(n−1)n
−(n−1)√
(n−1)n
;
in particular, if we use a1, . . . ,an to respectively denote column vectors that are the 1-st to n-th rows of
A so that A = [a1| · · · |an]T , we have a1 = ( 1√n , . . . , 1√n)T , and for 2 ≤ k ≤ n
ak =
(
1√
(k − 1)k , · · · ,
1√
(k − 1)k ,
−(k − 1)√
(k − 1)k︸ ︷︷ ︸
k entries
, 0, · · · , 0︸ ︷︷ ︸
n−k entries
)T
.
It is easy to see that
AX = (X¯
√
n,U1, U2, · · · , Un−1)T (1)
where U1, U2, . . . , Un−1 are combinations in X1, . . . , Xn with coefficients in A and X¯ = n−1
∑n
i=1Xi.
(a) Check that AAT = In. (Show your steps) [3]
(b) Using (a), conclude that the components in the vector AX are mutually independent normal random
variables with variance σ2. [1]
2
MAST20005/90058 Assignment 2 Semester 2, 2023
(c) Using (a), conclude that
∑n−1
i=1 U
2
i =
∑n
i=1(Xi − X¯)2.
Hint: Note
∑n
i=1(Xi − X¯)2 =
∑n
i=1X
2
i − nX¯2, and that AAT = In also implies ATA = In. [1]
(d) Using (b) and (c), conclude the independence between X¯ and S2 = (n− 1)−1∑ni=1(Xi − X¯)2. [1]
(e) Using (b) and (c), conclude that ∑n
i=1(Xi − X¯)2
σ2
=
(n− 1)S2
σ2
is chi-square distributed with (n− 1) degrees. (This is an alternative way of finding the distribution
of (n−1)S
2
σ2
, where you have done it using mgfs in Tutorial Week 3, question 11.) [1]
5. A random number generator (RNG) can generate numbers according to a normal distribution N(µ, 1),
and it purports that the mean µ for its generated numbers is exactly zero. Robert suspects the latter
claim, and consider the hypotheses
H0 : µ = 0 vs H1 : µ 6= 0.
Robert decides to reject H0 if |X| ≥ Φ−1(0.975), where Φ(·) denotes the standard normal cumulative
distribution function and Φ−1(·) denotes its inverse, by using a single number X requested from the RNG;
he expects this test to have a size of 0.05 given what he has learnt in MAST20005.
Turns out, the RNG is glitchy, but not in the way Robert has suspected: While it can indeed generate
numbers according to the N(0, 1) distribution, a number won’t be reported to Robert until the RNG produces
the first number with absolute value larger than 0.4. Precisely, the RNG will respond to Robert’s request
and generate a first number, called Y1, which is distributed according to N(0, 1). If |Y1| > 0.4, the
RNG will report Y1 to Robert; if |Y1| ≤ 0.4, the RNG will automatically generate another independent
number Y2 ∼ N(0, 1) and only report Y2 to Robert if |Y2| > 0.4. If not (i.e. |Y2| ≤ 0.4), this process will
repeat itself until the RNG produces its first number with absolute value larger than 0.4. Since Robert
is oblivious to this internal mechanism of the RNG, he will take as X the first such Yi with |Yi| > 0.4.
Mathematically, by letting (Yi)
∞
i=1 be a hypothetical sequence of independent N(0, 1) random numbers
being generated, then X = Yt, where t is defined as t ≡ min{i : |Yi| > 0.4}.
(a) (R) Use a simulation experiment with 1 million generated instances of X to numerically approximate
the actual size of Robert’s test. Remember to show your R code and result. [2]
(b) Give an exact mathematical expression for the actual size of Robert’s test. (The evaluated value of
your mathematical expression should be very close to your numerically computed approximate value
in (a).) [2]
(Moral: Selection bias can produce invalid test size)
Total marks = 23