QTM 220 predictive decompositions
predictive decompositions
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
QTM 220 HW5
Problem 1
In this problem, we’re looking into the relationship between the models in our
linear and predictive decompositions of Yi.
Yi = β0Xi0 + . . .+ βkXik + Ui where E[Ui Xij ] = 0 for j = 0 . . . k (1)
= m(Xi0, . . . , Xik) + U˜i where E[U˜i | Xi0 . . . Xik] = 0 (2)
1. Let U˙i := m(Xi0, . . . , Xik)− (β0Xi0 + . . .+ βkXik). Show that it satisfies
E[U˙iXij ] = 0 for j = 0 . . . k.1. Why, when we take Xi0 to be the con-
stant 1, does this mean that U˙i has mean zero and is uncorrelated with
Xi1 . . . Xik?2
2. Show that β0Xi0+. . .+βkXik is the best linear approximation tom(Xi0, . . . , Xik)
in the sense that β minimizes
`(b) = E[{b0Xi0 + . . .+ bkXik −m(Xi0, . . . , Xik)}2].
Hint: See Part I, Slide 5 of the Chapters 5-8 notes and use the result from
the first part of the problem.
1 Problem 2a
In this problem, we’re getting a sense of what the covariance matrix of a random
vector is. Here’s the definition. The covariance matrix of a random vector
X ∈ Rk is the matrix Σ with elements Σij = Cov(Xi, Xj). In particular,(
X1
X2
)
has covariance matrix Σ =
(
Cov(X1, X1) Cov(X1, X2)
Cov(X2, X1) Cov(X2, X2)
)
.
1. Show that the covariance matrix Σ of a 2-dimensional vector X ∈ R2 can
be written, in matrix/vector notation, as EXXT − (EX)(EX)T . If you
like, do it for a k-dimensional vector X ∈ Rk.
1Hint: Compare the two decompositions of Yi
2Hint: We say two random variables X and Y are uncorrelated when their covariance,
Cov(X,Y ) = E[XY ]− E[X] E[Y ], is zero.
1
2. The covariance matrix Σ of a random vector is always symmetric. That
is, Σij = Σji, or equivalently, Σ = ΣT . Why?
3. If X ∈ R2 is a random vector with covariance matrix Σ and v ∈ R2 is a
constant vector, vTX ∈ R is a random variable. In terms of Σ and v, what
is its variance Var(vTX)? That is, what is the variance of v1X1 + v2X2?
Try to write it in matrix/vector notation. If you like, do it for X ∈ Rk.
4. If X ∈ R2 is a random vector with covariance matrix Σ and A is a 2 × 2
constant matrix, AX ∈ R2 is a random vector. What is the covariance
matrix of AX? Again, try to write it in matrix/vector notation, and if
you like, do it for the more general case of X ∈ Rk and a m× k matrix A.
Problem 2
In this problem, we’re getting a sense of what it means for a random vector be
be multivariate normal. Here’s the definition. We say X ∈ Rk is normal with
mean µ and covariance matrix Σ, abbreviated X ∼ N(µ,Σ), if for every v ∈ Rk,
vT (X − µ) ∼ N(0, vTΣv). In particular,(
X1
X2
)
∼ N
((
µ1
µ2
)
,
(
Σ11 Σ12
Σ21 Σ22
))
when, for all numbers v1 and v2,
v1(X1 − µ1) + v2(X2 − µ2) ∼ N(0, σ2v) for σ2v = v1Σ11v1 + v1Σ12v2 + v2Σ21v1 + v2Σ22v2.
1. Let’s start by getting some geometric intuition. Let’s think about what
we call a standard normal vector in R2,
Z =
(
Z1
Z2
)
where Z1, Z2 ∼ N(0, 1) are independent .
• What is Z’s mean and covariance matrix?
• Show, using the definition above, that Z is multivariate normal.
• In R, sample a bunch of these vectors and plot them. You can use
this code.
Z1 = rnorm(500)
Z2 = rnorm(500)
plot(Z1,Z2)
Now take any unit vector v ∈ R2, that is,
v =
(
v1
v2
)
for any v1, v2 satisfying v
2
1 + v
2
2 = 1.
Plot a histogram of the distribution of vTZ = v1Z1 + v2Z2:
hist(v_1 * Z_1 + v_2 * Z_2)
2
Do it a few times for different unit vectors. Observe that the distri-
bution doesn’t seem to depend on the vector v. We say a standard
normal vector is isotropic, or looks the same in all directions. This
visualization exercise was just to build intuition — no need to turn
anything in for this part.
2. Now we’ll think about how multivariate normals transform when we mul-
tiply them by matrices. We do this, for instance, in the proof of the
approximate distribution of βˆ − β on Part III, slide 5 of the Chapters 5-8
notes. So let A be a 2× 2 matrix. If X ∈ R2 is multivariate normal, with
X ∼ N(µ,Σ), what is the distribution of AX?
3. Now we’ll show that every multivariate normal distribution is, in fact,
a linearly-transformed standard normal. That is, show that X = µ +
Σ1/2Z ∼ N(µ,Σ) for standard normal Z ∈ R2 if Σ1/2 satisfies Σ =
Σ1/2(Σ1/2)′. Use the previous part.
Problem 3
Suppose we want to estimate f(β) = E[r(Xi)mβ(Xi)] for mβ(x) = β0x0.
1. Characterize the approximate distribution of f(βˆ)− f(β) using the delta
method (see Part II of the Chapters 5-8 notes).
2. Do so specifically for the case r(x) = exp{(x− 1)2/2}/ exp{x2/2} − 1.
3. Simulate some observations (Xi, Yi) from a distribution of your choice
(see, for example, recent labs), form a confidence interval for this case of
r, and report whether the confidence interval contains the ‘true parameter’
E[r(Xi)m(Xi)].
4. Repeat 1000 times or 10000 and report an estimate of the interval’s cover-
age probability: the probability that it contains the true parameter. Please
include your code when you submit the assignment. Any language is fine.
5. Think about how this would change if we used a GLM mβ(x) = g(β0x0).
No need to turn anything in for this part.
Problem 4
Prove that in the linear model (1), the least squares estimator βˆ satisfies
√
n(βˆ−
β)→d N(0, S) where S = Σ−1ΣUΣ−1 for Σ = E[XiXTi ] and ΣU = E[U2i XiXTi ].
You may do this in the one-dimensional case (Yi = β0Xi0+Ui), but I encourage
you to try either the two-dimensional case or the general case.
Suggestion: use Part II of the Chapters 5-8 notes. Formalize the argument
on Slides 3-5 using the ‘rules’ of convergence in probability and distribution
on Slide 16 and the law of large numbers and central limit theorem stated as
3
follows: Let X1, X2, . . . ∈ Rk be independent and identically distributed with
mean zero and covariance matrix Σ = EXiXTi .
1
n
n∑
i=1
Xi →p 0 and 1√
n
n∑
i=1
Xi →d Z for Z ∼ N(0,Σ).