Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
MH4501 Multivariate analysis
Outline
Sample statistics
Definition of the multivariate normal distribution
1
Sample statistics
Sample mean
• Sample mean x¯j ∈ R of variable j is the average of {x1j , . . . , xnj} :
x¯j =
1
n
n∑
i=1
xij , j = 1, . . . , p
• Sample mean vector x¯ = 1n
∑n
i=1 xi ∈ Rp
x¯ = (x¯1, . . . , x¯p) =
1
n
n∑
i=1
xi
• Matrix form of the sample mean vector:
x¯ =
1
n
∑n
i=1 xi1
...
1
n
∑n
i=1 xip
= 1n
x11 x21 · · · xn1... ... . . . ...
x1p x2p · · · xnp
1...
1
= 1nX⊤1n,
where 1n = (1, . . . , 1)⊤ ∈ Rn.
2
Sample mean
Define an estimate x¯ of the population mean vector µ: Suppose the sample data matrix
X = (x1, . . . , xn)⊤ is collected from a multivariate population of dimension p, i.e.
xi = (xi1, . . . , xip)⊤ ∼ f
where f denotes the population pdf. Let µp×1 be the population mean vector and Σp×p be the
population covariance matrix.
Theorem
x¯ is an unbiased estimator for µ, i.e. E[x¯ ] = µ.
Proof
Note that x¯ = 1n
∑n
i=1 xi and for each xi ,E [xi ] = µ, thus
E[x¯ ] = 1n
n∑
i=1
µ = µ
3
Sample Covariance Matrix
• Sample variance of variable j is sjj = 1n−1
∑n
i=1 (xij − x¯j)2;
• Sample covariance between variables j and k is sjk = 1n−1
∑n
i=1 (xij − x¯j) (xik − x¯k);
• These sjk ’s are put together to form a (symmetric) sample covariance matrix S :
S =
s11 s12 · · · s1p... ... . . . ...
sp1 sp2 · · · spp
• Compute S by matrices. Define the sum of squares and cross-products matrix:
S = 1n − 1
∑n
i=1 (xi1 − x¯1)2 · · ·
∑n
i=1 (xi1 − x¯1) (xip − x¯p)
... . . .
...∑n
i=1 (xip − x¯p) (xi1 − x¯1) · · ·
∑n
i=1 (xip − x¯p)2
= 1n − 1
(x11 − x¯1) · · · (xn1 − x¯1)... . . . ...
(x1p − x¯p) · · · (xnp − x¯p)
(x11 − x¯1) · · · (x1p − x¯p)... . . . ...
(xn1 − x¯1) · · · (xnp − x¯p)
.
4
Sample Covariance Matrix
Notice that(x11 − x¯1) · · · (x1p − x¯p)... . . . ...
(xn1 − x¯1) · · · (xnp − x¯p)
=
x11 · · · x1p... . . . ...
xn1 · · · xnp
−
x¯1 · · · x¯p... . . . ...
x¯1 · · · x¯p
= X − 1x¯⊤ = X − 1n11
⊤X =
(
I − 1n11
⊤
)
X ,
then we have two expressions of S :
S = 1n − 1
(
X − 1x¯⊤)⊤ (X − 1x¯⊤)
= 1n − 1
n∑
i=1
(xi − x¯) (xi − x¯)⊤
5
Sample Covariance Matrix
Properties:
• S is symmetric as S⊤ = S.
• S is positive-semi-definite as S = B⊤B with
B = 1√
n − 1
(
X − 1x¯⊤)
Theorem
The sample covariance matrix S provides an unbiased estimator of the population covariance
matrix Σ, i.e. E[S] = Σ.
Proof
Hint:
S = 1n − 1
n∑
i=1
(xi − x¯) (xi − x¯)⊤ = 1n − 1
[ n∑
i=1
xix⊤i − nx¯ x¯⊤
]
6
Sample Correlation Matrix
Sample correlation coefficient between variables j and k is rjk :
rjk =
sjk√sjj√skk =
∑n
i=1 (xij − x¯j) (xik − x¯k)√∑n
i=1 (xij − x¯j)2
√∑n
i=1 (xik − x¯k)2
where sjk is sample covariance between variables j and k; sjj is sample variance of variable j .
• Clearly, rjj = 1, i.e. the sample correlation of variable j with itself is 1.
• When collecting all rjk ’s together, we have the sample correlation matrix R:
R =
1 r12 · · · r1p
r21 1 · · · r2p
...
... . . .
...
rp1 rp2 · · · 1
It is easy to check that rjk = rkj ⇒ R is symmetric.
7
Sample Correlation Matrix
Compute R using the sample covariance matrix S: let D = diag (s11, . . . , spp), then
R = D−1/2SD−1/2, S = D1/2RD1/2
Comments:
• R is positive semi-definite.
• The quantity det(S) is called the generalised sample variance. det(S) and det(R) are
related via
det(S) = s11 . . . spp det(R)
8
Sample Correlation Matrix
• Define the standardisation of xij by
zij =
xij − x¯j√sjj
Now define vectors zi = (zi1, . . . , zip)⊤, i = 1, . . . , n. We want to investigate the
covariance matrix of the standardised values, i.e. Sz .
As z¯j = 0, the (j , k)-th element of Sz is
Sz(j , k) =
1
n − 1
n∑
i=1
zijzik =
1
n − 1
n∑
i=1
xij − x¯j√sjj
xik − x¯k√skk
= 1n − 1
1√sjj√skk
n∑
i=1
(xij − x¯j) (xik − x¯k) = sjk√sjj√skk = Rx (j , k),
i.e. the correlation matrix of X is equal to the covariance matrix of the standardised X .
9
Sample Correlation Matrix
• Many multivariate methods use the concept of generalised distance:
d2 = (x − µ)⊤Σ−1(x − µ) distance between x and µ
or d2 = (x − x¯)⊤S−1(x − x¯) distance between x and x¯
Notes: From the graph we see that variances along x1 or x2 directions are different. When
considering a point (x∗1 , x∗2 ) and its distance to (x¯1, x¯2), the matrix S accounts for the
difference in variation.
10
Linear Combinations
Suppose X is a p × 1 random vector with mean µp×1 and covariance matrix Σp×p. Let Ak1×p
and Ck2×p be matrices, bk1×1 and dk2×1 be vectors. Define Y = AX + b and Z = CX + d .
Then we have sample means, variances, and covariances of Y and Z that are related to x¯ and
S by
Sample mean vector of Y is Ax¯ + b,
Sample covariance matrix of Y is ASA⊤,
Sample mean vector of Z is Cx¯ + d ,
Sample covariance matrix of Z is CSC⊤,
Sample covariance matrix between Y and Z is ASC⊤.
It is easy to see that all these sample estimates are unbiased estimates.
11
Definition of the multivariate
normal distribution
Definition
The univariate normal distribution N
(
µ, σ2
)
has the density function:
f (x) = 1√
2πσ
exp
(
− (x − µ)
2
2σ2
)
Graph of f (x)
Note that (x − µ)2/σ2 in the exponent is a general distance measurement:
(x − µ)2/σ2 = (x − µ)⊤ (σ2)−1 (x − µ).
12
Definition
Thus we can extend it to multivariate situation by using
(x − µ)⊤1×pΣ−1p×p(x − µ)p×1
fx (xp×1) = c exp
(
−12 (x − µ)
⊤Σ−1(x − µ)
)
,
where µ = E [Xp×1] ,Σ = Var (Xp×1), and c is a constant. From the requirement that∫
f (x)dx = 1, we have
c = 1(2π)p/2|Σ|1/2 , |Σ| denotes the determinant.
Definition
A random vector xp×1 is said to follow a Np(µ,Σ) (p-dimensional multivariate normal)
distribution if its density function is
fx (x) =
1
(2π)p/2|Σ|1/2 exp
(
−12 (x − µ)
⊤Σ−1(x − µ)
)
.
13
Examples
Example 2.1: When p = 1, the p-dimensional multivariate normal pdf reduces to that of
univariate normal distribution.
Example 2.2: When p = 2, we have x = (x1, x2)⊤ , µ = (µ1, µ2)⊤, and
Σ =
(
σ11 σ12
σ21 σ22
)
=
(
σ21 ρσ1σ2
ρσ1σ2 σ22
)
, where ρ = Corr (x1, x2) and σ2j = Var (xj). As
|Σ| = σ21σ22 − ρ2σ21σ22 =
(
1− ρ2)σ21σ22
and
Σ−1 = 1(1− ρ2)σ21σ22
(
σ22 −ρσ1σ2
−ρσ1σ2 σ21
)
= 1(1− ρ2)
(
σ−21 −ρσ−11 σ−12
−ρσ−11 σ−12 σ−22
)
We have
(x − µ)⊤Σ−1(x − µ) = 1(1− ρ2)
[(
x1 − µ1
σ1
)2
− 2ρ
(
x1 − µ1
σ1
)(
x2 − µ2
σ2
)
+
(
x2 − µ2
σ2
)2]
14
Examples
Hence the bivariate normal has the density function
f (x1, x2) =
1
2πσ1σ2
√
1− ρ2 ·
exp
{
− 12 (1− ρ2)
[(
x1 − µ1
σ1
)2
− 2ρ
(
x1 − µ1
σ1
)(
x2 − µ2
σ2
)
+
(
x2 − µ2
σ2
)2]}
.
Graph of bivariate normal density f (x1, x2) 15
Contours of the multivariate normal pdf
The contour at a fixed height h is determined by f (x) = h, or
1
(2π)p/2|Σ|1/2 exp
(
−12 (x − µ)
⊤Σ−1(x − µ)
)
= h ⇔ (x − µ)⊤Σ−1(x − µ) = k
for some constant k > 0.
Question: What does (x − µ)⊤Σ−1(x − µ) = k look like?
From the eigenvalue decomposition of Σ, we have
Σ−1 = UΛ−1U⊤,
where Λp×p = diagonal matrix of eigenvalues = diag (λ1, . . . , λp) and Up×p = eigenvector
matrix = (u1, . . . , up).
Since the eigenvalues are positive, (x − µ)⊤Σ−1(x − µ) = k is an ellipsoid.
16
Contours of the multivariate normal pdf
Example 2.3: When p = 2, (x−µ)⊤Σ−1(x−µ) = k gives
an ellipse as shown in the graph.
If observations (x1, x2)⊤ are from a bivariate normal distri-
bution, then the plot of x1 vs. x2 (i.e. scatter plot) usually
results in an ellipse shape.