STAT2001 Discrete random variables
Discrete random variables
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Chapter 3 (part b): Discrete random variables
STAT2001/STAT2013/STAT6013/STAT6039 - Introductory
Mathematical Statistics (for Actuarial Studies)/Principles of
Mathematical Statistics (for Actuarial Studies)
• Calculate summary measures for discrete distributions (e.g.
distribution mean/average).
• Describe the shape, location and spread of distributions.
• Properties of discrete probability distributions.
HAMBURGER
2 / 19
Expectation
Two coins are tossed. How many heads can we expect to come up?
Let Y = number of heads. Then
p(y) =
8><>:
1/4 y = 0
1/2 y = 1
1/4 y = 2
The answer seems to be 1 (the middle value).
But what exactly do we mean by “expect”?
A mental experiment: Suppose we toss the two coins 1000 times, and
each time record the number of heads, y.
The result would be something like 1,1,2,0,1,2,0,1,1,1,...,1,0.
We’d get about 250 zero’s, 500 one’s and 250 two’s.
3 / 19
Expectation continued
So the average of the 1000 values of Y would be approximately
250(0)+500(1)+250(2)
1000 = 0(1/4) + 1(1/2) + 2(1/4) = 1.
This agrees with our intuitive answer.
Observe that
0(1/4) + 1(1/2) + 2(1/4) = 0p(0) + 1p(1) + 2p(2) =
2X
y=0
yp(y).
This leads to the following definition.
Suppose Y is a discrete random variable with pmf p(y). Then the
expected value (or mean) of Y is
E(Y) =
X
y
yp(y).
(The sum is over all possible values y of the rv Y .)
We may also write Y’s mean as µY or µ.
µ is a measure of central tendency, in the sense that it represents the
average of a hypothetically infinite number of independent
realisations of Y .
4 / 19
Example 10
Suppose that Y is a random variable which equals 5 with probability
0.2 and 7 with probability 0.8. Find the expected value of Y .
E(Y) =
X
y
yp(y) = 5(0.2) + 7(0.8) = 6.6.
This means that if we were to generate many independent realisations
of Y , so as to get a sequence like 7, 7, 5, 7, 5, 7, ..., the average of
these number would be close to 6.6.
As the sequence got longer, the average would converge to 6.6. More
on this later.
5 / 19
Example 11
Find the mean of the Bernoulli distribution.
Let Y v Bern(p). Then
p(y) =
(
p y = 1
1 p y = 0.
So Y has mean
µ =
1X
y=0
yp(y) = 0p(0) + 1p(1) = 0(1 p) + 1p = p.
Thus for example, if we toss a fair coin thousands of times, and each
time write 1 when a head comes up and 0 otherwise, we will get a
sequence like 0,0,1,0,1,1,1,0,... The average of these 1’s and 0’s will
be about 1/2, corresponding to the fact that each such number has a
Bernoulli distribution with parameter 1/2 and thus a mean of 1/2.
6 / 19
Example 12
Find the mean of the binomial distribution.
Let Y v Bin(n, p). Then Y has mean
µ =
nX
y=0
y
✓
n
y
◆
py(1 p)ny
=
nX
y=1
y
n!
y!(n y)!p
y(1 p)ny (the first term is zero)
= np
nX
y=1
(n 1)!
(y 1)!(n 1 (y 1))!p
y1(1 p)n1(y1)
= np
mX
x=0
m!
x!(m x)!p
x(1 p)mx (x = y 1 and m = n 1)
= np (since the sum equals 1, by the binomial theorem)
This makes sense. For example, if we roll a die 60 times, we can
expect 60(1/6) = 10 sixes.
y
It 1 0
7 / 19
Example 12 limits
sum y 1 to y n
I
y 1 0 to y i
n t
I
2 0 to a n I
µ
8 0 to a M
8 / 19
Expectations of functions of random variables
Suppose that Y is a discrete random variable with pmf p(y), and g(t) is
a function. Then the expected value (or mean) of g(Y) is defined to be
E (g(Y)) =
X
y
g(y)p(y).
The text presents this equation as Theorem 3.2 and provides a proof
for it. We have instead defined the expected value of a function of a
rv, with no need for a proof.
Example 13 Suppose that Y v Bern(p). Find E(Y2).
E(Y2) =
X
y
y2p(y) = 02(1 p) + 12p = p.
(same as E(Y); in fact, E(Yk) = p for all k). Ok
O
I k I
9 / 19
Laws of expectation
1. If c is a constant, then E(c) = c.
2. E {cg(Y)} = cE {g(Y)}.
3.
E {g1(Y) + g2(Y) + ...+ gk(Y)} = E {g1(Y)}+E {g2(Y)}+...+E {gk(Y)} .
Proof of 1st law: E(c) =
P
y cp(y) = c
P
y p(y) = c(1) = c.
Example 14 Suppose that Y v Bern(p). Find E(3Y2 + Y 2).
E(3Y2 + Y 2) = 3E(Y2) + E(Y) 2
= 3p+ p 2
= 4p 2.
(recall from Example 13 that E(Yk) = p for all k)
10 / 19
Special expectations
1. The kth raw moment of Y is µ0k = E(Yk).
2. The kth central moment of Y is µk = E
⇣
(Y µ)k
⌘
.
3. The variance of Y is Var(Y) = 2 = µ2 = E
⇣
(Y µ)2
⌘
.
4. The standard deviation of Y is SD(Y) = =
p
Var(Y).
We can also write Var(Y) as V(Y) or 2Y .
Note that µ01 = µ.
Also, µ1 = E
⇣
(Y µ)1
⌘
= E(Y) µ = µ µ = 0.
11 / 19
Example 15
Suppose that p(y) = y/3, y = 1, 2. Find µ03 and .
µ03 = E(Y3) =
P
y y
3p(y) = 13
1
3
+ 23
2
3
= 173 .
µ = E(Y) =
P
y yp(y) = 1(1/3) + 2(2/3) = 5/3.
2 = µ2 = E
⇣
(Y µ)2
⌘
=
P
y(y µ)2p(y) =
1 53
2 1
3 +
2 53
2 2
3 =
2
9 .
Hence =
p
2/3 = 0.4714.
The various moments provide information about the nature of a
distribution.
We have already seen that the mean provides a measure of central
tendency.
The variance and standard deviation provide measures of dispersion.
Distributions that are highly disperse have a large variance.
12 / 19
Variance example
Example: Suppose X has pmf p(x) = 1/2, x = 1, 3 and Y has pmf
p(y) = 1/2, y = 0, 4. Find Var(X) and Var(Y). Which distribution is
the more disperse?
Both distributions have a mean of 2 (= average of 1 and 3 = average of
0 and 4).
Var(X) = (1 2)20.5+ (3 2)20.5 = 1 and
Var(Y) = (0 2)20.5+ (4 2)20.5 = 4.
We see that Var(Y) > Var(X). This corresponds to the fact that Y’s
distribution is the more disperse of the two.