MA20224 Probability
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
A Important distributions 71
2
1 Events and Probabilities
In this chapter, we review some of the basic constructions from your first year probability
course, but place more emphasis on some of the formal subtleties. The main goal is give
meaning to the construct of a probability space
(Ω,F ,P).
This is the basic construction that we use to model a random experiment. The ingredients
are:
• the sample space Ω. Each ω ∈ Ω corresponds to a realization of your experiment.
• The σ-algebra F is a collection of subsets of Ω, also known as events, that we would
like to assign probabilities to.
• The probability measure P is a mapping that associates to each event F ∈ F a
probability, i.e. a number between 0 and 1 that tells us how likely the particular
event is.
In the following sections, we will look more closely at each of these ingredients. We will
look at the rules they have to obey and derive some easy consequences.
1.1 Events and the Event Space
The first thing to write down, when constructing a model for a random experiment is the
sample space Ω. Each ω ∈ Ω corresponds to a particular realisation of our experiment.
Examples 1.1.
(i) Experiment C2; toss a coin twice, so ΩC2 = {H,T}2 = {HH,HT, TH, TT}.
(ii) Experiment C∞; toss a coin infinitely many times, so ΩC∞ = {H,T}N. Then
ω ∈ ΩC∞ is a sequence ω = (ω1, ω2, . . . ) with ωi ∈ {H,T} and ωi is the outcome
of the i-th toss for all i ∈ N.
(iii) Experiment S1; imagine we want to model the service time of the first customer in
a queue, then we take ΩS1 = R+ = (0,∞).
(iv) Experiment Q; all arrival times of customers at a queue. We can take
ΩQ = {ω = (ω1, ω2, . . . ) |ωi ∈ R+ ∀i, ω1 < ω2 < · · · } ⊆ (R+)N,
where ωi represents the arrival time of the i-th customer.
We will be interested in events F , that is (certain kind of) subsets of Ω.
3
Examples 1.2.
(i) Experiment C2: If F is the event of getting at least one head, then F = {HH,HT, TH} ⊂
ΩC2 .
(ii) Experiment S1: Let A be the event the service time takes between α and β units
of time, so A = (α, β) ⊂ ΩS1 .
(iii) Experiment C∞; if Gp is the event the proportion of heads converges to p, then
Gp =
{
ω = (ω1, ω2, . . . ) ⊂ ΩC∞
∣∣∣∣∣ 1n
n∑
i=1
1l{H}(ωi)→ p as n→∞
}
To formulate the last event, we have used the following notation:
Definition 1.3 (Indicator function). For an event E ⊂ Ω, the indicator function 1lE is
defined as
1lE(ω) :=
{
1, ω ∈ E
0, ω 6∈ E
So in the above example,
n∑
i=1
1l{H}(wi),
counts how many of the ωi are equal to H in the first n throws of the dice. To simplify
notation, we will also often write
1lH := 1l{H},
and similarly for any other singleton sets.
Examples 1.4 (Combination of events). We also frequently use that operations on
events translate directly into operations on indicator functions.
Event Description Indicator Function
F F occurs 1lF
∅ ‘Impossible event’, nothing occurs 1l∅ = 0
Ω ‘Certain event’, something occurs 1lΩ = 1
F c complement: F does not occur 1lF c = 1− 1lF⋂∞
n=1 Fn All Fn occur simultaneously 1l∩Fn =
∏∞
n=1 1lFn⋃∞
n=1 Fn = (
⋂∞
n=1 F
c
n)
c at least one Fn occurs 1−
∏∞
n=1 (1− 1lFn)
To see for example the result for the intersection of events: Note that 1l⋂∞
n=1 Fn
(ω) = 1 if
and only if ω ∈ ⋂∞n=1 Fn iff ω ∈ Fn for all n. In particular, for all n, we have 1lFn(ω) = 1,
so that this is equivalent to (
∏∞
n=1 1lFn)(ω) = 1.
4
In the following we will denote by F the collection of all events of interest for a particular
experiment. If Ω is finite (or countably infinite), we usually take
F := P(Ω) = {A : A ⊂ Ω},
the power set of Ω. Note that if |Ω| = N , then |P(Ω)| = 2N . Exercise: Why?
Example 1.5. In the coin tossing experiment C2, we take
F = P({H,T}2) = {∅, {HH}, {HT}, {HH,HT}, . . . ,Ω},
Note that here |F| = 24 = 16.
However, if Ω is uncountable, we cannot simply assign probabilities to all subsets of
Ω. If we tried, this would lead to contradictions (look up ‘non-measurable sets’ or see
unit ‘Measure theory; also check out the Banach-Tarski paradox for some surprising
consequences). The solution is not to assign probabilities to all subsets of Ω, but only
to a large enough collection that contains all the events of interest. For consistency, this
collection needs to satisfy the following rules:
Definition 1.6 (σ-algebra). F ⊆ P(Ω) is a σ-algebra (or σ−field, or event space) on Ω
if
(i) ∅ ∈ F ,
(ii) if F ∈ F , then F c ∈ F and
(iii) if (Fn : n ∈ N) is a sequence of sets in F (i.e. Fn ∈ F ∀n), then
⋃
n∈N Fn ∈ F .
Note that these conditions imply that Ω = ∅c ∈ F by (i) and (ii). Also, ⋂n Fn =
(
⋃
n F
c
n)
c ∈ F by using (ii) and (iii).
Example 1.7. As an example of very small σ-algebra consider the following. For any
subset E ⊂ Ω, we have F := {∅, E,Ec,Ω} is a σ-algebra on Ω. Exercise: Check!
Definition 1.8. If A ⊆ P(Ω) is a collection of subsets of Ω, then σ(A) is known as the
σ-algebra generated by A and is the smallest σ-algebra on Ω containing A.
Note that σ(A) is the intersection of all σ-algebras on Ω that contain A. This σ-algebra
is non-empty, since P(A) is always a σ-algebra that contains A.
Example 1.9. Coming back to the previous example, we can check that if E ⊂ Ω, then
σ(E) = {∅, E,Ec,Ω}.
Exercise: Can you find σ({E,F})? It is already a bit more complicated!
Examples 1.10 (Important examples).
5
(i) Let Ω = R. We cannot take F := P(Ω), since that is badly behaved. So instead
we let
G = {(−∞, x] : x ∈ R}
and take F = σ(G). This σ-algebra is known as the Borel σ-algebra on R and it
denoted by B(R). It contains all subsets of R that are relevant for us and that we
need to assign probabilities to. For example, note
(a) (−∞, x) = ⋃∞n=1(−∞, x − 1/n], so (−∞, x) ∈ B(R) by property (iii) of a
σ-algebra.
(b) We could take H := {(x,∞) : x ∈ R} or I = {(−∞, x) : x ∈ R} and get the
same Borel σ-algebra, i.e. B(R) = σ(G) = σ(H) = σ(I).
(ii) Infinite coin tossing; for experiment C∞, note that |ΩC∞ | ∼= 2N, which is uncount-
able. If Fn := {ω ∈ ΩC∞ |ωn = H} is the event of getting a head on the n-th toss,
then taking F = σ(Fn : n ∈ N) contains all of the events we need (including the
event Gp from Example 1.2)
1.2 Probability Measures
Definition 1.11 (Kolmogorov’s axioms). Let Ω be a sample space and F a σ-algebra
on Ω. Then a map P : F → [0, 1] is called a probability measure if
(i) P(Ω) = 1;
(ii) σ-additivity: If (Fn : n ∈ N) is a sequence of disjoint events in F , i.e. Fn ∈ F ∀n
and Fi ∩ Fj = ∅ ∀i 6= j, then
P
( ⋃
n∈N
Fn
)
=
∑
n∈N
P(Fn)
Note a consequence of the axioms is that P(∅) = 0. We interpret P(F ) as the probability
of the event F occurring. Also, we call the triple (Ω,F ,P) a probability space (or a
probability triple).
These axioms lead to the natural question, whether probability spaces even exist. This
question is answered in the following construction of the fundamental model. Let Ω =
[0, 1]. The Borel σ-algebra on [0, 1], denoted by B([0, 1]), can be generated by σ(G)
where G := {[0, x] : x ∈ [0, 1]}.
Theorem 1.12 (Lebesgue, Borel). There exists a unique probability measure P on(
[0, 1],B([0, 1])) such that P([0, x]) = x ∀x ∈ [0, 1].
The fundamental model allows us to choose a single number between [0, 1] according to
the uniform distribution. It is quite an amazing fact that every experiment, no matter
6
how complicated, can be constructed out of the single experiment (as in the above
theorem). [For a hint how this might work, see the problem sheets].
In the following we collect some of the properties of a probability measure. Some of these
you have already seen in the first year probability course (and some will be proved on
the problem sheet).
Proposition 1.13 (Properties of P). Let (Ω,F ,P) be a probability triple, then
(i) If A ∈ F , then P(Ac) = 1− P(A);
(ii) If A,B ∈ F and A ⊆ B, then P(A) ≤ P(B);
(iii) If A1, . . . , An ∈ F are disjoint events, so Ai ∩Aj = ∅ ∀i 6= j, then
P
( n⋃
i=1
Ai
)
=
n∑
i=1
P(Ai)
(iv) Boole’s inequality: For any A1, . . . , An ∈ F ,
P
( n⋃
i=1
Ai
)
≤
n∑
i=1
P(Ai)
(v) For any A,B ∈ F , we have P(A ∪B) = P(A) + P(B)− P(A ∩B);
Note that the last statement also generalizes to three sets
P(A∪B ∪C) = P(A) +P(B) +P(C)−P(A∩B)−P(B ∩C)−P(A∩C) +P(A∩B ∩C).
Moreover, going one step further we obtain in full generality:
Proposition 1.14. Inclusion-exclusion principle For any F1, . . . , Fn ∈ F ,
P
(
n⋃
i=1
Fi
)
=
∑
i
P(Fi)−
∑
i
∑
i
+ (−1)n−1P(F1 ∩ · · · ∩ Fn)
=
n∑
r=1
(−1)r−1
∑
i1
Proof. We will give an elegant proof later on in Chapter 2.
Proposition 1.15 (Continuity of P).
(a) Continuity from below: Consider events A1 ⊆ A2 ⊆ · · · and A :=
⋃
n∈NAn, then
P(An) ↑ P(A). That is, the sequence
(
P(An) : n ∈ N
)
is an increasing sequence with
limit P(A).
7
(b) Continuity from above: Suppose B1 ⊇ B2 ⊇ · · · and B :=
⋂
n∈NBn, then P(Bn) ↓
P(B).
Example 1.16. Consider the eventAn := {a population of animals is extinct by time n},
then An ⊆ An+1 and
A =
⋃
n
An = {population eventually becomes extinct}.
Hence, by the continuity of P, we have that
P(An) ↑ P(A).
8
2 Random Variables
When conducting a random experiment, we are often not only directly interested in the
outcome, but rather in various ‘observables’ that can be read off from the experiment.
E.g. when tossing a dice several times, we might only be interested in the sum of all the
throws. That is we want to understand mappings from Ω to R, since we are typically
making a real-valued observation. This naturally leads to the concept of a random
variable. Indeed, working on the same probability space we might also be interested in
understanding different observables (e.g. also the difference of the first two throws of the
dice). Later on in the course, we might often ‘forget’ about the underlying probability
space and write our assumptions in terms of random variables only.
Definition 2.1. A random variable X on R is a mapping X : Ω→ R such that
{X ≤ x} := {ω ∈ Ω : X(ω) ≤ x} ∈ F ∀x ∈ R (1)
Remark 2.2. (i) Note that since B(R) = σ({(−∞, x] : x ∈ R}) and F is a σ-algebra,
then one can show that
X−1(A) := {X ∈ A} := {ω ∈ Ω |X(ω) ∈ A} ∈ F for any A ∈ B(R) (2)
We say that X is an F-measurable function from Ω to R.
(ii) Often we want random variables that can take values on R ∪ {+∞,−∞} or R+ ∪
{+∞}. The definition above extends simply by replacing R with these.
(iii) Note: everything that you are likely to meet in probability will be suitably mea-
surable! So you should be aware of these problems, but you don’t have to worry
too much about these.
Example 2.3. Consider Experiment Q from Example 1.1, where we model the arrival
times of customers in a queue. So if ω = (ω1, ω2, . . . ) ∈ ΩQ, and we are interested in
T1, the time of the first arrival at the queue, then we can formally define T1 : Ω → R
as T1(ω) = ω1. Another example might be the gap between the arrival of the first and
second customers: G1 : Ω→ R, is given by G1(ω) := ω2 − ω1.
In order to describe a random variable, we introduce the concept of its distribution.
Definition 2.4. The distribution of a random variable X is the probability measure PX
defined on (R,B(R)) via
PX(A) = P(X ∈ A) for any A ∈ B(R).
The distribution function FX of a random variable X on (Ω,F ,P) is the map FX : R→
[0, 1] defined by
F (x) := P(X ≤ x) = P({ω ∈ Ω : X(ω) ≤ x})
9
Notice that in order for these quantities to make sense, we are really using that X is a
random variable so that (2) and (1) hold. As before in Remark 2.2 it turns out that the
distribution function completely describes the distribution of a random variable.
On the problem sheet, we will see that any distribution function F is an increasing, right-
continuous function such that limx→−∞ F (x) = 0 and limx→∞ F (x) = 1. Conversely,
one can show that any function F with these properties is the distribution function of a
real-valued random variable.