ECON 2070: INTRODUCTION TO STRATEGIC THINKING
INTRODUCTION TO STRATEGIC THINKING
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
ECON 2070: INTRODUCTION TO STRATEGIC THINKING
PROBABILITY REVIEW
METIN UYANIK
This is a review of some basic probability concepts used in this course.
1. Random Variables
A random variable, quite commonly denoted as X (or other capital letters), is a variable
that will eventually take exactly one of many possible values. When we find out which
one of the possible values it has taken, we say it is realized and the value taken is known
as the realization or outcome. It is important to distinguish between a random variable
and its realizations. For instance, suppose we are going to toss a coin. Before the coin
is tossed, the outcome of this coin toss is a random variable — it can take the values of
either “head” or “tail”. Once the coin is tossed (and we have looked at it), the outcome
of the coin toss is realized.
1.1. State Space. The set of all possible realizations associated with a random variable
is called the state space. Each of its elements, a possible realization, is sometimes known
as a state. Here are some examples of state spaces:
Example 1.1 (Coin Toss). Consider a coin toss. Then the state space is:
{Head,Tail} .
What if we toss two coins? The state space becomes:
{(Head,Head), (Head,Tail), (Tail,Head), (Tail,Tail)} .
where each state (or outcome) is a pair (coin 1 outcome, coin 2 outcome).
Example 1.2 (Dice Roll). Suppose we roll a (normal, 6-face) dice. Then the state space
is:
{1, 2, 3, 4, 5, 6} .
In both examples above, the state space is finite — that is, there are only finitely many
possible realizations. There are infinite state space as well, as in the next two examples.1
1For the maths geeks among you: The state spaces in the two examples are both uncountably infinite.
The first one is bounded and the second one is not. There are countably infinite state space as well
— such as the support of the Poisson distribution — but I don’t think we will use any of those in this
course.
1
Example 1.3. Suppose we draw a 1 metre line from point A to B. Next someone will
close his/her eye and drop a needle on the line. The random variable we are interested in
is the length (in metres) from point A to the point where the needle touch the line. (For
simplicity assume that there is no smallest unit of length measurement — we can measure
any length exactly and perfectly.) Then the state space is any real number between 0 and
1 (inclusive). In mathematical jargon, it is the closed interval between 0 and 1, denoted
as [0, 1].
Example 1.4. Suppose we are going to randomly pick a person in Australia and find
out his/her income. For simplicity assume that there is no smallest unit of money (i.e.,
there can be any real amount of money). Then the random variable is the income of the
person to be picked, and the state space is the set of all (weakly) positive real numbers.
1.2. Events. An event is a collection of possible realizations. Mathematically, it is a
subset2 of the state space. Take the dice example (Example 1.2). Let E1 be the event
that we “get an even number” and E2 be the event that we “get a number smaller than
or equal to 4”. Then
E1 = {2, 4, 6}
E2 = {1, 2, 3, 4} ,
both of which are subsets of the state space.
For any event E, the event “not E” is the set of all states not in E, typically denoted
as Ec. Mathematically, it is the complement of E relative to the state space. For any
two events E1 and E2, the event “E1 and E2” is the set of all states that are in both E1
and E2. Mathematically, it is the intersection of E1 and E2. The event “E1 or E2” is the
set of all states that are in either E1 or E2, or both. Mathematically, it is the union of
E1 and E2. How about the event “E1 but not E2” (i.e., the set of all states that are in
E1 but not in E2)? Notice that it is the intersection of E1 and (E2)
c, so it can already
be defined. As a shorthand, though, we can write it as E1 \ E2.
Using our dice example above,
not E1 = E
c
1 = {1, 3, 5}
not E2 = E
c
2 = {5, 6}
E1 and E2 = E1 ∩ E2 = {2, 4}
E1 or E2 = E1 ∪ E2 = {1, 2, 3, 4, 6}
E1 but not E2 = E1 \ E2 = {6}
E2 but not E1 = E2 \ E1 = {1, 3} .
2Again, if you are a maths geek: strictly speaking, the subset needs to be an element of the sigma algebra
which defines the probability space — but if you know what this footnote means, perhaps you should
take a more advanced course (e.g., ECON8030) rather than this one.
2
Ec
Ω
E
(a) Complement of E: The rectangle
represent the state space (Ω)
E1 E2
E1 and E2
(b) E1 and E2: The intersection
E1 E2
E1 or E2
(c) E1 or E2: The union
E2
E1
E1 \ E2
(d) E1 but not E2
E1
E2
E2 \ E1
(e) E2 but not E1
Figure 1. Complement, Intersection, Union and Differences of Sets
See Figure 1 for a graphical (Venn Diagram) illustration.
Two events E1 and E2 are said to be mutually exclusive if they have no common
elements, that is, if “E1 and E2” is an empty set.
2. Probability
Each random variable is characterized by a probability distribution which assigns a
number to each of the possible events such that:
(1) The number assigned to each event is between 0 and 1 (inclusive);
(2) The number assigned to the whole state space is 1; and
3
(3) For any collection of events E1, E2, . . . where every pair of events in the collection
are mutually exclusive, the number assigned to the event “E1 or E2 or . . .” is the
sum of the numbers assigned to each of these individual events.
(These may sound complicated, we will see some examples in the next section.)
2.1. Probability Rules. The above properties of probability allow us to derive the
following rules:
Complement: For any event E,
Pr [Ec] = 1− Pr [E] .
This implies that if an event is an empty set, it must get zero probability — as
the empty set is the complement of the state space.
Union: For any pair of events E1 and E2,
Pr [E1 ∪ E2] = Pr [E1] + Pr [E2]− Pr [E1 ∩ E2] .
If E1 and E2 are mutually exclusive, then Pr [E1 and E2] = 0 (since the inter-
section of E1 and E2 is empty, and an empty set must get zero probability, see
above). In this case, the last term of the above equation will drop out.
Set Difference: For any pair of events E1 and E2,
Pr [E1 \ E2] = Pr [E1]− Pr [E1 ∩ E2] .
2.2. Conditional Probability. Sometimes (actually, quite often) we are interested in
the following question: What is the probability of an event E1 given that event E2 occurs?
This is known as conditional probability. The conditional probability of E1 given E2 is
given by
Pr [E1|E2] = Pr [E1 and E2]
Pr [E2]
(1)
whenever Pr [E2] > 0. If Pr [E2] = 0, the conditional probability is undefined.
Two events E1 and E2 are independent if knowing the occurrence (or non-occurrence)
of one does not affect our probability evaluation of the other. Formally, E1 and E2 are
independent if and only if
Pr [E1 and E2] = Pr [E1] Pr [E2] . (2)
To see why this makes sense, rewrite Equation (1) to get
Pr [E1 and E2] = Pr [E1|E2] Pr [E2] ;
and if we switch the names of E1 and E2 in the above equations we will also get
Pr [E1 and E2] = Pr [E2|E1] Pr [E1] .
4
Thus Equation (2) essentially means
Pr [E1|E2] = Pr [E1]
Pr [E2|E1] = Pr [E2] .
This is what we are saying before — the probability of E1 (respectively E2) does not
change whether I condition on E2 (respectively, E1) or not.
3. Probability Distribution Functions
3.1. Finite State Space. The above may sound complicated, but for random variable
with a finite state space (with n states), a probability distribution is really just a list of
n numbers (p1, p2, . . . , pn) such that
0 ≤ pi ≤ 1 for each i = 1, . . . , n, and
n∑
i=1
pi = 1;
and that the probability of any event is the sum of the probabilities of all the states
within that event.
For example, if X is a fair coin toss, then
Pr[X = x] =
0.5 if x = head0.5 if x = tail .
Needless to say, the probabilities on different states need not be the same. For instance,
if the coin is loaded so that it comes up with head 4 times more often than tail, we would
have
Pr[X = x] =
0.8 if x = head0.2 if x = tail .
When there are more than two states (but still finitely many of them), we will just
have a longer list. In the next example, X is the random variable associated with a
peculiarly loaded dice (I don’t know if it’s physically possible to load a dice to get these
probabilities):
Pr[X = x] =
0.05 if x = 1
0.1 if x = 2
0.15 if x = 3
0.2 if x = 4
0.1 if x = 5
0.4 if x = 6
.
5
Suppose the state space is a subset of real numbers, we can also generate a cumulative
distribution function (cdf) for the random variable, defined by
F (x) = Pr[X ≤ x].
In the peculiarly loaded dice example above, the cdf of X is
F (x) = Pr[X ≤ x] =
0 if x < 1
0.05 if 1 ≤ x < 2
0.15 if 2 ≤ x < 3
0.3 if 3 ≤ x < 4
0.5 if 4 ≤ x < 5
0.6 if 5 ≤ x < 6
1 if 6 ≤ x
.
3.2. Infinite State Space. There are different kinds of infinite state spaces, but for this
course you will encounter only one kind of infinite state spaces — those that are intervals
of the real number line. So from now on I am only going to talk about this kind of state
spaces.
The issue here is that, if I ask you to randomly pick a number on an interval of the real
number line (and you pick “smoothly” so that you don’t concentrate your picking at any
particular point), the chances that you pick exactly 3.14159265358979 . . . (or any other
number) is essentially zero. So we cannot speak of a probability mass function. The way
to get around this is to talk about the probability of picking something within certain
segments of the line. Next, notice that for any real number x, saying “X ≤ x” is the
same as saying that we pick something within the segment (−∞, x]. Since we can talk
about the probability of ending up within this segment, we can still define the cumulative
distribution function (cdf):
F (x) = Pr[X ≤ x].
I will use the Uniform distribution (on [0, 1]) as an example — it is a very special case,
but suffices for illustration. Think of randomly picking a number between 0 and 1 in a
“fair” way (i.e., no particular segment of the line is favoured). The resulting distribution
is known as the Uniform Distribution on [0, 1]. In this case, the probability of ending up
in any segment is the length of that segment. Hence the cdf of X (the randomly picked
number) is given by
F (x) = Pr[X ≤ x]
0 if x < 0
x if 0 ≤ x ≤ 1
1 if x > 1
.
This is because, the probability of ending up below a certain number x (between 0 and
1) is the length of the segment from 0 to x.
6
Once I have a cdf, I can also find out the probability of ending up in the segment
between a and b (a < b) since this amounts to the event “X ≤ b but not X ≤ a”. Now I
can apply the “E1 not E2” rule above and get
Pr [a < X ≤ b] = Pr [X ≤ b]− Pr [X ≤ a] = F (b)− F (a).
If I pick a and b to be very close to each other, then I am almost getting the “prob-
ability” of ending up at a particular point. However, I cannot really squeeze a and b to
be the same point, for then I will get a probability of 0. Still, I can ask the following
question: suppose I start at point a and move slightly to the right (to get my b), how
fast will the cdf be changing? The answer will tell us how dense the “probability” is con-
centrated around a. Now if you have paid attention in the mathematics review tutorial
you may say, “Aha! Haven’t we learnt the mathematical way of saying ‘how fast is a
function changing when I move a little bit around a point’? It is called differentiation!”
Yes indeed. We are looking at
lim
ε→0
F (a + ε)− F (a)
ε
,
which is by definition the derivative of F at a (if the limit exists — and for everything
you will see in this course it will). This derivative is known as the probability density
function (pdf). It is typically denoted as f (if the cdf is F ). In the uniform distribution
example above, the pdf is
f(x) =
1 if 0 ≤ x ≤ 10 otherwise .
In many cases, the pdf acts “as if” it is a probability function. However, you should
bear in mind that it is a density function. That is, f(x) is NOT the probability of X = x!
An easy way to see this is to realize that f(x) need not be between 0 and 1. For example,
if we have a uniform distribution over [0, 1/2], f(x) = 2 for every x between 0 and 1/2.
There is no way that it can be a probability.
4. Expectation
The expectation of a (real-valued) random variable is its weighted average, weighted
by the probabilities on the outcomes.
Finite State Space: Suppose X is a random variable with n possible realizations,
x1, x2, . . . , xn. And suppose X takes realization x1 with probability p1, x2 with
probability p2, . . . , and xn with probability pn. Then the expectation of X is
given by
E [X] = x1p1 + x2p2 + · · ·+ xnpn =
n∑
i=1
xipi.
7
Continuous State Space: Suppose X is a random variable whose state space is
an interval of real number from a to b (where a may be −∞ and b may be ∞).
Let f be the pdf of X. Then the expectation of X is given by
E [X] =
∫ b
a
xf(x)dx.
Sometimes we would like to look for the expectation of a random variable conditional on
a certain event. The event that we are conditioning on may have given us more informa-
tion about the realization of the random variable, and therefore changes its expectation.
In such cases, we would like to look for the conditional expectation of a random variable.
Finite State Space: Suppose X is a random variable with n possible realizations,
x1, x2, . . . , xn, and that X takes realization x1 with probability p1, x2 with prob-
ability p2, . . . , and xn with probability pn. Let E1 be an event in the state
space such that Pr [E1] > 0. Without loss we can label the realizations such that
x1, . . . , xm (where m ≤ n) are in E1 and the rest are not. Then the conditional
expectation of X given E1 is:
E [X|E1] = x1p1 + x2p2 + · · ·+ xmpm
p1 + p2 + · · ·+ pm .
Alternatively, if you have already figured out the conditional probabilities, you
can also use the following formula:
E [X|E1] = x1 Pr [x1|E1] + x2 Pr [x2|E1] + · · ·+ xn Pr [xn|E1] =
n∑
i=1
xi Pr [xi|E1] .
You may ask why I sum to n instead of m — it really doesn’t matter as the
conditional probability of those states not in E1 is zero.
Continuous State Space: Suppose X is a random variable whose state space is
an interval of the real number line. Let f be the pdf of X. Let E1 be an event
in the state space such that Pr [E1] > 0. Then the conditional expectation of X
given E1 is:
E [X|E1] =
∫
x∈E1
x
f(x)
Pr [E1]
dx.
The function f(x)/Pr [E1] is known as the conditional pdf (of X conditional on
E1). Note that the denominator (Pr [E1]) is independent of x and can therefore
be pulled out of the integration operator.