ECMT2130 discrete random variables
discrete random variables
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
ECMT2130 – Tutorial 3
Conceptual Questions
1) What does it mean for two events to be independent?
It means an event contains no information about another event. Intuitively, if we are told
something about event A, this does not help us understand anything about event B. In terms of
probability, if two events are independent then P(A|B) = P(A), so the likelihood of observing
A isn’t affected by B. Recall that if two events are independent then their correlation is zero,
but not the other way around (necessarily).
2) Provide two examples of discrete random variables and continuous ones.
Discrete: (1) number of times the daily price of a bond will fall in the next month; and (2) the
number of times a football player received a yellow card in a tournament.
Continuous: (1) the exchange rate between the British Pound (GBP) and the Australian Dollar
(AUD); and (2) the average amount of rain that falls in the blue mountains every month.
3) What is the difference between a probability mass function (PMF) and a probability
density function (PDF)? What is the difference between a PDF and a CDF?
The PMF is suitable for discrete distributions while the PDF is appropriate for continuous
distributions. The CDF represents the likelihood of an accumulation of events happening, and
this is measured normally from minus infinity (or a minimum value assumed by the random
variable) until a point of interest. The PDF is designed to evaluate the chance of a range of
values (or even a particular value, but we know this is not interesting). It is important to note
that many questions we can ask about the probability of a range can be answered by both the
PDF and CDF – it depends on the information you have.
You can find the mathematical definitions in the lecture slides.
4) Would one be able to find a derivative for the CDF of a discrete distribution?
No. The CDF of a discrete distribution often involves “kinks”, making it impossible to find a
derivative for.
5) Assuming share prices are a continuous variable, what is the probability of Woodside
share price to be $40 next month?
Zero. Recall that the probability of a particular value of a continuous random variable is zero.
Mathematically:
Pr( = ) = Pr( ≤ ≤ ) = ∫ ()
= 0
So, if the distribution is continuous, we can write Pr( < ) = Pr( ≤ ).
6) Define expected value, both mathematically and in simple words.
The expected value of a random variable (say, X) is the average of all possible values of X
weighted by the probabilities of observing each of these values. It is the first moment of the
distribution, which is a measure of central tendency of a distribution.
Mathematically:
() =
{
∑ ∗ Pr( = )
=1
,
∫ ()
∞
−∞
,
7) What do we mean when we say a normal distribution is fully characterised by its mean
and variance?
We mean that we can fully “draw” the distribution if we know the exact values of the mean
and variance of a random variable. Notation-wise, if a random variable follows a normal
distribution, we write ~(, 2). This implicitly indicates we need and 2 to determine
the exact distribution of the random variable R.
8) What does the kurtosis measure?
The kurtosis measures the “fatness” of the tails of a distribution. Intuitively, it measures the
chance of extreme events happening in relation to the mean of the distribution. You can find
the formula for the kurtosis in the lecture slides.
9) What is the skewness and kurtosis of a normal distribution?
The skewness of a normal distribution is zero (indicating perfect symmetry around the
mean/median/mode), while its kurtosis is 3.
10) What is the difference between error term and residuals in regressions?
The error term () is a population concept, while the residual (̂) is an estimate of what the
error term is. Thus, the residual is a sample concept.
Recall that because we don’t have access to the population, the error term is unobservable. But
we can get an idea of how it behaves (if the OLS assumptions are satisfied, of course) by
observing the residuals.
11) What are the 5 essential assumptions for the OLS estimator to produce high-quality
estimates in large samples?
The assumptions are: (1) linearity in parameters, (2) random sampling, (3) no perfect
collinearity, (4) zero conditional mean and (5) homoskedasticity.
12) What does it mean for an estimator to be unbiased?
An estimator is unbiased if its expected value is equal to the true (unobservable) value we are
trying to estimate. Formally, E(estimator) = true value. In the context of regressions with the
OLS estimator: (̂) = .
There are many intuitive ways of understanding unbiasedness. For example, consider the case
of a simple linear regression between wage (dependent variable) and education (independent
variable). For every sample we have access to, we will obtain a different estimate (but all
produced with the same estimator – the OLS). If we obtain infinitely-many estimates, and take
the average of all of them, then we would observe the true/population relationship between
wage and education, if the estimator is unbiased. If the estimator is biased then it’s likely that
the estimator is “missing” something (in my example, this would likely be the case because,
for instance, the number of years of experience a person has also affects their wage and is at
least partially related to education attainment).
13) What does it mean for an estimator to be consistent?
An estimator is consistent if, as the sample becomes larger and larger (i.e., if we include more
observations into our existing sample), the estimates (̂) get closer and closer to the true value
(). Intuitively, a consistent estimator becomes better at guessing what the true value is the
more information we feed into it. Formally:
lim
→∞
Pr(|̂ − | > ) → 0
The expression above states that as the number of observations (n) gets larger and larger, the
difference between the estimates and the true value being larger than a very, very small
number () gets closer to zero. The closer you want ̂ to be of (i.e., the smaller ) the higher
the number of observations necessary.
Numerical and Computation Questions
1) Consider a random variable Y equal to the sum of the outcomes of tossing three fair coins:
= 1 + 2 + 3
Where 1 is the result of tossing the first coin, 2 the second and 3 the third. Assume the
general outcomes for each tossing follows:
= {
1,
0,
, = 1,2,3.
For example, if the second throw results in Heads, then 2 = 1.
a) What possible combination of values can (1, 2, 3) assume in this experiment?
The sample space is S = {(H, H, H), (H, H, T), (H, T, H), (H, T, T), (T, H, H), (T, H, T), (T, T,
H), (T, T, T)}. Therefore, the combination of values is {(1, 1, 1), (1, 1, 0), (1, 0, 1), (1, 0, 0),
(0, 1, 1), (0, 1, 0), (0, 0, 1), (0, 0, 0)}.
This results in possible values for Y to be {0, 1, 2, 3}.
b) Find the distribution function of Y. This involves showing all the possible values for Y and
the chance of observing each value.
As calculated in (a), the possible values for Y are {0, 1, 2, 3}. The probabilities associated
with each of them are 1/8, 3/8, 3/8 and 1/8, respectively.
c) Draw the probability mass function (PMF) and the cumulative mass function (CDF) of the
random variable Y.
d) Find the expected value of Y and Y2.
The expected values are given by:
() = ∑Pr( = ) ∗ =
1
8
∗ 0 +
3
8
∗ 1 +
3
8
∗ 2 +
1
8
∗ 3 = 1.5
(2) = ∑Pr( = ) ∗
2 =
1
8
∗ 02 +
3
8
∗ 12 +
3
8
∗ 22 +
1
8
∗ 32 = 3.0
e) Find the variance and standard deviation of Y.
The variance is given by:
2 = () = (2) − [()]2 = 3 − (1.52) = 0.75
The standard deviation is given by:
= () = √() = √0.75 ≅ 0.87
f) Find the skewness of Y. Hint: if you wish to calculate the skewness, use the fact that
[()] = ∑Pr( = ) ∗ () applied to () = [ − ()]
3. Then, use the formula for
the skewness from the lecture slides.
Does this skewness level make sense when analysing the PMF of Y?
We don’t really need to calculate the skewness because the distribution is perfectly
symmetric. Another way to see this is that the median (
1+2
2
= 1.5) is the same as the mean,
so the skewness is zero.
But we can use the formula for the skewness if you prefer the long way:
() =
[( − ())
3
]
3
Using the hint, we can calculate [( − ())
3
] by using () = ( − ())
3
, so:
[( − ())
3
] =∑Pr( = ) ∗ ( − ())
3
= 0
Because > 0 then, () = 0.
g) Find the kurtosis of Y. Compare it to the kurtosis of a normal distribution.
The kurtosis is given by:
() =
[( − ())
4
]
4
Using the hint, we can calculate [( − ())
4
] by using () = ( − ())
4
, so:
[( − ())
4
] =∑Pr( = ) ∗ ( − ())
4
≅ 1.31
We can obtain the denominator using the square of the variance:
4 = (2)2 = 0.752 = 0.5625
Therefore:
() =
1.31
0.5625
≅ 2.33
The kurtosis is smaller than that of a normal distribution (3).
2) Consider three return series with expected values and variances as in the table below:
a) Calculate ( + + ) and ( + + ). Assume the returns are independent
of one another.
( + + ) = () + () + () = 9.0%
( + + ) = () + () + () = 5%
2 + 10%2 + 13%2 = 2.94%
b) Consider a portfolio with 20% weight on company A, 45% weight on company B and the
remaining in company C? Assuming independence across the returns, calculate the mean and
standard deviation of the portfolio.
() = (∑
=
) = () + () + () = 3.15%
() = (∑
=
) =
2 ∗ () +
2 ∗ () +
2 ∗ () ≅ 0.42%
() = √() ≅ 6.48%
c) How does your answer change if the weights were 30% for company A, 25% for company
B and the remaining for company C? Which portfolio would prefer to buy and why?
() = (∑
=
) = () + () + () = 3.15%
() = √() ≅ 6.54%
Strictly speaking, without knowing the risk preferences (utility function) for the specific
individual making the choice, we cannot tell which one is best (perhaps the individual likes
risk, so they would prefer the second portfolio). But if the individual is risk-neutral or risk-
averse then the first portfolio is better because it offers the same (expected) reward for a
lower risk (measured by the standard deviation).
d) Assume the same portfolio as in (c) but now with (, ) = 0.03, (, ) =
0.04 and (, ) = −0.01. Which portfolio would you prefer now (b, c or d)?
The returns will not be affected by the covariances, only the variances.
() = (∑
=
) =
=
2 ∗ () +
2 ∗ () +
2 ∗ () + 2 ∗ ∗ ∗ (, ) + 2 ∗
∗ ∗ (, ) + 2 ∗ ∗ ∗ (, )
() ≅ 13.16%
3) A marketing manager of a leading firm believes that total sales for the firm next year can
be modelled by using a normal distribution with a mean of $2.5 million and a standard
deviation of $300,000. You may want to use Excel to calculate the probabilities.
a) What is the probability that the firm’s sales will exceed $3 million?
Let X be the level of sales in millions. We know that X ~ N(2.5,0.32)
Pr( > 3) = 1 − Pr( ≤ 3) = 4.78%
This can be easily obtained in Excel by using the function “=1-NORM.DIST(3,2.5,0.3,1)”
(without the quotes). The command NORM.DIST calculates the CDF of a function at a
specific point.
b) What is the probability that the firm’s sales will fall within $150,000 of the expected level
of sales?
Pr(2.5 − 0.15 < ≤ 2.5 + 0.15) = Pr( 2.35 < ≤ 2.65) =
= Pr( < 2.65) − Pr( < 2.35) = 38.29%
In Excel, use “=NORM.DIST(2.65,2.5,0.3,1)-NORM.DIST(2.35,2.5,0.3,1)”.
c) In order to cover fixed costs, the firm’s sales must exceed the breakeven level of $1.8
million. What is the probability that the sales will exceed the breakeven level?
Pr( > 1.8) = 1 − Pr( < 1.8) = 99.02%
In Excel, use “=1-NORM.DIST(1.8,2.5,0.3,1)”.
d) Determine the sales level that has only 9% chance of being exceeded next year.
Pr( > ) = 0.09
1 − Pr( < ) = 0.09 ↔ Pr( < ) = 0.91
= 2.9022
In Excel, use “=NORM.INV(0.91,2.5,0.3)”.
4) Patrick wants to estimate the relationship between the average returns and risk (measured
by the standard deviation of returns). He collects data for several share prices over time and
takes the average of them; he does the same for the standard deviation. He obtains one
observation (one value for return and one for variance) for each company, so he is dealing
with a cross section (not a time series).
He applies to OLS estimator to obtain the following results:
̂ = 5.87 + 1.12 ∗
Both returns and standard deviation are measured in percentage. So, = 4 means company
i‘s average return was 4%.
a) Interpret the estimate for the intercept coefficient. What does it suggest?
The intercept indicates the average return for a stock is 5.87% when the standard deviation is
0%. This suggests that a risk-free asset will on average pay around 5.87% as return.
b) Interpret the estimate for the slope coefficient.
For every one percentage point of extra risk (standard error), the model suggests the returns
will increase by 1.12 percentage points, on average.
Recall that percentage points are changes in variables already coined in percentages. Because
both returns and standard deviations are in %, their changes are in percentage points.
c) Plot line of best fit.
I trust you can do this one on your own.