ECOM30001/ECOM90001: Basic Econometrics
Basic Econometrics
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
ECOM30001/ECOM90001: Basic Econometrics
This tutorial reviews some concepts using the econometrics software package R. Specifi-
cally, the tutorial reviews:
- methods for estimating and interpreting the Linear Probability Model (LPM) in R
- methods for estimating and interpreting the Probit Model in R
- calculating marginal (partial) effects for the Probit Model in R
This tutorial requires one (1) data file:
- tut12 churn.csv
This file can be obtained from the Canvas subject page.
In addition, the R script file tut12.R provides the program code necessary to complete
the tutorial. This R script file uses the following package(s) which need to be installed
prior to running the R script file:
stargazer : for easily producing output tables in R
ggplot2 : for easily producing graphs in R
car : for easily conducting hypothesis tests in R
lmtest : for easily conducting the Ramsey RESET test in R
sandwich : for easily calculating robust (Huber-White) heteroskedasticty consistent
standard errors in R
margins : for easily calculating marginal effects in R
These can be installed directly in RStudio from the packages tab or by using the com-
mand install.packages() and inserting the name of the package in the brackets.
1
Question 1
Suppose you have been engaged as a consultant for a large telecommunications provider.
Your current role is to investigate the characteristics of customers who decide to switch
to another competitor.
Consider the following econometric model for customer churn in a clearly defined narrow
geographic segment of the market:
churn∗i = β0 + β1 partneri + β2 dependentsi + β3 tenurei + β4 tenure
2
i
+ β5 monthlychargesi + β6 contract2i + β7 contract3i + β8 paperlessi + εi (1)
where ε|Xi ∼ N (0, σ2ε) and:
churn∗ = latent variable determining whether a customer switched
providers in the last month
partner = 1 if customer has a partner, 0 otherwise
dependents = 1 if customer has dependents, 0 otherwise
tenure = number of months customer has stayed, 0 otherwise
monthlycharges = monthly account charges, in dollars
contract1 = 1 if customer does not have a contract (‘month-to-month’), 0 otherwise
contract2 = 1 if customer has a 12-month contract, 0 otherwise
contract3 = 1 if customer has a 24-month contract, 0 otherwise
paperless = 1 if customer receives bills electronically, 0 otherwise
Note that contract1 is the omitted category.
a) Provide a brief interpretation of the latent variable churn∗
Solution: Suppose, each month, customers weigh up the advantages and disadvan-
tages of remaining with the provider. They would trade-off the cost of the service,
against the quality and reliability of the service. In principle, customers (implicitly)
calculate a score that determines their decision to remain with the provider. Let
this ‘customer dissatisfaction’ score be denoted by churn∗i . Customers that perceive
the benefits of remaining with the provider exceed the disadvantages will choose to
remain with the provider for another month (churn∗i sufficiently low). Similarly, cus-
tomers that perceive the benefits of remaining with the provider do not exceed the
disadvantages will choose to switch providers (churn∗i sufficiently high). There will
be some implied threshold level where customers are indifferent to remaining with
the provider, or switching providers. For example, there might be some switching
rule as follows:
churni =
{
1 if churn∗i ≥ H
0 if churn∗i < H
where H represents the threshold ‘cut-off’ level for switching providers, based upon
2
the sufficiently high level of the ‘customer dissatisfaction’ variable churn∗i .
b) The data set tut12 churn contains 7, 043 observations on the population of cus-
tomers for this large telecommunications provider in a specific geographic segment.
This data contain the following indicator variable:
churni =
{
= 1 if churn∗i ≥ 0
= 0 if churn∗i < 0
This suggests the following econometric model:
churni = β0 + β1 partneri + β2 dependentsi + β3 tenurei + β4 tenure
2
i
+ β5 monthlychargesi + β6 contract2i + β7 contract3i + β8 paperlessi + εi
(2)
3
Estimate model (2) by Ordinary Least Squares (OLS).
Solution: The estimation results are provided in Figure 1:
i) What is the interpretation of your estimate for β5? What is your interpreta-
tion of your estimate of β8?
Solution: The econometric model (2) is a linear model. Recall also that for
a binary dependent variable:
E[churni|Xi] = Pr[churni = 1|Xi]
so the interpretation of β5: the effect on Pr[churni = 1|Xi] associated with
a one dollar increase in the monthly charges, holding all other variables con-
stant. Note that the magnitude of the marginal effect is in terms of probability
points—a one dollar increase in monthly charges, holding all other variables
constant, changes the probability of churn by 100 ∗ β5 probability points. The
results in column (1) of Figure 1 provide the estimate b5 = 0.0032 so a one dol-
lar increase in monthly charges, holding all other variables constant, changes
the probability of churn by 0.32 probability points.
The parameter β8 is the coefficient attached to the paperless indicator vari-
able. Consequently, it has the ‘usual’ interpretation for a linear model:
β8 = Pr[churni = 1|paperlessi = 1,Xi]− Pr[churni = 1|paperlessi = 0,Xi]
so the interpretation of β8 is the difference in Pr[churni = 1|Xi] for customers
who receive electronic bills, relative to customer who do not receive electronic
bills, holding all other variables constant. Note that the magnitude of the
marginal effect is in terms of probability points—holding all other variables
constant, the probability of churn for customers who receive electronic bills
is 100 ∗ β8 different to that for customers who do not receive electronic bills.
The results in column (1) of Figure 1 provide the estimate b8 = 0.0717 so,
holding all other variables constant, customers who receive electronic bills,
have a probability of churn that is higher by 7.17 probability points, relative
to customers who receive paper bills.
ii) At the 5% level, test for the presence of heteroskedasticity using White’s test.
Since there are numerous indicator variables in model (2), use the ‘no cross
products’ form of White’s test. Is there any evidence of heteroskedasticity?
Solution: The OLS estimation results are presented in column 1 of Figure 1
The null hypothesis is that the errors are homoskedastic against the alternative
hypothesis that the errors are heteroskedastic. The test statistic is calculated
from the auxiliary regression:
eˆ2i = α0 + α1 partneri + α2 dependentsi + α3 tenurei + α4 tenure
2
i
+ α5 monthlychargesi + α6 contract2i + α7 contract3i + α8 paperlessi
+ α9 tenure
4
i + α10 monthlycharges