Econometric Theory and Methods
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
ECON3203 Econometric Theory and Methods
Linear Regression
Table of content
1. Simple Linear Regression
2/22
Recommended reading
• Chapter 3, An Introduction to Statistical Learning with
Applications in R by James et al.: easy to read, sloppy
discussions sometimes, comes with R/Python code for
practice.
• Chapters 3, The Elements of Statistical Learning by Hastie et
al.: well-written, deep in theory, suitable for students with a
sound maths background.
3/22
Spending Data
The Spending Data contains information on 1000 customers in a
customer database for the company Direct Marketing (DM). The
data consist of
• AmountSpent ($): the amount spent by each customer in one
year on DM products.
• Salary ($)
• Catalogs: number of shopping catalogs sent to each
customer per year
• Children: number of children each customer has
• and many others
The company would like to build a model to explain/predict the
amount each customer spending on its products, based on the
other variables.
4/22
Spending Data
5/22
Spending Data
Questions might be asked
• Is there a relationship between the spending amount and the
number of catalogs, for example?
• How is the salary associate with the spending amount?
• How are the salary and the number of children interactively
associate with the spending amount?
• What is the best subset of the explanatory variables in terms
of predicting the spending amount?
• Given a particular customer, how much spending amount the
company would expect from he/she?
• ...
6/22
Linear Regression
Y = β0 + β1X1 + ...+ βpXp + ϵ
• Y : the response variable or dependent variable to be predicted
• X1, ..., Xp: potential predictors or covariates or independent
variables
• the intercept β0 and slope coefficients β1, ..., βp are unknown
population parameters to be estimated
• ϵ: error term or something that can’t be explained by the
model. Assume E(ϵ) = 0 and V(ϵ) = σ2, σ2 is also unknown
Caution: the model imposes many assumptions that need to be
checked/testified! Y should be continuous, not suitable for
categorical/binary response variables. The Xj can be both
continuous/discrete or categorical.
7/22
Linear Regression
Y = β0 + β1X1 + ...+Xp + ϵ
• Linear regression might sound like too simplistic, it’s hardly
true that Y is linearly dependent on the Xj ’s
• But linear regression is extremely useful and important. It
forms a basic framework for non-linear regression and many
advanced regression models
• The linearity is in terms of the coefficients βj ’s, not the Xj ’s.
E.g.,
Y = β0 + β1X1 + β2eX2 + β3X31 + β4X1X2 + ϵ
is still a linear regression model!
• Precisely, X1, X2 are called covariates. In this model, X1, X31 ,
eX2 and X1X2 are called predictors or features
8/22
Simple Linear Regression
Simple Linear Regression
Y = β0 + β1X + ϵ, E(ϵ) = 0, V(ϵ) = σ2
So the conditional mean of Y given X = x is a linear function of x
µY |X=x = E(Y |X = x) = β0 + β1x
Our working example
AmountSpent = β0 + β1Catalogs+ ϵ
9/22
Simple Linear Regression
Our working example
AmountSpent = β0 + β1Catalogs+ ϵ
Let β̂0 and β̂1 be estimates of β0 and β1 respectively. Then
µ̂Y |X=x = β̂0 + β̂1x
is a point estimate of E(Y |X = x). This is an estimate of the
average spending amount among all customers who are sent x
catalogs per year.
We also use ŷ = β̂0 + β̂1x to predict the spending amount of an
individual customer who is sent x catalogs.
10/22
Estimating the coefficients by the least squares method
• ŷi = β̂0 + β̂1xi is the prediction of the observation yi when
X = xi. So ei = yi − ŷi represents the ith residual
• We define the residual sum of squares (RSS) as
RSS(β̂0, β̂1) =
n∑
i=1
e2i =
n∑
i=1
(
yi − (β̂0 + β̂1xi)
)2
.
Note that {yi, xi, i = 1, ..., n} is the training dataset.
• The best β̂0 and β̂1 will be the ones that minimise
RSS(β̂0, β̂1).
• This is known as Least Squares Method. No probability
distribution of ϵ is needed.
11/22
Estimating the coefficients by the least squares method
• β̂0 = ...
• β̂1 = ...
12/22
Estimating the coefficients by the least squares method
β̂0 = y¯ − β̂1x¯
β̂1 =
∑(xi − x¯)(yi − y¯)∑(xi − x¯)2 = x¯y − x¯y¯x¯2 − (x¯)2
13/22
Estimation by the maximum likelihood method
• Assume that ϵi ∼ N (0, σ2), i = 1, ..., n. Note that
yi = β0 + β1xi + ϵi
Therefore yi ∼ N (β0 + β1xi, σ2).
• E.g., the amounts spent by customers who are sent x catalogs
are normally distributed with mean β0 + β1x and variance σ2.
Seems a reasonable assumption!
• The likelihood function is
p(y|β0, β1, σ2) =
n∏
i=1
1√
2πσ2
exp
(
−(yi − β0 − β1xi)
2
2σ2
)
• Maximising this likelihood with respect to β0 and β1 leads
exactly to the same least squares estimates β̂0 and β̂1
• Estimate of σ2: σ̂2= 1n
∑n
i=1
(
yi−(β̂0+β̂1xi)
)2
.
14/22
Brief introduction to Maximum Likelihood Estimation
Given data y = {y1, ..., yn}. In almost all areas of applications, we
then assume the data come from a statistical model, depending on
a vector of unknown parameters θ. This model allows us to write
down the density p(yi|θ) of yi.
Example. yi iid∼ N (µ, 1), i = 1, 2, . . . , n. The joint density of y is
p(y|µ) =
n∏
i=1
p(yi|µ) =
n∏
i=1
1√
2π
e−
(yi−µ)2
2
=
( 1√
2π
)n
exp
[
−12
n∑
i=1
(yi − µ)2
]
This function, considered as a function of µ, measures how likely a
value of µ is as the underlying parameter that generated the data
{y1, . . . , yn}
15/22
Brief introduction to Maximum Likelihood Estimation
Let y = {y1, . . . , yn} be a random sample from a distribution with
pdf p(y|θ). The likelihood function, as a function of θ, is defined as
p(y|θ) =
n∏
i=1
p(yi|θ).
• This likelihood function reflects the probability of observing
the data y if θ is the true parameter
• We wish to estimate the (unknown) true value of θ that
generated the data y by those values that maximise p(y|θ).
• The maximum likelihood estimator (MLE) of θ is a value that
maximises p(y|θ).
• MLE is one of the most popular estimation methods.
16/22
Maximum Likelihood Estimator
Exercise. Suppose that {x1 = 5, 0, 1, 1, 0, 3, 2, 3, 4, x10 = 1} are
n = 10 observations from the Poisson distribution with pdf
f(y|θ) = e
−θθy
y! .
• Write down the log-likelihood function for the sample.
• Find the MLE of θ
17/22
Spending data example