Two variable linear regression analysis
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
EC226 (Term 1: Handout 1) 1 INTRODUCTION
Two variable linear regression analysis
1 Introduction
Econometrics literally means economic measurement. Hendry1 described econometrics as “An anal-
ysis of the relationship between economic variables ... by abstracting the main phenomena of interest
and stating theories thereof in mathematical form” , and Samuelson et al.2 stated that “Econometrics
may be defined as the quantitative analysis of actual economic phenomena” Other authors have been
less complementary, Leamer3 believes that econometrics “is practised at the computer terminal (and)
involves fitting many, perhaps thousands, of statistical models. One or several that the researcher
finds pleasing are selected for reporting purposes” .
Econometrics has three major uses:
1. Describing economic reality.
2. Testing hypotheses about economic theory.
3. Forecasting future economic activity.
Econometricians attempt to quantify economic relationships that had previous been only theoretical.
To undertake this requires 3 steps:
1. Specifying/identifying theoretical economic relationship between the variables.
2. Collecting the data on those variables identified by the theoretical model.
3. Obtaining estimates of the parameters in the theoretical relationship.
1D. F. Hendry. (1980). Econometrics-Alchemy or Science? Economica, 47(188), 387–406
2Samuelson, P. A., T. C. Koopmans, and J. R. N. Stone. Report of the Evaluative Committee for Econometrica.
Econometrica 22, no. 2 (1954): 141–46
3Leamer, E. E. (1983). Let’s Take the Con Out of Econometrics. The American Economic Review, 73(1), 31–43
1
EC226 (Term 1: Handout 1) 2 CORRELATION VS REGRESSION ANALYSIS
2 Correlation vs Regression analysis
2.1 Correlation
In Economics we are interested in the relation between 2 or more random variables, for example:
• Sales and advertising expenditure
• Personal consumption and disposable income
• Investment and interest rates
• Earnings and schooling
While there are many ways in which these pairs of variables might be related – a linear relationship
is often a useful first approximation and this can be detected via a scatter plot, of one variable against
the other.
A measure of linear association between two random variables x and y is the covariance, which
for a sample of n pairs of observations (x1, y1) . . . (xn, yn) is calculated as:
cov(x, y) =
n∑
i=1
(xi − x)(yi − y)
n− 1
The covariance measures the average cross product of deviations of x, around its mean, with y,
around its mean. If high (low) values of x - relative to its mean - are associated with high (low)
values of y - relative to its mean – then we get a high positive covariance (see Figure 1). Conversely
if high (low) values of x are associated with low (high) values of y we get a negative covariance (see
Figure 2). A zero covariance occurs when there is no predominant association between the x and y
values (see Figure 3). The covariance is a linear association between x and y values and would be
approximately zero for a quadratic association (see Figure 4).
2
EC226 (Term 1: Handout 1) 2 CORRELATION VS REGRESSION ANALYSIS
3
EC226 (Term 1: Handout 1) 2 CORRELATION VS REGRESSION ANALYSIS
The covariance statistic is not scale free and multiplying the x variable by 100 multiplies the
covariance by 100. A scale free measure is a correlation, defined as:
corr(x, y) ≡ ρ(x, y) = cov(x, y)√
V (x)V (y)
as
corr(ax, y) =
cov(ax, y)√
V (ax)V (y)
=
a.cov(x, y)√
a2V (x)V (y)
=
a.cov(x, y)
a
√
V (x)V (y)
= corr(x, y)
ρ is a population parameter of association between the random variables x and y, and:
1. −1 ≤ ρ(x, y) ≤ 1
2. ρ(x, y) = −1 ⇒ perfect negative association
3. ρ(x, y) = 1 ⇒ perfect positive linear association
4. ρ(x, y) = 0 ⇒ no linear association
5. As|ρ(x, y)| increases ⇒ stronger association.
6. ρ(x, y) = ρ(y, x)
2.2 Regression
By contrast linear regression looks at the linear causal association between the random variables x
and y. In particular, we talk about the variable, x, taking a specific value and we are interested in
the response of y to a change in this value of x. So in our examples above we might be interested in
• Changes in sales caused by increased advertising expenditure
• Changes in personal consumption caused by increased disposable income
• Changes in investment caused by increased interest rates
4
EC226 (Term 1: Handout 1) 2 CORRELATION VS REGRESSION ANALYSIS
• Changes in earnings caused by increased schooling
In the simplest type of linear regression analysis we model the relationship between 2 variables y and
x and this is assumed to be a linear relationship. In particular, we are interested in the expected
value of the random variable, y, given a specific value for x. Given linearity this is 4:
E(y|x) = α+ βx
when
E(y|x = 0) = α⇒ expected value of y when x=0 (invariably do not interpret this)
E(y|x+ 1) = α+ β(x+ 1)
therefore,
β = E(y|x+ 1)− E(y|x)⇒ change in the expected value of y for a unit increase in x.
y – is known as the dependent variable (endogenous variable or regressand)
x – is known as the independent variable (exogenous variable, explanatory variable or
regressor).
The actual values of the dependent variable, y, will not be the same as the expected value and we
denote the discrepancy (error or disturbance) between the actual and expected value by εi, where:
εi = yi − E(yi|xi) = yi − α− βxi
Rearranging we have
yi = α+ βxi + εi i = 1, 2, . . . , n (1)
and this is the TRUE (but unknown) relationship between y and x and is made up of two components:
1. α+ βxi - the systematic part
2. εi- the random (non-systematic) component.
4Appendix 1 has some rules on expectations and variances.