MT2300 Linear statistical model
Linear statistical model
MT2300 Linear statistical model
Terminology
We deal with measurements of several variables for each of n experimental units or
individuals. The variables are of two types (though the distinction between them is not
always rigid in applications): those of primary interest to the investigator and those
which might provide supplementary or background information. The variables of the
former type are called response, outcome or dependent variables, while those of latter
type are called explanatory, independent or predictor variables. Econometricians also
use the terms endogenous and exogenous to distinguish the two types of variables. The
explanatory variables are used to predict or to understand the response variables.
Relation between variables, models
We distinguish between a functional relation and a statistical relation. The functional
relation between the independent variable X and the dependent variable Y is often ex-
pressed as a mathematical formula
Y = f(X)
and the main feature of this relation is that the observations (xi, yi) (i = 1, . . . , n) fall
directly on the “curve” of the relationship, that is, on the curve y = f(x).
A statistical relation, unlike a functional relation is not a “perfect”one. Very often
explanatory variables are thought of as fixed, and response variables are thought of as
random variables with a distribution depending on the explanatory variables. Therefore,
for each value of an explanatory variable x the response Y may be supposed to be a
random variable with expectation (mean value) f(x) = E(Y |X = x), E(Y |x) in short.
Then a statistician may wish to determine the function f using sample data consisting of
pairs (xi, yi) (i = 1, . . . , n). Function f is called the regression function for regressing Y
on X and X is called the regressor.
The regression function f(x) represents the systematic component of the model. The
systematic component of the model is concerned with overall population features such as
expected values. To emphasize the existence of the random component of the model, the
response is often written in the form
Y = f(x) + ε
where f is the regression function, that is, the systematic component, and ε is the random
component. In most applications ε is a normal random variable with mean zero and
variance σ2 (ε ∼ N (0, σ2)).
1
Linear statistical model
The systematic component f is often expressed in terms of explanatory variables
through a parametric equation. If, for example, it is supposed that
f(x) = A+Bx+ Cx2
or
f(x) = A2x +B
or
f(x) = A log x+B,
then the problem is reduced to one of identifying a few parameters, here labeled as A,B,C.
In each of these three forms for f given above, f is linear in these parameters.
For example, A2x + B can be written as f(x,β) = g(x)Tβ, where g(x)T = (1, 2x) is
known as transformed input and βT = (B,A) is vector of the model parameters. Similarly,
A log(x) + B writes as g(x)Tβ, where transformed input is given by g(x)T = (1, log(x))
and vector of model parameters is still βT = (B,A).
It is the linearity in the parameters that makes the model a linear statistical model!
1.2 The model of measurements, revision of Year 1 Statistics.
Let µ be an unknown quantity of interest which can be measured with some error. A
mathematical (statistical) model for this experiment is specified by the following equation
(the model equation) Y = µ+ ε, where Y is the available measurement (observation) and
ε is a random error modelled as a random variable with zero mean, say, with normal
distribution with variance σ2, i.e. ε ∼ N(0, σ2). By properties of the normal distribution,
we have that Y ∼ N(µ, σ2). Suppose that we have n measurements Yi = µ + εi, where
εi ∼ N(0, σ2), (i = 1, ..., n, ) are independently distributed. It follows that Y1, ..., Yn are
then also independent random variables with Yi ∼ N(µ, σ2). In other words, Y1, ..., Yn is
a random sample from a normally distributed population with mean µ and variance σ2,
so that the problem of estimating an unknown quantity µ is the well known (from the 1st
Year statistics) problem of estimating a population mean of a normal population. The
sample mean Y¯ = 1
n
(Y1 + ...+ Yn) =
1
n
∑n
i=1 Yi is usually used as a point estimator of µ.
In MT1300 we briefly stated that there are several general methods of obtaining point
estimators. In this course we are going to use one of these methods, namely, the method
of the least squares (LS). To demonstrate the main idea of this method, let us consider
the case of the model of measurements. Given observations Y1, ..., Yn define the following
function
S(µ) =
n∑
i=1
(Yi − µ)2.
The value of µ that minimises S(µ) is called the least square estimator of µ. We can find
the point of minimum of S(µ), by equateing to zero the first derivative of S(µ)
S ′(µ) = −2
n∑
i=1
(Yi − µ) = −2
(
n∑
i=1
Yi − nµ
)
= 0.
Now it is easy to see that the solution of the above equation is Y¯ = 1
n
∑n
i=1 Yi, the sample
mean, and this is the point of minimum, as the second derivative of S(µ) at µ = Y¯ is
2
2n > 0. There is also a direct way to see that the sample mean is the point of minimum,
and, hence, the least square estimator of µ. Indeed,
S(µ) =
n∑
i=1
(Y 2i − 2Yiµ+ µ2) =
n∑
i=1
Y 2i − 2nµY¯ + nµ2
= −2nµY¯ + nµ2 + nY¯ 2 +
n∑
i=1
Y 2i − nY¯ 2
= n(µ− Y¯ )2 +
n∑
i=1
Y 2i − nY¯ 2 ≥
n∑
i=1
Y 2i − nY¯ 2,
where inequality becomes equality if and only if µ = Y¯ . Note finally that
S(Y¯ ) =
n∑
i=1
Y 2i − nY¯ 2 = (n− 1)s2,
where s2 is the sample variance, which is the point estimator of another model parameter
σ2 (see the next section).
3
1.3 Parametric statistical inference, brief revision of
Year 1 background
Reading
Krzanowski: Chapter 2
Kleinbaum et al: Chapter 3
Frees: Chapter 2
Mendenhall et al: Chapter 1
Newbold, Chapter 9
The process of making statements about population characteristics/parameters given
only information from samples is known as parametric statistical inference.
Example 1 A mechanical jar filler for filling jars with coffee does not fill every jar with
the same quantity. The weight of coffee Y filled in a jar is a random variable which can be
assumed to be normally distributed with mean value µ and variance σ2 (Y ∼ N (µ, σ2)).
Suppose that we have a sample of n independent measurements on Y and wish to
“identify” the parameters of the population (µ, σ2).
The sort of statements that we wish to make about parameters will often fall into one
of the following three categories:
• Point estimation;
• Interval estimation;
• Hypotheses testing.
1.3.1 Point estimation
Point estimation is the aspect of statistical inference in which we wish to find the “the
best guess” of the true value of a population parameter.
Suppose that Y1, Y2, · · · , Yn is a random sample from the population of interest.
Then an estimator of an unknown parameter θ is some function of the observations
Y1, Y2, · · · , Yn, that is
θˆ = θˆ(Y1, Y2, · · · , Yn)
(which is in some sense a “good approximation” to the unknown parameter θ).
Example 1 (continued) A point estimator of µ in a N (µ, σ2) population is provided by
the sample mean Y¯ , which is defined by
Y¯ =
1
n
n∑
i=1
Yi =
1
n
(Y1 + Y2 + · · ·+ Yn).
To estimate σ2 in aN(µ, σ2) population we generally use as its estimator the sample variance
s2 defined by
s2 =
1
n− 1
n∑
i=1
(Yi − Y¯ )2.
It is easy to see that s2 = 1
n−1
(∑n
i=1 Y
2
i − nY¯ 2
)
.
4
Properties of Estimators
Let θˆ = θˆ(Y1, Y2, · · · , Yn) be an estimator of an unknown parameter θ. To clarify in
what sense θˆ is a “good approximation” to θ we consider estimators which are (1) unbiased
and (2) mean square consistent.
(1) θˆ is said to be an unbiased estimator of θ if E(θˆ) = θ.
Example 1 (continued) Y¯ is an unbiased estimator of µ and s2 is an unbiased estimator
of σ2.
To check whether we have a sensible estimator we need to ensure that θˆ is increasingly
likely to yield the right answer θ as the sample size n gets bigger. The mean square error
(MSE) of θˆ is defined to be E(θˆ − θ)2. Since the MSE of θˆ is the average square distance
of θˆ from the true value θ, a good estimator is one with a small MSE.
(2) θˆ is said to be a mean square consistent estimator of θ if
MSE(θˆ)→ 0 as n→∞.
Note that if θˆ is unbiased then it is also mean square consistent if Var(θˆ) → 0 with
n→∞.
1.3.2 Interval estimation
Point estimation is often not sufficiently informative as it does not say anything about
the error of the estimation procedure. Naturally, if the error is large, then we are less
confident in our estimate. Replacing a point estimator by an interval estimator allows
us to quantify the uncertainty of estimation by specifying a desirable level of confidence,
which is the probability of the interval capturing the true value of the parameter. Such
interval estimators are known as confidence intervals (C.I.).
Example 1 (continued) To construct a confidence interval for µ we recall that Y¯ is a linear
combination of independent N (µ, σ2) random variables (Y¯ = ∑ni=1 Yi/n) and therefore is
normally distributed with mean µ (unbiased) and variance σ2/n, that is, Y¯ ∼ N (µ, σ2/n).
It therefore follows that if σ2 is known, then
Z =
Y¯ − µ
σ/
√
n
∼ N (0, 1)
and so
P
(
Y¯ − zα/2σ/
√
n ≤ µ ≤ Y¯ + zα/2σ/
√
n
)
= 1− α,
that is, the (1− α)100% confidence interval for µ is given by(
Y¯ − zα/2σ/
√
n, Y¯ + zα/2σ/
√
n
)
.
If σ2 is unknown, then we construct our CI based on the following T -variable
T =
Y¯ − µ
s/
√
n
∼ tn−1,
where tn−1 is the t-distribution with n− 1 degrees of freedom, and
P
(
Y¯ − tn−1,α/2s/
√
n ≤ µ ≤ Y¯ + tn−1,α/2s/
√
n
)
= 1− α,
5
so that the (1− α)100% confidence interval for µ is given by(
Y¯ − tn−1,α/2s/
√
n, Y¯ + tn−1,α/2s/
√
n
)
.
Note that while both the intervals are centered at Y¯ , the margin of error tn−1,α/2s/
√
n
is a random variable unlike the margin of error zα/2σ/
√