Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Quantitative Methods
ECON 20003
WEEK 2
DESIRABLE PROPERTIES OF POINT ESTIMATORS
PARAMETRIC AND NONPARAMETRIC TECHNIQUES
THE ASSUMPTION OF NORMALITY
References:
S: § 10.1
W: 3.7
Notes prepared by:
Dr László Kónya
DESIRABLE PROPERTIES OF POINT ESTIMATORS
• We consider four properties that can make point estimators easier to
work with and are possessed by ‘good’ point estimators.
Suppose that we are interested in parameter β (it might be e.g. a
population mean, a population proportion or a slope parameter of a
population regression model) and we estimate it with the following
estimator:
a) β -hat is said to be a linear estimator of β if it is a linear function of the
sample observations.
For example, the sample mean X-bar is a linear estimator of the
population mean µ.
However, the sample variance s2 is a quadratic function of the Xi
sample observations, so it is a non-linear estimator of the
population variance σ 2.
UoM, ECON 20003, Week 2 2
b) β -hat is said to be an unbiased estimator of β if
i.e. if the expected value of β -hat is equal to β and thus the sampling
distribution of β -hat is centered around β.
Otherwise, β -hat is referred to as a biased estimator and
Bias
For example, the sample mean is an unbiased estimator of the
population mean because
Similarly, the sample variance is an unbiased estimator of the
population variance because
UoM, ECON 20003, Week 2 3
However, the following alternative estimator of σ 2 is biased since
Suppose that β1-hat and β2-hat denote two different (normally distributed)
estimators of β.
The sampling distribution of β1-hat is
centered around β, while the sampling
distribution of β2-hat is not.
β1-hat is an unbiased, whereas
β2-hat is a biased estimator.
β
β1-hat is expected to estimate β more accurately than β2-hat.
UoM, ECON 20003, Week 2 4
c) β -hat is an efficient estimator of β within some well-defined class of
estimators (e.g. in the class of linear unbiased estimators) if its
variance is smaller, or at least not greater, than that of any other
estimator of β in the same class of estimators.
β3-hat and β4-hat are both unbiased
estimators of β, but the sampling
distribution of β3-hat has a smaller
variance than the sampling distribution
of β4-hat.
β
β3-hat is the more efficient estimator, it is likely to produce a more
accurate estimate of β than β4-hat.
Note: In case of random sampling the sample mean is the best linear unbiased
estimator (BLUE) of the population mean. “Best” means that X-bar has
the smallest variance in the class of linear unbiased estimators of µ,
hence it is an efficient estimator.
UoM, ECON 20003, Week 2 5
d) β -hat is called a consistent estimator of β if its sampling distribution
collapses into a vertical straight line at the point β when the sample
size n goes to infinity.
β
n1 < n2 < n3Let f1(β -hat), f2(β -hat) and f3(β -hat)
denote the sampling distributions of
the same β -hat estimator generated
by three different sample sizes.
These sampling distributions are
centered around β, and as the
sample size increases they become
narrower.
Granted that this is true for larger sample sizes as well, β -hat is a
consistent estimator of β.
If β-hat is an unbiased estimator then consistency requires the
variance of its sampling distribution to go to zero for increasing n.
For example, X-bar is a consistent estimator of µ.
However, if β-hat is a biased estimator then consistency requires both
its variance and the bias to go to zero for increasing n.
UoM, ECON 20003, Week 2 6
PARAMETRIC AND NONPARAMETRIC TECHNIQUES
• Many statistical procedures for interval estimation and hypothesis testing
a) are concerned with population parameters, and
b) are based on certain assumptions about the sampled population
or about the sampling distribution of some point estimator.
These procedures are usually referred to as parametric procedures.
For example, the confidence interval estimation and hypothesis
testing of a population mean based on the t distribution are parametric
procedures as they are concerned with the population mean and
assume that
i. The sample has been randomly selected.
Otherwise it might not represent the population accurately.
ii. The variable of interest is quantitative …
iii. … and is measured on an interval or a ratio scale.
UoM, ECON 20003, Week 2 7
Otherwise the population mean would not exist and the
central location could be measured only with the mode and
the median (if the measurement scale is at least ordinal).
iv. The population standard deviation is unknown, but the population
is normally distributed, at least approximately.
Procedures that are either not concerned with some population
parameter or are based on relatively weaker assumptions than their
parametric counterparts, and hence require less information about the
sampled population, are called nonparametric procedures.
Note: Nonparametric techniques are sometimes referred to as distribution-free
procedures. This is a bit deceptive as they also rely on some, though
fewer and less stringent, assumptions about the sampled population.
Parametric and nonparametric procedures alike can be misleading when
some of their assumptions is violated, so it is crucial to be familiar with
these assumptions and to learn how to check them in practice.
Never run any inferential statistical procedure without
performing a thorough explanatory data analysis first.
UoM, ECON 20003, Week 2 8
THE ASSUMPTION OF NORMALITY
• A crucial assumption behind most parametric procedures is normality,
namely that the underlying sampling distribution is normally distributed.
For example, in the case of testing a population mean with a
parametric test, either σ should be known and the sample mean
should be normally distributed (Z-test),
or if σ is unknown, the sampled population itself should be
normally distributed (t-test), implying that the sample mean is
also normally distributed.
• How can we find out with reasonable certainty whether a population is
normally distributed, at least approximately?
In practice the populations are hardly ever observed entirely. Hence,
we look at the sample data to see if they are more or less normally
distributed, and if they are, then we have ground to believe that the
sampled population is also normal (at least, not extremely non-normal).
Normality can be verified in a number of ways relying on some
(i) graphs, (ii) sample statistics and (iii) formal hypothesis tests.
UoM, ECON 20003, Week 2 9
UoM, ECON 20003, Week 2 10
i. Checking normality visually
We can use two types of graphs to study whether a data set is
characterised by a normal distribution: histogram and QQ-plot.
The QQ (quantile-quantile) plot is a scatter plot that depicts the
cumulative relative frequency distribution of the sample data against
some known cumulative probability distribution.
When it is used for checking normality, the reference distribution is a
(standard) normal distribution and if the sample data is normally
distributed, the points on the scatter plot lie on a straight line.
Ex 1: (Week 1, Ex 2)
Last week we performed a t-test to find out whether there was sufficient
evidence at the 5% level of significance to establish that the average Australian
is more than 10kg overweight.
The sample size was large enough (n = 100) to rely on CLT, so the sampling
distribution of the sample mean could be assumed approximately normal.
However, σ was unknown, so we had to assume that the sampled population
was not extremely non-normal in order to be able to rely on the t-test.
a) Develop an histogram and a QQ-plot of Diff with R to see whether the
sampled population might be normally distributed.
Histogram of diff with a normal curve
that has the same mean and standard
deviation than the sample of diff QQ-plot of diff
The histogram is skewed to the right and on the QQ-plot the points are
scattered around the straight line. Hence, both graphs suggest that diff is
unlikely to be normally distributed.