Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
REGRESSION MODELLING
WEEK
REVISION OF BASIC STATISTICS
1 / 33
POPULATION AND SAMPLE
Population (True world)
• A collection of the whole of something you are interested in.
• Parameters: true values describing the population, for example
µ, σ2. Usually they are unknown.
Sample (Your subjective world)
• A set of individuals randomly drawn from a population.
• Statistics: calculated from the sample served as estimators of
the parameters, for example X¯ , S2.
2 / 33
PROPERTIES OF ESTIMATORS
• It is a random variables, e.g., X¯ .
• It has a probability distribution, often called sampling distribution.
• E(X¯ ) = µ, Var(X¯ ) = σ2/n.
• Central Limit Theorem (CLT): X¯ is asymptotically normally
distributed.
• Make statistical inferences: confidence interval and hypothesis
testing.
3 / 33
REGRESSION
4 / 33
WHAT IS REGRESSION?
• Statistical methodology that describes the relation between two
or more variables so that a response or outcome variable can be
estimated from the other explanatory variables.
• This methodology is widely used in business, the social and
behavioural sciences, the biological sciences, and many other
disciplines.
5 / 33
WHAT IS REGRESSION?
Examples
• Predict sales of a product using relationship between sales and
amount spent on advertising (SLR).
• Predict performance of employee using relationship between
performance and aptitude test (SLR).
• Predict the size of the vocabulary of a child using relationship
between the size of vocabulary and the age of the child and
amount of education of the parents (MLR).
• Does the price of a house increase with increase in living area?
(MLR)
6 / 33
RELATION BETWEEN VARIABLES
We should distinguish between functional relation and a statistical
relation between variables.
• A functional relation between two variables is expressed as a
mathematical formula,
Y = f (X ).
A functional relation is a “perfect” mapping from X to Y .
• A statistical relation is not perfect. The observations do not fall
directly on the curve of relationship and they are typically
scattered around this curve.
7 / 33
RELATIONSHIP BETWEEN VARIABLES
8 / 33
REGRESSION MODELS
Historical Origins
• The term regression was first used by Francis Galton in the late
19th century to explain a biological phenomenon he observed:
“regression towards the mean”.
• The height of children of both tall and short parents appeared to
“revert” or “regress” to the mean of the group.
9 / 33
GALTON’S DATASET
This data set lists the individual observations for 934 children in
205 families on which Galton (1886) based his cross-tabulation.
How to formally describe the relationship?