Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
REGRESSION MODELLING
WEEK REVISION OF BASIC STATISTICS 1 / 33 POPULATION AND SAMPLE Population (True world) • A collection of the whole of something you are interested in. • Parameters: true values describing the population, for example µ, σ2. Usually they are unknown. Sample (Your subjective world) • A set of individuals randomly drawn from a population. • Statistics: calculated from the sample served as estimators of the parameters, for example X¯ , S2. 2 / 33 PROPERTIES OF ESTIMATORS • It is a random variables, e.g., X¯ . • It has a probability distribution, often called sampling distribution. • E(X¯ ) = µ, Var(X¯ ) = σ2/n. • Central Limit Theorem (CLT): X¯ is asymptotically normally distributed. • Make statistical inferences: confidence interval and hypothesis testing. 3 / 33 REGRESSION 4 / 33 WHAT IS REGRESSION? • Statistical methodology that describes the relation between two or more variables so that a response or outcome variable can be estimated from the other explanatory variables. • This methodology is widely used in business, the social and behavioural sciences, the biological sciences, and many other disciplines. 5 / 33 WHAT IS REGRESSION? Examples • Predict sales of a product using relationship between sales and amount spent on advertising (SLR). • Predict performance of employee using relationship between performance and aptitude test (SLR). • Predict the size of the vocabulary of a child using relationship between the size of vocabulary and the age of the child and amount of education of the parents (MLR). • Does the price of a house increase with increase in living area? (MLR) 6 / 33 RELATION BETWEEN VARIABLES We should distinguish between functional relation and a statistical relation between variables. • A functional relation between two variables is expressed as a mathematical formula, Y = f (X ). A functional relation is a “perfect” mapping from X to Y . • A statistical relation is not perfect. The observations do not fall directly on the curve of relationship and they are typically scattered around this curve. 7 / 33 RELATIONSHIP BETWEEN VARIABLES 8 / 33 REGRESSION MODELS Historical Origins • The term regression was first used by Francis Galton in the late 19th century to explain a biological phenomenon he observed: “regression towards the mean”. • The height of children of both tall and short parents appeared to “revert” or “regress” to the mean of the group. 9 / 33 GALTON’S DATASET This data set lists the individual observations for 934 children in 205 families on which Galton (1886) based his cross-tabulation. How to formally describe the relationship?