Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
STA303H1: Methods of Data Analysis
To summarize
• In categorical data analysis our outcome (response) is categorical or discrete
• So far we discussed the covariate (independent variable) is also categorical
• We can also deal with a third variable. Investigate whether the third variable is a
confounder or an interaction variable
• Interaction means, that the third variable modeifies the effect of our exposure
• What about a fourth, fifth or sixth variable
• In real life data there is always existence of a large number of variables
• We learned how to measure the association. But what about prediction?
• What about continuous independent variables??
Regression of Binary Variable
• What is regression??
• Let Y be a continuous response and X is a continuous covariate
• Then we can assume the relationship, E (Y |X ) = β0 + β1X .
• The residuals Y − E (Y |X ) has mean = 0 and equal variance (homoscedasticity)
• Now assume Y is a binary variable. Then, can we have E (Y |X ) = β0 + β1X .
• Since Y = 0, 1, then 0 < E (Y |X ) < 1⇒ 0 < β0 + β1X < 1.
• This assumption may not hold
• Recall E (Y |X ) = π is a probability for binary variable
• What if we take the log of this. That is log(E (Y |X )) = β0 + β1X
• Since, 0 < E (Y |X ) < 1, then −∞ < β0 + β1X < 0
• Then what should be the approach
Regression of Binary Variable
• Recall E (Y |X ) can be defined as a risk, i.e., E (Y |X ) = P(Y = 1|X )
• Due to restricted space (−∞, 0) it is difficult to model risk
• But what about odds, Ω = E (Y |X )1− E (Y |X )
• Then log odds will have a range (−∞,∞)
• Thus, we can model,
log
( E (Y |X )
1− E (Y |X )
)
= β0 + β1X
• This is the form of the famous logistic regression
• The link log-odds is also called the ‘logit’ link (What is a link function?)
Logistic Regression
• Continuing our previous discussion. Let Y = 0, 1 is an binary outcome, X = 0, 1
is our exposure of interest and Z = 0, 1 is a third variable
• Let the model,
log
( E (Y |X ,Z )
1− E (Y |X ,Z )
)
= β0 + β1X + β2Z
• When, Z = 0, the odds ratio θZ=0 = exp(β1)
• When, Z = 1, the odds ratio θZ=1 = exp(β1)
• Thus the interpretation of exp(β1) is that when, Z is fixed at a constant value
then the odds ratio between X = 1 and X = 0 is exp(β1)
Logistic Regression
• However, when interaction exists, the model is,
log
( E (Y |X ,Z )
1− E (Y |X ,Z )
)
= β0 + β1X + β2Z + β12XZ
• When, Z = 0, the odds ratio θZ=0 = exp(β1)
• When, Z = 1, the odds ratio θZ=1 = exp(β1 + β12)
• The odds ratios have to be interpreted separately for the levels of Z
• The interpretation of exp(β12) is: how much the odds ratio changes by the level
of Z
• Often referred to as the ratio of odds ratios (ratio-in-ratio parameter)
• How do we estimate the βs
• First start with linear models
Linear Models
• Assume the following linear regression model,
Y = Xβ + ϵ
• Here , Y is an n × 1 vector, and X is a n × p matrix of covariates and β is p × 1
vector of regression of covariates
• To esimate β, the target is to minimize ϵT ϵ, w.r.t. β, or,
βˆ = argmin
β
(Y − Xβ)T (Y − Xβ)
This is called the ordinary least squared (OLS) equation
• The least squared estimates are βˆ =
(
XTX
)−1
XTY
Linear Models
• Recall the OLS method produces the same estimates as the MLE assuming
Y ∼ N(Xβ, σ2I) equavalent with assuming ϵ ∼ N(0, σ2I), where I is n × n
identity matrix. The variance of the estimates are different (Gauss-Markov
assumption)
• However, the OLS cannot be performed for Generalized Linear Models (GLM),
since most of the cases Y is not continuous
• Thus the estimation from GLM models are conducted with MLE
• But before understanding the estimation procedures we need to understand few
related topics such as, the link function and exponential families
• That is the goal for this lecture
Exponential Families
• Let Y ∼ fY (y ; θ, ϕ). If fY falls into exponential family, then, it can be written as,
fY (y ; θ, ϕ) = exp
[yθ − b(θ)
a(ϕ) + c(y , ϕ)
]
• Here,
• θ is a canonical (natural) parameter
• a(.), b(.) and c(.), are known functions
• ϕ is a dispersion parameter
Normal Distribution
For normal distribution we know,
fY (y ;µ, σ2) =
1√
2πσ2
exp
[
−(y − µ)
2
2σ2
]
= exp
[
yµ− µ2/2
σ2
− 12
(
y2
σ2
+ log(2πσ2)
)]
Exponential Families (Example)
Normal Distribution
• Here, θ = µ, and,
• b(θ) = θ2/2, a(ϕ) = σ2, c(y , ϕ) = −12
(
y2/σ2 + log(2πσ2)
)
• Now assume for general case the log-likelihood is
L = log (fY (y ; θ, ϕ)) =
(yθ − b(θ)
a(ϕ)
)
+ c(y , ϕ)
• According to Bartlett’s first Identity for exponential family, E
(
∂L
∂θ
)
= 0
• According to Bartlett’s second Identity,
E
(
∂2L
∂θ2
)
+ E
(
∂L
∂θ
)2
= 0
Exponential Families
• Here,
E
(
∂L
∂θ
)
= µ− b
′(θ)
a(ϕ) = 0
⇒ µ = b′(θ)
• Using the second identity,
E
(
∂2L
∂θ2
)
+ E
(
∂L
∂θ
)2
= 0
⇒ −b
′′(θ)
a(ϕ) +
Var(Y )
a2(ϕ) = 0
⇒ Var(Y ) = b′′(θ)a(ϕ)
• Often the Var(Y ) is written as V (µ) = b′′(θ)a(ϕ). V (µ) is a function of µ, which
can be constant w.r.t µ, e.g., for normal.
Exponential Families
• For Y = 0, 1, 2, ... having a Poisson(µ) distribution,
• fY (y , µ) = exp(−µ)µ
y
y ! = exp (−µ+ y log(µ)− log(y !))
• Here θ = log(µ)
• b(θ) = µ = exp(log(µ)) = exp(θ)
• a(ϕ) = 1
• E (Y ) = b′(θ) = exp(θ) = µ
• V (µ) = b′′(θ)a(ϕ) = µ
Exponential Families
• For Y = 0, 1 having a Bernoulli(π) distribution,
• fY (y , π) = πy (1− π)1−y = exp
(
y log
(
π
1−π
)
+ log(1− π)
)
• Here θ = log
(
π
1− π
)
• b(θ) = log(1+ exp(θ)) (HOW??)
• a(ϕ) = 1
• E (Y ) = b′(θ) = exp(θ)1+ exp(θ) = π
• V (µ) = b′′(θ) = exp(θ)(1+ exp(θ))2 = π(1− π)
Link Function
• Going back to regression models we need to model Y on covariates X
• As we have discussed before the logistic regression has a logit link and linear
regression has identity link
• What is a link??
• Let’s assume the linear predictor is, η = Xβ
Link Function
The link function is a function which relates the linear predictor η to expected value µ
• Let’s define a function g(.), for which, g(µ) = η
• For canonical parameters we can have θ = g(µ) = η = Xβ. Thus, this specific
function is called a Canonical Link
• For Binomial model θ = log