STATS330: Advanced Statistical Modelling
Advanced Statistical Modelling
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
STATS330: Advanced Statistical Modelling
1. (a) If the Poisson model was appropriate, under certain conditions we would expect
the residual deviance to come from an approximate Chi-squared distribution with
30 degrees of freedom. This assumption is reasonable as long as the expected value
under the model is greater than 5. The Poisson model has a residual deviance (97.05)
that is much larger than the residual degrees of freedom (30) which indicates lack
of fit. A possible explanation is that the data exhibits more variability than can be
accounted for by the Poisson model, i.e., over-dispersion.
The residual deviance for the negative binomial model is 40.15, again with 30 degrees
of freedom. We do not have evidence against the null hypothesis that the negative
binomial model is appropriate.
The fact that the negative binomial model has a smaller residual deviance indicates
that it fits the data better than the Poisson model. This is also reflected in the fact
that the negative binomial model has a smaller AIC than the Poisson model (176.1
compared with 196.4) and a difference of more than 10 suggests that the model with
the smaller AIC (negative binomial model) is considerably better supported than
the model with the higher AIC (Poisson model).
We now need to choose between the negative binomial model and the quasi-Poisson
model. If we consider the residual plots, we can see that there doesn’t appear to be a
discernible pattern in the Pearson and deviance residual plots for the Poisson model,
yet we are seeing more variability than we would like. We would expect the bulk of
the residuals to range between -3 and +3, but there are a few outside this range. We
can cope with a wider spread in the plotted Pearson residuals for the quasi-Poisson
model because we are accounting for the dispersion parameter which is not taken
into account in the plots in R. However, smaller residuals are better than larger
residuals so we prefer the negative binomial since the residuals all lie between -3 and
+3. Furthermore, if we consider the Cook’s distance plot, observation 1 is clearly
influential for the quasi-Poisson model, but not for the negative binomial model.
(b) i. We would expect that, everything else being equal, more apprentices would
migrate to Edinburgh from counties with large populations than would from
counties with small populations. Therefore, we know (or at least have a very
good idea about) the relationship between population size and the number of
apprentices and do not need to estimate it. The offset is taken as a rate (Pop-
ulation/1000) and the log is needed because of the link function in the Poisson
model specification.
ii. The fitted model now represents the number of apprentices migrating to Edin-
burgh as a count per 1000 population, i.e., a rate rather than a count.
(c) Fitted model for the expected count:
log(µ) = β0 + β1 × log(dist) + β2 × log(dist× urban).
Filling in dist=60 and urban=15 and exponentiating gives an estimated count of
0.3469 per 1000 population. Because the population for the hypothetical county
is 28000, we multiply this number by 28 to give 9.71. So the expected number of
apprentices to migrate to Edinburgh from a hypothetical Scottish county situated
60km away with 15% living in an urban area is 9.71.
(d) In order to answer this question, we need to take into account the impact of the
interaction because the impact of distance depends on the amount of urbanization,
and vice versa. First, suppose we fix the value of urban at A. Then the coefficient
for log(dist) will be −1.269−0.3483×A. Because this coefficient will be negative for
any value of urban in the data (note that urban must be ≥ 0), it means that for our
model, log(µ) will always decrease as the value of log(dist) increases. Consequently,
µ decreases as dist increases for any fixed value of urban. This is consistent with
the statement in the article. Next, suppose we fix the value of dist at D. Then the
coefficient for urban will be 0.11953− 0.03484×D. In this case, we need to try out
some different values of dist. Let‘s try the minimum (21.0), the mean (131.8) and
the maximum (491.0) values from the data:
Min 21.0: 0.11953− 0.03484 log(21) = 0.01346
Mean 131.8: 0.11953− 0.03484 log(131.8) = −0.0505
Max 491.0 : 0.11953− 0.03484 log(491) = −0.09635
So, according to the model negbin2.fit, there is a positive relationship between the
expected number of apprentices and urban for small values of dist. However, as
the value of dist increases, this coefficient decreases and is approximately equal to
0 when dist = 30.9 (since exp(0.11953/0.03484) = 30.9). Thus for values of dist >
30.9, the coefficient will be negative. In summary, we can say that the statement in
the article is supported when dist > 30.9, but not when dist < 30.9.
2. (a) Given all other explanatory variables are held constant, for a 1 children increase in
the number of children under 7 years old the odds of a married woman participating
in the labour force are multiplied by exp(−1.1857) = 0.31.
(b) Note that this question should read:
LFpart=yes when inc=10, age=40, educ=15, ykids=0, okids=0, foreign=yes.
The predict function performs the back-transform from the logit scale for us, so the
confidence interval is given by:
0.8280919± 1.96× 0.05412191 = (0.722, 0.934)
(c) i. A non-parametric bootstrap was used given that we are resampling from the
original data with replacement.
samp = sample(1:872, replace= TRUE)
ii. Note that this question should read:
LFpart=yes when inc=10, age=40, educ=15, ykids=0, okids=0, foreign=yes.
Our estimate for the probability was 0.8280919 from appendix 2.2 and we need
to invert the confidence interval.
2× 0.8280919− 0.9171 = 0.739 2× 0.8280919− 0.6911 = 0.965
So the 95% confidence interval is (.739, .965).
(d) i. Sensitivity is the probability of a true positive. For this data, sensitivity is the
probability of predicting a woman participated in the labour force, when they
actually did participate in the labour force.
Specificity is the probability of a true negative. For this data, specificity is the
probability of predicting a woman that didn’t participate in the labour force,
when they actually didn’t participate in the labour force.
ii. Calulations of the estimates:
sensitivity = 255/(146 + 255) = 0.636
specificity = 337/(377 + 134) = 0.715
error rate = (146 + 134)/(146 + 255 + 337 + 134) = 0.321
iii. In this case, we are using the same data to fit the model, as well as to get
estimates of these quantities above. These estimates will be higher than those if
we used new data. We could either gather more data, and test it on this, split
the data into training and test set, but our sample might not be big enough for
this, or use cross validation.
(e) i. 0.467 is the threshold level c (predict “yes” if the estimated probability is >
c) which maximises the sum of the sensitivity and specificity. The estimated
specificity for this c = .467 is 0.677 and the estimated sensitivity is 0.713.
ii. The true positive rate is the sensitivity and the false positive rate is 1−specificity.
So in this case we need sensitivity >= .7 and specificity >= .8. The point (.8,
.7) is above the live on the ROC curve which indicates this is not possible.
3. (a) D, E and F have direct causal effects on G.
(b) A, C and D should be used as explanatory variables.
(c) A and C should be used as explanatory variables.
(d) i. A and C are confounders.
ii. G is the only collider.