Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Generalized Linear Models MATH 523
Consider a binomial GLM with an arbitrary link function g and n responses that have
been entered in a grouped format. Using the same notation as in the lecture notes,
show that:
(1) The maximum likelihood estimates of β do not depend on whether the data have
been entered in a grouped or ungrouped format.
(2) The Fisher information matrix does not depend on whether the data have been
entered in a grouped or ungrouped format. Conclude that the asymptotic covari-
ance matrix of βˆ (and consequently the standard errors of βˆj, j = 1, . . . , p) does
not depend on the data entry format. Hint: It is easiest if you verify the entry
at position (j, k) of the Fisher information for arbitrary j, k, rather than doing
the matrix multiplication.
Q2 Suppose that miYi is binomial (mi, pii), where g(pii) = Xiβ and i = 1, . . . , n. Consider
the null model, for which pi1 = . . . = pin. Show that
pˆi =
∑n
i=1miyi∑n
i=1mi
.
When mi = 1 for all i ∈ {1, . . . , n}, show that in this case, the Pearson X2 statistic,
which is defined as the sum of the squared Pearson residuals, equals n. Decide whether
or not X2 is useful for testing whether a Binomial GLM model fits the data well when
the response is binary.
Q3 R exercise
Consider the following data on home-well contamination in 3020 households in Ara-
hazar upazila, Bangladesh. The response variable is switch (binary variable whether
or not the household switched to another well from an unsafe well). Other variables
collected for each household were arsenic (the level of arsenic contamination in the
household’s original well, in hundreds of micrograms per liter), dist100 (distance in
100-meter units to the closest known safe well), educ (years of education of the head of
the household) and assoc (whether or not any members of the household participated
in any community organizations: no or yes). The data is available in MyCourses under
Datasets. Load the data and compute dist100 as follows.
wells <- read.table("../Datasets/wells.dat")
attach(wells)
dist100 <- dist/100
Johanna G. Nešlehová
Generalized Linear Models MATH 523
McGill University, Winter Term 2022
Assignment 2 due on March 25 at noon.
(1) Report whether the data have been entered in a grouped or ungrouped form, and
which explanatory variables are continuous and which are factors.
(2) Fit a logistic regression model with the intercept and arsenic. Assess the fit
of this model graphically as follows: divide arsenic into 30 approximately filled
categories, group the data accordingly, and display the empirical logits of switch-
ing to a safe well for each category and display the fitted regression line. Do you
think the model is adequate? Perform an approximate goodness-of-fit test of the
model using the above binning and Pearson’s X2 statistic; conclude at the 5%
level.