Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
MAST90084: Statistical Modelling Assignment
1. Suppose the data consist of repeated observations (yit,x T it), t = 1, · · · , T , for each individual i = 1, · · · , n. Here yit is the response and xit is a covariate vector. A linear mixed-effects model for analysing the population-averaged and subject-specific effects of xit is of the fol- lowing form yi = Ziβ +Wibi + εi, where yi = (yi1, · · · , yiT )T ; Zi is a T × p design matrix built from {xit} for the fixed effects β; Wi is a T × q design matrix for the random effects bi; {bi} are i.i.d. MVN(0, Q) random vectors with Q > 0; {εi} are i.i.d. MVN(0, σ2I) random vectors with σ2 > 0 and I being an identity matrix; and {bi} and {εi} are mutually uncorrelated. We will assume all Zi,Wi are of full rank, and for now the variance-covariance components θ = (σ2, Q) are also assumed to be fixed. When β is treated as fixed, Harville (1976) has derived the formulae βˆ = ( n∑ i=1 ZTi V −1 i Zi )−1 n∑ i=1 ZTi V −1 i yi (1) and bˆi = QW T i V −1 i (yi − Ziβˆ), i = 1, . . . , n, (2) as the best linear unbiased estimator (BLUE) for δ := (βT ,bT1 , . . . ,b T n ) T , where Vi = Iσ 2 + WiQW T i . Alternatively, consider a Bayesian formulation where β is considered random and indepen- dent of all bi’s, with an improper “flat” prior imposed on it (i.e., the coordinates of β are i.i.d. and each uniformly distributed on the whole real line ). Hence, the posterior density of δ satisfies the proportionality relationship f(δ|Y ; θ) ∝ n∏ i=1 f(yi|bi,β;σ2) n∏ i=1 p(bi;Q), (3) where Y = {y1, . . . ,yn} is the full dataset, f(yi|bi,β;σ2) is the density of yi given (bi,β) and p(bi;Q) is the MVN(0, Q) prior on bi. This problem aims to derive that the posterior mode δ˜ := (β˜, b˜1, . . . , b˜n) ≡ arg max δ f(δ|Y ; θ) is given by (1) and (2) as well. In the following problems, this general fact may be useful: If X is a random vector with an invertible covariance matrix Σ and has a density f(X) satisfying the proportionaliity relationship f(X) ∼ exp ( −1 2 XTΣ−1X +XT c ) , for some vector c with the same dimension as X, then X must be multivariate normal with mean Σc and covariance Σ. MAST90084 Statistical Modelling Assignment 4 Semester 1, 2021 (a) Using the proportionality in (3), show that the posterior density f(δ|Y ;Q, σ) is a mul- tivariate normal; in particular, demonstrate that the posterior precision matrix (i.e. the inverse of the covariance matrix) of δ has the form Cov[δ|Y ; θ]−1 = σ−2 ∑n i=1 Z T i Zi σ −2ZT1 W1 · · · σ−2ZTnWn σ−2W T1 Z1 σ −2W T1 W1 +Q −1 · · · 0 ... ... . . . ... σ−2W Tn Zn 0 · · · σ−2W TnWn +Q−1 , where the lower right submatrix is block-diagonal. From the posterior normality of δ, one can deduce that δ˜ must be the same as the posterior mean. [5] (b) Show that β˜ has the form in (1). [5] Hint: Consider the marginalized version of the linear mixed model, i.e. for a given β, yi’s are independent and each distributed as yi ∼MVN(Ziβ, Vi) (c) Show that b˜i has the form in (2) for i = 1, . . . , n. Hint: You showed in (a) that p(b1, . . . ,bn,β|Y ; θ) is multivariate normal. This implies that p(b1, . . . ,bn|Y,β; θ), the conditional density of b1, . . . ,bn given (Y,β), is also multivariate normal. You may find the following Woodbury matrix identity useful: (σ−2W Ti Wi +Q −1)−1 = Q−QW Ti (σ2I +WiQW Ti )−1WiQ. [6] 2 MAST90084 Statistical Modelling Assignment 4 Semester 1, 2021 2. This question is a continuation of the previous one. Now suppose θ = (σ2, Q) is unknown. Let y = (yT1 , . . . ,y T n ) T , Z = (ZT1 , . . . , Z T n ) T and V be a block diagonal matrix such that V1, . . . Vn are on its diagonal. For a fixed value of β, the marginalized version of the linear mixed effect model takes the form y ∼MVN(Zβ, V ) after integrating out the bi’s. Since Z is of rank p, there exists a matrix A of dimensions (nT − p) × (nT ) such that AZ = 0 , AAT = I and ATA = I − Z(ZTZ)−1ZT . Precisely, the rows of A comprise a set of orthonormal vectors spanning col(Z)⊥, i.e., the orthogonal complement of the column space of Z. Hence, the distribution of αˆ := Ay, conditional on β or not, is normal and doesn’t depend on β, and we denote this density as f(αˆ|θ) to show its dependence on θ alone. It is also not hard to see that βˆ can be equivalently expressed as βˆ = (ZTV −1Z)−1ZTV −1︸ ︷︷ ︸ B y = By (a) Show that given a fixed value of β, βˆ is independent of αˆ := Ay. [3] (b) Show that up to additive constants not involving θ, log f(αˆ|θ) has the REML form 1 2 { − log |V | − log |ZTV −1Z| − (y − Zβˆ)TV −1(y − Zβˆ) } , where | · | denotes the determinant of a matrix. [7] Hint: Let C := [AT |BT ]T , a square matrix stacking A on top of B. The map y 7→ (αˆ, βˆ) is a one-to-one transformation involving the Jacobian determinant |C|. Use the result from (a), as well as the facts (y − Zβ)TV −1(y − Zβ) = (y − Zβˆ)TV −1(y − Zβˆ) + (β − βˆ)T (ZTV −1Z)(β − βˆ). and |C| = |ZTZ|−1/2. (The equality for |C| can be obtained by a trick that uses the Schur complement formula for determinant, but its derivation is irrelevant to solving this problem.) 3 MAST90084 Statistical Modelling Assignment 4 Semester 1, 2021 3. This question concerns parametric Weibull regression for survival analysis. Let T be Weibull- distributed and suppose the proportional hazard (PH) model describes its dependence on covariates as follows: For a given covariate vector x, the hazard function of T has the form h(t|x) = h0(t)φ(x) for the baseline hazard h0(t) = λα(λt) α−1 and some positive function φ(·). (a) Show that there exists a function ψ(·) so that h(t|x) = ψ(x)h0(ψ(x)t), i.e. T ’s dependence on x also follows a general accelerated failure time (AFT) model. [4] (b) Consider the gehan dataset that assesses the ability of 6-mercaptopurine (6MP) in maintaining remission time (in weeks) among leukemia patients, available in the MASS package. library(MASS) help(gehan) head(gehan) ## pair time cens treat ## 1 1 1 1 control ## 2 1 10 1 6-MP ## 3 2 22 1 control ## 4 2 7 1 6-MP ## 5 3 3 1 control ## 6 3 32 0 6-MP i. Read the documentation pages for survreg() from the package survival, as well as Modern Applied Statistics with S by Venables and Ripley, 4th Edition, p.359- 361 (ebook available from unimelb library). Use survreg() to fit the Weibull PH model with φ(x) = exp(γx), where x = 1 for the treatment group and x = 0 for the control group. Report the estimates for γ, α, and λ. [7] Hint: You may find the following useful: a standard exponential distribution has the density f(x) = e−x. and a standard Gumbel distribution has the density e−(x+e −x). Moreover, If X is a standard exponential random variable, − log(X) has a standard Gumbel random variable. 4 MAST90084 Statistical Modelling Assignment 4 Semester 1, 2021 ii. For each treatment group, obtain an estimate of the probability of having remission time more than 18 weeks. [4] iii. Form a 95% confidence interval for the PH regression coefficient γ. [6] iv. Form a 95% confidence interval for the hazard ratio h(t|x=1) h(t|x=0) . [3] Total marks = 50 5