Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
STAT3006 Assignment
Weighting: 30% Instructions • The assignment consists of three (3) problems, each problem is worth 10 marks, and each mark is equally weighted. • The mathematical elements of the assignment can be completed by hand, in LaTeX (prefer- ably), or in Word (or other typesetting software). The mathematical derivations and ma- nipulations should be accompanied by clear explanations in English regarding necessary information required to interpret the mathematical exposition. • Computation problems can be answered using your programming language of choice, al- though R is generally recommended, or Python if you are uncomfortable with R. As with the mathematical exposition, you may choose to typeset your answers to the problems in whatever authoring or word processing software that you wish. You should also maintain a copy of any codes that you have produced. • Computer generated plots and hand drawn graphs should be included together with the text where problems are answered. • The assignment will require four (4) files containing data, that you can can download from the Assignment 3 section on Blackboard. These files are: p2_1ts.csv, p2_1cl.csv, p2_2ts.csv, p3_1x.csv, p3_1y.csv, and data_bank_authentification.txt. • Submission files should include the following (which ever applies to you): – Scans of handwritten mathematical exposition. – Typeset mathematical exposition, outputted as a pdf file. – Typeset answers to computational problems, outputted as a pdf file. – Program code/scripts that you wish to submit, outputted as a txt file. 1 • All submission files should be labeled with your name and student number and archived together in a zip file and submitted at the TurnItIn link on Blackboard. We suggest naming using the convention: FirstName_LastName_STAT3006A3_[Problem_XX/Problem_XX_Part_YY].[FileExtension]. • As per my.uq.edu.au/information-and-services/manage-my-program/student-in tegrityand-conduct/academic-integrity-and-student-conduct, what you submit should be your own work. Even where working from sources, you should endeavour to write in your own words. You should use consistent notation throughout your assignment and define whatever is required. Problem 1 [10 Marks] Let X ∈ X = [0, 1] and Y ∈ {0, 1}. Further, suppose that piy = P (Y = y) = 1/2 for both y ∈ {0, 1}, and that the conditional distributions of [X|Y = y] are characterized by the probability density functions (PDFs): f (x|Y = 0) = 2− 2x and f (x|Y = 1) = 2x. Part a [2 Marks] Consider the Bayes’ classifier for Y ∈ {0, 1} is r∗ (x) = 1 if τ1 (x) > 1/2,0 otherwise, where τ1 (x) = P (Y = 1|X = x) . Derive the explicit form of τ1 (x) in the current scenario and plot τ1 (x) as a function of x. 2 Part b [2 Marks] Define the classification loss function for a generic classifier r : X→ {0, 1} as ` (x, y, r (x)) = Jr (x) 6= yK , where ` : X× {0, 1} × {0, 1}, and consider the associated risk L (r) = E (Jr (X) 6= Y K) . It is known that the Bayes’ classifier is optimal in that it minimizes the classification risk, that is L (r∗) ≤ L (r) . In the binary classification case, L (r∗) = E (min {τ1 (X) , 1− τ1 (X)}) = 1 2 − 1 2 E (|2τ1 (X)− 1|) . Calculate L (r∗) for the current scenario. Part c [2 Marks] Assume now that pi1 ∈ [0, 1] is now unknown. Derive an expression for L (r∗) that depends on pi1. Part d [2 Marks] Assume again that pi1 ∈ [0, 1] is unknown. Argue that we can write L (r∗) = ∫ X min {(1− pi1) f (x|Y = 0) , pi1f (x|Y = 1)} dx. Then, assuming that pi0 = pi1 = 1/2, argue that we can further write L (r∗) = 1 2 − 1 4 ∫ X |f (x|Y = 1)− f (x|Y = 0)| dx. Part e [2 Marks] Consider now that pi1 ∈ [0, 1] is unknown, as are f (x|Y = 0) and f (x|Y = 1). That is, we only know that f (·|Y = y) : X → R is a density function on X = [0, 1], for each y ∈ {0, 1}, in sense that f (x|Y = y) ≥ 0 for all x ∈ X and that ∫X f (x|Y = y) dx = 1. 3 Using the expressions from Part d, deduce the minimum and maximum values of L (r∗) and provide conditions on pi1, f (·|Y = 0) and f (·|Y = 1) that yield these values. Problem 2 [10 Marks] Suppose that we observe an independent and identically distributed sample of n = 300 random pairs (Xi, Yi), for i ∈ [n], where Xi = (Xi1, . . . , Xid) is a mean-zero time series of length d = 100 and Yi ∈ {1, 2, 3} is a class label. Here, Xit is the observation of time series i ∈ [n] at time t ∈ [d] and we may say that Xi ∈ X = Rd. We assume that the label Yi, for i ∈ [n], is such that each class occurs in the general population with unknown probability piy = P (Yi = y) , for each y ∈ {1, 2, 3}, where ∑3y=1 piy = 1. Further, we know that Xit is first-order autoregressive, in the sense that the distribution of [Xi|Y = y] can be characterized by the fact the conditional probability densities f (xi1|Y = y) = φ ( xi1; 0, σ 2 y ) and for each t ≥ 2, f (xit|Xi1 = xi1, Xi2 = xi2, . . . , Xi,t−1 = xi,t−1, Yi = y) = φ ( xit; βyxi,t−1, σ2y ) , where xi = (xi1, . . . , xid) is a realization of Xi, and for each y ∈ {1, 2, 3}, σ2y ∈ (0,∞) and βy ∈ [−1, 1]. Here, φ ( x;µ, σ2 ) = 1√ 2piσ2 exp { −1 2 (x− µ)2 σ2 } is the univariate normal probability density function with mean µ ∈ R and variance σ2 ∈ (0,∞). Part a [2 Marks] Let (X, Y ) arise from the same population distribution as (X1, Y1). Using the information above, derive expressions for the a posteriori probabilities τy (x;θ) = P (Y = y|X = x) , for each y ∈ {1, 2, 3}, as functions of the parameter vector θ = ( pi1, pi2, pi3, β1, β2, β3, σ 2 1, σ 2 2, σ 2 3 ) . 4 Further, use the forms of the a posteriori probabilities to produce an explicit form of the Bayes classifier (i.e., a form that is written in terms of the parameters θ). Part b [1 Marks] Using the information above, construct the likelihood function L (θ;Zn) = n∏ i=1 f (zi;θ) based on the random sample Zn = (Z1, . . . ,Zn), where Zi = (Xi, Yi) (for i ∈ [n]), and write the log-likelihood function logL (θ;Zn). Here, f (zi;θ) is the joint density of Zi, deduced from the problem description, and where θ is defined in Part a. Part c [2 Marks] Using the form of the log-likelihood function from the problem above, derive closed-form expres- sions of the maximum likelihood estimator θˆ = arg max θ∈{(pi1,pi2,pi3):piy≥0,∑3y=1 piy=1}×[−1,1]3×(0,∞)3 logL (θ;Zn) . Part d [1 Marks] The data set p2_1ts.csv1 contains a realization xn = (x1, . . . ,xn) of the n = 300 time series Xn = (X1, . . . ,Xn), and the data set p2_1cl.csv contains a realization yn = (y1, . . . , yn) of the associated n = 300 class labelsYn = (Y1, . . . , Yn). Using the notion themth order auto-covariances of a time series X = (X1, . . . , Xd): ρm = E {[Xt − E (Xt)] [Xt+m − E (Xt+m)]} for m ≥ 0, and appropriate sample estimators, attempt to visualize these data in a manner that demonstrates the differences between the three class specific distributions. Part e [2 Marks] For the data sets from Part d, using the maximum likelihood estimator from Part c, derive the expressions of the estimate τy ( x; θˆ ) of τy (x;θ), for each y ∈ {1, 2, 3}. Furthermore, pro- vide an explicit form of the estimated Bayes’ classifier (i.e., a classifier r ( x; θˆ ) , dependent 1Each row of the CSV file is a time series and each column is a time point. 5 on θˆ). Finally, use the estimated Bayes’ classifier to compute the so-called in-sample empirical risk: L¯n ( r ( ·; θˆ )) = 1 n n∑ i=1 r r ( Xi; θˆ ) 6= Yi z , where the averaging is over the same sample Zn that is used to compute θˆ. Part f [2 Marks] The data set p2_2ts.csv2 contains realization x′n = (x′1, . . . ,x′n) of n′ = 20 partially observed time series X ′i = (Xi1, . . . Xi50), where X ′i contains the first 50 time points of a fully observe time series X ′′i = (Xi1, . . . , Xi100), for each i ∈ [n′]. Under the assumption that X ′′i has the same distribution asX1, as described at start of the problem, argue that you can use the maximum likelihood estimates from Part e to produce a Bayes’ classifier for the partially observed time series X ′i, and produce classifications for each of the n′ = 20 times series. Problem 3 [10 Marks] Let Zn = (Z1, . . . ,Zn) be an independent and identically distributed sample of n pairs Zi = (Xi, Yi) of features Xi ∈ X = Rd and labels Y = {−1, 1}, where i ∈ [n]. Further, let ρ (x;θ) = α + β>x be a linear classification rule and let rρ (x;θ) = sign (ρ (x;θ)) be the classifier based on ρ (·;θ) : X→ R. Here θ = (α,β>)> ∈ Rd+1 is a parameter vector and sign (r) = −1 if r ≤ 0,1 otherwise. Consider the least-squares loss function ` (x, y, ρ (x)) = [1− yρ (x)]2 and define the estimator θˆ = arg min θ∈Rd+1 L¯n (ρ (·;θ)) + λ ‖β‖22 , 2Again, each row of the CSV file is a time series and each column is a time point. 6 where λ > 0 is a fixed penalty constant and L¯n (ρ (·;θ)) = 1 n n∑ i=1 ` (Xi, Yi, ρ (Xi)) , is the empirical risk. We say that the classifier rρ ( x; θˆ ) = sign ( ρ ( x; θˆ )) is the so-called linear least-squares support vector machine. Part a [2 Marks] Using the information from the problem description, for any fixed λ > 0, provide a closed-form expression for the estimator θˆ. Part b [2 Marks] A realization of a random sample n = 1000 observations Zi = (Xi, Yi) for i ∈ [n] is contained in the files p3_1x.csv and p3_1y.csv. Here the feature data Xn = (X1, . . . ,Xn) are contained in p3_1x.csv3 and the label data Yn = (Y1, . . . , Yn) are contained in p3_1y.csv4. For λ = 1, using the estimator from Part a, provide an explicit form of the linear least-squares support vector machine classifier based on the provided data and plot the decision boundary. Explore whether different values of λ > 0 change the decision boundary and propose some strategy to choose the value using Zn. Part c [2 Marks] A realization zn = (z1, . . . ,zn) of a random sample n = 1372 observations Zi = (Xi, Yi), for i ∈ [n], is contained in the file data_bank_authentification.txt. The data set consists of features extracted from genuine and forged banknote-like documents that were digitized into gray- scale images. Features of the image are then extract to form the feature vector (i.e. xi) of dimension d = 4, which are stored in the first four columns of the data set. The features are the variance (variance), skewness (skewness) and kurtosis (kurtosis) of a wavelet transformation of the image, and the entropy (entropy) of the image. All of the features can be considered real-valued. The final column of the data set contains the class label, where a label of zero indicates a genuine banknote and a label of 1 indicates a forgery5. 3Each row of the CSV file is a feature vector of dimension 2. 4Note that the label data are not in the appropriate form for use within the large-margin framework. 5You will have to transform the label data to the appropriate form for use within the large-margin framework.