Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
STA 135
Homework Book Homework (does not require R) Note: This may be hand written or typed. Answers should be clearly marked. Please do 8.1, 8.2, 8.4, 8.6 R Homework (requires some use of R) Note: You do not have to use R Markdown to turn in the homework, but the homework must be turned in in a reasonable format. The answers to the ques- tions should be in the body of the homework, and the code used to obtain those answers should be in an appendix. There should be no code in the body of the homework. You can accomplish this in R, Word, LaTex, Google Docs, etc. I. The purpose of this problem is to examine the effect that different correlations have on the outcome of the PCA. To make this easier, suppose x has a bivariate nor- mal distribution with µ = (0, 0)T , σ11 = σ22 = 1. For σ12 = −0.99,−0.9,−0.5, 0, 0.5, 0.9 and 0.99 (remember that σ12 = ρ12 because the variances are equal to 1), complete the following: (a) Simulate 1,000 observations from the bivariate nor- mal where a seed number of 8128 is set right before each data simulation. In R, use the command ”set.seed(8128)” to set your seed. Then ”rmvnorm(n = N, mean = mu, sigma = sigma)” to generate random normal variables. (b) Use ”princomp()” with ”cor = TRUE” to find the estimated eigenvalues and eigenvectors from the cor- relation matrix. (c) Interpret the PCs (d) How many PCs are necessary? (e) Create separate scatter plots of the data and the PC scores, but use one overall x-axis and y-axis set of limits. Describe the relationship between these plots for each ρ12. (f) Relate your answers in c) – e) to the value of σ12. II. The weekly rates of return for five stocks listed on the New York Stock Exchange. Online you will find the file ”Stock-Data.txt”. The txt file has the following columns: Column 1. JP Morgan: Column 2. Citibank: Column 3. Well Fargo: Column 4. Royal Dutch Shell: Column 5. Exxon Mobil: (a) Construct the sample covariance matrix S, and find the sample principal components. (b) Interpret the first two PCs. (c) Determine the proportion of the total sample vari- ance explained by the first three principal compo- nents. Interpret these components. (d) Generate the scree plot and interpret the plot. (e) Plot the first two PCs and interpret your plot. (f) Given the results from the previous parts, do you feel that the stock rates-of-return data can be sum- marized in fewer than five dimensions ? Explain. III. Online you will find the ”Goblet.csv” file. Below is the picture of the measurements for the Goblet. Subject-matter researchers are interested in grouping goblets that have the same shape although they may have different sizes. One way suggested by Manly(1994) to adjust the data is to divide each measurement by X3 (height). This can easily be done in R. Create these variables in R. You analysis will be done based on this variables. w1 = goblet$x1/goblet$x3, w2 = goblet$x2/goblet$x3, w4 = goblet$x4/goblet$x3, w5 = goblet$x5/goblet$x3, w6 = goblet$x6/goblet$x3) (a) Generate the Star plot using the following R com- mand. win.graph(width = 11, height = 7) stars(x = goblet[,-1], draw.segments = TRUE, key.loc = c(14,10), main = "Goblet star plot", labels = goblet$ID) (b) Which goblets appear to stand out? Can you make any generalizations about groups for goblets? (c) Generate the Parallel coordinates plot using the fol- lowing R command. 1 parcoord(x = goblet2[,-1], col = col.w5, main = "Goblet parallel coordinate plot") (d) Interpret the parallel plot. (e) Run the Principal component analysis using ”cor = TRUE” (f) Interpret the first three principal components. (g) How many PCs would you suggest for the analysis of the goblet data. Justify your answer. (h) Produce the Scree plot for the cereal data and inter- pret your plot. (i) Plot the first two PCs and interpret the plot.