Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
STAT 206 Homework
Homework must be submitted as pdf file, and be sure to include your name in the file. Give the commands to answer each question in its own code block, which will also produce plots that will be automatically embedded in the output file. Each answer must be supported by written statements as well as any code used. (Examining your various objects in the “Environment” section of RStudio is insufficient – you must use scripted commands.) In lecture, we fit a gamma distribution to the weight of cat’s hearts. We did this by adjusting the parameters so that the theoretical values of the mean and variance matched the observed, sample mean and variance. Since the mean and variance are the first two moments of the distribution, this is an example of the method of moments for estimation. The method of moments gives a point estimate θˆ of the parameters θ. To use a point estimate, we need to know how precise it is, i.e., how different it would be if we repeated the experiment with new data from the same population. We often measure imprecision by the standard error, which is the standard deviation of the point estimates θˆ. (You saw the standard error of the mean in your introductory statistics classes, but we are not computing the standard error of the mean here.) If we actually did the experiment many times, getting many values of θˆ, we could take their standard deviation as the standard error. With only one data set, we need to do something else. There is usually no simple formula for standard errors of most estimates, the way there is for the standard error of the mean. Instead, we will see how to approximate the standard error of for our estimate of the gamma distribution computationally. We can draw random values from a gamma distribution using the rgamma() function. For example, rgamma(n=35,shape=0.57,scale=15) would generate a vector of 35 random values, drawn from the gamma distribution with “shape” parameter a = 0.57 and “scale” s = 15. By applying the estimator to random samples drawn from the distribution, we can see how much the estimates will change purely due to noise. Part I - Estimates and standard errors 1. Write a function, gamma.est, which takes as input a vector of data values, and returns a vector containing the two estimated parameters of the gamma distribution, with components named shape and scale as appropriate. MY_ANSWER input <- rgamma(n = 35, shape = 0.57, scale = 15) gamma.est <- function(x){ gmean <- mean(x) gvar <- var(x) shape <- gmeanˆ2/gvar scale <- gvar/gmean rate <- 1/scale return (list(shape, scale, rate)) } 1 gamma.est(input) ## [[1]] ## [1] 0.4102008 ## ## [[2]] ## [1] 25.96766 ## ## [[3]] ## [1] 0.03850944 2. Verify that your function implements the appropriate formulas by showing that it matches the results from lecture for the cat heart data. MY_ANSWER #Obtain parameter by R built-in function library(MASS) data(cats) fitdistr(cats$Hwt, densfun = "gamma") ## shape rate ## 20.2998092 1.9095724 ## ( 2.3729250) ( 0.2259942) #Test gamma.est() gamma.est(cats$Hwt) ## [[1]] ## [1] 19.06531 ## ## [[2]] ## [1] 0.5575862 ## ## [[3]] ## [1] 1.793445 The shape & rate from R built-in is (20.30, 1.91), from gamma.est() is (19.07, 1.80), close! 3. Generate a vector containing ten thousand random values from the gamma distribution with a = 19 and s = 0.56. What are the theoretical values of the mean and of the variance? What are their sample values? MY_ANSWER random10000 <- rgamma(n = 10000, shape = 19, scale = 0.56) mean(random10000) ## [1] 10.60984 var(random10000) ## [1] 5.919709 sample_mean <- 19/(1/0.56) sample_var <- 19/(1/0.56)ˆ2 sample_mean ## [1] 10.64 sample_var ## [1] 5.9584 Theoretical mean and variance is (10.64, 5.88), sample values are (10.64, 5.96). 4. Plot the histogram of the random values, and add the curve of the theoretical probability density function. 2 MY_ANSWER hist(random10000, probability = TRUE, freq = NULL, main = "Histogram of The Random Values", xlab = "Random Values", ylim = c(0,0.2)) curve(dgamma(x, shape = 19, scale = 0.56), col = "blue", add = TRUE) Histogram of The Random Values Random Values D en si ty 5 10 15 20 0. 00 0. 05 0. 10 0. 15 0. 20 5. Apply your gamma.est function to your random sample. Report the estimated parameters and how far they are from the true values. MY_ANSWER gamma.est(random10000) ## [[1]] ## [1] 19.01593 ## ## [[2]] ## [1] 0.5579451 ## ## [[3]] ## [1] 1.792291 Values from gamma.est() are (19.22, 0.55), pretty close to gamma(19, 0.56). 6. Write a function, gamma.est.se, to calculate the standard error of your estimates of the gamma parameters, on simulated data drawn from the gamma distribution. It should take the following arguments: true shape parameter shape (or a), true scale parameter scale (or s), size of each sample n, and number of repetitions at that sample size B. It should return two standard errors, one for the shape parameter a and one for the scale parameter s. (These can be either in a vector or in a list, but should be named clearly.) It should call a function gamma.est.sim which takes the same arguments as gamma.est.se, and returns an array with two rows and B columns, one row holding shape estimates and the other row scale estimates. Your gamma.est.se function should not, itself, estimate any parameters or generate any random values. MY_ANSWER 3 #In Q1 I have estimated rate as well, here I remove rate estimation. gamma.est1 <- function(x){ gmean <- mean(x) gvar <- var(x) shape <- gmeanˆ2/gvar scale <- gvar/gmean return (list(shape, scale)) } gamma.est.sim <- function(a, s, n, B){ gamma.rd <- rgamma(n = n, shape = a, scale = s) est.parameter <- gamma.est1(gamma.rd) #output <- apply(gamma.est.sim, B, mean) return(est.parameter) for (i in 1:B){ output[[i]] <- gamma.est.sim(output[[i-1]]) } } gamma.est.sim(19, 0.56,100, 10) ## [[1]] ## [1] 21.10119 ## ## [[2]] ## [1] 0.5083013 #shape.list <- output[1,] #scale.list <- output[2,] Part II - Testing with a stub To check that gamma.est.se works properly, write a stub or dummy version of gamma.est.sim, which takes the correct arguments and returns an array of the proper size, but whose entries are fixed so that it’s easy for us to calculate what gamma.est.se ought to do. 7. Write gamma.est.sim so that the entries in the first row of the returned array alternate between shape and shape+1, and those in the second row alternate between scale and scale+n. For example gamma.est.sim(2,1,10,10) should return (row names are optional) ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] ## shapes 2 3 2 3 2 3 2 3 2 3 ## scales 1 11 1 11 1 11 1 11 1 11 and gamma.est.sim(2,8,5,7) should return ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] ## [1,] 2 3 2 3 2 3 2 ## [2,] 8 13 8 13 8 13 8 MY_ANSWER 8. Calculate the standard deviations of each row in the two arrays above. MY_ANSWER 9. Run your gamma.est.se, with this version of gamma.est.sim. Do its standard errors match the standard deviations you just calculated? Should they? 4 MY_ANSWER Part III - Replacing the stub 10. Write the actual gamma.est.sim. Each of the B columns in its output should be the result of applying gamma.est to a vector of n random numbers generated by a different call to rgamma, all with the same shape and scale parameters. MY_ANSWER 11. Run gamma.est.se, calling your new gamma.est.sim, with shape=2, scale=1, n=10 and B=1e5. Check that the standard error for shape is approximately 1.6 and that for scale approximately 0.54. Explain why your answers are not exactly 1.6 and 0.54. MY_ANSWER 5