Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
STA304H1F/1003HF FALL MIDTERM TEST
Aids: Two-sided handwritten notes (8 1/2 x 11) and a non-programmable calculator.
Instructions: This test consists of 4 questions on 7 pages. Please answer all questions on the question
paper, showing all your work and using proper English. The maximum mark for this test is 50.
1. (9 marks) A auditor is confronted with a long list of accounts receivable for a firm. She must verify the
amounts on 10% of these accounts and estimate the average difference between the audited and book
values.
(a) (3 marks) Suppose the accounts are arranged chronologically (according to their dates), with the
older accounts tending to have smaller values. Would systematic or random sampling be preferred?
Explain briefly.
In this case systematic sampling would be preferred, as the population is ordered. [2]
Thus, the variance of an estimate from a systematic sample would be expected to be
smaller. [1]
OR: A systematic sample would give a better representation of the population...
OR: (Any other sound reason)
(b) (3 marks) Suppose the accounts are grouped by department, and then listed chronologically within
departments. The older accounts again tend to have smaller values. Would systematic or random
sampling be preferred? Explain briefly.
In this case (simple) random sampling would be preferred. [1]
Because the accounts are ordered within departments, the population behaves more
like a periodic population. [2]
OR: The population will have a cycle (large to small to large) along the list, so sys-
tematic sampling could be biased and collect all large or small accounts.
OR: Use stratified random sampling, with departments as strata. Within each stra-
tum, we can use simple random sampling, or systematic sampling- to take advantage
of the chronology. [3]
OR: Use repeated systematic sampling to overcome the periodicity.[3]
OR: (Any other sound reason)
(c) (3 marks) Which of the following three estimation methods do you think is most appropriate to
estimate the desired population mean- ratio estimation, regression estimation or difference estima-
tion? Explain.
In this case, difference estimation is most appropriate, [1]
since audited and book values are highly correlated and both are measured on the
same scale. [1]
It is easier than regression estimation since the regression coefficient is set to one. [1]
AND/OR Compared to ratio estimation, we would not necessary have that there is
regression through the origin and the aim is to find difference rather than ratio. [1]
Page 1 of 7
2. (16 marks) A forest resource manager is interested in estimating the number of dead fir trees in a
300-acre area of heavy infestation. Using an aerial photo, he divides the area into 200 plots, each of 1.5
acres. Let x denote the photo count of dead firs and y the actual ground count for a simple random
sample of n = 10 plots. The total number of dead fir trees obtained from the photo count is τx = 4200.
The sample data is shown in the table and plotted in the figure below.
Plot sampled 1 2 3 4 5 6 7 8 9 10
Photo count 12 30 24 24 18 30 12 6 36 42
Ground count 18 42 24 36 24 36 14 10 48 54
(Note: considerations were made for the typo corrected in the above table for the ground count of the
8th plot sampled.)
5 10 15 20 25 30 35 40
10
20
30
40
50
photo
gr
ou
nd
(a) (4 marks) Construct a ratio estimate of the total number of dead firs in the 300-acre area. omit-
Place a bound on the error of estimation.
r =
y¯
x¯
=
30.6
23.4
= 1.307692
τˆy = rτx = 1.307692(4200) = 5492.31
Hence, a ratio estimate of the number of dead firs in the 300-acre plot is 5492.31 trees.
(This question #2 continues on the next page.)
2
(b) (4 marks) The model yi = α+β(xi− x¯) was fitted to this data and some related R output appears
below. The estimates of α and β were 30.60 and 1.26 respectively, to 2 decimal places. Construct
a regression estimate for the total number of dead firs. Place a bound on the error of estimation.
> reg_model= lm(ground~ I(photo - mean(photo)))
> summary(reg_model)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 30.6000 1.1507 26.59 4.30e-09 ***
I(photo - mean(photo)) 1.2594 0.1057 11.91 2.27e-06 ***
Residual standard error: 3.639 on 8 degrees of freedom
Multiple R-squared: 0.9466,Adjusted R-squared: 0.94
F-statistic: 141.9 on 1 and 8 DF, p-value: 2.269e-06
> mean(ground)
[1] 30.6
> mean(photo)
[1] 23.4
> sum(residuals(reg_model)^2)/8
[1] 13.24012
Answer (b)
µˆyL = 30.6 + (1.2594)
(4200
200
− 23.4
)
τˆyL = NµˆyL = 200(27.57744) = 5515.50
A regression estimate of the total number of dead fir trees is 5515.5 trees.
A bound on the error of estimation is found by B = 2(200) ∗
√
(1− 10200) ∗ 13.2401210 = 448.6
(c) (3 marks) Do you think that regression estimation is better than ratio estimation for this problem?
Explain.