DATA 311 Midterm Practice Problems
Midterm Practice Problems
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
DATA 311 Midterm Practice Problems
1. Describe what the “Bayes classifier” is. Will it result in 0 misclassifications?
2. What is a p-value?
3. What is wrong with the following analysis, and how would you fix it?
The data give the chemical composition of ancient pottery found at four sites in Great Britain. We will fit
a linear model with Calcium (Ca) as the response using Site and Magnesium (Mg) as predictors.
> ###Load data
> library(car)
> data(Pottery)
> ###Recode Site
> Pottery$Site <- as.numeric(Pottery$Site)
> Pottery$Site
[1] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 2 2 3 3 3 3 3 1 1 1 1 1
> plm <- lm(Pottery$Ca~Pottery$Site+Pottery$Mg)
> summary(plm)
Call:
lm(formula = Pottery$Ca ~ Pottery$Site + Pottery$Mg)
Residuals:
Min 1Q Median 3Q Max
-0.08619 -0.02989 -0.01557 0.02959 0.09908
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.070299 0.029655 2.371 0.0265 *
Pottery$Site -0.025603 0.012809 -1.999 0.0576 .
Pottery$Mg 0.049344 0.007037 7.012 3.81e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.05256 on 23 degrees of freedom
Multiple R-squared: 0.752, Adjusted R-squared: 0.7304
F-statistic: 34.87 on 2 and 23 DF, p-value: 1.088e-07
4. What are the inferential assumptions for simple linear regression? Suppose we knew the predictor X was
distributed uniformly, would that violate the assumptions?
5. Describe one nonparametric method we have introduced to perform regression.
6. Why should we not use R2 to compare the performance of linear regression models with differing numbers