(a) (2 marks) Report the resulting tree.
(b) (2 marks) Based on this output, predict the credit rating of a hypothetical “median” customer, i.e., one with the attributes listed in Table 1, showing the steps involved.
(c) (2 marks) Produce the confusion matrix for predicting the credit rating from this tree on the test set, and also report the overall accuracy rate.
(d) (5 marks) What is the numerical value of the gain in entropy corresponding to the fifirst split at the top of the tree? (Use logarithms to base 2, and show the details of the calculation rather than just providing a fifinal answer.)
(e) (2 marks) Fit a random forest model to the training set to try to improve prediction. Report the R output.
(f) (2 marks) Produce the confusion matrix for predicting the credit rating from this forest on the test set, and also report the overall accuracy rate.
(a) (2 marks) Predict the credit rating of a hypothetical “median” customer, i.e.,one with the attributes listed in Table 1. Report decision values as well.
(b) (2 marks) Produce the confusion matrix for predicting the credit rating from this SVM on the test set, and also report the overall accuracy rate.
(c) (2 marks) Automatically or manually tune the SVM to improve prediction over that found in 3b. Report the resulting SVM settings and the resulting confusion matrix for predicting the test set. (Any amount of improvement is acceptable.)
(a) (2 marks) Predict the credit rating of a hypothetical “median” customer, i.e.,one with the attributes listed in Table 1. Report predicted probabilities as well.
(b) (2 marks) Reproduce the fifirst 20 or so lines of the R output for the Naive Bayes fifit, and use them to explain the steps involved in making this prediction.
(c) (2 marks) Produce the confusion matrix for predicting the credit rating using
Naive Bayes on the test set, and also report the overall accuracy rate.
(a) (2 marks) Which of the classififiers look to be the best? (Be specifific, and specify the fifigures you used to answer this question.)
(b) (2 marks) Which look to be the worst? (Be specifific, and specify the fifigures you used to answer this question.)
(c) (2 marks) Are there any categories that all classififiers seem to have trouble with?
(a) (2 marks) Fit a logistic regression model to predict whether a customer gets a credit rating of A using all of the other variables in the dataset, with no interactions.
(b) (2 marks) Report the summary table of the logistic regression model fifit.
(c) (2 marks) Which predictors of credit rating appear to be signifificant at 5% signifificance level?
(d) (2 marks) Fit an SVM model of your choice to the training set.
(e) (3 marks) Produce an ROC chart comparing the logistic regression and the SVM results of predicting the test set. Comment on any difffferences in their performance.