Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
BUSA8001 Applied Predictive Analytics – Programming Task 2
Putall your workinto a file titled BUSA8001_programming_task2_MQ_ID.ipynb where MQ_ID is your Macquarie University student ID number (e.g. if MQ_ID == 12345678 then youneed to submit BUSA8001_programming_task2_12345678.ipynb).
•Failure to submit a correctly named file will result in a loss of 30 points.
•Failure to supply solutions in the cells provided below each question will result in a loss of 30points.
•Follow all instructions closely and not print your variables to screen unless explicitly asked todo so. Failure to do so will result in additional point deductions.
Perform the following tasks in python, writing your code in the cells provided underneatheach question.
Q1. Import the credit card data from https://archive.ics.uci.edu/ml/machine-learning-databases/00350/default of credit ard clients.xls directly into a pandas DataFrame named `df` making sure you skip the top row when reading the dataset. Delete the ‘ID” column after importing the data. (5 points)
Q2. Rename the column ‘PAY_0’ to ‘PAY_1’ and the column ‘default payment next month’ to ‘payment_default’ (5 points)
Q3. Create a one-dimensional NumPy array named `y` by exporting the first 12,500 observations of ‘payment_default’ column from df (hint: see `ravel` NumPy method). Similarly, create a two-dimensional NumPy array named `X` by exporting the first 12,500 observatations of ‘PAY_1’, ‘PAY_2’, ‘AGE’, ‘SEX’, ‘MARRIAGE’, ‘EDUCATION’ and ‘BILL_AMT1’ columns. (10 points)
Q4. Use an appropriate `scikit-learn` library we learned in class to create the following NumPy arrays: `y_train`, `y_test`, `X_train` and `X_test` by splitting the data into 68% train and 32% test datasets. Set `random_state` to 3 and stratify subsamples so that train and test datasets have roughly equal proportions of the target’s class labels. (5 points)
Q5. Use an appropriate `scikit-learn` library we learned in class to standardize features from train and test datasets to mean zero and variance one, as discussed in class. (5 points)
Q6. Using approapriate `scikit-learn` libararies we learned in class to fit the following classifiers to the training dataset constructed in Problem 1.
Q7. Using a method built into each of the above classifiers, compute prediction accuracy on training data for each classifier and store it into variables named according to the following pattern: classifier_name_accuracy_train`, for instance you should have `lr_accuracy_train`. (10 points)
Q8. Using a method built into each of the above classifiers, compute prediction accuracy on test data for each classifier and store it into variables named according to the following pattern:classifier_name_accuracy_test`, for instance you should have `lr_accuracy_test`. (10 points)
Q9. Explain which methods rank in the first two places according to their ability to accurately classify train data, and which two methods perform worst on train dataset? (10 points)
Q10.