Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Please submit your solution via LEARNONLINE. Submission instructions are given at the end of this assignment.
This assessment is worth 20% of the total marks. This assessment consists of 6 questions.
In this assignment you will aim to predict if it will rain on each day given weather observations from the preceding day. You will perform a number of machine learning tasks, including training a classifier, assessing its output, and optimising its performance. You will document your findings in a written report. Write concise explanations; approximately one paragraph per task will be sufficient.
Download the data file for this assignment from the course website (file weather.zip). The archive contains the data file in CSV format, and some python code that you may use to visualise a decision tree model.
Before starting this assignment, ensure that you have a good understanding of the Python programming language, the Jupyter Python notebook environment, and an overall understanding of machine learning training and evaluation methods using the scikit-learn python library (Practical 3). You will need a working Python 3.x system with the Jupyter Notebook environment and the ‘sklearn’ package installed.
Create a Jupyter notebook and load the data. Use
import numpy as np
data = np.loadtxt(‘weather.csv’,skiprows=1,delimiter=’,’, dtype=np.int)
to load the data. Type this code into the notebook. You will get syntax errors if you copy and paste from this document. (Students familiar with the Pandas library may use that to load and explore the data instead.)
Familiarise yourself with the data. There are 44 columns and 2716 rows. All values are binary (0/1) where 0 indicates false and 1 indicates true.
Categorical variables were encoded using “One Hot” coding, where a separate column is used to indicate the presence or absence of each possible value of the variable. For example, the three binary-valued columns “MinTemp_Low”, “MinTemp_Moderate”,”MinTemp_High” correspond to the three possible values “Low”, “Moderate”, and “High” of variable “MinTemp”. A 1 in column “MinTemp_Low” means that the value of MinTemp was “Low”; the cells for the other two values must be 0 in this case.
Explore the distribution of data in each column.
The last column contains the prediction target (RainTomorrow). The meaning of the columns is as follows:
· MinTemp_{Low,Moderate,High}: 1 if the minimum temperature on the day was low/moderate/high
· MaxTemp_{Low,Moderate,High}: 1 if the maximum temperature on the day was low/moderate/high
· Evaporation_{Low,Moderate,High}: 1 if the measured evaporation on the day was low/moderate/high
· Sunshine_{Low,Moderate,High}: 1 if the aggregated periods of sunshine on the day was low/moderate/high
· WindSpeed9am_{Low,Moderate,High}: 1 if the measured wind speed at 9am on the day was low/moderate/high
· WindSpeed3pm_{Low,Moderate,High}: 1 if the measured wind speed at 3pm on the day was low/moderate/high
· Humidity9am_{Low,Moderate,High}: 1 if the humidity at 9am on the day was low/moderate/high
· Humidity3pm_{Low,Moderate,High}: 1 if the humidity at 3pm on the day was low/moderate/high
· Pressure9am_{Low,Moderate,High}: 1 if the barometric pressure at 9am on the day was low/moderate/high
· Pressure3pm_{Low,Moderate,High}: 1 if the barometric pressure at 3pm on the day was low/moderate/high
· Cloud9am_{Low,Moderate,High}: 1 if the cloud cover at 9am on the day was low/moderate/high
· Cloud3pm_{Low,Moderate,High}: 1 if the cloud cover at 3pm on the day was low/moderate/high
· Temp9am_{Low,Moderate,High}: 1 if the temperature at 9am on the day was low/moderate/high
· Temp3pm_{Low,Moderate,High}: 1 if the temperature at 3pm on the day was low/moderate/high
· RainToday: 1 if it rained on the day
· RainTomorrow: 1 if it rained on the following day. This is the target we wish to predict.
A simple model for predicting rain tomorrow is to use today’s weather (RainToday) as an indicator of tomorrow’s weather (RainTomorrow).
What performance can we expect from this simple model?
Choose an appropriate measure to evaluate the classifier. Select among Accuracy, F1-measure, Precision, and Recall.
Use a confusion matrix and/or classification report to support your analysis.