COMP9414 Artificial neural networks
Artificial neural networks
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
COMP9414
Artificial Intelligence
Assignment 1 - Artificial neural networks
1 Problem context
Time Series Air Quality Prediction with Neural Networks: In this
assignment, you will delve into the realm of time series prediction using neural
network architectures. You will explore both classification and estimation
tasks using a publicly available dataset.
You will be provided with a dataset named “Air Quality,” [1] available
on the UCI Machine Learning Repository 1. We tailored this dataset for this
assignment and made some modifications. Therefore, please only use the
attached dataset for this assignment.
The given dataset contains 8,358 instances of hourly averaged responses
from an array of five metal oxide chemical sensors embedded in an air qual-
ity chemical multisensor device. The device was located in the field in a
significantly polluted area at road level within an Italian city. Data were
recorded from March 2004 to February 2005 (one year), representing the
longest freely available recordings of on-field deployed air quality chemical
sensor device responses. Ground truth hourly averaged concentrations for
carbon monoxide, non-methane hydrocarbons, benzene, total nitrogen ox-
ides, and nitrogen dioxide among other variables were provided by a co-
located reference-certified analyser.
are listed in Table 1. Missing values within the dataset are tagged
with -200 value.
Table 1: Variables within the dataset.
Variable Meaning
CO(GT) True hourly averaged concentration of carbon monoxide
PT08.S1(CO) Hourly averaged sensor response
NMHC(GT) True hourly averaged overall Non Metanic HydroCar-
bons concentration
C6H6(GT) True hourly averaged Benzene concentration
PT08.S2(NMHC) Hourly averaged sensor response
NOx(GT) True hourly averaged NOx concentration
PT08.S3(NOx) Hourly averaged sensor response
NO2(GT) True hourly averaged NO2 concentration
PT08.S4(NO2) Hourly averaged sensor response
PT08.S5(O3) Hourly averaged sensor response
T Temperature
RH Relative Humidity
AH Absolute Humidity
2 Activities
This assignment focuses on two main objectives:
• Classification Task: You should develop a neural network that can
predict whether the concentration of Carbon Monoxide (CO) exceeds
a certain threshold – the mean of CO(GT) values – based on historical
air quality data. This task involves binary classification, where your
model learns to classify instances into two categories: above or below
the threshold. To determine the threshold, you must first calculate
the mean value for CO(GT), excluding unknown data (missing values).
Then, use this threshold to predict whether the value predicted by your
network is above or below it. You are free to choose and design your
own network, and there are no limitations on its structure. However,
your network should be capable of handling missing values.
2
• Regression Task: You should develop a neural network that can pre-
dict the concentration of Nitrogen Oxides (NOx) based on other air
quality features. This task involves estimating a continuous numeri-
cal value (NOx concentration) from the input features using regression
techniques. You are free to choose and design your own network and
there is no limitation on that, however, your model should be able to
deal with missing values.
In summary, the classification task aims to divide instances into two cat-
egories (exceeding or not exceeding CO(GT) threshold), while the regression
task aims to predict a continuous numerical value (NOx concentration).
2.1 Data preprocessing
It is expected you analyse the provided data and perform any required pre-
processing. Some of the tasks during preprocessing might include the ones
shown below; however, not all of them are necessary and you should evaluate
each of them against the results obtained.
(a) Identify variation range for input and output variables.
(b) Plot each variable to observe the overall behaviour of the process.
(c) In case outliers or missing data are detected correct the data accord-
ingly.
(d) Split the data for training and testing.
2.2 Design of the neural network
You should select and design neural architectures for addressing both the
classification and regression problem described above. In each case, consider
the following steps:
(a) Design the network and decide the number of layers, units, and their
respective activation functions.
(b) Remember it’s recommended your network accomplish the maximal
number of parameters Nw < (number of samples)/10.
(c) Create the neural network using Keras and TensorFlow.
3
2.3 Training
In this section, you have to train your proposed neural network. Consider
the following steps:
(a) Decide the training parameters such as loss function, optimizer, batch
size, learning rate, and episodes.
(b) Train the neural model and verify the loss values during the process.
(c) Verify possible overfitting problems.
2.4 Validating the neural model
Assess your results plotting training results and the network response for the
test inputs against the test targets. Compute error indexes to complement
the visual analysis.
(a) For the classification task, draw two different plots to illustrate your
results over different epochs. In the first plot, show the training and
validation loss over the epochs. In the second plot, show the training
and validation accuracy over the epochs. For example, Figure 1 and
Figure 2 show loss and classification accuracy plots for 100 epochs,
respectively.