Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Assignment 3 – CNNs
Table 1 below provides details of each layer. Please note that you need to flatten the output of the maxpooling layer (using the view function in PyTorch) to connect the fc layers. Also, note that the output of
fc2 is fed to a softmax layer but since in PyTorch, the CrossEntropyLoss has an inbuilt softmax function,
the softmax layer is not shown here.
Table 1: Q1 – network structure
Name Type in out kernel size padding stride
conv1 Conv2D 3 96 7 × 7 0 2
conv2 Conv2D 96 64 5 × 5 0 2
conv3 Conv2D 64 128 3 × 3 0 2
fc1 Linear 1152 128 NA NA NA
fc2 Linear 128 10 NA NA NA
maxpool Pooling NA NA 3 × 3 0 3
1. (25 marks) Train the network described above. In your report, plot the training and validation loss
(per epoch). Also, plot training, validation and test accuracy per epoch. Detail out the value of
hyperparameters, the optimizer used and all other relevant information in your report.
2. (5 marks) Once training is done, pass all your validation images through the network and plot the top
five images correctly classified per class. That is, for each class, pick five images that are correctly
classified by your network and have the maximum softmax scores.
3. (5 marks) Repeat the above but this time plot the top five images that are misclassified for each class
(i.e., your network is very confident about its decision but the decision is totally wrong).
4. (10 marks) A confusion matrix is a table used to describe the performance of a classifier. It allows
easy identification of class confusions (e.g., one class might be mislabeled as the other more often).
Read more about the confusion matrix from the encyclopedia of machine learning (see the pdf file
Ting2010 ConfusionMatrix.pdf). Compute the confusion matrix of your training, validation, and test
data. Are they following a similar pattern?
Q2. Deep CNN. Use “hw3.ipynb” as a starter code for this question. For this question, you develop
a deep CNN to classify STL images. Your network has 4 convolutional blocks. We denote a convolutional
block by conv-blk hereafter. The structure of a conv-blk is as follows;
Figure 1: Structure of the conv-blk.
The details of layers inside the conv-blk are depicted in Table 2. In essence, a convolutional block receives
an input x of size ci × Hi × Wi and processes it with 3 convolutional layers followed by ReLU non-linearity.
The first convolutional layer has co filters of size 3 × 3. With an stride of two and padding of one, the first
convolutional layer creates a feature map of size co × Hi/2 × Wi/2. This is further processed by 1 × 1 and
3 × 3 convolutions (and non-linearity).
Our deep CNN uses a stack of four of the aforementioned blocks. This will create a feature map of spatial
resolution 6 × 6 ( 96 blk1
−−→ 48 blk2
−−→ 24 blk3
−−→ 12 blk4
−−→ 6). The network then uses a Global Average Pooling
2
Table 2: Q2 – Details of the convolutional block
Name Type in out kernel size padding stride
Conv1 Conv2D ci co 3 × 3 1 2
Conv2 Conv2D co co 1 × 1 0 1
Conv3 Conv2D co co 3 × 3 1 1
(GAP) layer followed by a linear layer. Put all together, the structure of the network reads as:
image → conv-blk1 → conv-blk2 → conv-blk3 → conv-blk4 → GAP → fc.
The details of the conv-blks are depicted in Table 3.