Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Task 1: Identifying training problems of a deep CNN
Time budget1
: This section should take about 20% of the time you have allocated for your coursework.
This part of the coursework involves debugging and tuning deep CNNs using PyTorch and the Google Compute
Engine.
Identifying and resolving any optimization-related problems that might prevent your model from fitting the
training set are critical skills when working with deep neural networks. Being able to work around this problem
is key to building high performance deep neural networks.
4.1 Introducing our broken CNN
Note: This section’s objective is to test your knowledge, understanding and research skill. It’s not meant
to be an implementation-oriented section. The solutions only require you to add at most 4-5 lines of code.
Using the Pytorch-based research framework that can be found in the pytorch_mlp_framework folder, we
have built, trained and evaluated 2 deep CNNs. One consisting of a total of 7 convolutional layers + one fully
connected, and another consisting of 37 convolutional layers + one fully connected layer.
training/val loss performance of the two models. One can clearly see that the 37 layer CNN (VGG_38) was
unable to minimize its loss, unlike the healthy 7 layer CNN (VGG_08) which converges to a low error. Given
that we know that extra layers means more abstraction power, parameters and capacity, one would expect the
deeper model to be doing better at learning than the shallow one, however this is simply not the case.
Identifying the problem. Construct a hypothesis as to what is causing the issue in Figure 1. You can reproduce
the figures by running the notebooks/Plot_Results.ipynb notebook. This file takes as input the collected
metrics in folders VGG_08 and VGG_38.
Quantitative Analysis of the problem. Support your position of the problem using arguments based on
quantitative observations. For instance, Figure 2 shows how the gradient flows in the healthy network, VGG_08
across layers. These are the gradients with respect to the weights of the model. Visualize and discuss how
gradient flows for the broken network affects/does not affect the loss curves, training and convergence of the 37
layer CNN.
4.1.1 Implementation Guidelines
The curves shown in Figure 1 can be reproduced by running the bash scripts to train each model from scratch
with the default settings given in run_vgg_08_default.sh and run_vgg_38_default.sh.
For reproducing Figure 2 and visualizing the gradient flows for the broken network, implement the function
plot_grad_flows() within pytorch_mlp_framework/experiment_builder.py. This function takes as
input the model parameters during training, accumulates the absolute mean of the gradients in all_grads and
the layer names in layers. The matplotlib function plt plots gradient values for each layer and the function
plot_grad_flows() returns this final plot.
5 Task 2: Background Literature
Time budget1
: This section should take about 20% of the time you have allocated for your coursework.
There exist at least several methods that can improve the training performance of the broken 37 layer CNN
introduced in Section 4.1. Discuss, in your own words and in no any three out of the four research papers
given below.
6 Task 3: Solution and Experiments
Time budget1
: This section should take about 30% of the time you have allocated for your coursework.
Solution Overview. The recommended solutions are in the four papers listed in section 5. Pick and discuss any
one solution in detail in this section. Discuss theoretically and intuitively why you chose this solution and how it
addresses/improves the training performance of the VGG_38 model.
Experiments. The recommended solutions are described in Ioffe and Szegedy [2015], He et al. [2016]. You can
also find alternative ones in Huang et al. [2017], Lee et al. [2015] which can be more challenging to implement.
We have written the MLP Pytorch framework in a way that allows you to implement said solutions with minimal
effort. Each solution will require at approximately 8-10 lines of code.