Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
ENGG2112 Coding Assignment
Instructions
• This is an individual assignment and the submitted work must be your
original work. You are allowed to discuss the method of solution with
others, however the submitted code must be entirely written by you.
• Submit your work as a Python notebook in the template provided.
• Submissions must be made through Canvas only, and not by e-mail. The
deadline will be strictly enforced: 11:59pm on 8 April 2022. (Students
with disability adjustments will be contacted separately.)
• It is expected that a student of average ability will take 4 hours to com-
plete the assignment. Please plan your time accordingly, seek help from
the teaching team and peers in a timely fashion, and do not ask for dead-
line extensions.
• A video demonstrating how to download the CSV files required has
been posted on Canvas in case you need help with that.
1
2Problem 1
The first column “mpg” indicates the fuel efficiency in
miles per gallon, with larger values indicating greater efficiency. Delete the
final column “name” and do not use it as a feature, or else your results will
be weird.
1. Write a function that performs a linear regression to estimate the mpg
from some of the features provided – you need to decide which features
to use and explain your decision. Your function should read the CSV file,
perform systematic k-fold cross-validation, and retain only the coeffi-
cients of the best model. A feature vector x, k and the file name are
inputs to the function; the coefficient vector and predicted mpg are the
outputs.
2. Write a function that uses the K nearest neighbours method to classify a
vehicle according to its number of cylinders, using all of the remaining
features. Use the first 300 records as training data and the remainder as
test/validation data. K and the file name are inputs to the function; the
predicted number of cylinders of the test data and the accuracy are its
outputs.
Problem 2
This is data collected from penguins of various
species found on various islands.
1. Write a function that uses linear regression to predict body mass from
Bill Length, Bill Depth and Flipper Length only. Use the first 300 data
records for training and the remaining data for testing. The function
input is the file name and its outputs are the predicted body mass of the
test data and the residual sum of squares of the test data.
2. Repeat the above exercise with Sex as an additional (categorical) vari-
able. Compare the RSS obtained here with the one from part 1, and
comment on your observation.
3. Use logistic regression to output the probability of a penguin being Male
or Female given the other features. Use the first 300 records for training
and the remainder for testing. The function must output the Sex of the
test data, and the accuracy of the classifier.