Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
COSC 2670/2738 Practical Data Science (with Python)
Project Assignment
Marks : This assignment is worth 30% of the overall assessment for this course.
Late penalties
apply. A penalty of 10% of the total project score will be deducted per day. No
submissions will be accepted 5 days beyond the due date.
Objective
The key objectives of this assignment are to learn how to compare and contrast
several recommendation system algorithms. There are three major components of the
assignment – a completed Jupyter notebook used to run your experiments, a written
report, and a short video presentation where you describe what you did and your key
findings.
The dataset you will use will be a sample of the Netflix Prize data. The problem is
movie recommendation, and the data is already split into a training and validation set
that you can use to run all of your experiments.
Provided files
The following template files are provided:
SXXXXX-A3.ipynb : The primer Jupyter notebook file you should use to stage
and run all of your experiments.
netflix-5k.movie-titles.feather : The movie title dataframe that can be used to
map a movieID to a title, as well as a list of genres.
netflix-5k.train.feather : The training tuples for 5,000 users, where each tuple
is userID,movieID,rating?.
netflix-5k.validation.feather : A predefined set of validation tuples for the
same users that can be used by you to benchmark the performance of various
algorithms.
A3.pdf : This specification file.
Creating Your Workspace
Once again, you should rename the file SXXXXX-A3.ipynb appropriately based on
your student ID.
Creating Your Anaconda Environment
In order to create your anaconda environment for this project, you should run the
following command in a terminal shell:
conda create -n PDSA3 python=3.8
conda activate PDSA3
pip install jupyterhub notebook numpy pandas
pip install matplotlib scikit-learn seaborn
1
pip install kneed scikit-surprise
Note that both kneed and scikit-surprise can be finicky on some systems. For
example on my machine, an error was thrown during the compile of scikit-surprise
(Macbook Pro M1) but the install still worked when it tried a fallback install method.
So if you find that it really fails for you using pip, then and only then resort to conda.
This would break requirements.txt, but it should work reliably for everyone albeit not
very reproducible. The magic commands would be:
conda install -c conda-forge kneed
conda install -c conda-forge scikit-surprise
You can type “pip freeze” to see a list of the packages that are correctly installed
in your environment. You can also install scikit-learn-intelex and/or psutil if
you want to use the Intel-based optimisations or debug memory management as shown
in the sample Jupyter notebook.