Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Film Database Application
Objectives
Within this assignment you will be performing some experimental analysis on a historical movie
review dataset. The goal of the assignment is to determine which algorithm (user-based or item
based) and parameter combination (top-K neighbours or similarity threshold, and associated
values) produce the most accurate rating predictions on the given dataset.
Assignment Requirements
The main goal of the assignment will be to perform experimental analysis of the prediction
accuracy achieved by the recommender system algorithms we have discussed in the course.
The assignment page contains a text file of historical movie review data, which follows the same
format as the data used in lab #8. You will be required to submit a short report (~5 pages)
analyzing and discussing your experimental results in the context of this dataset. Some
questions you should aim to answer in your report:
1) Is user-based or item-based nearest neighbour recommendation more accurate for this
data?
2) Is top-K (i.e., selecting the K most similar users/items) or threshold-based (i.e., selecting
all users/items with similarity above some threshold X) more accurate for this data?
3) Which parameter values produce the most accurate results for this data (e.g., is 2
neighbours best? 10? 100? a threshold value of 0? 0.5? , etc.)? How does the prediction
accuracy change as the parameter values change?
4) How long does prediction take for each algorithm/parameter combination? Is one solution
faster than the other? Is this expected based on the algorithms or is it specific to your
implementation?
5) Based on your analysis and knowledge of the algorithms, which algorithm/parameter
combination would you use for a real-time online movie recommendation system? Provide
some arguments in favor of this conclusion based on your experimental results and the
computational requirements for the algorithm. You should also consider the
benefits/drawbacks of each algorithm in your comparison (e.g., what values can be
precomputed? how will this affect a real-world application?).
If you are looking for more data to include in the report, you can consider additional questions
too (e.g., do users with more/less reviews receive the most accurate recommendations?). To
generate data for the report, you are expected to use the ‘leave one out’ cross validation
approach discussed in the Evaluating Recommender Systems lecture (Week #10). This will
allow you to compute the mean absolute error across the entire dataset for any single
algorithm/parameter combination.
Repeating the experiments for this assignment will involve quite a bit of computation. It may be
worth spending some time improving the runtime complexity of your implementation before
running the experiments. Look for values you can precompute and reuse to avoid unnecessary
computation.