Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
You must follow the code skeleton provided in the Gitlab repository. Inline comments will help you
identify the parts of the code you need to fill in. Keep in mind that the assignment does not require writing
much code: the logic of each data operator can be implemented in less than 20 LOC. Always keep your
code simple and well documented.
We will be using a mix of real and synthetic data. Real data include movie ratings from a large Netflix
dataset whereas friendship relationships between users are synthetic. The input data are available in the
Gitlab repository. Make sure you understand the data format first (cf. Section 1 “Data schema”). You
might also want to create a toy dataset of the same format to test your code easily.
1. Data schema
The data we will use for this assignment consist of two CSV files: Friends and Ratings. The former
contains friendship relationships as tuples of the form UserID1 UserID2, denoting two users who are
also friends. A user can have one or more friends and the friendship relationship is symmetric: if A is a
friend of B, then B is also a friend of A and both tuples (A B and B A) are present in the file. Ratings
contains user ratings as tuples of the form UserID MovieID Rating. For example, tuple 12 3 4
means that “the user with ID 12 gave 4 stars to the movie with ID 3”.
Hint #1: You can use Python’s CSV reader to parse input files.
Hint #2: Consider encapsulating your Python tuples in ATuple objects (see code skeleton).
2. TASK I: Implement backward tracing (credits: 40/100)
The first task is to extend the operators you built in Assignment 1 with support for backward tracing. For
each operator, you will have to implement a new method (in Python 3 syntax):
lineage(tuples: List[ATuple]) -> List[List[ATuple]]
that returns the lineage of the given list of tuples.