Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
It will take you quite some time to complete this project, therefore, we earnestly
recommend that you start working as early as possible. You should read the specs carefully at
least 2-3 times before you start coding.
Project Specification
Instructions
1. This note book contains instructions for .
You are required to complete your implementation for part-1 in a file project.py
provided along with this notebook. Please the name of the file.
You are not allowed to print out unnecessary stuff. We will not consider any output printed
out on the screen. All results should be returned in appropriate data structures via
corresponding functions.
You can submit your implementation for Project via give .
You are allowed to add other functions and/or import modules (you may have to for this
project), but you are not allowed to define global variables. Only functions are allowed in
project.py
You should not import unnecessary and non-standard modules/libraries. Loading such
libraries at test time will lead to errors and hence 0 mark for your project. If you are not
sure, please ask @ Piazza.
Allowed Libraries:
You are required to write your implementation for the project using Python 3.6.5 . You are
allowed to use any python standard libraries .
Part One - Group Varint Encoding
Input Format:
Note:
LATE PENALTY: 10% on day-1 and 30% on each subsequent day.
COMP6714-Project
DO NOT ALTER
The function encode() should receive One argument:
posting_list which is a list of integers, where each integer represents a document ID
(all the document IDs are sorted).
Output Format:
Your output should be a bytearray, which is the group varint encoding for posting_list .
Toy Example for Illustration
Here, we provide a small toy example for this part:
Let posting_list be:
['00000110',
'00000001',
'00001111',
'11111111',
'00000001',
'11111111',
'11111111',
'00000001']
Part Two - Group Varint Decoding
Input Format:
The function decode() should receive One argument:
encoded_list is a Bytearray which corresponds to the encoded binary sequence.
Output Format:
Your output should be a list of integers, where each integer represents a document ID that is
decoded from the encoded list.
Toy Example for Illustration
In [1]:
def encode(posting_list):
pass
In [2]:
posting_list = [1, 16, 527, 131598]
In [3]:
encoded_list = encode(posting_list)
In [6]:
[bin(code)[2:].zfill(8)for code in encoded_list]
Out[6]: In [55]:
def decode(encoded_list):
pass
Here, we provide a small toy example for this part:
Let encoded_list be:
[1, 16, 527, 131598]
Part Three - Evaluation
In this part, you need to implement a function that computes the F1 score and MAP with the
given informtion.
Input Format:
The function evaluation() should receive two argument:
rel_list is a list of 0s and 1s, where 0 indicates that the corresponding document is
irrelevant, and 1 indicates that the corresponding document is relevant. total_rel_doc is an
integer that indicates the total relevant documents to the query.