CMT309 Computational Data Science
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
CMT309
Computational Data Science
Assessment Number : 2
Extenuating Circumstances submission deadline will be 1 week after the submission date
above.
Extenuating Circumstances marks and feedback return will be 1 week after the feedback re-
turn date above.
This assignment is the CMT309 Data Science Portfolio, which accommodates 70% of the total marks
available for this module. If coursework is submitted late (and where there are no extenuating circum-
stances):
1.) If the assessment is submitted no later than 24 hours after the deadline, the mark for the
assessment will be capped at the minimum pass mark;
2.) If the assessment is submitted more than 24 hours after the deadline, a mark of 0 will be given
for the assessment.
Extensions to the coursework submission date can only be requested using the Extenuating Circum-
stances procedure. Only students with approved extenuating circumstances may use the extenuating
circumstances submission deadline. Any coursework submitted after the initial submission deadline
without approved extenuating circumstances will be treated as late.
By submitting this assignment you are accepting the terms of the following declaration:
I hereby declare that my submission (or my contribution to it in the case of group submissions)
is all my own work, that it has not previously been submitted for assessment and that I have
not knowingly allowed it to be copied by another student. I understand that deceiving or
attempting to deceive examiners by passing off the work of another writer, as one’s own is
plagiarism. I also understand that plagiarising another’s work or knowingly allowing another
student to plagiarise from my work is against the University regulations and that doing so will
result in loss of marks and possible disciplinary proceedings1
Assessment
(1) You have to upload the files mentioned in Submission Instructions section below.
(2) Failing to follow submitted file names, and file types (e.g. naming your file p1.py instead of
P1.py) will have a penalty of 10 points from your total mark.
(3) The coursework includes different datasets, which are automatically downloaded. Since these
files are already with the markers, students do not need to submit these files back.
(4) Changing the txt file names, and developing your codes with those changed file names would
cause errors during the marking since the markers will use a Python marking code developed
with the original file names.
(5) You can use any Python expression or package that was used in the lectures and practical
sessions. Additional packages are not allowed unless instructed in the question. Failing to
follow this rule might cause to lose all marks for that specific part of the question(s).
(6) You are free to use any Python environment, or version to develop your codes. However, you
should fill and test your notebook in Google Colab since testing and marking process will be
done via Google Colab.
(7) If any submitted code for any sub-question fails to run in Google Colab, that part of the code
will be marked as 0 without testing the code in Jupyter, or any other environment.
(8) It is not allowed to use input() function to ask user to enter values.
(9) If a function is asked to be developed, the name and input arguments of that function should be
as the same as instructed in the paper.
Learning Outcomes Assessed
• Carry out data analysis and statistical testing using code
• Critically analyse and discuss methods of data collection, management and storage
• Extract textual and numeric data from a range of sources, including online
• Reflect upon the legal, ethical and social issues relating to data science and its applications
Criteria for assessment
Credit will be awarded against the following criteria. Different criteria are applied to pandas code
(using pandas outside of a function), function code, and figures obtained with matplotlib or seaborn
. pandas code is exclusively judged by its functionality. Functions are judged by their functionality and
additionally their quality will be assessed. Figures are judged by their quality and completeness. The
below tables explain the criteria.
Mark Functionality (80%) Quality (20%)
Distinction
(70-100%)
Fully working application that
demonstrates an excellent
understanding of the assignment
problem using relevant python approach
Excellent
documentation with
usage of docstring
and comments
Functions
Merit
(60-69%)
All required functionality is met, and the
application are working probably with
some minors’ errors
Good documentation
with minor missing of
comments
Pass
(50-59%)
Some of the functionality developed with
and incorrect output major errors
Fair documentation
Fail
(0-50%)
Faulty application with wrong
implementation and wrong output
No comments or
documentation at all
Mark Functionality (100%)
Distinction
(70-100%)
Fully working application that demonstrates an excellent
understanding of the assignment problem using relevant python
approach
Pandas
Code
Merit
(60-69%)
All required functionality is met, and the application are working
probably with some minors’ errors
Pass
(50-59%)
Some of the functionality developed with and incorrect output
major errors
Fail
(0-50%)
Faulty application with wrong implementation and wrong output
Page 3
Mark Quality and completeness (100%)
Distinction
(70-100%)
Excellent figures with complete and informative data and
formatting, labels, titles, and legends if appropriate
Figures
Merit
(60-69%)
Good figures with good formatting, labels, titles, and legends
Pass
(50-59%)
Acceptable figures with missing information, bad formatting,
labels, titles, or legends
Fail
(0-50%)
Faulty or missing figures
Mark Quality (100%)
Distinction
(70-100%)
In addition to the requirements for Merit, there is a scholarly
approach, including references to external resources or types of
biases not covered in class.
Ethics
Merit
(60-69%)
Significant discussion is provided, with deep mapping between
several or all sources of bias and the argumentation.
Pass
(50-59%)
Some discussion is provided, and shallow mapping between few
sources of bias and discussion is provided.
Fail
(0-50%)
Incomplete discussion, sources of bias no discussed or
discussed with major mistakes.
Feedback and suggestion for future learning
Feedback on your coursework will address the above criteria. Feedback and marks will be returned
within 4 weeks of your submission date via Learning Central. In case you require further details, you
are welcome to schedule a one-to-one meeting.
Submission Instructions
Start by downloading P1.ipynb, and P2.ipynb from Learning Central, then answer the following
questions. You can use any Python expression or package that was used in the lectures and practical
sessions. Additional packages are not allowed unless instructed in the question. You answer the
questions by filling in the appropriate sections in the Jupyter Notebook.
Your coursework should be submitted via Learning Central by the above deadline. You have to upload
the following files:
Description Type Name
Your solution to part 1 Compulsory
One jupyter notebook
(.ipynb) file
P1.ipynb
Your solution to part 2 Compulsory
One jupyter notebook
(.ipynb) file
P2.ipynb
Make sure to include your student number as a comment in all of the Python files! Any deviation
from the submission instructions (including the number and types of files submitted) may result in a
reduction of marks for the assessment or question part.
You can submit multiple times on Learning Central. ONLY files contained in the last attempt
will be marked, so make sure that you upload all files in the last attempt.
Staff reserve the right to invite students to a meeting to discuss the Coursework submissions.
Page 4
Part 1 - Text Data and Ethics (45 marks)
This part covers the course content of weeks 1 to 4. It is advised students to complete this
part by the end of Week 5, to better plan your time and leave enough preparation time for the
second part of the assignment!
In this question you will write Python code for processing, analysing and understanding the social
network Reddit. Reddit is a platform that allows users to upload posts and com-
ment on them, and is divided in subreddits, often covering specific themes or areas of interest (for
example, world news, ukpolitics or nintendo). You are provided with a subset of Reddit with posts
from Covid-related subreddits (e.g., CoronavirusUK or NoNewNormal), as well as randomly selected
subreddits (e.g., donaldtrump or razer ).