Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
CMT224
Social Computing
Assessment Title: Social Computing Portfolio
This assignment is worth 100% of the total marks available for this module. If coursework is
submitted late (and where there are no extenuating circumstances):
1 If the assessment is submitted no later than 24 hours after the deadline,
the mark for the assessment will be capped at the minimum pass mark;
2 If the assessment is submitted more than 24 hours after the deadline, a
mark of 0 will be given for the assessment.
Your submission must include the official Coursework Submission Cover sheet, which can be
found here:
Submission Instructions
Description Type Name
Cover sheet Compulsory One PDF (.pdf) file [student_number].pdf
Part 1 Notebook
(Using the template provided on Learning
Central)
Compulsory One IPython Notebook file (.ipynb) [student_number]-part-1.ipynb
Part 2 Notebook
(Using the template provided on Learning
Central)
Compulsory One IPython Notebook file (.ipynb) [student_number]-part-2.ipynb
Part 3 Notebook
(Using the template provided on Learning
Central)
Compulsory One IPython Notebook file (.ipynb) [student_number]-part-3.ipynb
Any code submitted will be run on a system equivalent to your University provided laptop
and must be submitted as stipulated in the instructions above.
Any deviation from the submission instructions above (including the number and types of
files submitted) will result in a mark of zero for the assessment or question part.
Staff reserve the right to invite students to a meeting to discuss coursework submissions
Assignment
You are tasked with analysing various datasets representing different types of social and
communication behaviour. These datasets are provided as files and can be found alongside
this coursework pro-forma on Learning Central. You should ONLY use the files provided as
they are intentionally modified subsets of public datasets1.
Alongside the dataset files, there are 3 (THREE) IPython notebooks, named part-1.ipynb,
part-2.ipynb, and part-3.ipynb, which you should solely use to complete the assignment and
submit these in line with the Submission Instructions section above. The cells in each
completed notebook will be run in the order that they appear. You do not need to resubmit
the dataset files.
You are required to address 16 total questions across the 3 parts. Each part is made up of 1
or 2 tasks containing multiple questions. These questions are also listed below for
convenience.
For EACH question in EACH notebook:
1. Complete the cell below each question marked with “#CODE:” with the Python code
needed to generate any new information you need for your answer. This information
should be outputted when the cell is ran and any floating-point values should be
presented to 2 decimal places unless they are less than 0.01.
2. Complete the cell below this marked with “ANSWER:” with your answer to the
question, referring to the information outputted above (as well as any previous cell if
needed). In doing so, briefly explain your approach and methods/measures used to
answer the question and justify any choices made. Each answer cell should (ideally)
be no more than 125 words.
Each question is worth 6 marks (making a total of 96/100 possible marks) and a further 4
marks (4/100) will awarded for the overall usability and readability of the notebooks
submitted. Marks will be awarded using the criteria described in the Criteria for assessment
section below.
You may use any Python packages locally installed or installable via pip on your University
provided laptop. “%pip install ” commands should be placed in the cell
below “Install Python packages (pip only)” provided at the top of each notebook. “import
” lines for all packages required for the notebook to be ran successfully
should be placed in the cell under “Import Python packages” provided at the top of each
notebook. You may add additional cells throughout the notebooks, but this should be
minimised.
Questions (Duplicated from the notebook files)
Part 1: Social media behaviour data
Task 1 of 1
Examine the Graph Modelling Language (gml) files
"socialmedia_cmt224r_reply_network.gml" (reply network) and
"socialmedia_cmt224r_social_network.gml" (social network) which represent Twitter data
between a sample of users over several days at the time of the Higgs boson particle discovery.
Both networks are directed and share the same ids for nodes (anonymised Twitter users).
However, the shared user ids are contained within the "label" attribute in the .gml files, not
the node "id" attribute of each individual .gml file.
In the reply network, an edge from a node, , to some other node, , indicates that replied
to a Tweet made by during the time period. Replies are also Tweets. Edges are weighted
with the weight representing the number of times this happened over the time period.
In the social network, an edge from node to indicates that follows on the social media
platform.
Using these networks, answer the following questions:
Q1. How does the topological structure of the reply network differ from the social network
in terms of the fraction of mutual connections (i.e., users that follow each other) and
the number of connected groups of users?
Q2. Do the 20 users that follow the most other users also reply to the most amount of
users?
Q3. To what extent does the number of followers a user has in the social network correlate
with the number of users that have replied to them?
Q4. Do users typically ONLY reply to Tweets, are ONLY replied to, or BOTH?
Q5. Of the users that ONLY reply to Tweets, how many ONLY do so to those users they are
following?
Q6. How many users have ONLY mutual following connections AND ONLY mutual reply
connections with these SAME users?
Part 2: Email behaviour data
Task 1 of 2
Examine the file "emails_cmt224r.edgelist" which represents email behaviour at an
organisation. Each line contains two numbers, and , separated by a blank space. Consider
each number as an identifier for an individual in an organisation, with the space on each line
representing that the individual, , sent at least one email to another individual, , at some
point. Model the data using an appropriate network representation and answer the following
questions:
Q1. How many individuals have a higher or lower ratio of mutual connections than the
ratio of mutual connections found in the overall network?
Q2. Are occurrences of induced, connected subgraphs of 3 individuals (triads) with only
mutual connections more abundant in the network than those with a mixture of
asymmetric and mutual edges?
Q3. Using the largest, strongly connected component (where at least one path exists
between each individual and all others), could the connectivity be suggested to be
reflective of a small world phenomenon in comparison to a comparative random
network?
Task 2 of 2
Examine the JSON file "emails_cmt224_departments.json" (departments file). Keys in the
departments file represent individuals using the same ids as in the
"emails_cmt224r.edgelist" file in Part 2, Task 1 and the values represent a department id
that the individual can be attributed to. Using the contents of the departments file in
combination with the network in Part 2, Task 1, answer the following questions:
Q1. Using the connections that individuals have in the network, are they more likely to
mix with others in their department or those with a similar number of connections?
Q2. Are all departments with 12 or more members more tightly connected amongst
themselves in comparison to all individuals across the overall network irrespective of
their department? Where in this context, 'more tightly connected' is defined as
having less sparsity in the connections among members AND more clustered
connections. In addition to answering the overall question as yes or no, provide a list
of departments this is true for (if any) and not true for (if any).
Part 3: Peer-to-peer message behaviour data
Task 1 of 2
Examine the file "p2p_msg_cmt224r.csv" which represents messaging behaviour between
users on a messaging platform. Each row has four columns, representing a single event where
a person (person_a) messaged another person (person_b) on some date (date) at some time
of day (time). From this, answer the following questions: