COMP90042 Rumour Detection and Analysis on Twitter
Rumour Detection and Analysis on Twitter
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
COMP90042 Project
Rumour Detection and Analysis on Twitter
Project type: Group
Codalab submission due date: 1pm Fri, 13th May 2022 (no extensions possible for this component)
The concept of rumour has a long history, and it is typically defined as an unverified statement or news circulat-
ing from person to person. Rumours have the potential to spread quickly through social media, and bring about
significant economical and social impact. The figure below illustrates an example of a rumour propagating on
Twitter. The source message (green box) started a claim about the cause of Michael Brown’s shooting, and it was
published shortly after the shooting happened. It claimed that he was shot ten times by the police for stealing
candy. The message was retweeted by multiple users on Twitter, and within 24 hours there were about 900K
users involved, either by reposting, commenting, or questioning the original source message. From the replies
we can see some users (e.g. User 7; red box) questioned the truthfulness of the original message.
0 hour
4 hours
8 hours
12 hours
16 hours
20 hours
24 hours
This is unbelievable,
or should be.
User 2
Follower Count: 3144
No excuse.
User 1
Follower Count: 1222
It applies to
Black people.
User 3
Follower Count: 6
These days anything,
especially with Stand Your
Ground and even a sneeze
is punishable by death.
User 5
Follower Count: 6632
Anything is punishable by
death if the youth is black.
User 8
Follower Count: 11197
apparently it is
now.
User 4
Follower Count: 205
He was 18. Nothing to do with
stealing candy. He was walking
in the street. Horrible situation.
But stop spreading false facts.
User 7
Follower Count: 122
17 year old unarmed kid shot
ten times by police for stealing
candy. I didn't know that was
punishable by death.
User 0
Follower Count: 873021
there has not been any
proof that he stole
candy. I guess skittles
has become a reason to
kill black teens.
User 6
Follower Count: 1141
I was just going off what I
read in the #ferguson tag
early last night. Wasn't any
real news out at that point.
User 0
Follower Count: 873021
The challenge of the project is to develop a rumour detection system and analyse the nature of rumours that are
being spread on Twitter. We will frame this using two tasks: rumour detection and rumour analysis.
Task 1: Rumour Detection
In this task, you will be provided with a set of tweet IDs for the source tweets (i.e. the first tweet that started
the story) and their replies (i.e. the comments we saw in the figure above), and each source tweet is labelled
as either a rumour or non-rumour. The task here is to use the Twitter API to crawl the tweet IDs to get
the tweet objects, and then build a binary classifier to classify whether a source tweet is a rumour or not. A
tweet object provides a variety of information, including the text of the tweet, information of the user who
made the tweet, when the post was created, etc (see “Datasets” section below for more information).
You’re free to explore any methods or machine learning models for building the binary classifier. To give
some ideas, we could model the source tweet and replies as a sequence of tweets using recurrent networks.
1
Alternatively we could also model them based on their propagation structure (like the tree structure of
comments we saw earlier) using recursive networks or graph networks. We might want to consider incor-
porating some user information, as it could provide hints to the trustworthiness of a user. While you are
permitted to use pretrained models or embeddings, you should only use the provided labelled tweets for
training the model, i.e. you should not find more labelled data beyond what we have provided. Whatever
methods or features you use, you must at least incorporate the tweet text in your model (we are doing an
NLP project, after all).
Task 2: Rumour Analysis
In this task, you will use your trained rumour classifier from the first task and apply it to unlabelled COVID-
19 tweets to detect rumours. Given the predicted rumours and non-rumours, the aim here is to perform
some analyses to understand the nature COVID-19 rumours and how they differ to their non-rumour coun-
terparts. We have provided a preliminary set of tweet IDs (of source tweets and their replies) related to
COVID-19 that you can use, but you’re free to search for more COVID-19 tweets to support your analyses
in this task.