Explore, clean, summarise and analyse the data
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
COMP5310 Project Stage
Explore, clean, summarise and analyse the data
This assignment is worth 15% of the final mark of the unit of study.
GROUPS
This assignment is done in groups of 2 or 3. All students in a group must be attending the
same lab session.
Note: there is work required from each member separately, but the project is handed in as a
combined effort, and it is marked as a whole: there will be individual and group components
to the marks, all based on the single submitted document.
Group formation procedure
In Week 2 lab session there will be an opportunity to meet other students and form a group
with help from the tutor. Students must be in project groups with others who are all
timetabled in the same lab session.
In Week 2 lab:
• Exchange names and contact information (e.g., which social media platforms you
prefer for coordinating).
• Arrange when to get together: at least one meeting per week (in addition to your
scheduled lab session) is vital, but more frequent coordination is even better.
Dispute resolution
If during the course of the assignment work there is a dispute among group members that
you can’t resolve or that will impact your group’s capacity to complete the task well, you
need to inform the unit coordinator
[email protected] or the TA
[email protected]. Make sure that your email specifies the lab session
and group name, and is explicit about the difficulty; also make sure this email is copied to all
group members (including anyone you are complaining about) and your lab tutor.
We need to know about problems in time to help fix them, so set early deadlines for group
COMP5310 Project Stage 1
Explore, clean, summarise and analyse the data
members, and deal with non-performance promptly (don’t wait till a few days before the
work is due to complain that someone is not delivering on their tasks). If necessary, the
coordinator will split a group and leave anyone who didn’t participate effectively in a group
by themselves (they will need to achieve all the outcomes on their own). This option is only
available up until Friday Week 5, which is the last day with time to resolve the issue before
the due date. For any group issues that arise after this time, you will need to try to resolve
the problem on your own, and you will continue to be treated as a single group which all get
the same mark for this stage, based on whatever is submitted (though you should still let
the coordinator, TA and lab tutor know about them). Groups may be changed after stage 1
is finished in this case.
PROJECT
Overview
The objective of stage 1 of the project is to acquire and meticulously clean the dataset,
followed by a comprehensive analysis to derive meaningful insights about the data, and
effectively prepare the data to build a predictive model in stage 2. Additionally, you will
define your research question, based on a research/business requirement, which you aim to
answer on stage 2.
Identify the topic
Each member needs to choose a different dataset and different topic. The dataset each
member chooses must be relevant to the topic and research question they define. We
realize that you may not find data that completely resolves the problem you have defined,
but all the data should at least be potentially able to provide some insights. For example, if
your topic is “what influences the level of wealth in a community?”, you might look at
datasets that relate to the economy, climate, education, type of government, etc. Please
make sure that your question or issue is not simply a factual matter, but instead looks at
relationships where insights might be impactful for some stakeholder groups (for example, it
is not a good choice of question to ask just “which country has the highest level of
wealth?”).
COMP5310 Project Stage 1
Explore, clean, summarise and analyse the data
Obtain the dataset and metadata
Each member needs to obtain a different dataset that can contribute to the exploration of
their own topic. We prefer that you use publicly available data (so we can check your work if
we need to) but it is OK for you to work on privately-owned data as long as you have
permission to use it, and permission to reveal it to the markers.
Each dataset must have a sufficient volume of data. For this assignment, a dataset is
considered sufficient volume if it contains at least 1000 rows/objects, and each
row/object has at least 15 attributes/columns. We recommend you choose a dataset that
is not already cleaned: you need to demonstrate that you can clean a dataset in Project
Stage 1 or prove that it is already cleaned (which may be harder). Consider your research
question when choosing your dataset, and make sure it has a range of attribute types and
data size that will help you answering your research question.
We will keep track of the datasets chosen by every student to make sure there are no
repetitions within the group and tutorial session. You can submit your chosen dataset by
filling this form, where you will be asked for your personal details, tutorial Activity code (find
it here), Group number (ask your tutor if unsure), a short description of your dataset and a
link to your dataset. If by any chance two students within a same group or same tutorial
session have chosen the same dataset, the student who entered their dataset first will
have priority and the other student will be contacted by their tutor to choose a different
dataset.