Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
COMP6214 Open Data Innovation
Deliverable(s) Deadline Marking Scheme
Coursework 1
PDF report
Module
week 8
Task 1 Data cleaning 8 marks
Task 2 Data modelling 12 marks
Task 3 Data visualization 20 marks
Total 40 Marks
Task
This coursework has three parts assessing your ability to clean and model an open dataset and then
visualize that same dataset. You will write a single PDF report which includes results for all three
tasks, which will then be marked according to the assignment marking scheme. You will not submit
your code, just a report with screenshots (i.e. evidence of visualizations produced by your code).
Task 1 - Data cleaning [8 marks]
You must identify errors within the provided assignment dataset and correct them. The assignment
dataset is an Excel spreadsheet obtained from a UK government open data website. It can be
download from the module wiki page (see link to CW1-BusinessImpactsOfCovid19Data.xlsx).
A total of 10 errors have been introduced to this dataset for you to find and correct. You can use
techniques learnt from the lecture content to help you, or other tools you find on the web.
Your report must document how you found the errors (including tools used and justification for why
they were used), what they were, how you corrected them and what validation approach you used
to check the clean dataset was error free.
Task 2 - Data modelling [12 marks]
You must model the dataset in open data format RDF and populate the model using the data from
the datasets. You should export your RDF triples as turtle TTL formatted output.
You can use techniques learnt from the lecture content to help you. You can use the example code
package which can be download from the module wiki page (see link to java-rdf-example-code.zip),
or other tools you find on the web.
Your report must document the knowledge representation (i.e. ontology classes and predicates) you
chose to represent the knowledge extracted from dataset as RDF. You should justify why you chose
this knowledge representation in the context of other choices, and why you think it has a good
balance between expressiveness and conciseness and delivers conceptual clarity. You should provide
a clear diagram showing the ontology used alongside instance frequency statistics (i.e. number of
instances of each class), and a small snippet from the TTL file you serialized (i.e. max half a page of
TTL). You should also explain your data ingest approach and RDF model construction and
serialization approach.
Task 3 - Data visualization [20 marks]
You must create a multi-dimensional interactive visualisation of your RDF model for the assignment
dataset using a Linked Data Visualization tool. Your visualisation should have suitable interactivity
that allows for manipulation, filtering, and detailed analysis of the data.
You should aim to develop a multidimensional (greater than 2 dimensions) visualisation that enables
rich exploration of the data. Note that 'multidimensional' refers to the dimensions of the data, not
the visualisation (i.e. you are expected to use values from at least 3 columns from the provided
dataset to create your visualisation from one or more worksheets).
Examples of tools you might use are in the resources section of this document and lecture content.
Your report must describe the visualisation tools and techniques used in enough detail to show a
deep understanding. You should justify your choice of visualisation tools and techniques in the
context of the problem and alternatives that are available. Your report should describe a
hypothetical scenario for which your multi-dimensional interactive visualisation could be used with
the assignment dataset. You should walk the reader through this scenario, using your interactive
system running on the assignment dataset to provide sufficient screenshots to show its rich features,
multi-dimensional capabilities and support for interactive data manipulation, filtering, and analysis.
Report structure
Your PDF report should not be longer than 20 pages (including all sections and screenshots) and use
font size 12. Include a title, your name and student number but no abstract or table of contents.
Failure to use the required structure or going over the page limit will be penalized.
Your assignment PDF report should have the following section headings:
Title, Student Name, Student Number
1 Data cleaning
1.1 Approach to data cleaning with justification
1.2 Errors identified and validation approach used to check cleaned dataset
2 Data modelling
2.1 Knowledge representation for RDF model with justification
2.2 Approach to data ingest
2.3 Ontology with instance frequency statistics and TTL snippet
3 Data visualization
3.1 Approach to multi-dimensional interactive visualisation with justification
3.2 Hypothetical scenario for multi-dimensional interactive visualisation with walk-
through