Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
INFS3200 Advanced Database Systems
Prac 3: Data Linkage (5%)
Submission: Submit your work online (INFS3200 Course Website)
Introduction
Learning objectives:
? Learn how to use JDBC/cx_Oracle to interact with Oracle DBMS, including the operations
of create table, update table, query table, etc.
? Learn how to measure the performance of data linkage.
? Understand various string similarity measures, e.g., edit distance and Jaccard coefficient, for
field-level data linkage.
Marking Scheme:
This Prac carries 5 marks for 5% of this course.
1. 2 marks: Complete two missions in Task 1, each mission is accounted for 1 mark.
2. 1 mark: Complete Task 2, the correct answer for each bullet point is worth 0.5 marks.
3. 1 mark: Complete Task 3.
4. 1 mark: Complete Task 4.
Please include your screenshots of the data import and data linkage results (precision, recall, f-
measure and time) in a word/pdf document, as well as your answers to the task questions. It is
not mandatary to include your student ID in every screenshots as we can check the originality
through your code submission.
The submission should be a zip/rar file, which includes both the answer document and your
source code files. Please format your document and code nicely to help tutor’s marking process.
A poorly formatted document may receive a reduced mark. Submit your work to the Blackboard
website of the course by 16:00 (i.e., 4pm), Friday Oct 15th 2021.
Late Penalties (from ECP):
“Where an assessment item is submitted after the deadline, without an approved extension, a
late penalty will apply. The late penalty shall be 10% of the maximum possible mark for the
assessment item will be deducted per calendar day (or part thereof), up to a maximum of seven
(7) days. After seven days, no marks will be awarded for the item. A day is considered to be a 24
hour block from the assessment item due time. Negative marks will not be awarded.”
2
Part 1: Preparation of the Restaurant Data Set
In this part, our objective is to prepare data for subsequent tasks. There are two options for data
storage: (1) import data into Oracle database, or (2) store data as CSV files. Both solutions are
implemented and available in either Java or Python. Although both options are available, we
recommend you to try the database option, as you can learn more about how to interact with
database using Java/Python code.
Option 1: Import data to Oracle database through JDBC/cx_Oracle
This option requires the following software:
(1) Oracle platform used in Prac1 & Prac2,
(2) Java JDK/Python library (Java 8/Python 3.8 recommended), and
(3) Java/Python IDE.
Here we give an example of Java with Eclipse, but others, like Intellij IDEA for Java or PyCharm
for Python, will work the same. You need to go through the following 4 steps.
Step 1: Log in and Create Users
In this part, we first use “SQL Plus” to create a database user, then we need to connect user to
“SQL Developer” and interact with an Oracle database. In SQL Plus Command Line window,
login to Oracle with a default username “SYS AS SYSDBA” and password “Password1!”, as
shown below.
Follow the commands below to create a user:
/*Enable user creation*/
ALTER SESSION SET "_ORACLE_SCRIPT"=TRUE;
/* Create a user named “S1234567” (student id) with password “w” */
CREATE USER S1234567 IDENTIFIED BY w ACCOUNT UNLOCK DEFAULT
TABLESPACE "USERS" TEMPORARY TABLESPACE "TEMP" PROFILE
"DEFAULT";
/* Grant DBA privilege to “S1234567” */
3
GRANT DBA TO S1234567;
Same as that of the previous Prac1 & Prac2, please change “S1234567” to your student ID.
Step 2: Use Oracle SQL Developer
Open SQL Developer and connect the user that we have just created.