Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
DATA MINING
ITS61504
Instruction to Candidates:
1. 1. Answer ALL questions
2. This is an open book examination, student is not allowed to transcribe directly (cut and
paste) any material from another source into their submission.
3. The Turnitin similarity for this module is 20% overall and lesser than 5% from a single
source excluding program source codes.
4. Severe disciplinary action will be taken against those caught violating assessment rules
such as colluding, plagiarizing or transcribing.
5. The final assessment answers handed in should be within 5 -12 pages in total for non-
programming modules, with a spacing of 1.5 and a font of 12pt Times New Roman.
6. Submission link is here. (Do not submit the question paper)
7. The breakdown of exam questions by Module Learning Outcome(s) and its associate
weightage is as follows:
MLO Section(s)/ Question(s) Marks
MLO1 Question 1 / 20
MLO2 Question 2 / 20
MLO3 Question 3 / 20
MLO3 Question 4 / 20
MLO4 Question 5 / 20
TOTAL / 100
8. Start each answer on a separate page.
9. Complete the front cover of the examination answer booklet and question paper. Write
the question numbers attempted on the front cover of the answer booklet.
Page 2 of 6
Data Mining
ITS61504
Lorita Angeline
202108FE
Part I: Association Rule Mining
1. Table 1 shows the record for computer purchase. Manually calculate all items sets (from
one item to the maximum number of items you can find) using Apriori method. Prune the
item sets with minimum support of 35% and minimum confidence of 75%. (20 marks)
a) Manually calculate all itemset (10 marks)
b) Prune itemset with minimum support of 35% (5 marks)
c) Prune itemset with minimum confidence of 75% (5 marks)
Table 1: Computer purchase transactional record
TID Age Income Student Credit Rating Class (buy comp)
1 lessEqual30 High No Fair No
2 lessEqual30 High No Excellent No
3 31… 40 High No Fair Yes
4 greatThan40 Medium No Fair Yes
5 greatThan40 Low Yes Fair Yes
6 greatThan40 Low Yes Excellent No
7 31… 40 Low Yes Excellent Yes
8 lessEqual30 Medium No Fair No
9 lessEqual30 Low Yes Fair Yes
10 greatThan40 Medium Yes Fair Yes
11 lessEqual30 Medium Yes Excellent Yes
12 31… 40 Medium No Excellent Yes
13 31… 40 High Yes Fair Yes
14 greatThan40 Medium No Excellent No
15 31… 40 Medium Yes Fair Yes
Page 3 of 6
Data Mining
ITS61504
Lorita Angeline
202108FE
Part II: Case Study
In metropolitan cities like Kuala Lumpur, the prospective home buyer considers several
factors such as location, size of the land, proximity to parks, schools, hospitals, power
generation facilities and most importantly the house price. House price prediction is a
significant financial decision for individuals working in the housing market as well as for
potential buyers. From investment to buying a house for residence, a person investing in the
housing market is interested in the potential gain. Table 2 shows the property listing and the
factors for house price prediction. The full dataset is available on TIMeS (data_kl.csv)
Features:
• Rooms: Number of rooms
• Price: Price in Ringgit Malaysia (MYR)
• Distance: Distance from KL downtown
• Bedroom2: Number of Bedrooms
• Bathroom: Number of Bathrooms
• Car: Number parking space
• Landsize: Land size
Table 2: Property listing and the factors for house price prediction
Rooms Price Distance Bedroom2 Bathroom Car Landsize
2 1480000.0 2.5 2.0 1.0 1.0 202.0
2 1035000.0 2.5 2.0 1.0 0.0 156.0
3 1465000.0 2.5 3.0 2.0 0.0 134.0
3 850000.0 2.5 3.0 2.0 1.0 94.0
4 1600000.0 2.5 3.0 1.0 2.0 120.0
2 941000.0 2.5 2.0 1.0 0.0 181.0
3 1876000.0 2.5 4.0 2.0 0.0 245.0
2 1636000.0 2.5 2.0 1.0 2.0 256.0
3 1000000.0 2.5
1.0 1.0 238.0
2 745000.0 2.5 2.0 1.0 1.0 113.0
1 300000.0 2.5 1.0 1.0 1.0 0.0
2 1097000.0 2.5 3.0 1.0 2.0 220.0
2 542000.0 2.5 2.0 1.0
195.0
2 760000.0 2.5 2.0
1 481000.0 2.5 1.0 1.0
Page 4 of 6
Data Mining
ITS61504
Lorita Angeline
202108FE
(Question 2 – 5 are based on the case study, to predict house pricing)
2. Clean the dataset data_kl.csv using pre-processing techniques in R. Describe each
detected noises and anomalies that are existed in the full dataset. (20 marks)
3. Write a piece of program in R to develop a prediction model that predicts house pricing
for a new property listing. You are allowed to amend the dataset (justify your amendments).
Apply your model on our new house on the listing, newList: (20 marks)
newList <- data.frame(Room = 2,
Distance = 2.5,
Bedroom2 = 2,
Bathroom = 1,
Car = 0,
Landsize = 181)
4. Based on the original dataset (or your updated dataset), can we apply any regression
modelling? Which type(s) of regression modelling do you suggest? Justify your opinion with
a clear description. Use and modify a piece of program in R to conduct a regression
modelling. Apply your model on our new house, newList. (20 marks)
5. Which model is preferable for house price prediction? Evaluate and describe the
performance of both models (developed in question 3 and 4) using performance metric.
(20 marks)
- END OF QUESTION PAPER -
Page 5 of 6
Data Mining
ITS61504
Lorita Angeline
202108FE
Submission Requirements
1. Font type : Times New Roman
2. Font size : 12
3. Line spacing : 1.5
4. Alignment : Justify Text
5. Document type : .pdf, .R
6. Number of pages : 5 – 12 pages
7. A report of your answer should consist of the following (in order):
a) Cover page (Name, ID, Date, Signature, Score)
b) Report of your answer script
c) Appendixes (line spacing = 1.0)
• R programming
• List of references (APA format)
• Report of similarity score (percentage of similarity score from each source needs
to be shown)
8. Start each question on a separate page.
9. All figures and tables are labelled properly.
10. File naming conventions: StudentName_FinalAssessment
Notes:
• Include in-text citation to support your answers and add the list of references at the end of your
report (APA format). The list of references is to be alphabetized by the first author's last name, or
(if no author is listed) the organization or title.
• You are required to add screenshots of the code and results for each question.
• The program code must be appended to the main report (put in Appendix).
• The original program files (*.R) are required to be attached to the report upon submission.
No Student Name Student ID Date Signature Score
1
Page 6 of 6
Data Mining
ITS61504
Lorita Angeline
202108FE
ITS61504 Data Mining
Final Exam - Alternative Assessment
Marking Rubric (August 2021)
Criteria Excellent Good Average Poor
(90 – 100) (75 – 89) (40 – 74) (0 – 39)
Q 1: Describe
rule
association
mining
(MLO 1)
All itemset is
calculated, Apriori
method is applied
correctly and the
solution is clearly
elaborated in a step-by-
step manner. The
similarity is less than
2%.
All itemset is
calculated, Apriori
method is applied
correctly and the
solution is NOT clearly
elaborated in a step-by-
step manner. The
similarity is less than
2%.
Two itemset is
calculated, Apriori
method is NOT applied
correctly and the
solution is NOT clearly
elaborated in a step-by-
step manner. The
similarity is between
2% to 4%.
One or no itemset is
calculated, Apriori
method is NOT applied
correctly and the
solution is NOT clearly
elaborated in a step-by-
step manner. The
similarity is greater than
or equal to 5%.
Q2: Pre-
processing
techniques
(MLO 2)
All types of noise and
anomalies are detected
with high degree of
accuracy. The code is
applied correctly and the
solution is clearly
elaborated in a step-by-
step manner. The
similarity is less than
2%.
All types of noise and
anomalies are detected
with moderate degree of
accuracy. The code is
applied correctly and the
solution is NOT clearly
elaborated in a step-by-
step manner. The
similarity is less than
2%.
Some noise and
anomalies are detected
with moderate degree of
accuracy. The code is
applied correctly and the
solution is NOT
elaborated in a step-by-
step manner. The
similarity is between
2% to 4%.