Big Data Mining Techniques and Implementation
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
CSCI316 – Big Data Mining Techniques and Implementation
Group Assignments
2023 Session 3 (SIM)
10 Marks
Deadline: Refer to the submission link of assignments on Moodle
One task is included in each assignment. The specification of the task(s) starts in a separate page.
You must implement and run all your Python code in Jupyter Notebook. The deliverables are project
presentation slides and source code.
All results of your implementation must be reproducible from your submitted Jupyter notebook source
files. In addition, the submission must include all execution outputs as well as clear explanation of your
implementation algorithms (e.g., in the Markdown format or as comments in your Python codes).
Submission must be done online by using the correct submission link for this subject on MOODLE.
This is a group assignment. Only one submission per group. State the names and student numbers of
group members at the beginning of each submitted file.
Marking guidelines:
Correctness of source code, and completeness and clearness of the project presentation.
CSCI316 (SIM) 2023 Session 3 Group Assignments
Assignment 1
(10 marks)
Dataset: Loan data set for credit risk analysis
This data set has different types of features such as categorical, numeric & date. The target variable is the de-
fault (index). In financing, a default can occur when a borrower is unable to make timely payments, misses
payments, avoids or stops making payments. An explanation of the features in the appendix of this docu-
ment.
Objective
The objective of this task is to develop an end-to-end data mining project by using the Python machine learning
library Scikit-Learn. Only the Scikit-Learn library can be used in this task. However, all non-ML libraries
(e.g., SciPy) are allowed.
Requirements
(1) This is a classification problem.
(2) Use 80% data for training and 20% for testing. Stratified sampling must be used.
(3) Main steps of the project should be (a) “discover and visualise the data”, (b) “prepare the data for
machine learning algorithms”, (c) “select and train models”, (d) “fine-tune the model” and (e)
“evaluate the outcomes”. You can structure the project in your own way. Some steps can be performed
more than once.