ECLT5810 E-Commerce Data Mining Techniques
Commerce Data Mining Techniques
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
ECLT5810 E-Commerce Data Mining Techniques
Problem Statement
This is an individual assignment. This assignment requires the use of Weka. You must use Weka to
complete the assignment. In Assignment 1, you have conducted the data preprocessing for the term
deposit subscription prediction task. In this assignment, dataset in Assignment 1 is assumed to serve as
training set only. You need to use the training data to build a decision tree model and a logistic regression
model to predict the term deposit subscription of clients. Moreover, testing set will be given. The models
will be analyzed their performance using the training set as well as the testing set. The followings are
requirements of the task. In this Assignment, please use ARFF format instead of CSV format as ARFF
format is more compatible with Weka.
1. Conduct variable transformation and variable selection specification with reference to assignment 1,
with dataset (bank-additional.csv). Please save your file as ARFF format. Or you can simply use
assignment-1-anwser.arff.
2. Build a decision tree model to predict the term deposit subscription of clients. Save the model as
Decision-Tree.
3. Build a logistic regression model to predict the term deposit subscription of clients. Save the model
as Regression.
4. Assess the two models using training data only. Test options can be specified by yourselves (but no
supplied testing data). Compare the training accuracy and other metrics. Report the results* and write
down your comments on the comparison result.
5. Both learned models are analyzed using the testing set stored in another dataset (bank-additional-
test.arff).
6. You should perform the same variable transformation and variable selection techniques on the testing
dataset before prediction using the training dataset statistics. For example, in normalization, use the
min and max values in training set when handling the testing dataset. Please save your feature
engineered dataset as ARFF format.
7. Perform prediction on the testing dataset using the trained model in previous steps. Report the results*.
Write down your comments on the results (The model performs well or not? Why?)
8. Modify or extend any steps above in order to achieve a better accuracy on the testing set. Note that for
the learning models, only decision tree and logistic regression can be used. Report the results*.
State and explain the modifications you have made.
9. State briefly and concisely any explanatory notes on the methodologies you used to improve the
models’ performance. Discuss briefly the weakness and assumptions used in the models and
methodologies. Specifically, you should state which model you have chosen as the finalized model and
explain your reasons. Do not write more than one A4 page for this requirement.
* Using Screenshot on Weka to report the results.