ITEC3040 Introduction to Data Analytics
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
ITEC3040 Introduction to Data Analytics
Assignment
Submission Instructions:
• This is individual assignment.
• Use eClass to submit your work.
• At the top of the each file introduce your name and student number.
• You may use software (for example, R, SAS, MATLAB and Python). No Excel allowed.
1. Show ALL your work!!!
2. Submit ALL your program(s) along with your solutions(including comments, results and
graphs).
• Evaluation is based on the work you submitted.
1. Textbook, page 387, 8.7
(a) How would you modify the basic decision tree algorithm to take into consideration the count
of each generalized data tuple (i.e., of each row entry)?
(b) Use your algorithm to construct a decision tree from the given data.
(c) Given a data tuple having the values “systems”, “26. . . 30”, and “46–50K” for the attributes
department, age, and salary, respectively, what would decision tree classification of the status
for the tuple be?
(d) Construct the Na¨ıve Bayesian Classifier and redo part c).
2. Suppose you are given the following data set, in which attribute A through attribute C predict
Class attribute.
i
ITEC3040 Assignment 2 York University
A B C Class
30 35 6 YES
22 50 4 NO
34 200 2 NO
59 170 7 YES
25 40 2 YES
63 150 3 NO
77 105 8 YES
34 200 2 NO
59 170 7 YES
12 207 9 YES
55 181 5 NO
Using Manhattan distance, Euclidean distance and Supremum distance, classify the data point
(A = 37, B = 95, C = 3) according to its 3−nearest neighbors.