CS280: Elements of Data Processing
Elements of Data Processing
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
CS280: Elements of Data Processing
Q1. (Getting to Know Data)
A. For positively skewed data, do the data always have larger Mean than Median and larger
Median than Mode? If so, please briefly show your reason. If not, please offer a counter example.
(10 marks)
B. Consider data with an outlier. Which one(s) of following descriptions can ensure that readers
can know there is an outlier from reading the description: Boxplot, Histogram, Quantile plot,
Scatter plot? Please briefly explain your answer. (10 marks)
C. There are two data points A(2,5,3) and B(1,4,5). Under following distance measurements,
what is the distance from A to B: Manhattan distance and Euclidea distance? (10 marks)
Q2. (Frequent Pattern Mining)
Suppose that there are 9 items: 1,2,3,...,9. Here are itemsets:
TID Itemset
1 1,2,3,4,5,6
2 7,2,3,4,5,6
3 1,8,4,5
4 1,9,4,6
5 9,2,2,4,5
Given minsup threshold is 3.
1) Please use Apriori algorithm to find all frequent itemsets. (30 marks)
2) List all closed frequent items and all maximal frequent items. (10 marks)
3) Please use FP-growth to find all frequent patterns again and show the steps. Compare the
efficiency of two mining processes (FP-Tree and Apriori algorithm). (30 marks)