Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
MANG 6554
Advanced Analytics
• Gain a basic understanding of Bayesian theorem • Get familiar with Bayesian classifier (Naïve Bayes and Bayesian belief networks) • Develop a philosophical understanding of Bayesian inference and its relation to statistical model formulations. LL Tan, Pang-Ning, Michael Steinbach, and Vipin Kumar. Introduction to data mining. Pearson Education India, 2016. 3 Bayes Classifier A probabilistic framework for solving classification problems Conditional Probability: Bayes theorem: )( )()|( )|( XP YPYXP XYP = )( ),( )|( )( ),( )|( YP YXP YXP XP YXP XYP = = LL 4 Example of Bayes Theorem Given: • A salesperson knows that if a customer has already brought a keyboard, 50% of the chance he/she will buy a mouse. • Prior probability of any customer purchasing a keyboard is 1/50 • Prior probability of any customer purchasing a mouse is 1/20 If a customer has brought a mouse, what’s the probability he/she buys a keyboard? 2.0 20/1 50/15.0 )( )()|( )|( = == mP kPkmP mkP LL 5 Using Bayes Theorem for Classification Consider each attribute and class label as random variables Given a record with attributes (X1, X2,…, Xd) • Goal is to predict class Y • Specifically, we want to find the value of Y that maximizes P(Y| X1, X2,…, Xd ) Can we estimate P(Y| X1, X2,…, Xd ) directly from data? LL 6 Example Data 120K)IncomeDivorced,No,Refund( ===X Given a Test Record: Can we estimate P(Evade = Yes | X) and P(Evade = No | X)? In the following we will replace Evade = Yes by Yes, and Evade = No by No Id Refund Marital Status Taxable income (K) Evade (Y) 1 Yes Single 125 No 2 No Married 100 No 3 No Single 70 No 4 Yes Married 120 No 5 No Divorced 95 Yes 6 No Married 60 No 7 Yes Divorced 220 No 8 No Single 85 Yes 9 No Married 75 No 10 No Single 90 Yes LL 7 Using Bayes Theorem for Classification Approach: • compute posterior probability P(Y | X1, X2, …, Xd) using the Bayes theorem • Maximum a-posteriori: Choose Y that maximizes P(Y | X1, X2, …, Xd) • Equivalent to choosing value of Y that maximizes P(X1, X2, …, Xd|Y) P(Y) How to estimate P(X1, X2, …, Xd | Y )? )( )()|( )|( 21 21 21 d d n XXXP YPYXXXP XXXYP = LL 8 Example Data 120K)IncomeDivorced,No,Refund( ===X Given a Test Record: Id Refund Marital Status Taxable income (K) Evade (Y) 1 Yes Single 125 No 2 No Married 100 No 3 No Single 70 No 4 Yes Married 120 No 5 No Divorced 95 Yes 6 No Married 60 No 7 Yes Divorced 220 No 8 No Single 85 Yes 9 No Married 75 No 10 No Single 90 Yes LL 9 Naïve Bayes Classifier Assume independence among attributes Xi when class is given: • P(X1, X2, …, Xd |Yj) = P(X1| Yj) P(X2| Yj)… P(Xd| Yj) • Now we can estimate P(Xi| Yj) for all Xi and Yj combinations from the training data • New point is classified to Yj if P(Yj) P(Xi| Yj) is maximal. LL 10 Naïve Bayes on Example Data 120K)IncomeDivorced,No,Refund( ===X Given a Test Record: P(X | Yes) = P(Refund = No | Yes) x P(Divorced | Yes) x P(Income = 120K | Yes) P(X | No) = P(Refund = No | No) x P(Divorced | No) x P(Income = 120K | No) Id Refund Marital Status Taxable income (K) Evade (Y) 1 Yes Single 125 No 2 No Married 100 No 3 No Single 70 No 4 Yes Married 120 No 5 No Divorced 95 Yes 6 No Married 60 No 7 Yes Divorced 220 No 8 No Single 85 Yes 9 No Married 75 No 10 No Single 90 Yes LL 11 Estimate Probabilities from Data Class: P(Y) = Nc/N – e.g., P(No) = 7/10, P(Yes) = 3/10 For categorical attributes: P(Xi | Yk) = |Xik|/ Nc – where |Xik| is number of instances having attribute value Xi and belonging to class Yk – Examples: P(Status=Married|No) = 4/7 P(Refund=Yes|Yes)=0 Id Refund Marital Status Taxable income (K) Evade (Y) 1 Yes Single 125 No 2 No Married 100 No 3 No Single 70 No 4 Yes Married 120 No 5 No Divorced 95 Yes 6 No Married 60 No 7 Yes Divorced 220 No 8 No Single 85 Yes 9 No Married 75 No 10 No Single 90 Yes LL 12 Estimate Probabilities from Data For continuous attributes: – Discretization: Partition the range into bins: Replace continuous value with bin value Attribute changed from continuous to ordinal – Probability density estimation: Assume attribute follows a normal distribution Use data to estimate parameters of distribution (e.g., mean and standard deviation) Once probability distribution is known, use it to estimate the conditional probability P(Xi|Y) LL