Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
1 Bayes Classifiers
Imagine that you have collected a data set, X and y, and you believe that every category in y can be reasonably modeled by a multivariate distribution for its corresponding samples in X. If the covariance matrices are spherical, this results in the nearest centroid classifier. If the covariance matrices are the same for each class, this problem is solved by Linear Discriminant Analysis. If the covariance matrices can be different between the classes, this results in the Quadratic Discriminant Analysis. Each of these classifiers performs the following training routine:
1. For each class k:
(a) Compute the mean, μk
(b) Compute the covariance matrix, Σk Then use Bayes rule to classify a new sample:
P(c = k|x(i)) = P(x(i)|c = k)P(c = k) P(x(i))
where:
P(x(i)|c = k) = N(x;μk,Σk)
P (c = k) = samples in class k
n
P(x(i)) = ?P(x(i)|c = k)P(c = k)
k
2 Gaussian Mixture Models
(The PDF of a multivariate normal distribution) (proportion of samples in each class) (the marginal distribution for x(i))
BUT, what if we lost y and didn’t know the labels for the samples. This is the problem that Gaussian Mixture Models attempt to solve. It’s unsupervised because we don’t know y but we
expect that there could be a y that explains the data well. This means, a GMM must determine the μk and Σk for each class, as well as the class priors. More formally we have n samples, m features, and p classes:
1. The data matrix X contains n samples (feature vectors): X ∈ Rn×m
(1) (1) (1) x1 x2 …xm
x(2) x(2) . . . x(2) X=12m
. . .. . . . . .
(n) (n) (n) x1 x2 …xm
—x(1)⊤— —x(2)⊤—
= .
μ1,1
μ2,1 μ= .
μ1,2 … μ1,m μ2,2 … μ2,m
. —x(n)⊤—
φ = [φ1,φ2,…,φp] φk =P(c=k)
. .. . ….
μp,1 μp,2 —μ⊤1 — —μ⊤2 —
= . .
. . . μp,m
— μ ⊤p —
4. Covariances tensor (covariances_) contains the covariance matrix for each of the p classes:
Σ ∈ Rp×m×m
Σ = [Σ1,Σ2,…,Σp]
σk,1,1 σk,1,2 . . . σk,1,m
σk,2,1 σk,2,2 . . . σk,2,m Σk=. . .. . ….
σk,m,1 σk,m,2 . . . σk,m,m
Using these we can use Bayes rule to compute the posterior P(c = k|x(i)) from the following:
P(x(i)|c = k) = N(x;μk,Σk) P (c = k) = φk
P(x(i)) = ?P(x(i)|c = k)P(c = k) k
Likelihood (The PDF of a multivariate normal distribution) Prior (The component weight for k)
Evidence(the marginal distribution for x(i))