Mixture of Gaussians (or Gaussian Mixture Model (GMM))
We use GMM when the situation that Gaussian latent variable model p(x)=∑zp(x,z) used for clustering.
A K multivariate Gaussian mixture densities has the form: p(x)=∑k=1KπkN(x∣μk,Σk) where ∑k=1Kπk=1,∀k,πk≥0 is called mixing coefficients.
Consider a latent variable z∈{1,…,K}, then p(z=k)=πk and p(x∣z=k)=N(x∣μk,Σk) with joint density p(x,z=k)=p(z=k)p(x∣z=k)=πkN(x∣μk,Σk).
That is, we can also have p(z=k∣x)=∑k=1KπkN(x∣μk,Σk)πkN(x∣μk,Σk).
Since multivariate Gaussian, let denote {x1,…,xn}=X∈Rn×m and z∈Rn. Then we have the likelihood function logp(X∣π,μ,Σ)=∑i=1nlog∑k=1KπkN(xi∣μk,Σk).
Then we have the MLE μk=∑inNkp(z=k∣xi)xi, Nk=∑inp(z=k∣xi), Sigmak=∑inNkp(z=k∣xi)(xi−μk)(xi−μk)T, πk=nNk.
EM Algorithm
Since MLE does not have a closed form solution and the estimation depend on the posterior distribution p(z=k∣xi). We can use EM algorithm to estimate the parameters.
The steps are:
- Initialize π,μ,Σ.
- E-step: for each k,i compute the posterior p(z=k∣xi)=∑k=1KπkN(xi∣μk,Σk)πkN(xi∣μk,Σk).
- M-step: re-estimate π,μ,Σ using the mle formula.
- Evaluate the log-likelihood and check for convergence.
More general one: Consider a general setting with latent variables. i.e. Observed dataset X∈RN×D, latent variables Z∈RN×K. Maximize the log-likelihood logp(X∣θ)=log(∑Zp(X,Z∣θ)):
- Initialize parameters θold.
- E-step: use θold to compute the posterior p(Z∣X,θold).
- M-step: θnew=argmaxθQ(θ,θold), where Q(θ,θold)=∑Zp(Z∣X,θold)logp(X,Z∣θ)=E[logp(X,Z∣θ)∣X,θold] which is tractable in many applications.
- Replace θold←θnew. Repeat until convergence.
If z was observed, then the MLE would be trivial
logp(X,Z∣θ)=∑i=1nlogp(xi,zi∣θ)=∑i=1n∑k=1KI(zi=k)log(πkN(xi∣μk,Σk))
For the E-step: p(Z∣X,θ)=∑i=1np(zi∣X,θ) we have p(zi=k∣X,θ)=p(zi=k∣xi,θ)=∑j=1KπjNm(xi∣μj,Σj)πkNm(xi∣μk,Σk).
For the M-step: E(I(zi=k)∣X,θold)=p(zi=k∣X,θold) and so E[logp(X,Z∣θ)∣X,θold]=∑i=1n∑k=1Kp(zi=k∣X,θold)log(πkN(xi∣μk,Σk)).
K-means is a special case of EM algorithm.