Probabilistic Principal Component Analysis
Sometimes data is very high dimensional, its important features can be accurately captured in a low dimensional subspace. That is the purpose we use PCA.
Given a data set {x(i)}i=1N, where each vector x(i)∈RD s.t. x(i)≈x+Uz(i), where x=N1∑i=1Nx(i) is the mean of the data set, U∈RD×K is the orthogonal basis matrix of the principal subspace, and z(i)∈RK is the code vector.
Since we have z(i)=U⊤(x(i)−x), we can rewrite the data set as x(i)=x+UU⊤(x(i)−x), that is, we choose U by minimize the reconstruction error i.e. U∗=argminU∑i=1N∥x(i)−x−UU⊤(x(i)−x)∥2.
PPCA on Gaussian Data
We use PPCA when the situation that Gaussian latent variable model p(x)=∫zp(x,z) used for dimensionality reduction.
Consider the latent variable model, s.t. z∼N(0,IK),x∣z∼N(Wz+μ,σ2ID).
Again and again, if we want to estimate the parameters of the model, we do the mle that is, maxW,μ,σ2logp(x∣W,μ,σ2)=maxW,μ,σ2log∫p(x∣z,W,μ,σ2)p(z)dz
Since x∣z∼N(Wz+μ,σ2ID), we can write x=Wz+μ+ϵ. Then p(x∣W,μ,σ2) is also a Gaussian distribution, and
- E[x]=E[Wz+μ+ϵ]=E[Wz]+μ=μ
- Cov[x]=E[(x−E[x])(x−E[x])⊤]=E[(Wz+μ+ϵ−μ)(Wz+μ+ϵ−μ)⊤]=E[(Wz+ϵ)(Wz+ϵ)⊤]=E[Wzz⊤W⊤]+E[ϵϵ⊤]=WE[zz⊤]W⊤+σ2ID=WW⊤+σ2ID