Skip to main content

Basic Information to Multivariate Data

For a multivariate data, denote it with pp variables where p2p \ge 2, and with nn observations(item/experimental unit). We also denote as xjkx_{jk} where the measurement of kth variable on the jth item or experimental unit.

  • p=1    p =1\implies univariate data
  • We always have matrix to present the multivariate data, we can call it array/tensor/vector too for ez understanding on ml

Before we talk about the data organization, let's first define the terms we used to learn.

Definition Among Multivariate Data

We define a random vector is a vector whose elements are Random Variables, similarly, a random matrix is a matrix whose elements are random variables.

XX as a p×1p\times 1 random vector, then its transpose with joint PDF f(x)=fX(x1,x2,,xp)f(x) = f_X(x_1, x_2, \ldots, x_p)

  • Independence     f(x)=f1(x1)fp(xp)\implies f(x) = f_1(x_1)\cdots f_p(x_p)
  • if Xi,XjX_i, X_j are two element of XX are independent, then Cov(Xi,Xk)=0Cov(X_i, X_k) = 0

The expected value of a random vector/matrix is the vector/matrix consisting of the expected values of each of its elements.

  • if XX and YY are matrices of the same dimension E[X+Y]=E[X]+E[Y]E[X+Y] = E[X] + E[Y]
  • if XX and YY are matrices of constants, then E[AXB]=AE[X]BE[AXB] = AE[X]B
  • We also define the population mean vector μ=E[X]\mu = E[X]

The population variance-covariance matrix is Σ=Cov[X]=E[(Xμ)(Xμ)]\Sigma = Cov[X] = E[(X-\mu)(X-\mu)']

  • where E[(Xμ)(Xμ)]=E[[X1μ1  Xpμp][X1μ1Xpμp]]E[(X-\mu)(X-\mu)'] = E[\begin{bmatrix}X_1-\mu_1 \\\ \vdots \\\ X_p - \mu_p\end{bmatrix}\begin{bmatrix}X_1-\mu_1&\cdots& X_p - \mu_p\end{bmatrix}] =[σ12σ12σ1p σ21σ22σ2p  σp1σp2σp2]=\begin{bmatrix}\sigma_1^2 & \sigma_{12} & \cdots & \sigma_{1p} \\\ \sigma_{21} & \sigma_2^2 & \cdots & \sigma_{2p} \\\ \vdots & \vdots & \vdots & \vdots \\\ \sigma_{p1} & \sigma_{p2} & \cdots & \sigma_p^2\end{bmatrix} is a squared, symmetric matrix
  • Σ=E[(Xμ)(Xμ)]=E[XX]μμ\Sigma = E[(X-\mu)(X-\mu)'] = E[XX'] - \mu\mu'

We also have a correlation matrix ρ=[1ρ12ρ1p ρ211ρ2p  ρp1ρp21]\rho = \begin{bmatrix} 1 & \rho_{12} & \cdots & \rho_{1p} \\\ \rho_{21} & 1 & \cdots & \rho_{2p} \\\ \vdots & \vdots & \vdots & \vdots \\\ \rho_{p1} & \rho_{p2} & \cdots & 1 \end{bmatrix} where ρij=σijσiσj\rho_{ij} = \frac{\sigma_{ij}}{\sigma_i\sigma_j} and ρ\rho stand for ρ[X]\rho[X]

  • squared, symmetric

The standard deviation matrix V1/2=[σ100 0σ20  00σp]V^{1/2} =\begin{bmatrix}\sigma_1 & 0 & \cdots & 0 \\\ 0 & \sigma_2 & \cdots & 0 \\\ \vdots & \vdots & \vdots & \vdots \\\ 0 & 0 & \cdots & \sigma_p \end{bmatrix}

  • squared, symmetric and diagonal

Then we have Σ=V1/2ρV1/2\Sigma = V^{1/2}\rho V^{1/2} and ρ=V1/2ΣV1/2\rho = V^{-1/2}\Sigma V^{-1/2}, VV stand for V[X]=E[(Xμ)2]=E[X2]μ2V[X] = E[(X-\mu)^2]= E[X^2] - \mu^2

Let cc be a 1×p1\times p matrix, let a new random vector Y=cXY = cX, then:

  • E[Y]=cE[X]=cμE[Y] = cE[X] = c\mu
  • V[Y]=V[cX]=cV[X]c=cΣcV[Y] = V[cX] = cV[X]c' = c\Sigma c'

A deterministic matrix is a matrix does not include any random variables, e.g. XX is deterministic then E[X]=XE[X] = X and V[X]=0V[X] = 0 We also have change of variable, for a deterministic matrix Cq×pC_{q\times p}, let Y=CXY = CX, then:

  • E[Y]=CE[X]=CμXE[Y] = CE[X] = C\mu_X
  • ΣY=CΣXC\Sigma_Y = C\Sigma_XC'

Back to topic, for a multivariate data set, we can use a matrix to represent it. Let a n×pn\times p matrix XX present a data set, then we have X=[x11x12x1p x21x22x2p  xn1xn2xnp]X = \begin{bmatrix}x_{11} & x_{12} & \cdots & x_{1p} \\\ x_{21} & x_{22} & \cdots & x_{2p} \\\ \vdots & \vdots & \vdots & \vdots \\\ x_{n1} & x_{n2} & \cdots & x_{np} \end{bmatrix}

  • each row of Xn×pX_{n\times p} represents a multivariate observation
  • we say this data set XX is a nn-observation sample of a pp-variate population
  • we can also re-write it by features where X=[X1 X2  Xn]X = \begin{bmatrix}X_{1}' \\\ X_{2}' \\\ \vdots \\\ X_{n}' \end{bmatrix} where XiX_i is a p×1p\times 1 vector presents the iith observation of the data set with pp features.
    • Each XiX_i form a random sample from f(x)=fX(x1,x2,,xp)f(x) = f_X(x_1, x_2, \ldots, x_p)
    • Measurement of pp variables in a XiX_i is correlated.
    • but measurments from different XiX_i is independent to each other]

Descriptive Statistics Among Samples

Certain summary numbers are known as descriptive statistics. They are used to describe the data set. We can use them to compare two data sets.

  • sample mean of xjx_j is xˉj=1ni=1nxij\bar x_j = \frac{1}{n} \sum_{i=1}^n x_{ij}
  • sample variance of xjx_j is sj2=1n1i=1n(xijxˉj)2s_j^2 = \frac{1}{n-1} \sum_{i=1}^n (x_{ij} - \bar{x}_j)^2, then the sample standard deviation is sj=sj2s_j = \sqrt{s_j^2}
  • sample covariance of xjx_j and xkx_k is sjk=1n1i=1n(xijxˉj)(xikxˉk)s_{jk} = \frac{1}{n-1}\sum_{i=1}^n (x_{ij} - \bar x_j)(x_{ik} - \bar x_k) where sjk=skjs_{jk} = s_{kj}
  • sample correlation coefficient of xjx_j and xkx_k is ρjk=sjksjsk\rho_{jk} = \frac{s_{jk}}{s_j s_k} where ρjk=ρkj\rho_{jk} = \rho_{kj} where ρjk=ρkj\rho_{jk} = \rho_{kj}
  • sample mean vector Xˉ=[xˉ1 xˉ2  xˉp]\bar X = \begin{bmatrix}\bar x_1 \\\ \bar x_2 \\\ \vdots \\\ \bar x_p \end{bmatrix}
  • sample variance-covariance matrix S=[s12s12s1p s21s22s2p  sp1sp2sp2]S = \begin{bmatrix}s_1^2 & s_{12} & \cdots & s_{1p} \\\ s_{21} & s_2^2 & \cdots & s_{2p} \\\ \vdots & \vdots & \vdots & \vdots \\\ s_{p1} & s_{p2} & \cdots & s_p^2 \end{bmatrix}
  • sample correlation matrix ρ=[1ρ12ρ1p ρ211ρ2p  ρp1ρp21]\rho = \begin{bmatrix}1 & \rho_{12} & \cdots & \rho_{1p} \\\ \rho_{21} & 1 & \cdots & \rho_{2p} \\\ \vdots & \vdots & \vdots & \vdots \\\ \rho_{p1} & \rho_{p2} & \cdots & 1 \end{bmatrix}
    • Σ=E[(Xμ)(Xμ)]=E[XX]μμ\Sigma = E[(X-\mu)(X-\mu)'] = E[XX'] - \mu\mu' is the common population covariance matrix for each XiX_i
    • Cov(Xˉ)=1nΣCov(\bar X) = \frac{1}{n} \Sigma
    • Xˉ\bar X is an unbiased estimator of μ\mu where μ=E[Xˉ]\mu = E[\bar X]
    • SS is an unbiased estimator of Σ\Sigma, more specifically nn1Sn\frac{n}{n-1} S_n is the unbiased estimator of Σ\Sigma where Σ=E[S]\Sigma = E[S] while E[Sn]=n1nSE[S_n] = \frac{n-1}{n}S

Projection and Deviation

Let 1n1_n be the n×1n\times 1 vector with full of 11 then the vector (1/n)1n(1 / \sqrt{n})1_n has unit length and forms equal angles with each of the nn coordinates.

For the data Xn×pX_{n\times p}, we can write as X=[Y1Y2Yp]X = \begin{bmatrix} Y_1 & Y_2 & \cdots & Y_p \end{bmatrix}:

  • The projection of YiY_i on (1/n)1n(1 / \sqrt{n})1_n is Xˉi1n\bar X_i 1_n where Xi=E[Yi]\overline X_i = E[Y_i] present the iith feature of the data set

We also have deviation vector which the collection of the distance of each element from YiY_i to its projection on (1/n)1n(1 / \sqrt{n}) 1_n : di=YiXˉi1nd_i = Y_i - \bar X_i 1_n

  • did_i in matrix form is [yi1xˉi yi2xˉi  yinxˉi]\begin{bmatrix} y_{i1} - \bar x_i \\\ y_{i2} - \bar x_i \\\ \vdots \\\ y_{in} - \bar x_i \end{bmatrix}
  • diXˉi1nd_i\perp \bar X_i 1_n
  • didi=di,di=j=1n(xjiXˉi)2d_i'd_i = \langle d_i, d_i \rangle = \sum_{j=1}^n (x_{ji} - \bar X_i)^2 where the sample variance can be write as sii=1n1didis_{ii} = \frac{1}{n-1}d_i'd_i
  • didk=di,dk=j=1n(xjiXˉi)(xjkXˉk)d_i'd_k = \langle d_i, d_k \rangle = \sum_{j=1}^n (x_{ji} - \bar X_i)(x_{jk} - \bar X_k) where the sample covariance can be write as sik=1n1didks_{ik} = \frac{1}{n-1}d_i'd_k
    • recall the dot product has the form AB=ABcos(θ)A\cdot B = \|A\|\|B\|\cos(\theta)
    • where θik\theta_{ik} is the angle between did_i and dkd_k
  • ρik=siksisk=(n1)didk(n1)didi(n1)dkdk=didkdididkdk=cos(θik)\rho_{ik} = \frac{s_{ik}}{s_i s_k} = \frac{(n-1) d_i' d_k}{\sqrt{ (n-1) d_i'd_i}\sqrt{(n-1)d_k'd_k}} = \frac{d_i'd_k}{\sqrt{d_i'd_i}\sqrt{d_k'd_k}} = \cos(\theta_{ik})

We can have some other projection:

  • the sample mean vector Xˉi=1/nX1n\bar X_i = 1/n X'1_n where XX is the data matrix with n×pn\times p, and Xˉi\bar X_i is the sample mean of the iith feature with p×1p\times 1 vector
  • the n×pn\times p matrix of deviations D=X1/n1n1nXD = X - 1/n 1_n1_n' X
  • the p×pp\times p sample covariance matrix S=1n1DDS = \frac{1}{n-1}D'D where DDD'D is the p×pp\times p matrix of inner products of the deviations
    • S=X(I1/n1n1n)XS = X'(I - 1/n 1_n1_n')X where II is the n×nn\times n identity matrix and In1/n1n1nI_n - 1/n 1_n1_n' is an orthogonal projector.