主成分分析

最近線形代数学を再び学びながら主成分分析を見つけました。
Principal component analysis involves transforming a set of points (written as a matrix) into a new set of points such that the offsets are from different coordinate axes. These axes represent the directions where the greatest change occurs among points in the original set. This can be used for image recognition and image compression (KL transform).
Steps to compute are:
1. record data (e.g. image intensities at Row i, Col j) in a matrix (each image would be a row of n^2 coordinates for n by n image)
2. subtract the mean from each coordinate (so each value is now an offset from the mean for that coordinate)
3. compute the covariance matrix
i.e. [E*1(xj-avg(xj)))]
4. find the eigenvectors and eigenvalues of the covariance matrix.
5. Order the eigenvectors by eigenvalue. The vector with the largest eigenvalue is the first principal component.
6. The matrix of eigenvectors can be called FeatureVector. This is written with the eigenvectors in column form. Note that not all of the independent eigenvectors are necessarily used.
FinalData = FeatureVector^T*RowDataAdjust. RowDataAdjust is the matrix of step 2 transposed.
7. Recover the original (or a subset) data via
OriginalData^T = RowFeatureVector^T*FinalData + Means^T

In compression, the images are actually considered to be the coordinates that the data varies over (i.e. these are the columns of the original data matrix). If we begin with 20 images and then select the 15 eigenvectors of the covariance matrix with the largest eigenvalues, the output data for each coordinate-image pair will be the dot product of the ith eigenvector and the jth image. As the dot product is an indicator of similarity, a larger element in element (i,j) of final data indicates that image j is closely correlated to the ith eigenvector.

*1:xi-avg(xi