Let’s first define what a is PCA.
Principal Component Analysis or PCA is mathematically defined as an orthogonal linear transformation that transforms the data to a new coordinate system such that the greatest variance by some projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on.
In other words, Principal Component Analysis (PCA) is a technique to detect the main components of a data set in order to reduce into fewer dimensions retaining the relevant information.
To put an example, Let a data set with zero mean, that is, the matrix formed by observations of variables. Where the elements of are denoted as usual by meaning that it contains the value of the observable of the observation experiment.
A principal component is a linear combination of the variables so that maximizes the variance.
Let’s now see a PCA example step by step
1. Create a random toy data set
import numpy as np import matplotlib.pyplot as plt %matplotlib inline m1 = [4.,-1.] s1 = [[1,0.9],[0.9,1]] c1 = np.random.multivariate_normal(m1,s1,100) plt.plot(c1[:,0],c1[:,1],'r.')
Let’s plot the data set and compute the PCA. The red dots of the figure show below the considered data, the blue arrow shows the eigenvector of maximum eigenvalue.
vaps,veps = np.linalg.eig(np.cov(c1.T)) idx = np.argmax(vaps) plt.plot(c1[:,0],c1[:,1],'r.') plt.arrow(np.mean(c1[:,0]),np.mean(c1[:,1]), vaps[idx]*veps[0,idx],vaps[idx]*veps[1,idx],0.5, linewidth=1,head_width=0.1,color='blue')
Now that we have visualize it, let’s code the closed solution for the PCA
First step is to standardize the data. We are going to use Scikit-learn library.
from sklearn.preprocessing import StandardScaler X_std = StandardScaler().fit_transform(c1)
Eigendecomposition – Computing Eigenvectors and Eigenvalues
The eigenvectors determine the directions of the new feature space, and the eigenvalues determine their magnitude. In other words, the eigenvalues explain the variance of the data along the new feature axes.
mean_vec = np.mean(X_std, axis=0) cov_mat = (X_std - mean_vec).T.dot((X_std - mean_vec)) / (X_std.shape-1) print('Covariance Matrix \n%s' %cov_mat)
Covariance Matrix [[ 1.01010101 0.88512031] [ 0.88512031 1.01010101]]
Let’s now print our Covariance Matrix
#Let's print our Covariance Matrix print('NumPy Covariance Matrix: \n%s' %np.cov(X_std.T))
NumPy Covariance Matrix: [[ 1.01010101 0.88512031] [ 0.88512031 1.01010101]]
Now we perform an eigendecomposition on the covariance matrix
cov_mat = np.cov(X_std.T) eig_vals, eig_vecs = np.linalg.eig(cov_mat) print('Eigenvectors \n%s' %eig_vecs) print('\nEigenvalues \n%s' %eig_vals)
Eigenvectors [[ 0.70710678 -0.70710678] [ 0.70710678 0.70710678]] Eigenvalues [ 1.89522132 0.1249807 ]
Let’s sort the eigenvalues to see if everything is ok
# let's sort the eig values to see if everything is ok for ev in eig_vecs: np.testing.assert_array_almost_equal(1.0, np.linalg.norm(ev)) print('Everything ok!')
Now we need to make a list of the eigenvalue, eigenvectors tuples and sort them from high to low.
# Make a list of (eigenvalue, eigenvector) tuples eig_pairs = [(np.abs(eig_vals[i]), eig_vecs[:,i]) for i in range(len(eig_vals))] # Sort the (eigenvalue, eigenvector) tuples from high to low eig_pairs.sort(key=lambda x: x, reverse=True) # Visually confirm that the list is correctly sorted by decreasing eigenvalues print('Eigenvalues in descending order:') for i in eig_pairs: print(i)
Eigenvalues in descending order: 1.89522131626 0.124980703938
Building a Projection Matrix
# Choose the "top 2" eigenvectors with the highest eigenvalues # we are going to use this values to matrix W. matrix_w = np.hstack((eig_pairs.reshape(2,1), eig_pairs.reshape(2,1))) print('Matrix W:\n', matrix_w)
('Matrix W:\n', array([[ 0.70710678, -0.70710678], [ 0.70710678, 0.70710678]]))
We will use this data to plot our output later so we can compare with a custom gradient descent approach.
There are several numerical techniques that allow to find a point that corresponds too , the saddle point. One way to tackle the problem is to “construct a new function, related to the Lagrangian, that (ideally) has a minimum at
This new function can be considered as ’distorting’ the Lagrangian at infeasible points so as to create a minimum at . Unconstrained minimization techniques can then be applied to the new function. This approach can make it easier to guarantee convergence to a local solution, but there is the danger that the local convergence properties of the method can be damaged.
The ’distortion’ of the Lagrangian function can lead to a ’distortion’ in the Newton equations for the method. Hence the behavior of the method near the solution may be poor unless care is taken.” Another way to tackle the condition is to maintain feasibility at every iteration. That is, to ensure that the updates follow the implicit curve . For the toy problem we are considering here it is relatively easy. Assume we start from a point x 0 that satisfies , that is it satisfies the constraint.
The algorithm can be summarized as follows:
- Compute the gradient (observe that we compute the gradient of the Lagrangian with respect to ).
- Compute an estimate of by computing the value of that minimizes .
- Assume that the update is . For each candidate update , project it over the constraint . Find the α k value that decreases the with respect to .
- Goto step 1 and repeat until convergence.
Let’s now implement the KKT conditions to see if we are able to obtain the same result as the one obtained with the closed solution. We will use the projected gradient descent to obtain the solution.
Let’s A be our covariance matrix
# A is the covariance matrix of the considered data A = np.cov(c1.T) A
Now we set up our initial values
# Tolerance tol=1e-08 # Initial alpha value (line search) alpha=1.0 # Initial values of w. DO NOT CHOOSE w=(0,0) w = np.array([1., 0.])
Now we compute the eigenvalues and eigenvectors
# let's see now the eigvals and eigvects eig_vals, eig_vecs = np.linalg.eig(A) print('Eigenvectors \n%s' %eig_vecs) print('\nEigenvalues \n%s' %eig_vals)
Now, we compute the projection for the function w=w.T*w
#now let's compute the projection for the function. w = w.T*w den = np.sqrt(np.dot(w.T,w)) w = w / den
Next step is to compute lambda
# now we calculate lambda lam = -np.dot (np.dot (w.T,(A + w.T) ),w) / 2 * np.dot(w.T,w)
Let’s review our initial values
print "Initial values" print "Lagrangian value =", lag print " w =", w print " x =", m1 print " y =", s1
Initial values Lagrangian value = -0.858313040377 w = [ 1. 0.] x = [4.0, -1.0] y = [[1, 0.9], [0.9, 1]]
Let’s now compute our function using gradient descent
# let's now compute the entire values for our function while ((alpha > tol) and (cont < 100000)): cont = cont+1 # Gradient of the Lagrangian grw = -np.dot (w.T,(A + w.T) ) - 2 * lam * w.T # Used to know if we finished line search finished = 0 while ((finished == 0) and (alpha > tol)): # Update aux_w = w - alpha * grw # Our Projection den = np.sqrt(np.dot(aux_w.T,aux_w)) aux_w = aux_w / den # Compute new value of the Lagrangian. aux_lam = -np.dot (np.dot(aux_w.T,(A+w.T)),aux_w) / 2 * np.dot (aux_w.T,aux_w) aux_lag = -np.dot (aux_w.T,np.dot(A,aux_w)) - lam * (np.dot(aux_w.T,aux_w) - 1) # Check if this is a descent if aux_lag < lag: w = aux_w lam = aux_lam lag = aux_lag alpha = 1.0 finished = 1 else: alpha = alpha / 2.0
Let’s now review our final values
# Let's now review our final values! print " Our Final Values" print " Number of iterations", cont print " Obtained values are w =", w print " Correct values are w =", veps[idx] print " Eigenvectors are =", eig_vecs
Our Final Values Number of iterations 22 Obtained values are w = [ 0.71916397 0.69484041] Correct values are w = [ 0.71916398 -0.6948404 ] Eigenvectors are = [[ 0.71916398 -0.6948404 ] [ 0.6948404 0.71916398]]
Let’s compare our new values vs the ones obtained by the closed solution
# Full comparition print " Gradient Descent values w =", w print " PCA analysis approach w =", matrix_w print " Closed Solution w =", veps[idx] print " Closed Solution w =", veps,vaps
Gradient Descent values w = [ 0.71916397 0.69484041] PCA analysis approach w = [[ 0.70710678 -0.70710678] [ 0.70710678 0.70710678]] Closed Solution w = [ 0.71916398 -0.6948404 ] Closed Solution w = [[ 0.71916398 -0.6948404 ] [ 0.6948404 0.71916398]] [ 1.56340502 0.10299214]
Very close! Let’s print it to visualize the new values versus the ones obtaine with sci-kit learn
import seaborn as sns plt.plot(c1[:,0],c1[:,1],'r.') plt.arrow(np.mean(c1[:,0]),np.mean(c1[:,1]), vaps[idx]*veps[0,idx],vaps[idx]*veps[1,idx],0.5, linewidth=1,head_width=0.1,color='blue') plt.arrow(np.mean(c1[:,0]),np.mean(c1[:,1]), vaps[idx]*w[idx],vaps[idx]*w[idx],0.5, linewidth=1,head_width=0.1,color='lightblue')
The post Principal Component Analysis (PCA): A Practical Example appeared first on 3Blades.
Source: 3blades – Principal Component Analysis (PCA): A Practical Example