## Let’s first define what a is PCA.

Principal Component Analysis or PCA is mathematically defined as an orthogonal linear transformation that transforms the data to a new coordinate system such that the greatest variance by some projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on.

In other words, Principal Component Analysis (PCA) is a technique to detect the main components of a data set in order to reduce into fewer dimensions retaining the relevant information.

To put an example, Let a data set with zero mean, that is, the matrix formed by observations of variables. Where the elements of are denoted as usual by meaning that it contains the value of the observable of the observation experiment.

A principal component is a linear combination of the variables so that maximizes the variance.

## Let’s now see a PCA example step by step

### 1. Create a random toy data set

Let’s plot the data set and compute the PCA. The red dots of the figure show below the considered data, the blue arrow shows the eigenvector of maximum eigenvalue.

## Now that we have visualize it, let’s code the closed solution for the PCA

First step is to standardize the data. We are going to use Scikit-learn library.

### Eigendecomposition – Computing Eigenvectors and Eigenvalues

The eigenvectors determine the directions of the new feature space, and the eigenvalues determine their magnitude. In other words, the eigenvalues explain the variance of the data along the new feature axes.

Covariance Matrix
[[ 1.01010101 0.88512031]
[ 0.88512031 1.01010101]]

### Let’s now print our Covariance Matrix

NumPy Covariance Matrix:
[[ 1.01010101 0.88512031]
[ 0.88512031 1.01010101]]

Now we perform an eigendecomposition on the covariance matrix

Eigenvectors
[[ 0.70710678 -0.70710678]
[ 0.70710678 0.70710678]]
Eigenvalues
[ 1.89522132 0.1249807 ]

### Let’s sort the eigenvalues to see if everything is ok

### Now we need to make a list of the eigenvalue, eigenvectors tuples and sort them from high to low.

Eigenvalues in descending order:
1.89522131626
0.124980703938

## Building a Projection Matrix

('Matrix W:\n', array([[ 0.70710678, -0.70710678],
[ 0.70710678, 0.70710678]]))

**We will use this data to plot our output later so we can compare with a custom gradient descent approach.**

There are several numerical techniques that allow to find a point that corresponds too , the saddle point. One way to tackle the problem is to “construct a new function, related to the Lagrangian, that (ideally) has a minimum at

This new function can be considered as ’distorting’ the Lagrangian at infeasible points so as to create a minimum at . Unconstrained minimization techniques can then be applied to the new function. This approach can make it easier to guarantee convergence to a local solution, but there is the danger that the local convergence properties of the method can be damaged.

The ’distortion’ of the Lagrangian function can lead to a ’distortion’ in the Newton equations for the method. Hence the behavior of the method near the solution may be poor unless care is taken.” Another way to tackle the condition is to maintain feasibility at every iteration. That is, to ensure that the updates follow the implicit curve . For the toy problem we are considering here it is relatively easy. Assume we start from a point x 0 that satisfies , that is it satisfies the constraint.

The algorithm can be summarized as follows:

- Compute the gradient (observe that we compute the gradient of the Lagrangian with respect to ).
- Compute an estimate of by computing the value of that minimizes .
- Assume that the update is . For each candidate update , project it over the constraint . Find the α k value that decreases the with respect to .
- Goto step 1 and repeat until convergence.

Let’s now implement the KKT conditions to see if we are able to obtain the same result as the one obtained with the closed solution. We will use the projected gradient descent to obtain the solution.

### Let’s A be our covariance matrix

**Now we set up our initial values**

**Now we compute the eigenvalues and eigenvectors**

**Now, we compute the projection for the function w=w.T*w**

**Next step is to compute lambda**

**Let’s review our initial values**

Initial values
Lagrangian value = -0.858313040377
w = [ 1. 0.]
x = [4.0, -1.0]
y = [[1, 0.9], [0.9, 1]]

## Let’s now compute our function using gradient descent

**Let’s now review our final values**

Our Final Values
Number of iterations 22
Obtained values are w = [ 0.71916397 0.69484041]
Correct values are w = [ 0.71916398 -0.6948404 ]
Eigenvectors are = [[ 0.71916398 -0.6948404 ]
[ 0.6948404 0.71916398]]

**Let’s compare our new values vs the ones obtained by the closed solution**

Gradient Descent values w = [ 0.71916397 0.69484041]
PCA analysis approach w = [[ 0.70710678 -0.70710678]
[ 0.70710678 0.70710678]]
Closed Solution w = [ 0.71916398 -0.6948404 ]
Closed Solution w = [[ 0.71916398 -0.6948404 ]
[ 0.6948404 0.71916398]] [ 1.56340502 0.10299214]

**Very close! Let’s print it to visualize the new values versus the ones obtaine with sci-kit learn**

The post Principal Component Analysis (PCA): A Practical Example appeared first on 3Blades.

Source: 3blades – Principal Component Analysis (PCA): A Practical Example