## Introduction to Gradient Descent Algorithm (along with variants) in Machine Learning

Introduction Optimization is always the ultimate goal whether you are dealing with a real life problem or building a software product. I, as a …

The post Introduction to Gradient Descent Algorithm (along with variants) in Machine Learning appeared first on Analytics Vidhya.

## How to read most commonly used file formats in Data Science (using Python)?

Introduction If you have been part of data industry, you would know the challenge of working with different data types. Different formats, different compression, …

The post How to read most commonly used file formats in Data Science (using Python)? appeared first on Analytics Vidhya.

## 5 More Deep Learning Applications a beginner can build in minutes (using Python)

Introduction Deep Learning is fundamentally changing everything around us. A lot of people think that you need to be an expert to use power of …

The post 5 More Deep Learning Applications a beginner can build in minutes (using Python) appeared first on Analytics Vidhya.

## Let’s first define what a is PCA.

Principal Component Analysis or PCA is mathematically defined as an orthogonal linear transformation that transforms the data to a new coordinate system such that the greatest variance by some projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on.

In other words, Principal Component Analysis (PCA) is a technique to detect the main components of a data set in order to reduce into fewer dimensions retaining the relevant information.

To put an example, Let  $X \in\mathbb{R}^{mxn}$ a data set with zero mean, that is, the matrix formed by $n$ observations of $m$ variables. Where the elements of $X$ are denoted as usual by $x_ij$ meaning that it contains the value of the observable $i$ of the $j-th$ observation experiment.

A principal component is a linear combination of the variables so that maximizes the variance.

## Let’s now see a PCA example step by step

### 1. Create a random toy data set

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

m1 = [4.,-1.]
s1 = [[1,0.9],[0.9,1]]
c1 = np.random.multivariate_normal(m1,s1,100)
plt.plot(c1[:,0],c1[:,1],'r.')


Let’s plot the data set and compute the PCA. The red dots of the figure show below the considered data, the blue arrow shows the eigenvector of maximum eigenvalue.

vaps,veps = np.linalg.eig(np.cov(c1.T))
idx = np.argmax(vaps)

plt.plot(c1[:,0],c1[:,1],'r.')
plt.arrow(np.mean(c1[:,0]),np.mean(c1[:,1]),
vaps[idx]*veps[0,idx],vaps[idx]*veps[1,idx],0.5,


## Now that we have visualize it, let’s code the closed solution for the PCA

First step is to standardize the data. We are going to use Scikit-learn library.

from sklearn.preprocessing import StandardScaler
X_std = StandardScaler().fit_transform(c1)


### Eigendecomposition – Computing Eigenvectors and Eigenvalues

The eigenvectors determine the directions of the new feature space, and the eigenvalues determine their magnitude. In other words, the eigenvalues explain the variance of the data along the new feature axes.

mean_vec = np.mean(X_std, axis=0)
cov_mat = (X_std - mean_vec).T.dot((X_std - mean_vec)) / (X_std.shape[0]-1)
print('Covariance Matrix \n%s' %cov_mat)

Covariance Matrix
[[ 1.01010101  0.88512031]
[ 0.88512031  1.01010101]]


### Let’s now print our Covariance Matrix

#Let's print our Covariance Matrix
print('NumPy Covariance Matrix: \n%s' %np.cov(X_std.T))

NumPy Covariance Matrix:
[[ 1.01010101  0.88512031]
[ 0.88512031  1.01010101]]


Now we perform an eigendecomposition on the covariance matrix

cov_mat = np.cov(X_std.T)

eig_vals, eig_vecs = np.linalg.eig(cov_mat)

print('Eigenvectors \n%s' %eig_vecs)
print('\nEigenvalues \n%s' %eig_vals)

Eigenvectors
[[ 0.70710678 -0.70710678]
[ 0.70710678  0.70710678]]

Eigenvalues
[ 1.89522132  0.1249807 ]


### Let’s sort the eigenvalues to see if everything is ok

# let's sort the eig values to see if everything is ok
for ev in eig_vecs:
np.testing.assert_array_almost_equal(1.0, np.linalg.norm(ev))
print('Everything ok!')

Everything ok!


### Now we need to make a list of the eigenvalue, eigenvectors tuples and sort them from high to low.

# Make a list of (eigenvalue, eigenvector) tuples
eig_pairs = [(np.abs(eig_vals[i]), eig_vecs[:,i]) for i in range(len(eig_vals))]

# Sort the (eigenvalue, eigenvector) tuples from high to low
eig_pairs.sort(key=lambda x: x[0], reverse=True)

# Visually confirm that the list is correctly sorted by decreasing eigenvalues
print('Eigenvalues in descending order:')
for i in eig_pairs:
print(i[0])

Eigenvalues in descending order:
1.89522131626
0.124980703938


## Building a Projection Matrix

# Choose the "top 2" eigenvectors with the highest eigenvalues
# we are going to use this values to matrix W.
matrix_w = np.hstack((eig_pairs[0][1].reshape(2,1),
eig_pairs[1][1].reshape(2,1)))

print('Matrix W:\n', matrix_w)

('Matrix W:\n', array([[ 0.70710678, -0.70710678],
[ 0.70710678,  0.70710678]]))


We will use this data to plot our output later so we can compare with a custom gradient descent approach.

There are several numerical techniques that allow to find a point $x^*$ that corresponds too $\nabla x, \lambda L (x^*, \lambda ^*) = 0$ , the saddle point. One way to tackle the problem is to “construct a new function, related to the Lagrangian, that (ideally) has a minimum at $(x^*, \lambda ^*)$

This new function can be considered as ’distorting’ the Lagrangian at infeasible points so as to create a minimum at $(x^*, \lambda ^*)$. Unconstrained minimization techniques can then be applied to the new function. This approach can make it easier to guarantee convergence to a local solution, but there is the danger that the local convergence properties of the method can be damaged.

The ’distortion’ of the Lagrangian function can lead to a ’distortion’ in the Newton equations for the method. Hence the behavior of the method near the solution may be poor unless care is taken.” Another way to tackle the condition $\nabla x, \lambda L (x, \lambda) = 0$ is to maintain feasibility at every iteration. That is, to ensure that the updates $xk$ follow the implicit curve $h(x) = 0$. For the toy problem we are considering here it is relatively easy. Assume we start from a point x 0 that satisfies $h(x 0 ) = 0$, that is it satisfies the constraint.

The algorithm can be summarized as follows:

1. Compute the gradient $\nabla L (x^k)$  (observe that we compute the gradient of the Lagrangian with respect to $x$).
2. Compute an estimate of $\lambda$ by computing the value of $\lambda$ that minimizes $\nabla L (x^k)^2$.
3. Assume that the update is $x^{k+1} = x^k - \alpha ^k \nabla L (x^k)$. For each candidate update $x k+1$, project it over the constraint $h(x) = 0$. Find the α k value that decreases the $L (x^{k+1})$ with respect to $\nabla L (x^k)$.
4. Goto step 1 and repeat until convergence.

Let’s now implement the KKT conditions to see if we are able to obtain the same result as the one obtained with the closed solution. We will use the projected gradient descent to obtain the solution.

### Let’s A be our covariance matrix

# A is the covariance matrix of the considered data
A = np.cov(c1.T)
A


Now we set up our initial values

# Tolerance
tol=1e-08

# Initial alpha value (line search)
alpha=1.0

# Initial values of w. DO NOT CHOOSE w=(0,0)
w = np.array([1., 0.])


Now we compute the eigenvalues and eigenvectors

# let's see now the eigvals and eigvects

eig_vals, eig_vecs = np.linalg.eig(A)

print('Eigenvectors \n%s' %eig_vecs)
print('\nEigenvalues \n%s' %eig_vals)


Now, we compute the projection for the function w=w.T*w

#now let's compute the projection for the function. w = w.T*w
den = np.sqrt(np.dot(w.T,w))
w = w / den


Next step is to compute lambda

# now we calculate lambda
lam = -np.dot (np.dot (w.T,(A + w.T) ),w) / 2 * np.dot(w.T,w)


Let’s review our initial values

print "Initial values"
print "Lagrangian value =", lag
print " w =", w
print " x =", m1
print " y =", s1

Initial values
Lagrangian value = -0.858313040377
w = [ 1.  0.]
x = [4.0, -1.0]
y = [[1, 0.9], [0.9, 1]]


## Let’s now compute our function using gradient descent

# let's now compute the entire values for our function

while ((alpha > tol) and (cont < 100000)):
cont = cont+1

grw = -np.dot (w.T,(A + w.T) ) - 2 * lam * w.T

# Used to know if we finished line search
finished = 0

while ((finished == 0) and (alpha > tol)):
# Update
aux_w = w - alpha * grw

# Our Projection
den = np.sqrt(np.dot(aux_w.T,aux_w))
aux_w = aux_w / den

# Compute new value of the Lagrangian.
aux_lam = -np.dot (np.dot(aux_w.T,(A+w.T)),aux_w) / 2 * np.dot (aux_w.T,aux_w)
aux_lag = -np.dot (aux_w.T,np.dot(A,aux_w)) - lam * (np.dot(aux_w.T,aux_w) - 1)

# Check if this is a descent
if aux_lag < lag:
w = aux_w
lam = aux_lam
lag = aux_lag
alpha = 1.0
finished = 1
else:
alpha = alpha / 2.0


Let’s now review our final values

# Let's now review our final values!
print " Our Final Values"
print "  Number of iterations", cont
print "  Obtained values are w =", w
print "  Correct values are  w =", veps[idx]
print "  Eigenvectors are =", eig_vecs

 Our Final Values
Number of iterations 22
Obtained values are w = [ 0.71916397  0.69484041]
Correct values are  w = [ 0.71916398 -0.6948404 ]
Eigenvectors are = [[ 0.71916398 -0.6948404 ]
[ 0.6948404   0.71916398]]


Let’s compare our new values vs the ones obtained by the closed solution

# Full comparition
print "  Gradient Descent values   w =", w
print "  PCA analysis approach     w =", matrix_w
print "  Closed Solution           w =", veps[idx]
print "  Closed Solution           w =", veps,vaps

  Gradient Descent values   w = [ 0.71916397  0.69484041]
PCA analysis approach     w = [[ 0.70710678 -0.70710678]
[ 0.70710678  0.70710678]]
Closed Solution           w = [ 0.71916398 -0.6948404 ]
Closed Solution           w = [[ 0.71916398 -0.6948404 ]
[ 0.6948404   0.71916398]] [ 1.56340502  0.10299214]


Very close! Let’s print it to visualize the new values versus the ones obtaine with sci-kit learn

import seaborn as sns
plt.plot(c1[:,0],c1[:,1],'r.')
plt.arrow(np.mean(c1[:,0]),np.mean(c1[:,1]),
vaps[idx]*veps[0,idx],vaps[idx]*veps[1,idx],0.5,
plt.arrow(np.mean(c1[:,0]),np.mean(c1[:,1]),
vaps[idx]*w[idx],vaps[idx]*w[idx],0.5,


The post Principal Component Analysis (PCA): A Practical Example appeared first on 3Blades.

Source: 3blades – Principal Component Analysis (PCA): A Practical Example

## How to get a data science job

You’ve done it. You just spent months learning how to analyze data and make predictions. You’re now able to go from raw data to well structured insights in a matter of hours. After all that effort, you feel like it’s time to take the next step, and get your first data science job.

Unfortunately for you, this is where the process starts to get much harder. There’s no clear path to go from having data science skills to getting a data science job. You’ll need to put in a lot of hard work to forge your own.

But don’t give up hope! Getting a data science job after learning on your own is very possible. In this post, we’ll discuss the things you should be doing to put yourself in position to start getting data science interviews. In a subsequent post, we’ll cover the interview process itself, and how to prepare.

Afterwards, you might get a snazzy new company laptop!

If you feel like your data science skills aren’t yet well developed enough to start looking for a job, you might want to check out…

Source: Dataquest – How to get a data science job

## Pandas Tutorial: Data analysis with Python: Part 2

We covered a lot of ground in Part 1 of our pandas tutorial. We went from the basics of pandas DataFrames to indexing and computations. If you’re still not confident with Pandas, you might want to check out the Dataquest pandas Course.

In this tutorial, we’ll dive into one of the most powerful aspects of pandas – its grouping and aggregation functionality. With this functionality, it’s dead simple to compute group summary statistics, discover patterns, and slice up your data in various ways.

Since Thanksgiving was just last week, we’ll use a dataset on what Americans typically eat for Thanksgiving dinner as we explore the pandas library. You can download the dataset here. It contains 1058 online survey responses collected by FiveThirtyEight. Each survey respondent was asked questions about what they typically eat for Thanksgiving, along with some demographic questions, like their gender, income, and location. This dataset will allow us to discover regional and income-based patterns in what Americans eat for Thanksgiving dinner. As we explore the data and try to find patterns, we’ll be heavily using the grouping and aggregation functionality of pandas.

## Python Web Scraping Tutorial using BeautifulSoup

When performing data science tasks, it’s common to want to use data found on the internet. You’ll usually be able to access this data in csv format, or via an Application Programming Interface(API). However, there are times when the data you want can only be accessed as part of a web page. In cases like this, you’ll want to use a technique called web scraping to get the data from the web page into a format you can work with in your analysis.

In this tutorial, we’ll show you how to perform web scraping using Python 3 and the BeautifulSoup library. We’ll be scraping weather forecasts from the National Weather Service, and then analyzing them using the Pandas library.

We’ll be scraping weather forecasts from the National Weather Service site.

Before we get started, if you’re looking for more background on APIs or the csv format, you might want to check out our Dataquest courses on APIs or data analysis.

## The components of a web page

When we visit a web page, our web browser makes a…

Source: Dataquest – Python Web Scraping Tutorial using BeautifulSoup

## Pandas Tutorial: Data analysis with Python: Part 1

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages, and makes importing and analyzing data much easier. Pandas builds on packages like NumPy and matplotlib to give you a single, convenient, place to do most of your data analysis and visualization work.

In this introduction, we’ll use Pandas to analyze data on video game reviews from IGN, a popular video game review site. The data was scraped by Eric Grinstein, and can be found here. As we analyze the video game reviews, we’ll learn key Pandas concepts like indexing.

Do games like the Witcher 3 tend to get better reviews on the PS4 than the Xbox One? This dataset can help us find out.

Just as a note, we’ll be using Python 3.5 and Jupyter Notebook to do our analysis.

## Importing Data with Pandas

The first step we’ll take is to read the data in. The data is stored as a comma-separated values, or csv, file, where each row…

Source: Dataquest – Pandas Tutorial: Data analysis with Python: Part 1

## NumPy Tutorial: Data analysis with Python

NumPy is a commonly used Python data analysis package. By using NumPy, you can speed up your workflow, and interface with other packages in the Python ecosystem, like scikit-learn, that use NumPy under the hood. NumPy was originally developed in the mid 2000s, and arose from an even older package called Numeric. This longevity means that almost every data analysis or machine learning package for Python leverages NumPy in some way.

In this tutorial, we’ll walk through using NumPy to analyze data on wine quality. The data contains information on various attributes of wines, such as pH and fixed acidity, along with a quality score between 0 and 10 for each wine. The quality score is the average of at least 3 human taste testers. As we learn how to work with NumPy, we’ll try to figure out more about the perceived quality of wine.

The wines we’ll be analyzing are from the Minho region of Portugal.

The data was downloaded from the UCI Machine Learning Repository, and is available here. Here are the first few rows of the winequality-red.csv

Source: Dataquest – NumPy Tutorial: Data analysis with Python

## 28 Jupyter Notebook tips, tricks and shortcuts

This post is based on a post that originally appeared on Alex Rogozhnikov’s blog, ‘Brilliantly Wrong’.

We have expanded the post and will continue to do so over time – if you have a suggestion please let us know in the comments. Thanks to Alex for graciously letting us republish his work here.

## Jupyter Notebook

Jupyter notebook, formerly known as the IPython notebook, is a flexible tool that helps you create readable analyses, as you can keep code, images, comments, formulae and plots together.

Jupyter is quite extensible, supports many programming languages and is easily hosted on your computer or on almost any server — you only need to have ssh or http access. Best of all, it’s completely free.

The Jupyter interface.

Project Jupyter was born out of the IPython project as the project evolved to become a notebook that could support multiple languages – hence its historical name as the IPython notebook. The name Jupyter is an indirect acronyum of the three core languages it was designed for: JUlia, PYThon, and R and is inspired by the planet Jupiter.

When working with Python in Jupyter, the…

Source: Dataquest – 28 Jupyter Notebook tips, tricks and shortcuts