T-SNE Dimensionality Reduction

3 min readJul 28, 2020

This article is all about the dimensionality reduction. If you want to learn about the PCA i.e. principal component analysis then please check my previous article.

Principal Component Analysis (PCA)

When we start learning anything new we generally scared because of horrible buzz words. So let’s understand principal…

medium.com

t-Distributed Stochastic Neighbor Embedding (t-SNE) is a non-linear technique for dimensionality reduction that is particularly well suited for the visualization of high-dimensional datasets. It is extensively applied in image processing, NLP, genomic data, and speech processing. It’s a very new technique. The research paper was published for this technique around 2008 by Geoffrey Hinton.

So now, let’s understand the basic terminology of t-SNE,

What is a Neighborhood?
consider d-dimension space and our data is distributed in it, choose X point and find the distance of that point from another point by using some distance formula this is how we can identify the neighborhood points of X and can create a cluster of it.

What is Embedding?
In simple terms, embedding means picking up a point from a higher dimension and finding a corresponding point in the lower dimension.

Now we will move to the actual steps of t-SNE.

In this algorithm, we mostly calculate the probability of similarity of points in higher dimensions space and calculate the probability of similarity of points in the corresponding low-dimensional space. The similarity of points is calculated as the conditional probability that point A would choose point B as its neighbor if neighbors were picked in proportion to their probability density under a Gaussian (normal distribution) centered at A.

Then it tries to minimize the difference between these conditional probabilities (or similarities) in higher-dimensional and lower-dimension space for the perfect representation of data points in low dimensions.

To measure the minimization of the sum of the difference of conditional probability t-SNE minimizes the sum of the Kullback-Leibler divergence of overall data points using a gradient descent method.

In simple words, t-SNE minimize the divergence between two distribution and i.e the distribution that measures the pairwise similarities of input objects and the distribution which measures the pairwise similarities of the corresponding low-dimensional points in the embedding.

But one point is important at the time of embedding as we know that it’s always trying to find the corresponding point in the lower dimension at that time it’s trying to keep those points near which are nearer in the higher dimension. that means the distance (x1,x2) = distance(x’1,x’2) in a higher dimension.But it doesn't give the guarantee that those points were away from each other i.e they may not be the same in the lower dimension. i.e. T-SNE preserves distance in the neighborhood.

In t-SNE, there is one famous problem called a Crowding problem.

Crowding Problem:
In a higher dimension, we have more rooms, points can have more neighbors.
But in 2D points can have a few neighbors at the same distance from each other so just think about how can we embed them in 1D. This is a crowding problem. To solve this crowding problem we use t — distribution SNE.

The below article is awesome where you can visualize the t-SNE for a different type of data distribution and also can decide Perplexity i.e how many points should have in a neighbor, the number of iterations.