How to do it...

We proceed with the recipes as follows:

Import the modules that are required. We will be definitely using TensorFlow; we will also need numpy for some elementary matrix calculation, and matplotlib, mpl_toolkit, and seaborn for the plotting:

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import seaborn as sns
%matplotlib inline

We load the dataset--we will use our favorite MNIST dataset:

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/")

We define a class, TF_PCA, which will implement all our work. The class is initialized as follows:

def __init__(self, data,  dtype=tf.float32):
        self._data = data
        self._dtype = dtype
        self._graph = None
        self._X = None
        self._u = None
        self._singular_values = None
        self._sigma = None)

The SVD of the given input data is calculated in the fit method. The method defines the computational graph and executes it to calculate the singular values and the orthonormal matrix U. It takes self.data to feed the placeholder self._X. tf.svd returns s (singular_values) in shape [..., p] in descending order of magnitude. We use tf.diag to convert it to a diagonal matrix:

def fit(self):
        self._graph = tf.Graph()
        with self._graph.as_default():
            self._X = tf.placeholder(self._dtype, shape=self._data.shape)
            # Perform SVD
            singular_values, u, _ = tf.svd(self._X)
            # Create sigma matrix
            sigma = tf.diag(singular_values)
        with tf.Session(graph=self._graph) as session:
            self._u, self._singular_values, self._sigma = session.run([u, singular_values, sigma], feed_dict={self._X: self._data})

Now that we have the sigma matrix, the orthonormal U matrix, and the singular values, we calculate the reduced dimension data by defining the reduce method. The method requires one of the two input arguments, n_dimensions or keep_info. The n_dimensions argument represents the number of dimensions we want to keep in the reduced dimension dataset. The keep_info argument, on the other hand, decides the percentage of the information that we intend to keep (a value of 0.8 means that we want to keep 80 percent of the original data). The method creates a graph that slices the Sigma matrix and calculates the reduced dimension dataset Y_r:

def reduce(self, n_dimensions=None, keep_info=None):
        if keep_info:
            # Normalize singular values
            normalized_singular_values = self._singular_values / sum(self._singular_values)
            # information per dimension
            info = np.cumsum(normalized_singular_values)            # Get the first index which is above the given information threshold
           it = iter(idx for idx, value in enumerate(info) if value >= keep_info)
            n_dimensions = next(it) + 1 
       with self.graph.as_default():
            # Cut out the relevant part from sigma
            sigma = tf.slice(self._sigma, [0, 0], [self._data.shape[1], n_dimensions])
            # PCA
            pca = tf.matmul(self._u, sigma)

        with tf.Session(graph=self._graph) as session:
            return session.run(pca, feed_dict={self._X: self._data})

Our TF_PCA class is ready. Now, we will use it to reduce the MNIST data from each input being of dimension 784 (28 x 28) to new data with each point of dimension 3. Here, we retained only 10 percent of the information for better viewing, but normally you would need to retain roughly 80 percent of the information:

tf_pca.fit()
pca = tf_pca.reduce(keep_info=0.1)  # The reduced dimensions dependent upon the % of information
print('original data shape', mnist.train.images.shape)
print('reduced data shape', pca.shape)

Following is the output of the following code:

Let's now plot the 55,000 data points in the three-dimensional space:

Set = sns.color_palette("Set2", 10)
color_mapping = {key:value for (key,value) in enumerate(Set)}
colors = list(map(lambda x: color_mapping[x], mnist.train.labels))
fig = plt.figure()
ax = Axes3D(fig)
ax.scatter(pca[:, 0], pca[:, 1],pca[:, 2], c=colors)

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...