How to do it...

We proceed with the recipes as follows:

  1. Import the modules that are required. We will be definitely using TensorFlow; we will also need numpy for some elementary matrix calculation, and matplotlib, mpl_toolkit, and seaborn for the plotting:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import seaborn as sns
%matplotlib inline
  1. We load the dataset--we will use our favorite MNIST dataset:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/")
  1. We define a class, TF_PCA, which will implement all our work. The class is initialized as follows:
def __init__(self, data,  dtype=tf.float32):
self._data = data
self._dtype = dtype
self._graph = None
self._X = None
self._u = None
self._singular_values = None
self._sigma = None)
  1. The SVD of the given input data is calculated in the fit method. The method defines the computational graph and executes it to calculate the singular values and the orthonormal matrix U. It takes self.data to feed the placeholder self._X. tf.svd returns s (singular_values) in shape [..., p] in descending order of magnitude. We use tf.diag to convert it to a diagonal matrix:
def fit(self):
self._graph = tf.Graph()
with self._graph.as_default():
self._X = tf.placeholder(self._dtype, shape=self._data.shape)
# Perform SVD
singular_values, u, _ = tf.svd(self._X)
# Create sigma matrix
sigma = tf.diag(singular_values)
with tf.Session(graph=self._graph) as session:
self._u, self._singular_values, self._sigma = session.run([u, singular_values, sigma], feed_dict={self._X: self._data})
  1. Now that we have the sigma matrix, the orthonormal U matrix, and the singular values, we calculate the reduced dimension data by defining the reduce method. The method requires one of the two input arguments, n_dimensions or keep_info. The n_dimensions argument represents the number of dimensions we want to keep in the reduced dimension dataset. The keep_info argument, on the other hand, decides the percentage of the information that we intend to keep (a value of 0.8 means that we want to keep 80 percent of the original data). The method creates a graph that slices the Sigma matrix and calculates the reduced dimension dataset Yr:
def reduce(self, n_dimensions=None, keep_info=None):
if keep_info:
# Normalize singular values
normalized_singular_values = self._singular_values / sum(self._singular_values)
# information per dimension
info = np.cumsum(normalized_singular_values) # Get the first index which is above the given information threshold
it = iter(idx for idx, value in enumerate(info) if value >= keep_info)
n_dimensions = next(it) + 1
with self.graph.as_default():
# Cut out the relevant part from sigma
sigma = tf.slice(self._sigma, [0, 0], [self._data.shape[1], n_dimensions])
# PCA
pca = tf.matmul(self._u, sigma)

with tf.Session(graph=self._graph) as session:
return session.run(pca, feed_dict={self._X: self._data})
  1. Our TF_PCA class is ready. Now, we will use it to reduce the MNIST data from each input being of dimension 784 (28 x 28) to new data with each point of dimension 3. Here, we retained only 10 percent of the information for better viewing, but normally you would need to retain roughly 80 percent of the information:
tf_pca.fit()
pca = tf_pca.reduce(keep_info=0.1) # The reduced dimensions dependent upon the % of information
print('original data shape', mnist.train.images.shape)
print('reduced data shape', pca.shape)

Following is the output of the following code:

  1. Let's now plot the 55,000 data points in the three-dimensional space:
Set = sns.color_palette("Set2", 10)
color_mapping = {key:value for (key,value) in enumerate(Set)}
colors = list(map(lambda x: color_mapping[x], mnist.train.labels))
fig = plt.figure()
ax = Axes3D(fig)
ax.scatter(pca[:, 0], pca[:, 1],pca[:, 2], c=colors)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.187.210