Tensorflow

Tensorflow is another library designed for fast numerical calculations and automatic parallelism. It was released as an open source project by Google in 2015. Tensorflow works by building mathematical expressions similar to Theano, except that the computation is not compiled to machine code but is executed on an external engine written in C++. Tensorflow supports execution and deployment of parallel codes on one or more CPUs and GPUs.

The usage of Tensorflow is quite similar to that of Theano. To create a variable in Tensorflow, you can use the tf.placeholder function that takes a data type as input:

    import tensorflow as tf

a = tf.placeholder('float64')

Tensorflow mathematical expressions can be expressed quite similarly to Theano, except for a few different naming conventions as well as a more restricted support for the NumPy semantics.

Tensorflow doesn't compile functions to C and then machine code like Theano, but serializes the defined mathematical functions (the data structure containing variables and transformations is called computation graph) and executes them on specific devices. The configuration of devices and context can be done using the tf.Session object.

Once the desired expression is defined, a tf.Session needs to be initialized and can be used to execute computation graphs using the Session.run method. In the following example, we demonstrate the usage of the Tensorflow API to implement a simple element-wise sum of squares:

    a = tf.placeholder('float64')
b = tf.placeholder('float64')
ab_sq = a**2 + b**2

with tf.Session() as session:
result = session.run(ab_sq, feed_dict={a: [0, 1, 2],
b: [3, 4, 5]})
print(result)
# Output:
# array([ 9., 17., 29.])

Parallelism in Tensorflow is achieved automatically by its smart execution engine, and it generally works well without much fiddling. However, note that it is mostly suited for deep learning workloads that involve the definition of complex functions that use a lot of matrix multiplications and calculate their gradient.

We can now replicate the estimation of the pi example using Tensorflow capabilities and benchmark its execution speed and parallelism against the Theano implementation. What we will do is this:

  • Define our x and y variables and perform a hit test using broadcasted operations.
  • Calculate the sum of hit_tests using the tf.reduce_sum function.
  • Initialize a Session object with the inter_op_parallelism_threads and intra_op_parallelism_threads configuration options. These options control the number of threads used for different classes of parallel operations. Note that the first Session created with such options sets the number of threads for the whole script (even future Session instances).

We can now write a script name, test_tensorflow.py, containing the following code. Note that the number of threads is passed as the first argument of the script (sys.argv[1]):

    import tensorflow as tf
import numpy as np
import time
import sys

NUM_THREADS = int(sys.argv[1])
samples = 30000

print('Num threads', NUM_THREADS)
x_data = np.random.uniform(-1, 1, samples)
y_data = np.random.uniform(-1, 1, samples)

x = tf.placeholder('float64', name='x')
y = tf.placeholder('float64', name='y')

hit_tests = x ** 2 + y ** 2 <= 1.0
hits = tf.reduce_sum(tf.cast(hit_tests, 'int32'))

with tf.Session
(config=tf.ConfigProto
(inter_op_parallelism_threads=NUM_THREADS,
intra_op_parallelism_threads=NUM_THREADS)) as sess:
start = time.time()
for i in range(10000):
sess.run(hits, {x: x_data, y: y_data})
print(time.time() - start)

If we run the script multiple times with different values of NUM_THREADS, we see that the performance is quite similar to Theano and that the speedup increased by parallelization is quite modest:

    $ python test_tensorflow.py 1
13.059704780578613
$ python test_tensorflow.py 2
11.938535928726196
$ python test_tensorflow.py 3
12.783955574035645
$ python test_tensorflow.py 4
12.158143043518066

The main advantage of using software packages such as Tensorflow and Theano is the support for parallel matrix operations that are commonly used in machine learning algorithms. This is very effective because those operations can achieve impressive performance gains on GPU hardware that is designed to perform these operations with high throughput. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.104.153