How to do it...

We proceed with the recipe as follows:

Consider this piece of code which runs a matrix multiplication on a single GPU.

# single GPU (baseline)
import tensorflow as tf
# place the initial data on the cpu
with tf.device('/cpu:0'):
  input_data = tf.Variable([[1., 2., 3.],
    [4., 5., 6.],
    [7., 8., 9.],
    [10., 11., 12.]])
b = tf.Variable([[1.], [1.], [2.]])
 
# compute the result on the 0th gpu
with tf.device('/gpu:0'):
  output = tf.matmul(input_data, b)
 
# create a session and run
with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  print sess.run(output)

Partition the code with in graph replication as in the following snippet between 2 different GPUs. Note that the CPU is acting as the master node distributing the graph and collecting the final results.

# in-graph replication
import tensorflow as tf
num_gpus = 2
# place the initial data on the cpu
with tf.device('/cpu:0'):
  input_data = tf.Variable([[1., 2., 3.],
   [4., 5., 6.],
   [7., 8., 9.],
   [10., 11., 12.]])
   b = tf.Variable([[1.], [1.], [2.]])
 
# split the data into chunks for each gpu
inputs = tf.split(input_data, num_gpus)
outputs = []
 
# loop over available gpus and pass input data
for i in range(num_gpus):
  with tf.device('/gpu:'+str(i)):
    outputs.append(tf.matmul(inputs[i], b))
 
# merge the results of the devices
with tf.device('/cpu:0'):
  output = tf.concat(outputs, axis=0)
 
# create a session and run
with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  print sess.run(output)

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...