Implementation of inference methods

We will now add two methods for inference to our SequentialNetwork class—that is, for predicting an output given for a particular input. The first method we will just call predict, which will be used by the end user. In the course of the training process, we will have to make predictions based on a partial result from only some of the layers, and we will make another method to this end called partial_predict.

Let's start by implementing predict. This will take two inputs—a collection of samples in the form of a one- or two-dimensional NumPy array, and possibly a user-defined CUDA stream. We will start by doing some type-checks and formatting on the samples (here, called x), remembering that the samples will be stored row-wise:

def predict(self, x, stream=None):
 
 if stream is None:
  stream = self.stream
 
 if type(x) != np.ndarray:
  temp = np.array(x, dtype = np.float32)
  x = temp
 
 if(x.size == self.network_mem[0].size):
  self.network_mem[0].set_async(x, stream=stream)
 else:
 
  if x.size > self.network_mem[0].size:
   raise Exception("Error: batch size too large for input.")
 
  x0 = np.zeros((self.network_mem[0].size,), dtype=np.float32)
  x0[0:x.size] = x.ravel()
  self.network_mem[0].set_async(x0.reshape( self.network_mem[0].shape), stream=stream)
 
 if(len(x.shape) == 2):
  batch_size = x.shape[0]
 else:
  batch_size = 1

Now, let's perform the actual inference step. We just have to iterate through our entire neural network, performing an eval_ on each layer:

for i in xrange(len(self.network)):
 self.network[i].eval_(x=self.network_mem[i], y= self.network_mem[i+1], batch_size=batch_size, stream=stream)

We will now pull the final output of the NN, the GPU, and return it to the user. If the number of samples in x is actually smaller than the maximum batch size, we will slice the output array appropriately before it is returned:

y = self.network_mem[-1].get_async(stream=stream)
 
if len(y.shape) == 2:
 y = y[0:batch_size, :]
 
return y

Now, with that done, let's implement partial_predict. Let's briefly discuss the idea behind this. When we are in the training process, we will evaluate a collection of samples, and then look at how a subtle change of adding delta to each weight and bias individually will affect the outputs. To save time, we can calculate the outputs of each layer and store them for a given collection of samples, and then only recompute the output for the layer where we change the weight, as well as for all subsequent layers. We'll see the idea behind this in a little more depth soon, but for now, we can implement this like so:

def partial_predict(self, layer_index=None, w_t=None, b_t=None, partial_mem=None, stream=None, batch_size=None, delta=None):
 
 self.network[layer_index].eval_(x=self.network_mem[layer_index], y = partial_mem[layer_index+1], batch_size=batch_size, stream = stream, w_t=w_t, b_t=b_t, delta=delta)
 
 for i in xrange(layer_index+1, len(self.network)):
  self.network[i].eval_(x=partial_mem[i], y =partial_mem[i+1], batch_size=batch_size, stream = stream)

Table of Contents for Implementation of inference methods

Create new playlist

Sign In

Sign Up

Table of Contents for
Implementation of inference methods