Implementation of the softmax layer

We will now look at how we can implement a softmax layer. As we have already discussed, a sigmoid layer is used for assigning labels to a class—that is, if you want to have multiple nonexclusive characteristics that you want to infer from an input, you should use a sigmoid layer. A softmax layer is used when you only want to assign a single class to a sample by inference—this is done by computing a probability for each possible class (with probabilities over all classes, of course, summing to 100%). We can then select the class with the highest probability to give the final classification.

Now, let's see exactly what the softmax layer does—given a set of a collection of N real numbers (c₀, ..., c_N-1) , we first compute the sum of the exponential function on each number (), and then calculate the exponential of each number divided by this sum to yield the softmax:

Let's start with our implementation. We will start by writing two very short CUDA kernels: one that takes the exponential of each input, and another that takes the mean over all of the points:

SoftmaxExpCode='''
__global__ void softmax_exp( int num, float *x, float *y, int batch_size)
{
 int i = blockIdx.x * blockDim.x + threadIdx.x;

 if (i < num)
 {
  for (int k=0; k < batch_size; k++)
  {
   y[num*k + i] = expf(x[num*k+i]);
  }
 }
}
'''
exp_mod = SourceModule(SoftmaxExpCode)
exp_ker = exp_mod.get_function('softmax_exp')

SoftmaxMeanCode='''
__global__ void softmax_mean( int num, float *x, float *y, int batch_size)
{
 int i = blockDim.x*blockIdx.x + threadIdx.x;
 
 if (i < batch_size)
 {
  float temp = 0.0f;

  for(int k=0; k < num; k++)
   temp += x[i*num + k];
 
 
  for(int k=0; k < num; k++)
   y[i*num+k] = x[i*num+k] / temp;
 }
 
 return;
}'''

mean_mod = SourceModule(SoftmaxMeanCode)
mean_ker = mean_mod.get_function('softmax_mean')

Now, let's write a Python wrapper class, like we did previously. First, we will start with the constructor, and we will indicate the number of both inputs and outputs with num. We can also specify a default stream, if we wish:

class SoftmaxLayer:
    def __init__(self, num=None, stream=None):
     self.num = np.int32(num)
     self.stream = stream

Now, let's write eval_ function in a way that is similar to the dense layer:

def eval_(self, x, y=None, batch_size=None, stream=None):
 if stream is None:
 stream = self.stream
 
 if type(x) != pycuda.gpuarray.GPUArray:
  temp = np.array(x,dtype=np.float32)
  x = gpuarray.to_gpu_async( temp , stream=stream)
 
 if batch_size==None:
  if len(x.shape) == 2:
   batch_size = np.int32(x.shape[0])
  else:
   batch_size = np.int32(1)
 else:
  batch_size = np.int32(batch_size)
 
 if y is None:
  if batch_size == 1:
   y = gpuarray.empty((self.num,), dtype=np.float32)
 else:
  y = gpuarray.empty((batch_size, self.num), dtype=np.float32)

 exp_ker(self.num, x, y, batch_size, block=(32,1,1), grid=(int( np.ceil( self.num / 32) ), 1, 1), stream=stream)
 
 mean_ker(self.num, y, y, batch_size, block=(32,1,1), grid=(int( np.ceil( batch_size / 32)), 1,1), stream=stream)
 
 return y

Table of Contents for Implementation of the softmax layer

Create new playlist

Sign In

Sign Up

Table of Contents for
Implementation of the softmax layer