Write operation

Before performing a write operation, we want to find the least recently used memory location because that is where we have to write. How can we find the least recently used memory location? To find that, we compute a new vector called the usage weight vector. It is denoted by and will be updated after every read and write step. It is just a sum of the read weight vector and write weight vector—that is, .

Along with adding read and weight vectors, we update our usage weight vector by adding the decaying previous usage weight vector, . We use the decay parameter called , which is used to determine how previous usage weights have to be decayed. So, our final usage weight vector is the sum of the decaying previous usage weight vector, read weight vector, and write weight vector:

Now that we calculated the usage weight vector, how can we compute the least recently used location? For that, we introduce one more weight vector, called the least used weight vector, .

Computing least used weight vector, , from the usage weight vector, , is very simple. We simply set the index of the lowest value in the usage weight vector to 1 and the rest of the values to 0, as the lowest value in the usage weight vector means that it is least recently used:

Okay, what's next? We have computed the least used weight vector. Now, how do we compute the write weight vector ,? We compute the write weight vector using a sigmoid gate, which is used to compute a convex combination of the previous read weight vector, , and previous least used weight vector :

After computing the write weight vector, we finally update our memory matrix:

We will see how to build this in TensorFlow.

We compute the usage weight vector:

 w_u = self.gamma * prev_w_u + tf.add_n(w_r_list) + tf.add_n(w_w_list)

Then, we compute the least used weight vector:

    def least_used(w_u):
        _, indices = tf.nn.top_k(w_u, k=self.memory_size)
        w_lu = tf.reduce_sum(tf.one_hot(indices[:, -self.head_num:], depth=self.memory_size), axis=1)
        return indices, w_lu

We store previous indices and the previous least used weight vector:

prev_indices, prev_w_lu =  least_used(prev_w_u)

We compute the write weight vector:

    def write_head_addressing(sig_alpha, prev_w_r, prev_w_lu):
      
        return sig_alpha * prev_w_r + (1. - sig_alpha) * prev_w_lu

Then, we update the memory:

M_ = prev_M * tf.expand_dims(1. - tf.one_hot(prev_indices[:, -1], self.memory_size), dim=2)

We perform the write operation:


        M = M_
        with tf.variable_scope('writing'):
            for i in range(self.head_num):
                
                w = tf.expand_dims(w_w_list[i], axis=2)
                k = tf.expand_dims(k_list[i], axis=1)
                M = M + tf.matmul(w, k)

Table of Contents for Write operation

Create new playlist

Sign In

Sign Up

Table of Contents for
Write operation