Memory Neural Networks

Most machine learning models are not able to read and write to a long-term memory component, nor are they able to combine old memories seamlessly with inference. RNNs and their variants, such as LSTMs, do have a memory component. However, their memory (encoded by hidden states and weights) is typically too small, and is not like the large arrays of blocks that we find in modern computers (in the form of RAM). They try to compress all past knowledge into one dense vector—the memory state. This may be very restrictive for a complex application such as a virtual assistance or question-answering (QA) system where the long-term memory effectively acts as a (dynamic) knowledge base, and the output is a textual response. To address this problem, Memory Neural Networks (MemNNs) were developed by the Facebook AI Research group. The central idea of MemNNs is to combine the successful learning strategies developed in the deep learning literature for inference with a memory component that can be read and written to just like RAM. Also, the model is trained to learn how to operate effectively with the memory component. A memory network consists of memory, m, an indexed array of objects (for example, vectors or arrays of strings) and four components that are to be learned, I, G, O and R:

  • I: An input feature map, I, which converts the incoming input into the internal feature representation.
  • G: A generalization component, G, which updates old memories given new input. This is called generalization, as there is an opportunity for the network to compress and generalize its memories at this stage for some desired future use.
  • O: An output feature map, O, which produces a new output in the feature representation space given the new input and the current memory state.
  • R: A response component, R, which converts the output into the response format desired—for example, a textual response or an action:

When the components I,G,O, and R are neural networks, then resulting system is called a MemNN. Let's try to understand this with an example QA system. The system will be given a set of facts and a question. It will output the answer to this question. We have the following six textual facts and a question, Q: where is the milk now?:

  • Joe went to the kitchen
  • Fred went to the kitchen
  • Joe picked up the milk
  • Joe traveled to the office
  • Joe left the milk
  • Joe went to the bathroom

Note that only some subsets of the statements contain information needed for the answer, and the others are essentially irrelevant distractions. We will represent this in terms of the MemNN modules I, G, O, and R. The module I is a simple embedding module that converts the text into binary bag-of-words vectors. The text is stored in the next available memory slot in its original form, and thus the G module is simple. The vocabulary of words used in given facts is V = {Joe, Fred, traveled, picked, left, went, office, bathroom, kitchen, milk}, once stopwords are removed. Now, here is the memory state after all text is stored:

Memory Slot#

Joe        

Fred    

...

Office

bathroom

kitchen

milk

1

1

0

0

0

1

0

2

0

1

0

0

1

0

3

1

0

0

0

0

1

4

1

0

1

0

0

0

5

1

0

0

0

0

1

6

1

0

0

1

0

0

7

 

The O module produces output features by finding k supporting memories given question q. For k = 2, the highest scoring supporting memory is retrieved with:

Where s0 is a function that scores the match between an input, q, and mio1 is the index to the memory, m, with the best match. Now, using the query and the first retrieved memory, we can retrieve the next memory, mo2, which is close to both of them: 

The combined query and memory results are [qmo1mo2[ where is the milk now, Joe left the milk.Joe traveled to the office.]. Finally, module R needs to produce a textual response, r. The R module can output one-word answers, or maybe an RNN module generating a complete sentence. For a single-word response, let sr be another function that scores the match between [q,mo1,mo2] and a word, w. So, the final response, r, is the word office:

 

This model is hard to train end-to-end using backpropagation, and requires supervision at each module of the network. There is a slight modification of this that is effectively a continuous version of memory nets called an End-To-End Memory Network (MemN2N). This network can be trained by backpropagation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.12.157