Neural machine translation - training a seq2seq RNN

Sequence to sequence (seq2seq) is a particular kind of RNN with successful applications in neural machine translation, text summarization, and speech recognition. In this recipe, we will discuss how to implement a neural machine translation with results similar to the one achieved by the Google Neural Machine Translation system (https://research.googleblog.com/2016/09/a-neural-network-for-machine.html ). The key idea is to input a whole sequence of text, understand the entire meaning, and then output the translation as another sequence. The idea of reading an entire sequence is very different from the previous architectures, where a fixed set of words was translated from one source language into a destination language.

This section is inspired by the 2016 PhD thesis, Neural Machine Translation, by Minh-Thang Luong (https://github.com/lmthang/thesis/blob/master/thesis.pdf). The first key concept is the presence of an encoder-decoder architecture, where an encoder transforms a source sentence into a vector representing the meaning. This vector is then passed through a decoder to produce a translation. Both the encoders and the decoders are RNNs that can capture long-range dependencies in languages, for example, gender agreements and syntax structures, without knowing them a priori and with no need to have a 1:1 mapping across languages. This is a powerful capacity which enables very fluent translations:

An example of encoder-decoder as seen in https://github.com/lmthang/thesis/blob/master/thesis.pdf

Let's see an example of an RNN translating the sentence She loves cute cats into Elle aime les chats mignons.

There are two RNNs: one that acts as the encoder, and one that acts as the decoder. The source sentence She loves cute cats is followed by a separator sign - and by the target sentence Elle aime les chats mignons. These two concatenated sentences are given in input to the encoder for training, and the decoder will produce the target Elle aime les chats mignons. Of course, we need multiple examples like this one for achieving good training:

An example of sequence models for NMT as seen in https://github.com/lmthang/thesis/blob/master/thesis.pdf

Now there is a number of RNNs variants we can have. Let's look at some of them:

  • RNNs can be unidirectional or bidirectional. The latter will capture long-term relations in both directions.
  • RNNs can have multiple hidden layers. The choice is a matter of optimization: on one hand, a deeper network can learn more; on the other hand, it might require a long time to be trained and might overfit.
  • RNNs can have an embedding layer which maps words into an embedding space where similar words happen to be mapped very close.
  • RNNs can use simple either recurrent cells, or LSTM, or PeepHole LSTM, or GRUs.

Still considering the PhD thesis, Neural Machine Translation (https://github.com/lmthang/thesis/blob/master/thesis.pdf), we can use embedding layers to map the input sentences into an embedding space. Then, there are two RNNs stuck together - the encoder for the source language and the decoder for the target language. As you can see, there are multiple hidden layers, and two flows: the feed-forward vertical direction connects the hidden layers, and the horizontal direction is the recurrent part transferring knowledge from the previous step to the next one:

An example of Neural machine translation as seen in https://github.com/lmthang/thesis/blob/master/thesis.pdf

In this recipe, we use NMT (Neural Machine Translation), a demo package for translation available online on top of TensorFlow.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.47.169