How to do it...

We proceed with the recipe as follows:

  1. Clone NMT from GitHub:
git clone https://github.com/tensorflow/nmt/
  1. Download a training dataset. In this case, we will use the training set for translating from Vietnamese to English. Other datasets are available at https://nlp.stanford.edu/projects/nmt/ for additional languages, such as German and Czech:
nmt/scripts/download_iwslt15.sh /tmp/nmt_data
  1. Considering https://github.com/tensorflow/nmt/, we are going to define the first embedding layer. The embedding layer takes an input, the vocabulary size V, and the desired size of the output embedding space. The vocabulary size is such that only the most frequent words in V are considered for embedding, while all the others are mapped to a common unknown term. In our case, the input is time-major, which means that the max time is the first input parameter (https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn ):
# Embedding
embedding_encoder = variable_scope.get_variable(
"embedding_encoder", [src_vocab_size, embedding_size], ...)
# Look up embedding:
# encoder_inputs: [max_time, batch_size]
# encoder_emb_inp: [max_time, batch_size, embedding_size]
encoder_emb_inp = embedding_ops.embedding_lookup(
embedding_encoder, encoder_inputs)
  1. Still considering https://github.com/tensorflow/nmt/, we define a simple encoder which uses tf.nn.rnn_cell.BasicLSTMCell(num_units) as a basic RNN cell. This is pretty simple but it is important to notice that given the basic RNN cell, we create the RNN by using tf.nn.dynamic_rnn (as specified in https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn ):
# Build RNN cell
encoder_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units)

# Run Dynamic RNN
# encoder_outpus: [max_time, batch_size, num_units]
# encoder_state: [batch_size, num_units]
encoder_outputs, encoder_state = tf.nn.dynamic_rnn(
encoder_cell, encoder_emb_inp,
sequence_length=source_sequence_length, time_major=True)
  1. After that, we need to define the decoder. So the first thing is to have a basic RNN cell with tf.nn.rnn_cell.BasicLSTMCell, which is then used to create a basic sampling decoder tf.contrib.seq2seq.BasicDecoder, which is used to perform dynamic decoding with the decoder tf.contrib.seq2seq.dynamic_decode:
# Build RNN cell
decoder_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units)
# Helper
helper = tf.contrib.seq2seq.TrainingHelper(
decoder_emb_inp, decoder_lengths, time_major=True)
# Decoder
decoder = tf.contrib.seq2seq.BasicDecoder(
decoder_cell, helper, encoder_state,
output_layer=projection_layer)
# Dynamic decoding
outputs, _ = tf.contrib.seq2seq.dynamic_decode(decoder, ...)
logits = outputs.rnn_output
  1. The last stage in the network is a softmax dense stage to transform the top hidden states into a logit vector:
projection_layer = layers_core.Dense(
tgt_vocab_size, use_bias=False)
  1.  Of course, we need to define the cross-entropy function and the loss used during the training phase :
crossent = tf.nn.sparse_softmax_cross_entropy_with_logits(
labels=decoder_outputs, logits=logits)
train_loss = (tf.reduce_sum(crossent * target_weights) /
batch_size)
  1. The next step is to define the steps needed for backpropagation, and use an appropriate optimizer (in this case, Adam). Note that the gradient has been clipped and that Adam uses a predefined learning rate:
# Calculate and clip gradients
params = tf.trainable_variables()
gradients = tf.gradients(train_loss, params)
clipped_gradients, _ = tf.clip_by_global_norm(
gradients, max_gradient_norm)
# Optimization
optimizer = tf.train.AdamOptimizer(learning_rate)
update_step = optimizer.apply_gradients(
zip(clipped_gradients, params))
  1. So now we can run the code and understand what the different executed steps are. First, the training graph is created. Then the training iterations start. The metric used for evaluations is bilingual evaluation understudy (BLEU). This metric is the standard for evaluating the quality of text which has been machine-translated from one natural language to another. Quality is considered to be the correspondence between a machine and a human output. As you can see, this value grows over time:
python -m nmt.nmt --src=vi --tgt=en --vocab_prefix=/tmp/nmt_data/vocab --train_prefix=/tmp/nmt_data/train --dev_prefix=/tmp/nmt_data/tst2012 --test_prefix=/tmp/nmt_data/tst2013 --out_dir=/tmp/nmt_model --num_train_steps=12000 --steps_per_stats=100 --num_layers=2 --num_units=128 --dropout=0.2 --metrics=bleu
# Job id 0
[...]
# creating train graph ...
num_layers = 2, num_residual_layers=0
cell 0 LSTM, forget_bias=1 DropoutWrapper, dropout=0.2 DeviceWrapper, device=/gpu:0
cell 1 LSTM, forget_bias=1 DropoutWrapper, dropout=0.2 DeviceWrapper, device=/gpu:0
cell 0 LSTM, forget_bias=1 DropoutWrapper, dropout=0.2 DeviceWrapper, device=/gpu:0
cell 1 LSTM, forget_bias=1 DropoutWrapper, dropout=0.2 DeviceWrapper, device=/gpu:0
start_decay_step=0, learning_rate=1, decay_steps 10000,decay_factor 0.98
[...]
# Start step 0, lr 1, Thu Sep 21 12:57:18 2017
# Init train iterator, skipping 0 elements
global step 100 lr 1 step-time 1.65s wps 3.42K ppl 1931.59 bleu 0.00
global step 200 lr 1 step-time 1.56s wps 3.59K ppl 690.66 bleu 0.00
[...]
global step 9100 lr 1 step-time 1.52s wps 3.69K ppl 39.73 bleu 4.89
global step 9200 lr 1 step-time 1.52s wps 3.72K ppl 40.47 bleu 4.89
global step 9300 lr 1 step-time 1.55s wps 3.62K ppl 40.59 bleu 4.89
[...]
# External evaluation, global step 9000
decoding to output /tmp/nmt_model/output_dev.
done, num sentences 1553, time 17s, Thu Sep 21 17:32:49 2017.
bleu dev: 4.9
saving hparams to /tmp/nmt_model/hparams
# External evaluation, global step 9000
decoding to output /tmp/nmt_model/output_test.
done, num sentences 1268, time 15s, Thu Sep 21 17:33:06 2017.
bleu test: 3.9
saving hparams to /tmp/nmt_model/hparams
[...]
global step 9700 lr 1 step-time 1.52s wps 3.71K ppl 38.01 bleu 4.89
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.70.88