The Skip-Gram architecture in Keras

To illustrate the Word2vec network architecture, we use the TED Talk dataset with aligned English and Spanish subtitles that we first introduced in Chapter 13Working with Text Data.

The notebook contains the code to tokenize the documents and assign a unique ID to each item in the vocabulary. We require at least five occurrences in the corpus and keep a vocabulary of 31,300 tokens.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.187.121