Summary

Many companies such as Google freely give pretrained word vectors (trained on a subset of Google News, incorporating the top three million words/phrases) for various vector dimensions: for example, 25d, 50d, 100d, 300d, and so on. You can find the code (and the resulting word vectors) here. In addition to Google News, there are other sources of trained word vectors, which use Wikipedia and various languages. One question you might have is that if companies such as Google freely provide pretrained word vectors, why bother building your own? The answer to the question is, of course, application-dependent; Google's pretrained dictionary has three different vectors for the word java based on capitalization (JAVA, Java, and java mean different things), but perhaps, your application is just about coffee, so only one version of java is all that is really needed.

Our goal for this chapter was to give you a clear and concise explanation of the word2vec algorithms and very popular extensions of this algorithm, such as doc2vec and sequence-to-sequence learning models, which employ various flavors of recurrent neural networks. As always, one chapter is hardly enough time to cover this extremely exciting field in natural language processing, but hopefully, this is enough to whet your appetite for now!

As practitioners and researchers in this field, we (the authors) are constantly thinking of new ways of representing documents as fixed vectors, and there are a plenty of papers dedicated to this problem. You can consider LDA2vec and Skip-thought Vectors for further reading on the subject.

Some other blogs to add to your reading list regarding Natural Language Processing (NLP) and Vectorizing are as follows:

In the next chapter, we will see word vectors again, where we'll combine all of what you have learned so far to tackle a problem that requires the kitchen sink with respect to the various processing tasks and model inputs. Stick around!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.69.83