How to train your own word vector embeddings

Many tasks require embeddings or domain-specific vocabulary that pretrained models based on a generic corpus may not represent well or at all. Standard Word2vec models are not able to assign vectors to out-of-vocabulary words and instead use a default vector that reduces their predictive value.

For example, when working with industry-specific documents, the vocabulary or its usage may change over time as new technologies or products emerge. As a result, the embeddings need to evolve as well. In addition, corporate earnings releases use nuanced language not fully reflected in GloVe vectors pretrained on Wikipedia articles.

We will illustrate the Word2vec architecture using the Keras library that we will introduce in more detail in the next chapter and the more performant gensim adaptation of the code provided by the Word2vec authors. The notebook Word2vec contains additional implementation detail, including a reference of a TensorFlow implementation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.239.148