Now that we have trained our word2vec model, let's explore what our model was able to learn. We will use most_similar() to explore the relations between various words. In the following example, you see that the model was able to learn that the word earth is related to crust, globe, and other words. It is interesting to see that we have just provided the raw data and model was able to learn all this relations and concepts automatically!
model2vec.most_similar("earth")[(u'crust', 0.6946468353271484),
(u'globe', 0.6748907566070557),
(u'inequalities', 0.6181437969207764),
(u'planet', 0.6092090606689453),
(u'orbit', 0.6079996824264526),
(u'laboring', 0.6058655977249146),
(u'sun', 0.5901342630386353),
(u'reduce', 0.5893668532371521),
(u'moon', 0.5724939107894897),
(u'eccentricity', 0.5709577798843384)]
Let's try to find words related to human we see what the model has learned.
model2vec.most_similar("human")
[(u'art', 0.6744576692581177),
(u'race', 0.6348963975906372),
(u'industry', 0.6203593611717224),
(u'man', 0.6148483753204346),
(u'population', 0.6090731620788574),
(u'mummies', 0.5895125865936279),
(u'gods', 0.5859177112579346),
(u'domesticated', 0.5857442021369934),
(u'lives', 0.5848811864852905),
(u'figures', 0.5809590816497803)]
Even when we try to derive an analogy by using two positive vectors as earth and moon and a negative vector orbit, the model predicts the word sun which makes sense because there is a semantic relation between moon orbiting around earth and earth orbiting around the sun.
model2vec.most_similar_cosmul(positive=['earth','moon'], negative=['orbit'])
(u'sun', 0.8161555624008179)
So, we learned that using word2vec model one can derive valuable information from the raw unlabeled data. This process is very crucial in terms of learning the language grammar and semantic correlations between words.