A quick example

If you're new to word embeddings, you might be feeling a little lost right now. Hang in there, it will become clearer in just a moment. Let's try a concrete example.

Using word2vec, a popular word-embedding model, we can start with the word cat and find it's 384 element vector, as shown in the following output code:

array([ 5.81600726e-01, 3.07168198e+00, 3.73339128e+00,
2.83814788e-01, 2.79787600e-01, 2.29124355e+00,
-2.14855480e+00, -1.22236431e+00, 2.20581269e+00,
1.81546474e+00, 2.06929898e+00, -2.71712840e-01,...

I've cut the output short, but you get the idea. Every word in this model is converted into a 384-element vector. These vectors can be compared to evaluate the semantic similarity of words in a dataset.

Now that we have a vector for a cat, I'm going to compute the word vector for a dog and a lizard. I would suggest that cats are more like dogs than lizards. I should be able to measure the distance between the cat vector and dog vector, and then measure the distance between the cat vector and the lizard vector. While there are many ways to measure the distance between vectors, cosine similarity is probably the most commonly used for word vectors. In the following table, we're comparing the cosine similarity of cats versus dogs and lizards:

Dog Lizard
Cat 0.74 0.63

As expected, in our vector space, cats are closer to dogs in meaning than lizards.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.140.204