The following are the steps for executing the CountVectorizer:
- Import the library required for the count vectorizer:
from sklearn.feature_extraction.text import CountVectorizer
- Make a list of the text:
text = [" Machine translation automatically translate text from one human language to another text"]
- Tokenize the list of the text and build the vocabulary:
vectorizer.fit(text)
You will get the following output:
- Let's take a look at the vocabulary that was created:
print(vectorizer.vocabulary_)
We get the following output:
- Now, we have to encode it, as follows:
vector = vectorizer.transform(text)
- Let's get a summary of the vector and find out the term matrix:
print(type(vector))
print(vector.toarray())
We get the following output: