Executing TF-IDF in Python

The following are the steps for executing TF-IDF in Python:

  1. Import the library, as follows:
from sklearn.feature_extraction.text import TfidfVectorizer
  1. Let's make a corpus by adding four documents, as follows:
corpus = ['First document', 'Second document','Third document','First and second document' ]
  1. Let's set up the vectorizer:
vectorizer = TfidfVectorizer()
  1. We extract the features out of the text as follows:
X = vectorizer.fit_transform(corpus)
print(vectorizer.get_feature_names())
print(X.shape)

The output is as follows:

  1. Here comes the document term matrix; every list indicates a document:
X.toarray()

We get the following output:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.17.140