Multinomial Naive Bayes

We create a document-term matrix with 934 tokens as follows:

vectorizer = CountVectorizer(min_df=.001, max_df=.8, stop_words='english')
train_dtm = vectorizer.fit_transform(train.text)
<1566668x934 sparse matrix of type '<class 'numpy.int64'>'
with 6332930 stored elements in Compressed Sparse Row format>

We then train the MultinomialNB classifier as before and predict the test set:

nb = MultinomialNB()
nb.fit(train_dtm, train.polarity)
predicted_polarity = nb.predict(test_dtm)

The result is over 77.5% accuracy:

accuracy_score(test.polarity, y_pred_class)
0.7768361581920904
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.8.34