Implementing Naïve Bayes with scikit-learn

Coding from scratch and implementing on your own solutions is the best way to learn about machine learning model. Of course, we can take a shortcut by directly using the MultinomialNB class from the scikit-learn API:

>>> from sklearn.naive_bayes import MultinomialNB

Let's initialize a model with a smoothing factor (specified as alpha in scikit-learn) of 1.0, and prior learned from the training set (specified as fit_prior in scikit-learn):

>>> clf = MultinomialNB(alpha=1.0, fit_prior=True)

To train the Naïve Bayes classifier with the fit method, use the following command:

>>> clf.fit(term_docs_train, Y_train)

And to obtain the prediction results with the predict_proba method, use the following commands:

>>> prediction_prob = clf.predict_proba(term_docs_test)
>>> prediction_prob[0:10]
[[1.00000000e+00 3.96500362e-13]
[1.00000000e+00 2.15303766e-81]
[6.59774100e-01 3.40225900e-01]
[1.00000000e+00 2.28043493e-15]
[1.00000000e+00 1.77156705e-15]
[5.53261316e-05 9.99944674e-01]
[0.00000000e+00 1.00000000e+00]
[1.00000000e+00 3.49697719e-28]
[1.00000000e+00 4.43498548e-14]
[3.39263684e-01 6.60736316e-01]]

Do the following to directly acquire the predicted class values with the predict method (0.5 is the default threshold; if the predicted probability of class 1 is great than 0.5, class 1 is assigned, otherwise, 0 is used):

>>> prediction = clf.predict(term_docs_test)
>>> prediction[:10]
[0 0 0 0 0 1 1 0 0 1]

Finally, we measure the accuracy performance by calling the score method:

>>> accuracy = clf.score(term_docs_test, Y_test)
>>> print('The accuracy using MultinomialNB is:
{0:.1f}%'.format(accuracy*100))
The accuracy using MultinomialNB is: 93.0%
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.140.68