How to do it...

Include the following lines in a new Python file to add datasets:

from sklearn.datasets import fetch_20newsgroups 
category_mapping = {'misc.forsale': 'Sellings', 'rec.motorcycles': 'Motorbikes', 
        'rec.sport.baseball': 'Baseball', 'sci.crypt': 'Cryptography', 
        'sci.space': 'OuterSpace'} 
 
training_content = fetch_20newsgroups(subset='train', 
categories=category_mapping.keys(), shuffle=True, random_state=7)

Perform feature extraction to extract the main words from the text:

from sklearn.feature_extraction.text import CountVectorizer 
 
vectorizing = CountVectorizer() 
train_counts = vectorizing.fit_transform(training_content.data) 
print "nDimensions of training data:", train_counts.shape

Train the classifier:

from sklearn.naive_bayes import MultinomialNB 
from sklearn.feature_extraction.text import TfidfTransformer 
 
input_content = [ 
    "The curveballs of right handed pitchers tend to curve to the left", 
    "Caesar cipher is an ancient form of encryption", 
    "This two-wheeler is really good on slippery roads" 
] 
 
tfidf_transformer = TfidfTransformer() 
train_tfidf = tfidf_transformer.fit_transform(train_counts)

Implement the Multinomial Naive Bayes classifier:

classifier = MultinomialNB().fit(train_tfidf, training_content.target) 
input_counts = vectorizing.transform(input_content) 
input_tfidf = tfidf_transformer.transform(input_counts)

Predict the output categories:

categories_prediction = classifier.predict(input_tfidf)

Print the output:

for sentence, category in zip(input_content, categories_prediction): 
    print 'nInput:', sentence, 'nPredicted category:',  
            category_mapping[training_content.target_names[category]]

The following screenshot provides examples of predicting the object based on the input from the database:

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...