Bag of words

Sometimes, if we want to use text in Machine Learning algorithms, we’ll have to convert them into a numerical representation. We know that computers are very good at handling numbers. We convert text into a numerical representation called a feature vector. A vector can be as simple as a list of numbers. The bag-of-words model is one of the feature-extraction algorithms for text. We can use this package to generate a bag of words. 

For that, we need to use sklearn from Python:

from sklearn.feature_extraction.text import CountVectorizer

We are going to use CountVectorizer to create the bag of words:

corpus = []
len(text.sentences)
for sentence in text.sentences:
corpus.append(str(sentence))

vectorizer = CountVectorizer()
print( vectorizer.fit_transform(corpus).todense() )
print( vectorizer.vocabulary_ )

The output of the preceding snippet should look something like this:

Figure 13.6: Bag of words using the CountVectorize function

Now let TextBlob perform sentiment analysis:

test=TextBlob(text)
test.sentiment

It should output the sentiment text as follows:

Sentiment(polarity=0.20195668693009117, subjectivity=0.5995670995670995)

There are two attributes for sentiment:

  • The polarity score is a float within the range [-1.0, 1.0]
  • The subjectivity is a float within the range [0.0, 1.0], where 0.0 is very objective and 1.0 is very subjective:
test.sentiment.polarity
out[8]:0.20195668693009117

Let's write a loop to get a sentiment from every tweet:

#now we want to get the polarity plot:
storage=[]
for i in range(len(df)):
x=str(df[i])
y=TextBlob(x)
z=y.sentiment.polarity
storage.append(z)

 

Now let's convert storage to DataFrame and plot it:

#create new dataframe of the change
change=DataFrame({'trend':storage})

rcParams['figure.figsize'] = 20, 10

#plot the trend of the sentiments (polarity plot)
change.plot.line()

The output should look like the following screenshot:

Figure 13.7: Polarity trend plot of the tweets

Let's obtain the subjectivity plot:

storage2=[]
for i in range(len(df)):
x2=str(df[i])
y2=TextBlob(x2)
z2=y2.sentiment.subjectivity
storage2.append(z2)

And now plot it:

#create new dataframe of the change
change2=DataFrame({'trend2':storage2})
rcParams['figure.figsize'] = 20, 10
#plot the trend of the sentiments (polarity plot)
change2.plot.line()

The output should look like the following screenshot:

Figure 13.8: Subjectivity plot of the tweets

To zoom in, we want to see a clear trend within the first 500 tweets.

First is the polarity trend:

#now we want to get the polarity plot:
storage=[]
for i in range(500):
x=str(df[i])
y=TextBlob(x)
z=y.sentiment.polarity
storage.append(z)

#create new dataframe of the change
change=DataFrame({'trend':storage})
rcParams['figure.figsize'] = 20, 10
#plot the trend of the sentiments (polarity plot)
change.plot.line()

The output should look like the following screenshot:

Figure 13.9: Polarity trends plot of first 500 tweets only

And we can also see the subjectivity trends:

##now we want to get the subjectivity plot:
storage2=[]
for i in range(500):
x2=str(df[i])
y2=TextBlob(x2)
z2=y2.sentiment.subjectivity
storage2.append(z2)
#create new dataframe of the change
change2=DataFrame({'trend2':storage2})
rcParams['figure.figsize'] = 20, 10

#plot the trend of the sentiments (polarity plot)
change2.plot.line()

The output of the plot should look like the following screenshot:

Figure 13.10: Subjectivity trends of first 500 tweets

There are several other possibilities for analyzing Twitter data to generate statistics from them. It's being widely used in the modern market. The main purpose of this exercise is to help you get started. Feel free to take it from here and extract other statistics.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.50.222