Sometimes, if we want to use text in Machine Learning algorithms, we’ll have to convert them into a numerical representation. We know that computers are very good at handling numbers. We convert text into a numerical representation called a feature vector. A vector can be as simple as a list of numbers. The bag-of-words model is one of the feature-extraction algorithms for text. We can use this package to generate a bag of words.
For that, we need to use sklearn from Python:
from sklearn.feature_extraction.text import CountVectorizer
We are going to use CountVectorizer to create the bag of words:
corpus = []
len(text.sentences)
for sentence in text.sentences:
corpus.append(str(sentence))
vectorizer = CountVectorizer()
print( vectorizer.fit_transform(corpus).todense() )
print( vectorizer.vocabulary_ )
The output of the preceding snippet should look something like this:
Now let TextBlob perform sentiment analysis:
test=TextBlob(text)
test.sentiment
It should output the sentiment text as follows:
There are two attributes for sentiment:
- The polarity score is a float within the range [-1.0, 1.0]
- The subjectivity is a float within the range [0.0, 1.0], where 0.0 is very objective and 1.0 is very subjective:
test.sentiment.polarity
out[8]:0.20195668693009117
Let's write a loop to get a sentiment from every tweet:
#now we want to get the polarity plot:
storage=[]
for i in range(len(df)):
x=str(df[i])
y=TextBlob(x)
z=y.sentiment.polarity
storage.append(z)
Now let's convert storage to DataFrame and plot it:
#create new dataframe of the change
change=DataFrame({'trend':storage})
rcParams['figure.figsize'] = 20, 10
#plot the trend of the sentiments (polarity plot)
change.plot.line()
The output should look like the following screenshot:
Let's obtain the subjectivity plot:
storage2=[]
for i in range(len(df)):
x2=str(df[i])
y2=TextBlob(x2)
z2=y2.sentiment.subjectivity
storage2.append(z2)
And now plot it:
#create new dataframe of the change
change2=DataFrame({'trend2':storage2})
rcParams['figure.figsize'] = 20, 10
#plot the trend of the sentiments (polarity plot)
change2.plot.line()
The output should look like the following screenshot:
To zoom in, we want to see a clear trend within the first 500 tweets.
First is the polarity trend:
#now we want to get the polarity plot:
storage=[]
for i in range(500):
x=str(df[i])
y=TextBlob(x)
z=y.sentiment.polarity
storage.append(z)
#create new dataframe of the change
change=DataFrame({'trend':storage})
rcParams['figure.figsize'] = 20, 10
#plot the trend of the sentiments (polarity plot)
change.plot.line()
The output should look like the following screenshot:
And we can also see the subjectivity trends:
##now we want to get the subjectivity plot:
storage2=[]
for i in range(500):
x2=str(df[i])
y2=TextBlob(x2)
z2=y2.sentiment.subjectivity
storage2.append(z2)
#create new dataframe of the change
change2=DataFrame({'trend2':storage2})
rcParams['figure.figsize'] = 20, 10
#plot the trend of the sentiments (polarity plot)
change2.plot.line()
The output of the plot should look like the following screenshot:
There are several other possibilities for analyzing Twitter data to generate statistics from them. It's being widely used in the modern market. The main purpose of this exercise is to help you get started. Feel free to take it from here and extract other statistics.