Understanding the application's architecture

Every application has its own unique structure, or architecture. This architecture provides the overarching organization or framework for the application. For this application, we combine the three classes using a Java 8 stream in the ApplicationDriver class. This class consists of three methods:

  • ApplicationDriver: Contains the applications' user input
  • performAnalysis: Performs the analysis
  • main: Creates the ApplicationDriver instance

The class structure is shown next. The three instance variables are used to control the processing:

public class ApplicationDriver { 
    private String topic; 
    private String subTopic; 
    private int numberOfTweets; 
 
    public ApplicationDriver() { ... } 
    public void performAnalysis() { ...     } 
 
    public static void main(String[] args) { 
        new ApplicationDriver(); 
    } 
} 

The ApplicationDriver constructor follows. A Scanner instance is created and the sentiment analysis model is built:

public ApplicationDriver() { 
    Scanner scanner = new Scanner(System.in); 
    TweetHandler swt = new TweetHandler(); 
    swt.buildSentimentAnalysisModel(); 
    ... 
} 

The remainder of the method prompts the user for input and then calls the performAnalysis method:

out.println("Welcome to the Tweet Analysis Application"); 
out.print("Enter a topic: "); 
this.topic = scanner.nextLine(); 
out.print("Enter a sub-topic: "); 
this.subTopic = scanner.nextLine().toLowerCase(); 
out.print("Enter number of tweets: "); 
this.numberOfTweets = scanner.nextInt(); 
performAnalysis(); 

The performAnalysis method uses a Java 8 Stream instance obtained from the TwitterStream instance. The TwitterStream class constructor uses the number of tweets and topic as input. This class is discussed in the Data acquisition using Twitter section:

public void performAnalysis() { 
Stream<TweetHandler> stream = new TwitterStream( 
    this.numberOfTweets, this.topic).stream(); 
    ... 
} 

The stream uses a series of map, filter, and a forEach method to perform the processing. The map method modifies the stream's elements. The filter methods remove elements from the stream. The forEach method will terminate the stream and generate the output.

The individual methods of the stream are executed in order. When acquired from a public Twitter stream, the Twitter information arrives as a JSON document, which we process first. This allows us to extract relevant tweet information and set the data to fields of the TweetHandler instance. Next, the text of the tweet is converted to lowercase. Only English tweets are processed and only those tweets that contain the sub-topic will be processed. The tweet is then processed. The last step computes the statistics:

stream 
        .map(s -> s.processJSON()) 
        .map(s -> s.toLowerCase()) 
        .filter(s -> s.isEnglish()) 
        .map(s -> s.removeStopWords()) 
        .filter(s -> s.containsCharacter(this.subTopic)) 
        .map(s -> s.performSentimentAnalysis()) 
        .forEach((TweetHandler s) -> { 
            s.computeStats(); 
            out.println(s); 
        }); 

The results of the processing are then displayed:

out.println(); 
out.println("Positive Reviews: " 
        + TweetHandler.getNumberOfPositiveReviews()); 
out.println("Negative Reviews: " 
        + TweetHandler.getNumberOfNegativeReviews()); 

We tested our application on a Monday night during a Monday-night football game and used the topic #MNF. The # symbol is called a hashtag and is used to categorize tweets. By selecting a popular category of tweets, we ensured that we would have plenty of Twitter data to work with. For simplicity, we chose the football subtopic. We also chose to only analyze 50 tweets for this example. The following is an abbreviated sample of our prompts, input, and output:

Building Sentiment Model
Welcome to the Tweet Analysis Application
Enter a topic: #MNF
Enter a sub-topic: football
Enter number of tweets: 50
Creating Twitter Stream
51 messages processed!
Text: rt @ bleacherreport : touchdown , broncos ! c . j . anderson punches ! lead , 7 - 6 # mnf # denvshou 
Date: Mon Oct 24 20:28:20 CDT 2016
Category: neg
...
Text: i cannot emphasize enough how big td drive . @ broncos offense . needed confidence booster & amp ; just got . # mnf # denvshou 
Date: Mon Oct 24 20:28:52 CDT 2016
Category: pos
Text: least touchdown game . # mnf 
Date: Mon Oct 24 20:28:52 CDT 2016
Category: neg
Positive Reviews: 13
Negative Reviews: 27

We print out the text of each tweet, along with a timestamp and category. Notice that the text of the tweet does not always make sense. This may be due to the abbreviated nature of Twitter data, but it is partially due to the fact this text has been cleaned and stop words have been removed. We should still see our topic, #MNF, although it will be lowercase due to our text cleaning. At the end, we print out the total number of tweets classified as positive and negative.

The classification of tweets is done by the performSentimentAnalysis method. Notice the process of classification using sentiment analysis is not always precise. The following tweet mentions a touchdown by a Denver Broncos player. This tweet could be construed as positive or negative depending on an individual's personal feelings about that team, but our model classified it as positive:

Text: cj anderson td run @ broncos . broncos now lead 7 - 6 . # mnf 
Date: Mon Oct 24 20:28:42 CDT 2016
Category: pos

Additionally, some tweets may have a neutral tone, such as the one shown next, but still be classified as either positive or negative. The following tweet is a retweet of a popular sports news twitter handle, @bleacherreport:

Text: rt @ bleacherreport : touchdown , broncos ! c . j . anderson punches ! lead , 7 - 6 # mnf # denvshou 
Date: Mon Oct 24 20:28:37 CDT 2016
Category: neg

This tweet has been classified as negative but perhaps could be considered neutral. The contents of the tweet simply provide information about a score in a football game. Whether this is a positive or negative event will depend upon which team a person may be rooting for. When we examine the entire set of tweet data analysed, we notice that this same @bleacherreport tweet has been retweeted a number of times and classified as negative each time. This could skew our analysis when we consider that we may have a large number of improperly classified tweets. Using incorrect data decreases the accuracy of the results.

One option, depending on the purpose of analysis, may be to exclude tweets by news outlets or other popular Twitter users. Additionally we could exclude tweets with RT, an abbreviation denoting that the tweet is a retweet of another user.

There are additional issues to consider when performing this type of analysis, including the sub-topic used. If we were to analyze the popularity of a Star Wars character, then we would need to be careful which names we use. For example, when choosing a character name such as Han Solo, the tweet may use an alias. Aliases for Han Solo include Vykk Draygo, Rysto, Jenos Idanian, Solo Jaxal, Master Marksman, and Jobekk Jonn, to mention a few (http://starwars.wikia.com/wiki/Category:Han_Solo_aliases). The actor's name may be used instead of the actual character, which is Harrison Ford in the case of Han Solo. We may also want to consider the actor's nickname, such as Harry for Harrison.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.134.133