Chapter 12. New Directions in Sentiment Analysis: Charting Words

The aim of this chapter is to introduce the emerging field of text mining as a new form of sentiment analysis for use in trading markets.

This book has really been about the analysis of the shape of sentiment. We first looked at how traders can apply alternative charting to detect sentiment in the market. We can deduce that a price point in the market, represented by a candlestick, line, bar chart, price break chart, Kagi chart, or Renko chart, is the result of an adversarial contest between buyers and sellers. Each price close represents a unit of sentiment. From this perspective, we can consider technical analysis or chart reading to be a branch of shape science. As a result, a chart type is really a shape that has been revealed from sample data. Accordingly, price break charts, Kagi charts, point and figure charts, and Renko charts represent shapes but use different landmarks. The application of these charts is not very different from any pattern recognition process that cuts across disciplines. We can apply some of the principles of shape analysis, which started with the science of morphometrics (the study of biological form), to chart reading.[11]

Consider the following statement: "Shape is a definite entity, a configuration of points that keep geometric relationships among them."[12] Recognizing the shape of surprise in the market is not really different from recognizing a face in a crowd. Seen in this context, charting is a form of shape science.

What is exciting about our current time is that the art and the science of analyzing sentiment in the market are evolving. Charts following this logic do not simply map price action, but also track opinion data. We see this in consumer and producer sentiment data and the charting of that data. Also emerging is a new branch of sentiment analysis that is generating an entirely new type of data for the trader to evaluate and use in shaping trades. We are referring to the field of "text mining." The field involves information extraction (IE), natural language processing (NLP), text mining, and event analysis. Economists at the Federal Reserve have recognized the need for real-time data sets that enable the gathering of information about economic conditions that is more precise than that currently available. A recent research report stated:

Aggregate business conditions are of central importance in the business, finance, and policy communities worldwide, and huge resources are devoted to assessment of the continuously evolving state of the real economy. Literally thousands of newspapers, newsletters, television shows, and blogs, not to mention armies of employees in manufacturing and service industries, including the financial services industries, central banks, government and nongovernment organizations, grapple constantly with the measurement and fore-casting of evolving business conditions. Of central importance is the constant grappling. Real economic agents, making real decisions, in real time, want accurate and timely estimates of the state of real activity. Business cycle chronologies such as those of the [National Bureau of Economic Research], which proclaim expansions and contractions long after the fact, are not useful in that regard.[13]

In this report's conclusion, there is an explicit recognition of the importance of words. Here is what the authors say:

We look forward to ... [i]ncorporation of indicators beyond macroeconomic and financial data. In particular, it will be of interest to attempt inclusion of qualitative information such as headline news.[14]

The Federal Reserve Bank of Philadelphia has established the "Real-Time Data Research Center." Research there focuses on how to involve real-time data such as business forecasts in tracking economic conditions, as well as separating "the signal from the noise."

This focus is profoundly important because as a result, real-time, high-frequency surveys of "opinion" will produce new information for traders to consider. A recent study noted the importance of this category of new research:

Traders in financial markets are confronted with the problem that too much information is available from various, heterogeneous sources like newswires, forums, blogs, and collaborative tools. In order to make accurate trading decisions, traders have to filter the relevant information efficiently so that they are able to react to new information in a timely manner.[15] This field is new, but is rapidly developing a topology and logic for the construction of technical tools. The figure below shows the overall logic.

Of course, we are interested in the application of "text mining" to trading. A recent study in the field of computational intelligence focused on the application to stock prediction states:

Mining textual documents and time series concurrently, such as predicting the movements of stock prices based on news articles, is an emerging topic in data mining society nowadays. Previous research has already suggested that the relationships between news articles and stock prices do exist. However, all of the existing approaches are concerning mining single time series only. The interrelationships among different stocks are not well addressed. Mining multiple time series concurrently is not only more informative but also far more challenging. Research in such a direction is lacking. In this paper, we try to explore such an opportunity and propose a systematic framework for mining multiple time series based on the Efficient Market Hypothesis.[16]

A recent article released by the Federal Reserve and focused on the impact of words states,

The Federal Reserve's announcement following its January 28, 2004, policy meeting led to one of the largest reactions in the Treasury market on record, with two- and five-year yields jumping 20 and 25 basis points (bp) respectively in the half-hour surrounding the announcement—the largest movements around any Federal Open Market Committee (FOMC) announcement over the fourteen years for which we have data. "We find that 75 to 90 percent of the explainable variation in five- and ten-year Treasury yields in response to monetary policy announcements is due to the path factor (associated with statements) rather than to changes in the federal funds rate target." ... The study concluded, "Do central bank actions speak louder than words? We find that the answer to this question is a qualified "no." In particular, we find that viewing the effects of FOMC announcements on financial markets as driven by a single factor—changes in the federal funds rate target—is inadequate. Instead, we find that a second policy factor—one not associated with the current federal funds rate decision of the FOMC, but instead with statements that it releases—accounted for more than three-fourths of the explainable variation in the movements of five- and ten-year Treasury yields around FOMC meetings.[17]

The question for the day is, given the importance of words and their impact on market, what can a trader without high-powered economic and computer models do now to use word analysis as a technical analysis tool for detecting sentiment? Do we have to wait for greater progress? The answer is that even at this early stage in text mining, the trader can employ this new form of technical analysis of sentiment by using word (or tag) clouds. Word clouds are part of a class of text mining that scans a document and generates word frequency counts and analyses of word associations. A publicly available method and early example of text mining potential is at www.wordle.net. By inserting text into the program, an arrangement of words represented as a function of their frequency is generated. One can instantly see which words are important.

Figure 12.1 is a word cloud of Federal Reserve chairman Ben Bernanke's speech on January 13, 2009 at the London School of Economics.[18]

Let's compare this word cloud with Figure 12.2, done for a key speech given by Jean-Claude Trichet, president of the European Central Bank on April 18, 2009. The speech is entitled "The Global Dimensions of the Crisis."[19]

Bernanke Speech Analysis: Word Cloud.: Source: www.wordle.net

Figure 12.1. Bernanke Speech Analysis: Word Cloud.: Source: www.wordle.net

Trichet Speech Analysis: Word Cloud.: Source: www.wordle.net

Figure 12.2. Trichet Speech Analysis: Word Cloud.: Source: www.wordle.net

A quick comparison of Ben Bernanke's word cloud with the one generated for Jean-Claude Trichet immediately shows some different emphases. The words global, imbalances, and trade appear prominently in Jean-Claude Trichet's speech, and are comparatively hard to see in Ben Bernanke's speech. The perception of differences between these two key bankers is important to understand, and these word clouds help an investor understand that expectations and sentiment in Europe differ from those in the United States. Word clouds are a step in the right direction, but much more quantification is necessary for text mining to apply to trading. Traders need to know much more than what they can get from word clouds. They need to be alerted to changes in word frequency and emphasis, as well as the detection of new words (new event detection), and whether words are disappearing.

The next steps in technical analysis use of text mining will be to enable any trader to upload a speech or document and a text mining application will generate a word matrix which is an array of words organized so that it shows frequency and compares the change in frequency from one document to another. The application of such a tool will help one gain a better understanding of macro-environment. If we detect a shift in emphasis, it can provide clues about the direction of public policy. What events are being feared? Is it deflation? Inflation? Credit tightening? Asset bubbles? A slight change in emphasis can reveal more to the trader about direction than looking at a chart.

Let's look at the potential demonstrated even at this early level of development. We will do some text mining of Ben Bernanke and Jean-Claude Trichet's speeches and review the results.

There are two types of text mining. The first kind of analysis is similar to a time series. It compares one person's words to previous speeches of the same person. Comparing Ben Bernanke to Jean-Claude Trichet or other persons is an example of cross-sectional analysis.

Several dimensions will be of interest to the trader. First is the frequency of key words. Did the use of a word increase or decrease between speeches? Another dimension is the uniqueness of a word's appearance. If a word appears for the first time, it will generate an important signal. When a key policy maker drafts a speech, each word is carefully weighed. The appearance, therefore, of a new word, and its appearance in association with other words (known as key words in context) in an economic release or speech can be useful indicators. When comparing one person's words with those of another, the analysis is similar to that of cross-sectional data. Between, for example, Ben Bernanke and Jean-Claude Trichet, some words are common, and some are unique to each. There are also variations in the frequency of those words. Let's look at the Sentiment Matrix in action on recent speeches of Bernanke (Figures 12.3 through 12.5) and Trichet (Figure 12.6). We will first assess Ben Bernanke's speeches in comparison with each other; then we will assess Jean-Claude Trichet's speeches; finally, we will look at Bernanke's speeches in comparison with Trichet's.

Text Mining of Bernanke Speeches: Words of Increasing Frequency

Figure 12.3. Text Mining of Bernanke Speeches: Words of Increasing Frequency

Words are both coincident and leading indicators of sentiment. Consider that any official statement by a key government official is carefully scrutinized for the impact every word will have. It therefore makes sense to focus on word appearances, their frequency, and their proximity to other words. An initial scan of a speech breaks down all the words into frequency counts. Frequency counting is the first clue to a shift in sentiment between two speeches. First, we look at which words are increasing in frequency. In comparing two Bernanke speeches, we find a hierarchy of words that significantly increased from the first speech to the second speech (Figure 12.3). The increase in frequency is not a coincidence, and provides a direct measure of attitudes. In contrast to prices, which really are results of attitudes, word analysis is a leading indicator. Why does the word economy appear three times in Speech 1 and ten times in Speech 2? It is not a coincidence. Why does creditors appear zero times in speech 1 but five times in speech 2? The word balances appears 0 times in speech 1 but four times in speech 2. The word liquidity appears four times in speech 1, but ten times in speech 2.

Decreasing Frequency Comparison

Comparing the same Ben Bernanke speeches with one another, we see a hierarchy of words that have decreased in frequency. For example, in the first speech, the word prices appears twenty-one times, and in the second speech, it appears ten times. The word boom appears nine times in the first speech and zero times in the second speech. Inflation appears ten times in the first speech and five times in the second speech. Oil appears six times in the first speech and two times in the second speech. We can see a decreasing emphasis on economic growth, inflation, and boom-related activity by the second speech.

Bernanke Speeches: Decreasing Frequency Analysis

Figure 12.4. Bernanke Speeches: Decreasing Frequency Analysis

Comparison of Ben Bernanke's and Jean-Claude Trichet's Speeches: A Cross-Sectional Analysis

The trader who wants to get an edge on evaluating macro developments between countries, or even between companies, is currently dependent on lagging macroeconomic indicators and surveys of business attitudes. In the near future, the trader will be able, using text mining, to compare and contrast the word appearances between two different documents. For example, a comparison of the minutes of the central bank of England with the minutes of the Federal Reserve could lead to insights into whether there are significant differences in policy and emphasis. The speeches and annual reports of industry leaders can be mined for data that can be converted into leading indicators about their sectors. In a cross-sectional comparison between two different individual speeches or documents of two different countries, what is important is detecting words that are shared and words that are unique. Let's take a look at two different speeches by Ben Bernanke and Jean-Claude Trichet.

The most frequently mentioned words of Ben Bernanke (Figure 12.5), versus those of Jean-Claude Trichet (Figure 12.6), in two speeches made not far apart in time, show a clear difference in focus.

When Trichet's remarks were text mined, certain words that are specific or unique to Trichet in comparison with Bernanke were of great interest. Trichet uses the word refinancing seventeen times and the word turbulence thirteen times. Bernanke uses neither of these words. Notice that Trichet uses the unique word longer-term, showing a priority to long-term policy goals. Bernanke doesn't use this word.

Words Shared between Ben Bernanke and Jean-Claude Trichet

We can also examine words shared by Ben Bernanke and Jean-Claude Trichet. In these words, a frequency difference is very revealing. Bernanke mentions the word credit twenty-three times, while Trichet mentions it three times. Bernanke mentions the word economy eleven times, and Trichet mentions it once. Ben Bernanke mentions the word financial twenty-eight times, and Jean-Claude Trichet uses it fourteen times. Most revealing is that Bernanke refers to markets sixteen times, compared with Trichet's four.

Text Mining of Federal Open Market Committee Statements

The entire trading world waits to see what the Federal Open Market Committee (FOMC) will say when it releases its statements. These statements are not lengthy; they number approximately sixty-five words. Therefore, every word is considered very carefully. It is common wisdom that these statements move the market in the moments after their release.

Text Mining of Bernanke-Specific Words

Figure 12.5. Text Mining of Bernanke-Specific Words

Text Mining of Trichet-Specific Words

Figure 12.6. Text Mining of Trichet-Specific Words

On January 22, 2008, an extraordinary statement was issued when the FOMC cut interest rates by 75 basis points. Only eight days later, it cut rates by another 50 basis points. These statements moved the markets. A side-by-side analysis where one can read each statement and contrast the statements' emphasis on key words is a common method used today.

This method is very cumbersome. With text mining, it can be nearly instant. When the statements in Table 12.1 are text mined for frequency of words and for unique word occurrence, tracking each key word over a period of time, we can obtain a deeper understanding of nuances and shifts in sentiment. This frequency charting is a new type of charting that will be available to the trader soon. We text mined every FOMC statement from January 31, 2007 to April 30, 2009. The text-mining software we used is found at http://www.sobolsoft.com.

Table 12.1. Side-by-side display of Federal Open Market Committee Statements published by Bloomberg L.P.

January 30, 2008 Text

January 22 Text

The Federal Open Market Committee decided today to lower its target for the federal funds rate 50 basis points to 3 percent.

The Federal Open Market Committee has decided to lower its target for the federal funds rate 75 basis points to 3-1/2 percent.

Financial markets remain under considerable stress, and credit has tightened further for some businesses and households. Moreover, recent information indicates a deepening of the housing contraction as well as some softening in labor markets.

The Committee took this action in view of a weakening of the economic outlook and increasing downside risks to growth. While strains in short-term funding markets have eased somewhat, broader financial market conditions have continued to deteriorate and credit has tightened further for some businesses and households. Moreover, incoming information indicates a deepening of the housing contraction as well as some softening in labor markets.

The Committee expects inflation to moderate in coming quarters, but it will be necessary to continue to monitor inflation developments carefully.

The Committee expects inflation to moderate in coming quarters, but it will be necessary to continue to monitor inflation developments carefully.

Today's policy action, combined with those taken earlier, should help to promote moderate growth over time and to mitigate the risks to economic activity. However, downside risks to growth remain. The Committee will continue to assess the effects of financial and other developments on economic prospects and will act in a timely manner as needed to address those risks.

Appreciable downside risks to growth remain. The Committee will continue to assess the effects of financial and other developments on economic prospects and will act in a timely manner as needed to address those risks.

We selected several key words and tracked them starting with the January 2007 statements. We can see the word economic rise in frequency (Figure 12.7), reflecting a growing economic concern. In contrast, the word inflation (Figure 12.8) has had several swings in emphasis and has substantially declined in usage. Of course, the word federal, hardly used at all for years, took a sudden spike in the later months (Figure 12.9). The word market has also a variation in emphasis over time (Figure 12.10). The usage of the word growth (Figure 12.11) is interesting because of its precipitous decline after peak use in August 2008.

Key questions arise. Which words are increasing in emphasis? Which words are decreasing? The proposition here is that charting words in their appearances over time is an important emerging form of technical and sentiment analysis.

FOMC Minutes

The minutes of central bank meetings are also released, and have an impact on the market. Text mining of the minutes enables the trader to detect variations in sentiment. Programs now offer visualization of the importance that words have in a document. These can use color and shapes to show variations in significance. The color red, for example, commonly highlights important words. Eagle Software in France processed text files of FOMC minutes for December 16, 2008, January 28, 2009, and March 18, 2009 for us, and developed a representation method that organizes key words and represents them in cells. (The full table breaking down the words is in Figure 12.12) The graphic for each date (Figures 12.13 through 12.15) indicates these words' importance through color, size of cell, and distance from center.

The top twenty-five words provide a virtual inventory of the concepts tracked by policymakers. From a trader's perspective, the differences among the sets of minutes are the important thing to focus on. Here is one key example: the word decline had 17 percent frequency on December 16. That frequency went to 10 percent on March 18. Meanwhile, the word inflation stayed within a 5–7 percent range.

FOMC Frequency: Economic

Figure 12.7. FOMC Frequency: Economic

FOMC Frequency: Inflation

Figure 12.8. FOMC Frequency: Inflation

FOMC Frequency: Federal

Figure 12.9. FOMC Frequency: Federal

FOMC Frequency: Market

Figure 12.10. FOMC Frequency: Market

FOMC Frequency: Growth

Figure 12.11. FOMC Frequency: Growth

Summary of Key Words Across All Dates.: Source: eeagle.com

Figure 12.12. Summary of Key Words Across All Dates.: Source: eeagle.com

Key Word Visualization, FOMC Minutes, December 16, 2007.: Source: eeagle.com

Figure 12.13. Key Word Visualization, FOMC Minutes, December 16, 2007.: Source: eeagle.com

Comparing Trichet Testimony with Bernanke Testimony

On January 10, 2008, Ben Bernanke delivered an important speech on the financial markets. One year and one month later, on February 20, 2009, Jean-Claude Trichet gave a speech about the ECB's response to the financial crises. Using text-mining analysis, we can detect words that were coincident and words that were unique to each policy maker. What is striking about the comparison is the words unique to each speaker. They show that, although the financial crisis was a global one, the mindset of these policy makers led to important differences in what they emphasized. Trichet's most frequent twenty words reveal that he is clearly Eurocentric in outlook, focusing on responses to the crisis as a process, whereas Bernanke was at that time focused on the housing market–related causes of the crises.[20]

Key Word Visualization, FOMC Minutes, January 28, 2008.: Source: eeagle.com

Figure 12.14. Key Word Visualization, FOMC Minutes, January 28, 2008.: Source: eeagle.com

One can see how words can become leading indicators of market expectations. The future of sentiment analysis will be in the direction of extracting real-time information from key statements of policy makers. Real-time word charts showing word frequency, co-occurrences of words, and differences among statements by key policy makers will soon become an everyday tool for the trader. What is important and exciting to realize is that when text mining enables real-time word analysis, words themselves will become units of sentiment and a new class of indicators and signals will arise. That time is not far away.

Text mining as a part of technical analysis of price action is here to stay.

Key Word Visualization, FOMC Minutes, March 18, 2009.: Source: eeagle.com

Figure 12.15. Key Word Visualization, FOMC Minutes, March 18, 2009.: Source: eeagle.com



[11] Subhash R. Lele and Joan T. Richtsmeier, An Invariant Approach to Statistical Analysis of Shapes. Boca Raton, FL: Chapman & Hall, 2001.

[12] http://palstrat.unigraz.at/methods%20in%20ostracodology/Contr GeomMorphom_vol13Berr(080708).

[13] Nii Ayi Armah and Norman R. Swanson, "Seeing Inside the Black Box: Using Diffusion Index Methodology to Construct Factor Proxies in Large Scale Macroeconomic Time Series Environments." Federal Reserve Bank of Philadelphia Working Paper 08–19, July 2008, retrieved from http://www.philadelphiafed.org/research-and-data/publications/working-papers/2008/wp08-19.pdf.

[14] Ibid.

[15] Uta Hellinger, Event and Sentiment Detection in Financial Markets, AIFB, Universität Karlsruhe, Germany, [email protected].

[16] G. Pui Cheong Fung, J. Xu Yu, and Wai Lam, Stock Prediction: Integrating Text Mining Approach Using Real-Time News. In Proceedings, IEEE International Conference on Computational Intelligence for Financial Engineering, Hong Kong, 2003, 395–402.

[17] Refet S., Gurkaynak, Brian P. Sack, and Eric T., Swanson, Do Actions Speak Louder Than Words? The Response of Asset Prices to Monetary Policy Actions and Statements (November 2004). FEDS Working Paper No. 2004-66. Available at SSRN: http://ssrn.com/abstract=633281 86.

[18] http://www.federalreserve.gov/newsevents/speech/bernanke20080110a.htm

[19] http://www.ecb.int/press/key/date/2009/html/sp090418.en.html.

[20] http://www.ecb.int/press/key/date/2009/html/sp090220.en.html; http://www.federalreserve.gov/newsevents/speech/bernanke20080110a.htm.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.130.201