Text Mining

"What then is, generally speaking, the truth of history? A fable agreed upon. As it has been very ingeniously remarked"
- Napoleon Bonaparte

The world is awash with textual data. If you Google, Bing, or Yahoo! how much of that data is unstructured, that is, in a textual format, estimates would range from 80 to 90 percent. The real number doesn't matter. It matters that a large proportion of the data is in text format. The implication is that anyone seeking to find insights in that data must develop the capability to process and analyze text.

When I first started out as a market researcher, I used to manually pore through page after page of moderator-led focus group and interview transcripts with the hope of capturing some qualitative insight, an aha moment if you will, and then haggle with fellow team members over whether they had the same insight or not. Then, you would always have that one individual in a project who would swoop in and listen to two interviews—out of the 30 or 40 on the schedule—and, alas, they had their mind made up on what was really happening in the world. Contrast that with the techniques being used now, where an analyst can quickly distill data into meaningful quantitative results, support qualitative understanding, and maybe even sway the swooper.

Over the last few years, I've applied the techniques discussed here to mine physician-patient interactions, understand FDA fears on prescription drug advertising, capture patient concerns about rare cancer, and capture customer maintenance problems, to name just a few. Using R and the methods in this chapter, you too can extract the powerful information in textual data.

The following topics will be covered in this chapter:

  • Text mining framework and methods
  • Data overview
  • Word frequency
  • Sentiment analysis
  • N-grams
  • Topic models
  • Classifying text
  • Additional quantitative analysis
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.54.245