KDD, Data Mining, and Text Mining

"Certainty of death. Small chance of success. What are we waiting for?"
                                                                                                      - Gimli, son of Gloin

Aside from being a buzzword, data mining is the analysis step from Knowledge Discovery in Databases (KDDs), which is concerned with uncovering hidden patterns from huge unstructured datasets. The term data mining doesn't define a single method, but a broad collection of used methods. Those methods range from linear regressions and clustering techniques, all the way to visualizations, random forests, and artificial intelligence methods.

You may have already noticed, but it's not that easy to set apart what data mining is from data science. I mostly think about data mining as something that data scientists are doing to big data (another buzzword). That said, practically any benefits coming from data science can be somehow directly related to data mining. Thriving data mining applications can be seen through finances, health industry, retail, marketing, astronomy, and web industries.

Although data mining is not restricted to a single method or data type, this chapter is dedicated to text data. Mining through text is known as text mining. We will use a text mining framework to go through tweets to check what the R community is talking about. By doing this, we will seek insights that could lead to skill improvements.

In this chapter, we will look at the following topics:

  • How to get yourself a dwarf name
  • The steps in the KDD process
  • R tools to retrieve text from the web
  • Some nuts and bolts of using the rtweet package
  • Analytical tools for analyzing text
  • How to visualize the results

How can you get from a complex dataset to an insightful report? Data mining is used to discover and extract patterns from datasets. Adopt a dwarf nickname and grab a suitable data-pick axe because we are totally doing this—we are mining data!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.