The Analytics Toolkit

There are several platforms today that are used for large-scale data analytics. At a broad level, these are divided into platforms that are used primarily for data mining, such as analysis of large datasets using NoSQL platforms, and those that are used for data science—that is, machine learning and predictive analytics. Oftentimes, the solution may have both the characteristics—a robust underlying platform for storing and managing data, and solutions that have been built on top of them that provide additional capabilities in data science.

In this chapter, we will show you how to install and configure your Analytics Toolkit, a collection of software that we'll use for the rest of the chapters:

  • Components of the Analytics Toolkit
  •  System recommendations
    • Installing on a laptop or workstation
    • Installing on the cloud
  • Installing Hadoop
    • Hadoop distributions
    • Cloudera Distribution of Hadoop (CDH)
  • Installing Spark
  • Installing R and Python
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.74.66