Enterprise Data Science

We have thus far discussed various topics regarding both data mining and machine learning. Most of the examples shown were designed so that anyone with a standard computer would be able to run them and complete the exercises. In real-world situations, datasets would be much larger than those encountered in general home use.

Traditionally, we have relied on well-known database technologies such as SQL Server, Oracle, and others for organizational data warehouse and data management. The advent of NoSQL and Hadoop-based solutions made a significant change to this model of operation. Although companies were at first reluctant, the popular appeal of these tools became too large to ignore, and today, most, if not all, large organizations leverage one or more non-traditional contemporary solution for their enterprise data requirements.

Furthermore, the advent of cloud computing has transformed most businesses, and in-house data centers are being rapidly replaced by cloud-based infrastructures. The primary market leaders in the cloud space are Amazon (Amazon Web Services), Microsoft (Azure), and, to a lesser extent, Google (Google Compute Engine).

Data warehousing, data science, and machine learning needs are being delivered primarily on such platforms.

In this section, we will look at the various technical platforms that are prevalent in the corporate/enterprise market, their strengths, use cases, and potential pitfalls. In addition, we will also complete a tutorial using AWS to launch new instances on-demand using a trial account.

We will cover the following topics in this chapter:

  • Enterprise data science overview
  • Enterprise data mining
  • Enterprise AI and machine learning
  • Enterprise infrastructure
  • Other considerations, such as data strategy, governance, and tool selection
  • Amazon Web Services tutorial
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.