Overview of the five steps

The five essential steps to perform data science are as follows:

  1. Asking an interesting question
  2. Obtaining the data
  3. Exploring the data
  4. Modeling the data
  5. Communicating and visualizing the results

First, let's look at the five steps with reference to the big picture.

Asking an interesting question

This is probably my favorite step. As an entrepreneur, I ask myself (and others) interesting questions every day. I would treat this step as you would treat a brainstorming session. Start writing down questions regardless of whether or not you think the data to answer these questions even exists. The reason for this is twofold. First off, you don't want to start biasing yourself even before searching for data. Secondly, obtaining data might involve searching in both public and private locations and, therefore, might not be very straightforward. You might ask a question and immediately tell yourself "Oh, but I bet there's no data out there that can help me" and cross it off your list. Don't do that! Leave it on your list.

Obtaining the data

Once you have selected the question you want to focus on, it is time to scour the world for the data that might be able to answer that question. As mentioned before, the data can come from a variety of sources; so, this step can be very creative!

Exploring the data

Once we have the data, we use the lessons learned in Chapter 2, Types of Data, and begin to break down the types of data that we are dealing with. This is a pivotal step in the process. Once this step is completed, the analyst has generally spent several hours learning about the domain, using code or other tools to manipulate and explore the data, and has a very good sense of what the data might be trying to tell them.

Modeling the data

This step involves the use of statistical and machine learning models. In this step, we are not only fitting and choosing models, but we are also implanting mathematical validation metrics in order to quantify the models and their effectiveness.

Communicating and visualizing the results

This is arguably the most important step. While it might seem obvious and simple, the ability to conclude your results in a digestible format is much more difficult than it seems. We will look at different examples of cases when results were communicated poorly and when they were displayed very well.

In this book, we will focus mainly on steps 3, 4, and 5.

Note

Why are we skipping steps 1 and 2 in this book?

While the first two steps are undoubtedly imperative to the process, they generally precede statistical and programmatic systems. Later in this book, we will touch upon the different ways to obtain data; however, for the purpose of focusing on the more scientific aspects of the process, we will begin with exploration right away.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.242.131