Interactive, exploratory analysis using Python and Pixiedust

Now since our ETL process is in place, let's use a more lightweight programming environment based on Python for some exploratory data analysis in order to get an idea of what the data looks like. We'll use a visualizations/charting library called Pixiedust here.

The main advantage is that you can directly pass DataFrame objects to it, independent of their size, and Pixiedust will take care of the correct down sampling where necessary. It can create charts with only a single line of code whereas other libraries such as matplotlib need far more complex code to obtain similar charts. And the good news is: It is open source on the Apache V2 license but powered by IBM Watson Data Lab developers. More on Pixiedust can be found here: https://github.com/ibm-watson-data-lab/pixiedust.

We'll now implement this as a Python notebook executing the following steps:

  1. Load the DataFrame from the ObjectStore.
  2. Create an interactive chart using Pixiedust.

Let's go over every step in detail:

  1. First, we load the DataFrame. Loading data as a DataFrame from the ObjectStore is straightforward. Here is how you go about it:
  1. And so is using Pixiedust, which is just another two lines of Python code:
  1. By default, you are presented with a table view containing the first rows of the dataset. It is important to notice that, independent of the size of the DataFrame, we can just visualize it, and Pixiedust takes care of using the Apache Spark DataFrame API to just pull the necessary data in order to prevent unnecessary operations:
  1. Now let's create a graph by clicking on the respective symbol:
  1. We have specified that the ts field containing the timestamp, which resides on the x-axis, and the vacc and hacc fields containing the vibration data are plotted against the timestamp on the y-axis.

The result looks like his:

Again, it is important to notice that we didn't do any sampling in order to visualize that data. And it doesn't matter whether the DataFrame contains one MB, one TB, or even one PB since Pixiedust uses the DataFrame API in order to obtain the required samples.

We can clearly see that the vibrations increase as time progresses and finally, at the end of the recordings, which is when the bearing breaks, they reach their maximum.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.253.210