Running the examples in Eclipse

Now, as everything is set up, let's run the examples. We can simply run the Run.scala class as the Scala application, as illustrated in the following screenshot:

This application will connect to the MQTT message broker, which is part of the IBM Watson IoT Platform running in the cloud. That's the message hub where the test data generator published data to, and we'll just subscribe to this data:

Please ignore warnings that the Vfs.Dir is not found. Those are only warnings and don't affect the behavior of the application.

Now it's time to start the test data generator in the cloud running on Node-RED, therefore, we will open up the browser again, and click on the reset button in order to send another 30 seconds worth of data to the message broker, where we can pick it up:

Now the application will receive the subscribed data from the message hub and fill a tumbling, count-based window. We covered sliding, tumbling, count and time-based windows in Chapter 6, Structured Streaming.

Once this window is full, the data is sent to the neural network, where it gets analyzed:

We see that we have started at iteration 0 with an initial reconstruction error of 392314.67211754626. This is due to random initialization of the neural network weight parameters and will, therefore, change on each subsequent run.

An autoencoder provides two things. First, through a neural bottleneck, it provides a non-linear lower dimensional representation of your data. Second, it tries to reconstruct the input signal at the output, running though the bottleneck. Therefore it doesn't learn noise and irrelevant data. Since an autoencoder measures the reconstruction error on the signal trained on, we can use it as anomaly detector. Data which has been seen before will yield to a lower reconstruction error. More on the topic can be found here: http://deeplearning.net/tutorial/dA.html#da and http://dl.acm.org/citation.cfm?id=2689747.

This is demonstrated as follows:

So, we end up with 372.6741075529085 as the reconstruction error, since the neural network got used to the inherit hidden patterns in this signal. Let's run it again by again clicking on the reset button of the test data generator:

Now we end up with a value of 77.8737141122287. This value will further decrease as long as we show the normal signal to the neural network over and over again, but will also converge to a minimal value.

So, now, let's change the nature of the signal by clicking on broken in the test data generator:

Then, again, we click on reset in order to send the next 30 seconds worth of data to the neural network:

If we now check on the reconstruction error, we'll see something interesting:

Now we end up with a reconstruction error of 11091.125671441947, which is significantly higher than 77.8737141122287 and clearly shows us how anomalies can be detected on any time series signal without further knowledge of the underlying domain.

So, the only remaining thing is pushing this application to Apache Spark, because you might have noticed that this example only runs locally on your machine. But, at least it supports OpenBLAS, so it makes use of the SIMD instructions on your CPU, which speeds things up a bit. So let's push to Apache Spark!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.235.188