Basic data analytics in IoT

Data analytics intend to find events, usually in a streaming series of data. There are multiple types of events and roles that a real-time streaming analysis machine must provide. The following is a superset of analytic functions based on the work of Srinath Perera and Sriskandarajah Suhothayan, "Solution patterns for real-time streaming analytics." In Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems (DEBS '15). ACM, New York, NY, USA, 247-255. Following is an enumerated listing of these analytic functions:

  • Preprocessing: Filter out events of little interest, denaturing, feature extraction, segmentation, transform data to a more suitable form (although data lakes prefer no immediate transformation), adding attributes to data such as a tag (data lakes do need tags).
  • Alerting: Inspect data; if it exceeds some boundary condition, then raise an alert. The simplest example is if the temperature rises above a set limit on a sensor.
  • Windowing: A sliding window of events is created that only draws rules upon that window. Windows can be based on time (for example, one hour), or length (2000 sensor samples). They can be sliding windows (for example, inspect only the 10 latest sensor events and produce a result whenever a new event arises), or batch windows (for example, produce an event only at the end of the window). Windowing is good for rules and for counting events. One could look for the number of temperature spikes in the last hour and resolve that a defect will occur on some machine.
  •  Joins: Combine multiple data streams into a new single stream. A scenario where this applies is a logistics example. Say a shipping company tracks their shipments with assets tracking beacons and that their fleet of trucks, planes, and facilities have geolocation information streaming as well. There are initially two streams of data: one for the package, and one for a given truck. When a truck picks up a package, those two streams become joined. 
  • Errors: Millions of sensors will generate missing data, garbled data, and data that is out of sequence. This is important in the IoT case with multiple streams of asynchronous and independent data. For example, data may be lost in a cellular WAN if a vehicle enters an underground parking garage. This analytic pattern correlates data within its own stream to attempt to find these error conditions.
  • Databases: The analytics package will need to interact with some data warehouse. For example, if data is streaming in from a number of sensors looking, or in particular, when Bluetooth asset tags if an item is stolen or lost, a database of missing tag IDs would be referenced from all the gateways streaming in tag IDs to the system.
  • Temporal events and patterns: This is most often used with the window pattern mentioned previously. Here, a series or sequence of events constitutes a pattern of interest. One can think of this as a state machine. Say we are monitoring the health of a machine based on temperature, vibrations, and noise. A temporal event sequence could be as follows:  
    1. Detect if the temperature exceeds 100° C
    2. Then detect if vibrations exceed 1 m/s
    3. Next, detect if the machine is emitting noise at 110 dB
    4. If those events take place in that sequence, only then raise an alert
  • Tracking: Tracking involves when or where something exists, an event occurred, or when something doesn't exist where it should have. A very basic example is geolocation of service trucks where a company may need to know exactly where a truck is, and when it was last there. This has application in agriculture, human movement, tracking patients, tracking high-value assets, luggage systems, smart city garbage, snow removal, and so on. 
  • Trends: This pattern is particularly useful for predictive maintenance. Here, a rule is designed to detect an event based on time-correlated series data. This is similar to temporal events, but differs in the sense that temporal events have no notion of time, only sequence order. This model uses time as a dimension in the process. A running history of time-correlated data could be used to find patterns like a livestock sensor in farming. Here, a head of cattle may wear a sensor that detects the animal movement and temperature. An event sequence can be constructed to see if the cattle moved in the last day. If there was no movement, the cattle may be sick or dead.
  • Batch queries: Batch processing typically is more comprehensive and deeper than real-time stream processing. A well-designed streaming platform can fork analysis and call into a batch processing system. This will be talked about later in the form of Lambda processing.
  • Deep analytics pathway: In real-time processing, we make a decision on the fly that some event has occurred. Whether or not that event really should signal an alarm may require further processing that will not operate in real time. This works because these events should be rare, and pass down information to a detailed analysis engine, while new events streaming in real time should be designed within a system. An example is a video surveillance system. Say a smart city issues an amber alert for a lost child. The smart city can issue a simple feature extraction and classification model for the real-time streaming engines. The model would detect license plates for a vehicle the child may be in, or potentially a logo on the child's shirt. The first step would be to image-capture license plate numbers of vehicles or logos on pedestrians, and send them to the cloud. The analysis package may identify a plate of interest or a logo out of millions of image samples as a first-level pass. That positively identified frame (and surrounding video frames) would be passed to a deeper analytics package that resolves the image with deeper object recognition algorithms (image fusion, super-resolution, machine learning) to eliminate false positives. 
  • Models and training: The first-level model described previously may, in fact, be an inference engine for a machine learning system. These machine learning tools are built on trained models that can be used for in-flight, real-time analysis.
  • Signaling: It is often the case that an action needs to propagate back to the edge and sensor. A typical case is factory automation and safety. For example, if the temperature rises beyond a certain limit on a machine, log the event, but also send a signal back to the edge device to slow the machine down. The system must be able to be bidirectional in communication.
  • Control: Finally, we need a way to control these analysis tools. Whether that is starting, stopping, reporting, logging, or debugging, facilities need to be in place to manage this system.

Now, we will concentrate on how to build a cloud-based analytics architecture that must ingest unpredictable and unstoppable streams of data, and deliver interpretations of that data as close to real-time as possible.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.255.134