IoT data analytics and machine learning comparison and assessment

The uses of a machine learning algorithm have their place in IoT. The typical case is when there is a plethora of streaming data that needs to produce some meaningful conclusion. A small collection of sensors may only need a simple rule engine on the edge in a latency-sensitive application. Others may stream data to a cloud service and apply rules there for systems with less-aggressive latency demands. When large amounts of data, unstructured data, and real-time analytics come into play, we need to consider the use of machine learning to solve some of the hardest problems. 

In this section, we detail some tips and reminders in deploying machine learning analytics, and what use cases may warrant such tools.

Training phase: 

  • For random forest, use bagging techniques to create ensembles. 
  • When using a random forest, ensure you maximize the number of decision trees.
  • Watch overfitting. Overfitting will lead to inaccurate field models. Techniques such as regularization and even injecting noise into a system will reinforce the mode.
  • Don't train on the edge.
  • Gradient descent will lead to error. RNNs naturally are susceptible.

Model in field:

  • Update model with new data sets as they become available. Keep the training set current.
  • Running models on the edge can be reinforced with larger and more comprehensive models in the clouds.
  • Neural network execution can be optimized in the cloud and at the edge with a minimum loss by considering techniques such as pruning node and reducing precision.

Model

Best application

Worse fit and side effects

Resource demands

Training

Random forests (statistical models)

  • Anomaly detection
  • Systems with 1000's of choice points and hundreds of inputs
  • Regression and classification
  • Handles mixed data types
  • Ignores missing values
  • Scales linearly with input
  • Feature extraction
  • Time and sequence analysis

Low

  • Training based on bagging techniques. for maximum effectiveness
  • Training fairly resource light
  • Mainly supervised

RNN (temporal and sequence-based neural networks) 

  • Prediction of an event based on a sequence
  • Streaming data patterns
  • Time-correlated series data
  • Maintains knowledge of past states to predict new states (electrical signals, audio, speech recognition)
  • Unstructured data
  • nput variables may or may not be dependent
  • Image and video analysis
  • Systems of requiring thousands of features
  • Very high for training
  • High for inference execution
  • Training more cumbersome than CNN backpropagation
  • Very hard to train
  • Supervised 

CNN (deep learning)

  • Prediction of an object based on surrounding values
  • Pattern and feature identification
  • 2D image recognition
  • Unstructured Data
  • Input variables may or may not be dependent
  • Time-based and sequential predictions
  • Systems of requiring thousands of features
  • Very high for Training (floating point precision, large training sets, large memory demands)
  • High for inference execution

Supervised and unsupervised

Bayesian networks (probabilistic models)

  • Noisy and incomplete data sets
  • Streaming data patterns
  • Time correlated series
  • Structured data
  • Signal analysis
  • Models developed quickly
  • Assumes all input variables are independent
  • Perform poorly with high orders of data dimensions

Low

  • Little training data need with respect to other artificial neural networks
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.140.198.173