The following are the key constraints in designing the analytics architecture:
- Scope: The scope is defined by the business viewpoint and in turn defines the requirements and constraints for the system. The location of the analytics depends on the business use case and value proposition as determined in the business viewpoint and becomes a question as to where and how the derived analytic results need to be acted upon.
- Response time and reliability: If time and reliability are critical to the response, then the analytics must be performed locally. If there is a more generous time horizon, the analytics can be performed after the data collection.
- Bandwidth: When large numbers of sensors are generating large amounts of data, it places a burden upon the network and other components to transmit and integrate data. Infrastructure costs need to be balanced against the value of the analytic insights that can be achieved.
- Capacity: Although it may be optimal to perform analysis at a particular tier, the tier's capacity constraints, including latency, bandwidth, and computational capacity, may force the selection of another tier to perform the analysis.
- Security: The value of collecting and moving raw data should be balanced against the risks of securing the data as it is stored and transmitted. By performing analytics locally within the domain, summary, redacted, and anonymized data can be shared with other domains, while either discarding the raw data or storing it in a hardened database.
- Volume: Data generated by an IIoT system needs to be stored for analytics and processing, and possibly longer. The IIoT system must have sufficient capacity for storing the data. Collecting large amounts of high-volume, high-velocity raw data requires a large and expandable storage capacity. Storage costs can be reduced by storing only the derived data and discarding the raw data; archiving older data to cheaper but slower devices can alleviate some costs. Using cloud storage can provide additional costs savings.
- Velocity: Industrial systems typically collect measurement in real-time and cyclically. Data-processing speeds need to be able to keep up with high-frequency data collection, such as vibrations in an engine or machine. In the case of transient events data collection, accurate timings and the order of occurrence of the measurements are needed to determine the causality. When there are low-latency requirements on high-velocity data, it is best to perform analytics close to the source of the measurement.
- Variety: In some IIoT systems, the problem may not be so much velocity or volume, but variety. This can occur when there are many types of equipment for similar functions, with different interfaces, controls, and data measurements. This situation can occur as equipment is purchased over time or in the event of acquisitions. Reliable analytics depends on the ability to interpret the format (syntax) and the content (semantics).
- Analytics maturity: The ability to benefit from analytics results should not be limited to where the analytics are performed. Measurements, information, and results can provide additional value when aligned with data from the broader system, outside events, and business and operational functions.
- Temporal correlation: When the analytics requirements involve the correlation of data from multiple devices, sensors, and control states; the correlation is better performed at the lower tier near the data-collection points, rather than trying to perform correlations for analytics in a higher tier.
- Provenance: It is often desirable or required to maintain the lineage of data from the source. Once measures are combined in computations in higher architecture layers, it is difficult to maintain the lineage. Performing analytics at or close to the source makes it easier to maintain the data’s lineage.
- Compliance: Government regulations may impose restrictions that will impact architectural decisions. Regulations pertaining to security, privacy, and so on can prevent large-scale analytics from being performed at a higher architectural tier and in the cloud.
These combined characteristics determine where the analytic capabilities should be deployed. Most systems take a hybrid approach, with both local and centralized analytics. The main factors in determining the location of the analytic capacity include the maximum acceptable network latency and jitter for events, the analytics’ criticality to the operation, and the cost of transmitting large amounts of data.
The following table (industrial analytics location) lists the considerations for analytics deployment between plant, enterprise (on-premises), and the cloud (hosted):
Criteria |
Plant |
Enterprise |
Cloud |
Analysis scope |
|
|
|
Single-site optimization |
x |
x |
x |
Multi-site comparison |
|
x |
x |
Multi-customer bench-marking |
|
|
x |
Results response time |
|
|
|
Control loop |
x |
|
|
Human decision |
x |
x |
|
Planning horizon |
x |
x |
|
Connectivity reliability |
|
|
|
Site |
x |
|
|
Organization |
x |
x |
|
Global |
x |
x |
x |
Connectivity bandwidth |
|
|
|
Raw data |
x |
|
|
Processed results |
x |
x |
|
Summarized results |
x |
x |
x |
Storage and compute capacity |
|
|
|
Server |
x |
x |
x |
Multiple servers |
|
x |
x |
Data center |
|
|
x |
Data security |
|
|
|
Secret |
x |
x |
|
Proprietary |
x |
x |
|
Shared |
x |
x |
x |
Data characteristics |
|
|
|
Volume |
|
|
x |
Velocity |
x |
|
|
Variety |
x |
x |
x |
Analytics maturity |
|
|
|
Descriptive |
x |
x |
x |
Predictive |
x |
x |
x |
Prescriptive |
x |
x |
x |
Event correlation |
|
|
|
Sub-seconds |
x |
|
|
Seconds |
x |
x |
|
Tens of seconds |
x |
x |
x |
Data provenance |
|
|
|
Sensor |
x |
|
|
Assent |
x |
x |
|
Site |
x |
x |
x |
Regulatory compliance |
|
|
|
Asset |
x |
x |
x |
Process |
|
x |
x |
Industry |
|
|
x |