Problems with IoT Streaming Ingestion

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

• Sometimes, you have to address bad data when sensors are aected in a few devices.

• It is possible that you have to clean data and ﬁlter out sensitive data, which cannot be

saved in a few selected governance zones.

• This chapter can help you to understand how StreamSets Data Collector can help in the

simpliﬁcation of data movement.

• Regardless of the fact which type of data the sensors and devices generate, usually the

backend architecture is the same. Such architecture can have the following characteristics.

• A publisher/subscriber mechanism or MQTT message brokers like Kaa, RabbitMQ, or

Solace Systems is used for the incoming data.

• A StreamSets Data Collector pipeline is used for routing, cleaning, and enriching data.

• A Hadoop cluster that can help to improve the analysis and processing of the data.

The StreamSets Data Collector oers a drag-and-drop user interface for designing, testing, and

operating the pipelines for data ﬂow. This system is created to support continuous processes

and accepts data from several streaming points like the Apache Kaa, RabbitMQ, and Cloudera

Kaa. Built-in transformation processors allow implementing sanitization methods for merg-

ing, masking, hashing, splitting, lookups, and parsing. The list of processors is continuously

increasing. If you want to use your own customized logic then you can use Jython, JavaScript or

Groovy processors. Lastly, you can use an API to generate Java-based processor stages.

The pipeline of the StreamSets Data Collector uses the memory to execute each transfor-

mation and performs ordered set delivering via the delivery semantics known as “At Least Once

or At Most Once”. The IDE supports DevOps and can help you with the generation, testing, and

running of the pipelines so you can convert your streaming internet of things data and use it for

a dataset that is ready for consumption. This data can now be used for visualization or analysis.

During execution mode, you can take advantage of high runtime visibility that can help

you to assess your data ﬂows. This includes error rates, processing time, and throughput for

all the pipeline stages. You can also create rules and alerts on the basis of a threshold to handle

scenarios when the rate of processing slows down or when you have to deal with anomalous

data values.

Problems with IoT Streaming Ingestion

Among the common problems that you can ﬁnd in large-scale internet of things deployment,

there are those with data seeping issues because of the aged devices. There is also the dilemma

of multiple device versions that are distributed over the installed base. You have to also consider

the requirement to improve the data before it is sent to the data store. Lastly, you have to plan

how the system can handle several sensor streams.

Managing Bad Data

As IoT deals with thousands of devices, therefore, there will be a time when you ﬁnd a device

with poor calibration or one which has become defective. These issues have to be solved before

they are sent to the data store.

StreamSets Data Collector can discover the issue and present insights for the errors along

with the exact error record. Additionally, it can also display a stack-trace for the failure con-

dition. All of this is processed without aecting the primary pipeline. The error records can

be stored on the disk or a secondary pipeline that is connected to Elasticsearch or Kaa for

remediation.

90 Internet of Things

Internet_of_Things_CH04_pp081-104.indd 90 9/3/2019 10:13:31 AM

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Problems with IoT Streaming Ingestion

Create new playlist

Sign In

Sign Up

Table of Contents for
Problems with IoT Streaming Ingestion