Collecting All That Data - Strategies and Techniques

You stare at your drawing of the IoT device hanging on the wall of your cubicle, lost in thought on the ways you might manipulate the data to squeeze out game changing insights. You can almost hear your colleagues cheer as you accept the Executive Award for best project of the year and the huge bonus that goes with it.

"Ahem!" someone coughs behind you. You almost jump out of your chair.

Your boss has sidled up to your cubicle. He looks both cheerful and amused. You are a little concerned at the amused part.

"You did an excellent job selling them on using the cloud for analytics, and they are fully on board and want to start immediately," he says with a big grin. You perk up, as this is great news.

"You did so well," he continues with a smirk, "that they want to double the data capture rate on the next generation of devices. They figure the cost will not change much if it is routed through cloud infrastructure. And since capacity restraints won't be an issue anymore, they want the monthly reports on a weekly schedule now. And several of the executives in other departments were very excited and want their own people to be able to use the data as well. Good work!"

He walks off chuckling to himself. You are, at the same time, happy at the outcome and bewildered about how to deliver on their expectations. How do you store the data in a way others can interact with it? And you are certain there will be a much broader set of questions that will need to be answered now, especially when other departments have people looking at the data as well. Whatever you do has to be able to handle huge scale with lots of flexibility.

This chapter is about strategies to collect IoT data in order to enable analytics. There are many options to store IoT data for analytics. We will review a common technology in the field, Hadoop, along with how to use Amazon S3 as a big data store. The chapter also describes when and why to use Spark for data processing. We will discuss tradeoffs between streaming and batch processing. Building flexibility into data processing in order to allow integration of future analytics will also be reviewed.

The chapter covers the following topics:

  • Designing data processing for analytics
  • Applying big data technology to storage
  • Apache Spark for data processing
  • Handling change
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.214.215