Chapter 2.  Data Processing Using the DataStream API

Real-time analytics is currently an important issue. Many different domains need to process data in real time. So far there have been multiple technologies trying to provide this capability. Technologies such as Storm and Spark have been on the market for a long time now. Applications derived from the Internet of Things (IoT) need data to be stored, processed, and analyzed in real or near real time. In order to cater for such needs, Flink provides a streaming data processing API called DataStream API.

In this chapter, we are going to look at the details relating to DataStream API, covering the following topics:

  • Execution environment
  • Data sources
  • Transformations
  • Data sinks
  • Connectors
  •  Use case - sensor data analytics

Any Flink program works on a certain defined anatomy as follows:

Data Processing Using the DataStream API

We will be looking at each step and how we can use DataStream API with this anatomy.

Execution environment

In order to start writing a Flink program, we first need to get an existing execution environment or create one. Depending upon what you are trying to do, Flink supports:

  • Getting an already existing Flink environment
  • Creating a local environment
  • Creating a remote environment

Typically, you only need to use getExecutionEnvironment(). This will do the right thing based on your context. If you are executing on a local environment in an IDE then it will start a local execution environment. Otherwise, if you are executing the JAR then the Flink cluster manager will execute the program in a distributed manner.

If you want to create a local or remote environment on your own then you can also choose do so by using methods such as createLocalEnvironment() and createRemoteEnvironment (String host, int port, String, and .jar files).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.93.132