Streaming

As you may have guessed, the Streaming tab shows information about Spark Streaming jobs. This tab appears when you execute a streaming job in SparkContext.

We will learn about the working of Spark Streaming framework in Chapter 9, Near Real-Time Processing with Spark Streaming. For now, let us run the following job that will listen to port 10000 and run word count on the incoming messages.

First, open a connection on port 10000:

sparkuser@~$ nc -lk 10000

Now, execute a NetworkWordCount example bundled in Spark Package:

sparkuser@~$ $SPARK_HOME/bin/run-example streaming.NetworkWordCountlocalhost 10000

When you run it, the Streaming tab appears and it can be accessed at http://localhost:4040/streaming/:

It provides a lot of information about the streaming job such as the input rate, the scheduling delay while processing streams, the processing time of the streams, and so on. This information becomes really useful for performance tuning of Spark Streaming jobs as stream processing needs to cope with the speed of incoming data to avoid delays.

Also, it provides information about batches processed, which is useful while debugging issues.

In this section, we will learn about various capabilities of Spark UI.

In the next section, we will learn about some of the REST APIs, which are useful for monitoring Spark jobs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.247.11