Stateless transformation

The simplest stateless transformation is a map function where each passing element is treated independent of each other. In the following example, we will read incoming messages from a stream and map it to FlightDetails objects as follows:

JavaReceiverInputDStream<String> inStream= jssc.socketTextStream("localhost", 9999); 
JavaDStream<FlightDetails> flightDetailsStream = inStream.map(x -> { 
               ObjectMapper mapper = new ObjectMapper(); 
               return mapper.readValue(x, FlightDetails.class); 
});

Incoming messages can be printed or be stored hdfs as follows:

flightDetailsStream.print(); 
          
flightDetailsStream.foreachRDD((VoidFunction<JavaRDD<FlightDetails>>) rdd -> rdd.saveAsTextFile("hdfs://namenode:port/path")); 

Some common stateless transformation are shown in the following table:

Transformation

Description

flatMap(func)

Returns a DStream of flattened data; that is, each input item can return more than one output item

filter(func)

Filters the record satisfying the function passed as an argument

join(otherStream, [numTasks])

For a DStream having a pair RDDs of (K, V) and (K, X) it returns a new DStream having pair RDDs (k, (V,X))

union(otherStream)

A DStream is retuned having union of element of two different DStreams

count()

Returns a new DStream of single element RDDs by counting the number of elements in each RDD of the source DStream

reduce(func)

The aggregate function reduces the elements of a DStream while returning a new DStream of single element RDDs

reduceByKey(func, [numTasks])

For a pair DStream RDD, it returns a new DStream of (K, V) pairs where the values for each key are aggregated using the given reduce function

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.71.94