Stateless transformation

The simplest stateless transformation is a map function where each passing element is treated independent of each other. In the following example, we will read incoming messages from a stream and map it to FlightDetails objects as follows:

JavaReceiverInputDStream<String> inStream= jssc.socketTextStream("localhost", 9999); 
JavaDStream<FlightDetails> flightDetailsStream = -> { 
               ObjectMapper mapper = new ObjectMapper(); 
               return mapper.readValue(x, FlightDetails.class); 

Incoming messages can be printed or be stored hdfs as follows:

flightDetailsStream.foreachRDD((VoidFunction<JavaRDD<FlightDetails>>) rdd -> rdd.saveAsTextFile("hdfs://namenode:port/path")); 

Some common stateless transformation are shown in the following table:




Returns a DStream of flattened data; that is, each input item can return more than one output item


Filters the record satisfying the function passed as an argument

join(otherStream, [numTasks])

For a DStream having a pair RDDs of (K, V) and (K, X) it returns a new DStream having pair RDDs (k, (V,X))


A DStream is retuned having union of element of two different DStreams


Returns a new DStream of single element RDDs by counting the number of elements in each RDD of the source DStream


The aggregate function reduces the elements of a DStream while returning a new DStream of single element RDDs

reduceByKey(func, [numTasks])

For a pair DStream RDD, it returns a new DStream of (K, V) pairs where the values for each key are aggregated using the given reduce function

..................Content has been hidden....................

