The simplest stateless transformation is a map function where each passing element is treated independent of each other. In the following example, we will read incoming messages from a stream and map it to FlightDetails objects as follows:
JavaReceiverInputDStream<String> inStream= jssc.socketTextStream("localhost", 9999); JavaDStream<FlightDetails> flightDetailsStream = inStream.map(x -> { ObjectMapper mapper = new ObjectMapper(); return mapper.readValue(x, FlightDetails.class); });
Incoming messages can be printed or be stored hdfs as follows:
flightDetailsStream.print(); flightDetailsStream.foreachRDD((VoidFunction<JavaRDD<FlightDetails>>) rdd -> rdd.saveAsTextFile("hdfs://namenode:port/path"));
Some common stateless transformation are shown in the following table:
Transformation |
Description |
flatMap(func) |
Returns a DStream of flattened data; that is, each input item can return more than one output item |
filter(func) |
Filters the record satisfying the function passed as an argument |
join(otherStream, [numTasks]) |
For a DStream having a pair RDDs of (K, V) and (K, X) it returns a new DStream having pair RDDs (k, (V,X)) |
union(otherStream) |
A DStream is retuned having union of element of two different DStreams |
count() |
Returns a new DStream of single element RDDs by counting the number of elements in each RDD of the source DStream |
reduce(func) |
The aggregate function reduces the elements of a DStream while returning a new DStream of single element RDDs |
reduceByKey(func, [numTasks]) |
For a pair DStream RDD, it returns a new DStream of (K, V) pairs where the values for each key are aggregated using the given reduce function |