In Chapter 7, Processing Massive Datasets with Parallel Streams – The Map and Reduce Model, we introduced the concept of stream, the new Java 8 feature. A stream is a sequence of elements that can be processed in a parallel or sequential way. In this chapter, you will learn how to work with streams with the following topics:
collect()
methodIn Chapter 7, Processing Massive Datasets with Parallel Streams – The Map and Reduce Model, we made an introduction to streams. Let's remember their most important characteristics:
A stream is formed by the following three main elements:
The Stream
API provides different terminal operations, but there are two more significant operations for their flexibility and power. In Chapter 7, Processing Massive Datasets with Parallel Streams – The Map and Reduce Model, you learned how to use the reduce()
method, and in this chapter, you will learn how to use the collect()
method. Let's make an introduction to this method.
The collect()
method allows you to transform and group the elements of the stream generating a new data structure with the final results of the stream. You can use up to three different data types: an input data type, the data type of the input elements that come from the stream, an intermediate data type used to store the elements while the collect()
method is running, and an output data type returned by the collect()
method.
There are two different versions of the collect()
method. The first version accepts the following three functional parameters:
This version of the collect()
method works with two different data types: the input data type of the elements that comes from the stream and the intermediate data type that will be used to store the intermediate elements and to return the final result.
The second version of the collect()
method accepts an object that implements the Collector
interface. You can implement this interface by yourself, but it's easier to use the Collector.of()
static method. The arguments of this method are as follows:
Actually, there's slight difference between the two versions. The three-param collect accepts a combiner, that is BiConsumer
, and it must merge the second intermediate result into the first one. Unlike it, this combiner is BinaryOperator
and should return the combiner. Therefore, it has the freedom to merge either the second inside the first or the first inside the second, or create a new intermediate result. There is another version of the of()
method, which accepts the same arguments except the finisher; in this case, the finishing transformation is not performed.
Java provides you with some predefined collectors in the Collectors
factory class. You can get those collectors using one of its static methods. Some of those methods are:
averagingDouble()
, averagingInt()
, and averagingLong()
: This returns a collector that allows you to calculate the arithmetic mean of a double
, int
, or long
function.groupingBy()
: This returns a collector that allows you to group the elements of a stream by an attribute of its objects generating a map where the keys are the values of the selected attribute and the values are a list of the objects that have a determined value.groupingByConcurrent()
: This is similar to the previous one except for two important differences. The first one is that it may work faster in the parallel but slower in the sequential mode than the groupingBy()
method. The second and most important difference is that groupingByConcurrent()
function is an unordered collector. The items in the lists are not guaranteed to be in the same order as in the stream. The groupingBy()
collector on the other hand guarantees the ordering.joining()
: This returns a Collector
factory class that concatenates the input elements into a string.partitioningBy()
: This returns a Collector
factory class that makes a partition of the input elements based on the results of a predicate.summarizingDouble()
, summarizingInt()
, and summarizingLong()
: These return a Collector
factory class that calculates summary statistics of the input elements.toMap()
: This returns a Collector
factory class that allows you to transform input elements into a map based on two mapping functions.toConcurrentMap()
: This is similar to the previous one, but in a concurrent way. Without custom merger, toConcurrentMap()
is just faster for parallel streams. As occurs with groupingByConcurrent()
, this is an unordered collector too, whereas toMap()
uses the encounter order to make the conversion.toList()
:This returns a Collector
factory class that stores the input elements into a list.toCollection()
: This method allows you to accumulate the input elements into a new Collection
factory class (TreeSet
, LinkedHashSet
, and so on) in the encounter order. The method receives an implementation of the Supplier
interface that creates the collection as a parameter.maxBy()
and minBy()
: This returns a Collector
factory class that produces the maximal and minimal element according to the comparator passed as a parameter.toSet()
: This returns a Collector
that stores the input elements into a set.3.145.63.62