Actions on key/value pairs

In this section, we'll be looking at the actions on key/value pairs.

We will cover the following topics:

  • Examining actions on key/value pairs
  • Using collect()
  • Examining the output for the key/value RDD

In the first section of this chapter, we covered transformations that are available on key/value pairs. We saw that they are a bit different compared to RDDs. Also, for actions, it is slightly different in terms of result but not in the method name.

Therefore, we'll be using collect() and we'll be examining the output of our action on these key/value pairs.

First, we will create our transactions array and RDD according to userId, as shown in the following example:

 val keysWithValuesList =
Array(
UserTransaction("A", 100),
UserTransaction("B", 4),
UserTransaction("A", 100001),
UserTransaction("B", 10),
UserTransaction("C", 10)
)

The first action that comes to our mind is to collect(). collect() takes every element and assigns it to the result, and thus our result is very different than the result of keyBy.

Our result is a pair of keys, userId, and a value, that is, UserTransaction. We can see, from the following example, that we can have a duplicated key:

 res should contain theSameElementsAs List(
("A",UserTransaction("A",100)),
("B",UserTransaction("B",4)),
("A",UserTransaction("A",100001)),
("B",UserTransaction("B",10)),
("C",UserTransaction("C",10))
)//note duplicated key

As we can see in the preceding code, we have multiple occurrences of the same order. For a simple key as a string, duplication is not very expensive. However, if we have a more complex key, it will be expensive.

So, let's start this test, as shown in the following example:

We can see, from the preceding output, that our test has passed. To see the other actions, we will look at different methods.

If a method is returning RDD, such as collect[U] (f: PartialFunction[(String, UserTransaction), U]), it means that this is not an action. If something returns RDD, it means that it is not an action. This is the case for key/value pairs.

collect() does not return an RDD but returns an array, thus it is an action. count returns long, so this is also an action. countByKey returns map. If we want to reduce our elements, then this is an action, but reduceByKey is not an action. This is the big difference between reduce and reduceByKey.

We can see that everything is normal according to the RDD, so actions are the same and differences are only in transformation.

In the next section, we will be looking at the available partitioners on key/value data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.168.203