Actions and transformations

There are 2 types of Spark operations:

  • Transformations
  • Actions

Transformations specify general data manipulation operations such as filtering data, joining data, performing aggregations, sampling data, and so on. Transformations do not return any result when the line containing the transformation operation in the code is executed. Instead, the command, upon execution, supplements Spark's internal DAG with the corresponding operation request. Examples of common transformations include: map, filter, groupBy, union, coalesce, and many others.

Actions, on the other hand, return results. Namely, they execute the series of transformations (if any) that the user may have specified on the corresponding RDD and produce an output. In other words, actions trigger the execution of the steps in the DAG. Common Actions include: reduce, collect, take, aggregate, foreach, and many others.

Note that RDDs are immutable. They cannot be changed; transformations and actions will always produce new RDDs, but never modify existing ones.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.93.0