Directed acyclic graphs

In computer science and mathematics parlance, a directed acyclic graph represents pairs of nodes (also known as vertices) connected with edges (or lines) that are unidirectional. Namely, given Node A and Node B, the edge can connect A à B or B à A but not both. In other words, there isn't a circular relationship between any pair of nodes.

Spark leverages the concept of DAG to build an internal workflow that delineates the different stages of processing in a Spark job. Conceptually, this is akin to creating a virtual flowchart of the series of steps needed to obtain a certain output. For instance, if the required output involves producing a count of words in a document, the intermediary steps map-shuffle-reduce can be represented as a series of actions that lead to the final result. By maintaining such a map, Spark is able to keep track of the dependencies involved in the operation. More specifically, RDDs are the nodes, and transformations, which are discussed later in this section, are the edges of the DAG.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.166.149