Let's perform the fourth execution to validate that the output has now reached its final stable state.
$ hadoop jar graph.jarGraphPathgraphout3graphout4
$ hadoop fs -cat /user/hadoop/graphout4/part-r-00000 12,3,40D 21,41D 31,5,61D 41,21D 53,62D 63,52D 76-1P
The output is as expected; since node 7 is not reachable by node 1 or any of its neighbors, it will remain Pending and never be processed further. Consequently, our graph is unchanged as shown in the following figure:
The one thing we did not build into our algorithm was an understanding of a terminating condition; the process is complete if a run does not create any new D or C nodes.
The mechanism we use here is manual, that is, we knew by examination that the graph representation had reached its final stable state. There are ways of doing this programmatically, however. In a later chapter, we will discuss custom job counters; we can, for example, increment a counter every time a new D or C node is created and only reexecute the job if that counter is greater than zero after the run.
The previous algorithm is the first time we have explicitly used the output of one MapReduce job as the input to another. In most cases, the jobs are different; but, as we have seen, there is value in repeatedly applying an algorithm until the output reaches a stable state.
For anyone familiar with graph algorithms, the previous process will seem very alien. This is simply a consequence of the fact that we are implementing a stateful and potentially recursive global and reentrant algorithm as a series of serial stateless MapReduce jobs. The important fact is not in the particular algorithm used; the lesson is in how we can take flat text structures and a series of MapReduce jobs, and from this implement something like graph traversal. You may have problems that at first don't appear to have any way of being implemented in the MapReduce paradigm; consider some of the techniques used here and remember that many algorithms can be modeled in MapReduce. They may look very different from the traditional approach, but the goal is the correct output and not an implementation of a known algorithm.
18.191.195.111