Example 3 – PageRank

The PageRank algorithm provides a ranking value for each of the vertices in a graph. It makes the assumption that the vertices that are connected to the most edges are the most important.

Search engines use PageRank to provide ordering for page display during a web search as can be seen from the following code:

val tolerance = 0.0001
val ranking = graph.pageRank(tolerance).vertices
val rankByPerson = vertices.join(ranking).map {
case (id, ( (person,age) , rank )) => (rank, id, person)
}

The example code creates a tolerance value and calls the graph pageRank method using it. The vertices are then ranked into a new value ranking. In order to make the ranking more meaningful, the ranking values are joined with the original vertices RDD. The rankByPerson value then contains the rank, id, and person name.

The PageRank result held in rankByPerson is then printed record by record using a case statement to identify the record contents and a format statement to print the contents. We did this because we wanted to define the format of the rank value, which can vary:

rankByPerson.collect().foreach {
case (rank, id, person) =>
println ( f"Rank $rank%1.2f id $id person $person")
}

The output from the application is then shown as follows; as expected, Mike and Sarah have the highest rank as they have the most relationships:

Rank 0.15 id 4 person Jim
Rank 0.15 id 6 person Flo
Rank 1.62 id 2 person Sarah
Rank 1.82 id 1 person Mike
Rank 1.13 id 3 person John
Rank 1.13 id 5 person Kate
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.138.178