Calculating PageRank

In this section, we will load data about users and reload data about their followers. We will use the graph API and the structure of our data, and we will calculate PageRank to calculate the rank of users.

First, we need to load edgeListFile, as follows:

package com.tomekl007.chapter_7

import org.apache.spark.graphx.GraphLoader
import org.apache.spark.sql.SparkSession
import org.scalatest.FunSuite
import org.scalatest.Matchers._

class PageRankTest extends FunSuite {
private val sc = SparkSession.builder().master("local[2]").getOrCreate().sparkContext

test("should calculate page rank using GraphX API") {
//given
val graph = GraphLoader.edgeListFile(sc, getClass.getResource("/pagerank/followers.txt").getPath)

We have a followers.txt file; the following screenshot shows the format of the file, which is similar to the file we saw in the Creating the loader component section:

We can see that there's a relationship between each of the vertex IDs. Hence, we are loading the graph from the followers.txt file and then issuing PageRank. We are taking vertices that will be needed, as follows:

    val ranks = graph.pageRank(0.0001).vertices

PageRank will calculate the influence and relationship between our vertices.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.72.15