Loading and reloading data about users and followers

To find out which user has which name, we need to load the users.txt file. The users.txt file assigns the VertexId with a username and its own name. We use the following code:

    val users = sc.textFile(getClass.getResource("/pagerank/users.txt").getPath).map { line =>

The following is the users.txt file:

We are splitting on the comma and the first group is our integer, which will be vertex ID, and then fields(1) is the name of vertex, as follows:

      val fields = line.split(",")
(fields(0).toLong, fields(1))
}

Next, we will join the users with ranks. We will join the users using the VertexId by using the username and rank of the user. Once we have that, we can sort everything by the rank, so we will take a second element of the tuple and it should be sorted as sortBy ((t) =>t.2. At the beginning of the file, we will have the user with the most influence:

    //when
val rankByUsername = users.join(ranks).map {
case (_, (username, rank)) => (username, rank)
}.sortBy((t) => t._2, ascending = false)
.collect()
.toList

We will print the following and order the rankByUsername, as follows:

    println(rankByUsername)
//then
rankByUsername.map(_._1) should contain theSameElementsInOrderAs List(
"BarackObama",
"ladygaga",
"odersky",
"jeresig",
"matei_zaharia",
"justinbieber"
)
}

}
If we skip the sortBy method, Spark does not guarantee any ordering of elements; to keep the ordering, we need to issue the sortBy method.

After running the code, we get the following output:

When we start running this test, we can see whether the GraphX PageRank was able to calculate the influence of our users. We get the output that's shown in the preceding screenshot, where BarackObama was first with 1.45 influence, then ladygaga with an influence of 1.39, odersky with 1.29, jeresig with 0.99, matai_zaharia with 0.70, and at the end, justinbieber with an influence of 0.15.

From the preceding information, we were able to calculate complex algorithms with a minimal amount of code.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.199.184