Introduction to GraphX

As per the Apache Spark documentation: "GraphX is Apache Spark's API for graphs and graph-parallel computation". Graph based computations have become very popular with the advancement of technologies. Whether it is finding the shortest path between two points, matching DNA, or social media, graph computations have become ubiquitous.

Graph consists of a vertex and edges, where a vertex defines entities or nodes and edges defines the relationships from entities. Edges can be one directional or bidirectional based on the requirement. For example, an edge describing friendship relations between two users on Facebook is bidirectional; however, an edge describing follower relations between two users on Twitter may or may not be bidirectional because one can follow another person on Twitter without being followed by that person.

The Spark Graphx library helps to run graph-based computations on top of Spark. It provides a graph-based abstraction on the Spark RDD called a Property Graph, which is a directed multigraph where every vertex and edge is associated with a property. The GraphX library provides various sets of transformations to run graph. It also comes with basic graph-based algorithms: for example, PageRank, Triangle Count, and so on, which helps to run graph-based analytics.

Let's start exploring the Graphx library!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.43.26