Graph Analytics with GraphX

In our interconnected world, graphs are omnipresent. The World Wide Web (WWW) is just one example of a complex structure that we can consider a graph, in which web pages represent entities that are connected by incoming and outgoing links between them. In Facebook’s social graph, many millions of users form a network, connecting friends around the globe. Many other important structures that we see and can collect data for today come equipped with a natural graph structure; that is, they can, at a very basic level, be understood as a collection of vertices that are connected to each other in a certain way by what we call edges. Stated in this generality, this observation reflects how ubiquitous graphs are. What makes it valuable is that the graphs are well-studied structures and that there are many algorithms available that allow us to gain important insights about what these graphs represent.

Spark’s GraphX library is a natural entry point to study graphs at scale. Leveraging RDDs from the Spark core to encode vertices and edges, we can do graph analytics on vast amounts of data with GraphX. To give an overview, you will learn about the following topics in this chapter:

  • Basic graph properties and important graph operations
  • How GraphX represents property graphs and how to work with them
  • Loading graph data in various ways and generating synthetic graph data to experiment with
  • Essential graph properties by using GraphX’s core engine
  • Visualizing graphs with an open source tool called Gephi
  • Implementing efficient graph-parallel algorithms using two of GraphX’s key APIs.
  • Using GraphFrames, an extension of DataFrames to graphs, and studying graphs using an elegant query language
  • Running important graph algorithms available in GraphX on a social graph, consisting of retweets and a graph of actors appearing in movies together
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.40.182