Chapter 8. Graphs

A graph is a type of data structure capable of handling networks. Graphs are widely used across various domains such as the following:

  • Transportation: To find the shortest routes to travel between two places
  • Communication-signaling networks: To optimize the network of inter-connected computers and systems
  • Understanding relationships: To build relationship trees across families or organizations
  • Hydrology: To perform flow regime simulation analysis of various fluids

This current chapter will build fundamentals for graphs and the following topics will be covered:

  • Terminology and representations
  • Graph implementations
  • Graph traversals
    • Depth-first search
    • Breadth-first search
    • Topological sort
  • Shortest-paths problems
    • Single-source shortest paths
  • Minimum-cost spanning trees
    • Prim's algorithm
    • Kruskal's algorithm

Terminology and representations

A graph (G) is a network of vertices (V) interconnected using a set of edges (E). Let |V| represent the count of vertices and |E| represent the count of edges. The value of |E| lies in the range of 0 to |V|2 - |V|. Based on the directional edges, the graphs are classified as directed or undirected. In directed graphs, the edges are directed from one vertex towards the other, whereas in undirected graphs, each vertex has an equal probability of being directionally connected with the others. An undirected graph is said to be connected if all the vertices are connected with at least one edge. If the vertices are indexed, then it is said to be a labeled graph, and if the edges are associated with some value (cost or weights), then it is said to be a weighted graph. Adjacent vertices (P and Q) connected by an edge are termed as neighbors (P, Q), and the connecting edge is termed as an incident. Figure 8.1 represents undirected, directed, and labeled (with weights) graphs.

Terminology and representations

Figure 8.1: Representation of an undirected graph, directed graph, and labeled (directed) graph (from left to right)

Consider a graph with n vertices. A sequence of interconnected vertices (v1, v2, v... vn) is termed as a path, and the path is said to be simple if all the vertices of the path are unique. The length of the path is the number of edges, which is one less than the number of vertices (n-1). In case the vertices of a given path are not unique and the length of the path is greater than two, then the path becomes a cycle. A cycle is simple if all the intermediate vertices are unique and only the first and last vertices are same. An undirected graph with no cycles is called an acyclic graph, and a directed graph with no cycles is called a directed acyclic graph (DAG).

A graph can be further split into multiple subgraphs (S) as shown in Figure 8.2.

Terminology and representations

Figure 8.2: A graph split into three sub-graphs. Even a single vertex forms a graph

A free tree is a form of a connected undirected graph with no cycles in simple form. It has |V|-1 number of edges.

Consider a graph (G) with n number of vertices, which can be represented in two forms. These forms can be directly used to perform mathematical computations:

  • Adjacency matrix: The adjacency matrix is an n×n array, with rows representing the from vertices and columns representing to vertices. The numbers in the matrix can either denote the presence of a directed connection between two vertices, or can indicate the weight (or distance) associated with the edge connecting the two vertices. As each position in the adjacency matrix can take in a numeric value, it requires one bit of memory. Thus, the asymptote of total memory requirement is θ(|V|2).
  • Adjacency list: As the name suggests, the adjacency list is an array of linked lists of length n. Each position in the array stores the pointers connecting linked lists of its adjacent connected vertices, and each linked list stores the value of its connecting edge. Unlike an adjacency matrix, the memory requirement of adjacency lists depends on both, the number of vertices (|V|) and the number of edges (|E|). The array takes into account vertex memory requirement, and the list takes in the account of edge memory requirement. Thus, the asymptote of total memory requirement is θ(|V| + |E|).

The aforementioned two forms of representation can be used to transform and store both directed and undirected graphs. Also, based on the number of interconnecting edges, the graphs can be termed as sparse or dense. A graph with a relatively less number of edges is termed as sparse, whereas a graph with a relatively large number of edges is termed as dense. Also, in case of all interconnected vertices, the graph is termed as complete (special case of a dense graph).

Assume two vertices, P and Q. If P and Q belong to a directed graph with P pointed towards Q, then in case of an adjacency matrix, only the position of P (in row) and Q (in column) would be filled with the edge value, and the position of Q (in row) and P (in column) would be left blank. Similarly, in case of adjacency lists, the array will have both the vertices P and Q, but the edge value will be assigned only to the linked list of P, which stores a pointer towards Q. If P and Q belong to an undirected graph, then both the positions of P and Q will be filled with the edge value in case of an adjacency matrix, and both the edge values will be assigned to both the linked lists of P and Q in case of adjacency lists. Figure 8.3 elucidates both directed and undirected graphs along with their respective adjacency matrices and adjacency lists.

In an adjacency matrix, one denotes the presence of a directed connection (from row to column) and zero represents no connection.

Terminology and representations

Figure 8.3: Representation of graphs using an adjacency matrix and adjacency lists. Part 1 shows a directed graph and Part 2 shows an undirected graph. Part 3 and 5 shows an adjacency matrix and adjacency list for directed graph. Parts 4 and 6 shows an adjacency matrix and adjacency list for an undirected graph.

Now let us analyze the memory efficiency of an adjacency matrix and adjacency list. The first differentiating factor is the number of edges in the graph. The matrix formulation of the adjacency matrix requires memory for each possible edge irrespective of its existence, whereas the linked list formulation of adjacency lists stores only those edges that are present in the graph. On the contrary, adjacency lists require additional memory for storing pointers, which can sometimes be relatively costlier. The cost primarily depends on the value of the edge (costlier in case of a binary flag indicating the mere existence of a connection). Thus, an adjacency matrix is relatively more efficient in case of denser graphs, as it has more number of edges (thereby more edge values), and an adjacency list is relatively more efficient in case of sparse graphs, as it has less number of edges (thereby less pointers to be stored).

The second differentiating factor is the number of computations required to complete a single iteration. Asymptotically, an adjacency matrix is relatively costlier than adjacency lists. In case of an adjacency matrix, all the positions within the matrix need to be scanned prior to the completion of the iteration, whereas in case of adjacency lists, only the linked connections (using pointers) are scanned, thereby reducing unnecessary lookups. Thus, the system runtime of an adjacency matrix is θ(|V|2) and that of adjacency lists is θ(|V| + |E|). Thus, the performance of an adjacency matrix is relatively poorer in case of sparse graphs, and is almost comparable to adjacency lists in case of dense graphs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.85.135