Chapter 4. Modeling Data for Neo4j

In this chapter, we will get started with some graph database modeling in Neo4j. As this type of modeling can be quite different from what we are typically used to with our relational database backgrounds, we will start by explaining the fundamental constructs first and then explore some recommended approaches.

We will cover the following topics in this chapter:

  • Modeling principles and how-to's
  • Modeling pitfalls and best practices

The four fundamental data constructs

As you may already know by now, graph theory gives us many different graphs to work with. Graphs come in many different shapes and sizes, and therefore, Neo4j needed to choose a very specific type of data structure that is flexible enough to support the versatility required by real-world datasets. This is why the underlying data model of Neo4j, the labeled property graph, is one of the most generic and versatile of all graph models.

This graph data model gives us four different fundamental building blocks to structure and store our data. Let's go through them:

The four fundamental data constructs

The labeled property graph model

  • Nodes: These are typically used to store entity information. In the preceding example, these are the individual books, readers, and authors that are present in the library data model.
  • Relationships: These are used to connect nodes to one another explicitly and therefore provide a means of structuring your entities. They are the equivalent of an explicitly stored, and therefore pre-calculated, join-like operation in a relational database management system. As we have seen in the previous chapters, joins are no longer a query-time operation—they are as simple as the traversal of a relationship connecting two nodes. Relationships always have a type, a start- and an end-node, and a direction. They can be self-referencing/looping and can never be dangling (missing start- or end-node).
  • Properties: Both nodes and relationships are containers for properties, which are effectively name/value pairs. In the case of the nodes, this is very intuitive. Just like a record in the relational database world has one or more fields or attributes, so can the node have one or more properties. Less intuitive is the fact that relationships can have properties too. These are used to further qualify the strength or quality of a relationship and can be used during queries/traversals to evaluate the patterns that we are looking for.
  • Labels: This was a fundamental data model construct that was added to Neo4j with Version 2.0 at the end of 2013. Labels are a means to quickly and efficiently create subgraphs. By assigning labels to nodes, Neo4j makes the data model of most users a lot simpler. There is no longer a need to work with a type property on the nodes, or a need to connect nodes to definition nodes that provide meta-information about the graph. Neo4j now does this out of the box—and this is a huge asset, now and for the future. At the time of writing this book, labels are primarily used for indexing and some limited schema constraints. However, in future, it is likely that the structural understanding that labels provide about the data stored in the graph will be used for other purposes such as additional schema, security, graph sharding/distribution—and perhaps others.

With these four data constructs, we can now start working with Neo4j.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.