CHAPTER 5

image

Managing Your Data in Neo4j

Thanks to it being a graph database, Neo4j actually gives you a lot of freedom when it comes to how you structure your data. When you’re using something like MySQL, if you want to perform relationships, you have to adhere to certain rules. You’ll need to have some kind of joining table for the data to join the different tables. Of course the way you structure the data is still similar in some respects, such as relating one node to another, but you don’t need to use a table to do that; you could use a relationship, or a node and multiple relationships.

A quick note about Gists

In this chapter, we’ll be covering some common pitfalls that can catch beginners, and then some example data structures that could be adapted for your own use. There’s one issue with books that mention code in that sometimes, there are errors in the code, or as updates to the software happen, the examples just don’t apply anymore. To avoid that, the examples used in this chapter will be hosted as Gists, in addition to being available to download form apress.com. This concept will be used where possible and deemed appropriate throughout the book, so look out for the references, but they’ll be explained as they’re used.

If the concept of a Gist is new to you, then it’s just a code snippet that’s hosted on GitHub, so they’ll always be available. It also means the snippets can be altered and updated, so the code in the Gist should always work. If you discover it doesn’t anymore, just leave a comment and then I can update the code so it functions again. This also means a full revision history will be kept, so if the code is for a different version it can be referenced in the change to show the version differences and so on. The sample in this book should be useful well after it’s printed, which is always a bonus. Until the book is published all Gists will be kept private, but after the publish date they’ll be made public.

Common pitfalls

When a technology exists for long enough, people get very good at using it, which brings wisdom and builds a list of Dos and Don’ts for the future. Neo4j is no different, and there are certain things that should be avoided, mainly performance issues. It can also be that the wrong data structure is being used and it needs to be streamlined. Either way, here are some common pitfalls and how to avoid them.

bi-directional relationships

There are many cases when a relationship goes both ways, and in most cases you don’t care which way it goes, just as long as it exists. Since this can’t be modelled in Neo4j, it can be avoided by only using one relationship. In the case of two brothers, you could model this relationship in a number of ways. One obvious example would be something like this:

(chris)-[:RELATED_TO]->(kane)
(kane)-[:RELATED_TO]->(chris)

Since both are related to each other, it makes sense for the relationship to go both ways, right? Yes, and there are cases when it’s required. You may well use the relationship to store certain properties (In this example, it could have been {relation: “bother”}) but in a lot of cases the same data is stored twice, as it’s assumed that having the context in both directions makes sense. The following structure results in the same problem:

(chris)-[:KICKED_BUTT {name: "Mario Kart", date: timestamp()}]->(kane)
(kane)-[:GOT_BUTT_KICKED {name: "Mario Kart", date: timestamp()}]->(chris)

This is done to ensure all of the context is kept, but the data in both is again the same. The only thing that changes is the relationship name. To avoid these problems, you simply need to have the node in one direction, and that’s it. Thanks to Cypher, the direction of a relationship doesn’t need to matter, as you don’t need to specify one because the directions can go both ways. This means that instead of having two relationships, you have one, and then just don’t specify a direction in the query. The first example changed to work this way would look like so:

(chris)-[:RELATED_TO {relation: "Brother"}]->(kane)

In this case the direction is pointing from `chris` to `kane` but that doesn’t matter (It’s because I’m older that it’s not the other way around, but again, it doesn’t matter) because as long as the relationship exists, that’s enough. The property mentioned also got added in this case, so all the context required can be from a Cypher query, which would be:

MATCH (a:Person)-[r:RELATED_TO]-(b)
WHERE r.relation = "Brother"
RETURN a.name, b.name

The example would return two rows, each with “Chris” and “Kane” alternating as `a` and `b` for each row. The second example doesn’t really change, as the second relationship simply isn’t needed. You can imply that somebody got their butt kicked, or kicked butt depending on how the data is returned, so the other direction isn’t needed.

The only real issue with this approach is that the data structure wouldn’t be even, in the sense that the relationship would only be from one node, and not both. So, if you like symmetry, you’ll need to resist the urge here (I’m one of those people, so I do it too) and just remember that Cypher has enough power to make the direction of the relationship not always required, and that in some cases (or all cases in real-life applications) one relationship is enough.

Using this method also requires less data, so it keeps your database smaller, and keeps queries cleaner.

Example Data Structures

With these example data structures, there will also be hints on how these can be used in real applications, so hopefully they can be tailored to your needs, and also benefit an application they’re used in, while still being a reference on how to structure data.

e-commerce

One very big area that takes huge advantage of recommendations through relationships is e-commerce, so we’ll run through a basic structure for that kind of application. Although the chunks of this example will be broken up, all of the queries can be found in a Gist (ADD_GIST_URL_HERE), so any updates will be available there. With that out of the way, let’s begin. It’s worth noting that this is a very basic structure, with just enough information to give an idea of how to structure the data and then expand on it for your own needs. Table 5-1 contains the different node types that will be used with this example.

Table 5-1. e-commerce example node types and description

Node type (Label)

Explanation

Customer

These are the customers of the application, so any information needed would be stored, name, e-mail, that kind of thing.

Product

Products will have any properties needed, but at the very least a name.

Category

Categories can be used to group products together of similar type.

Order

An order will attach to the products it contains and the customer that created it.

Bundle

Products that can be sold together as one unit.

Sale

A Sale will contain many Products, essentially like a Bundle. In this case though, its a collection of products that all have their own prices, rather than 1 set price.

To get off to a good start, we know at least two things that need to be unique, the e-mail address of the user, and the uuid of the product. As has been mentioned before, this example only has the bare-bones of information, but even so, we can still add constraints, first the e-mail of the `Customer` node:

CREATE CONSTRAINT ON (c:Customer) ASSERT c.email IS UNIQUE;

Followed by the uuid constraint on the `Product` nodes

CREATE CONSTRAINT ON (p:Product) ASSERT p.uuid IS UNIQUE;

Although this example won’t have any conflicting nodes, at least if the structure is extended in actual use, the constraints are already in place. Next up, let’s add in some nodes:

CREATE (product1:Product {name: "Product 1", uuid: "d8d177cc-1542-11e5-b60b-1697f925ec7b", price: 10})
CREATE (product2:Product {name: "Product 2", uuid: "d8d17b28-1542-11e5-b60b-1697f925ec7b", price: 20})
CREATE (product3:Product {name: "Product 3", uuid: "d8d17c72-1542-11e5-b60b-1697f925ec7b", price: 30})
CREATE (product4:Product {name: "Product 4", uuid: "d8d1b958-1542-11e5-b60b-1697f925ec7b", price: 40})
CREATE (product5:Product {name: "Product 5", uuid: "d8d1bade-1542-11e5-b60b-1697f925ec7b", price: 50})

Here we just have some products being added, with various names, prices, and unique UUIDs, nothing too crazy here. Next up are the categories:

CREATE (category1:Category {name: "Category 1"})
CREATE (category2:Category {name: "Category 2"})
CREATE (category3:Category {name: "Category 3"})

Here the categories are just given a name, and referenced for later use in the query. There’s a bundle in this example, so let’s create that:

CREATE (bundle1:Bundle {name: "Bundle 1", price: 35})

The bundle here is created with a name, and also a price so we know how much the bundle sells for. Nothing can be sold until there are some customers, so let’s add a couple:

CREATE (customer1:Customer {name: "Chris", email: "[email protected]"})
CREATE (customer2:Customer {name: "Kane", email: "[email protected]"})})

The customers are created, with a unique e-mail address to ensure they adhere to the constraint placed earlier. Of course this data would all be dynamically created and related normally, but for the sake of example it’s manually added. It’s time to relate the newly created nodes now, starting with the products:

CREATE UNIQUE (product1)-[:BELONGS_TO]->(category1)
CREATE UNIQUE (product2)-[:BELONGS_TO]->(category1)
CREATE UNIQUE (product3)-[:BELONGS_TO]->(category2)
CREATE UNIQUE (product4)-[:BELONGS_TO]->(category3)
CREATE UNIQUE (product5)-[:BELONGS_TO]->(category2)

The incrementing product aliases would still be valid to use if all of the snippets were run together, which is why they’ll be used in the different parts of the example. Each of the products is assigned to a category with the `BELONGS_TO` relationship. A product could be part of multiple categories as well, it just isn’t in this instance. We also have a couple of products in a bundle, so those relationships need to added, and are like so:

CREATE UNIQUE (product1)-[:PART_OF]->(bundle1)
CREATE UNIQUE (product3)-[:PART_OF]->(bundle1)

`product1` and `product3` are part of this bundle, which is expressed with the `PART_OF` relationship. The final relationship to add is a sub category, which will be added like so:

CREATE UNIQUE (category3)-[:CHILD_OF]->(category1)

Here `category3` is actually a child of `category1`, which is shown by using the `CHILD_OF` relationship going in the correct direction. With the relationships now added, it’s now possible to query the database to have a look at the structure in the Neo4j Browser using `MATCH (n) RETURN n`, the result of which can be seen in Figure 5-1.

9781484212288_Fig05-01.jpg

Figure 5-1. An example data structure taking into account categories, products, bundles, and some customers (with no orders)

It’s already possible to see how the data will develop, but for now there aren’t any orders yet, so we better add one. Since adding an order will be a new query to adding the data, to use the nodes required, they need to be matched, so let’s do that first:

MATCH (customer:Customer {email: "[email protected]"})
        ,(product1:Product {uuid: "d8d177cc-1542-11e5-b60b-1697f925ec7b"})
        ,(product2:Product {uuid: "d8d17b28-1542-11e5-b60b-1697f925ec7b"})

This chunk of Cypher will match the products included in the order, and also the customer that made the order, so here we have `product1`, `product2` and the customer, respectively. Now that the nodes have been found, the order itself can be created, which is added like so:

CREATE (order:Order {date: "2015-05-15"})

The only real information required for the order is the date it was placed, the rest of the information, such as cost, can be calculated as needed via queries. With the order created, it’s now time to relate everything together, which is done by adding three relationships, two for the products, and one relating the customer to the order, which looks like:

CREATE UNIQUE (product1)-[:IN_ORDER]->(order) 
CREATE UNIQUE (product2)-[:IN_ORDER]->(order)
CREATE UNIQUE (customer)-[:CREATED]->(order)

The `IN_ORDER` relationship has been used to relate the products to the order, and the `CREATED` relationship has been added between the customer and the order. With that data now in place the graph looks a little different, if we run the same `MATCH (n) RETURN n;` query as before, which can be seen in Figure 5-2.

9781484212288_Fig05-02.jpg

Figure 5-2. With new data added the graph looks a little different

Even with one order, we can already see that `Chris` has a preferred category, which means this information can be used in Cypher queries to generate recommendations on products that may be useful, potentially because they’re on sale. Speaking of Sales, let’s add one:

CREATE UNIQUE (sale1:Sale {name: "Sale 1", active: TRUE})

The sale here only really needs a name or some kind of identifier, as the information that matters can be added to the relationship between the `Product` and the `Sale`, which in this case, is the price.

MATCH (product4:Product {uuid: "d8d1b958-1542-11e5-b60b-1697f925ec7b"),(product5:Product {uuid: "d8d1bade-1542-11e5-b60b-1697f925ec7b"})
CREATE UNIQUE (product4)-[:ON_SALE {price: 36}]->(sale1)
CREATE UNIQUE (product5)-[:ON_SALE {price: 45}]->(sale1)

The products are first matched so the relationships can be added, then they are. In this case, a property of `price` is being added, with the price the item is on sale for. This could then be recovered when querying the data, and replace the price returned from the actual product, if the product happens to be on sale.

When more and more orders are added to the database, it becomes easy to detect trends. These trends can be that a customer buys more items from a certain category at certain times, or always buys a particular product. With this kind of data in hand, it’s possible to craft very unique and tailored experiences for the user, based on their own data, so you get recommendations that actual work.

This code will be available in a whole via the GitHub Gist at https://gist.github.com/chrisdkemper/794416dbae1bb17942b1 so check there for any updates or changes to the example since the book has been published. Alternatively, all my Gists can be found at https://gist.github.com/chrisdkemper, so if you don’t fancy typing the URL out, visit there first.

Social Network

Another big area that takes advantage of the power of graph databases is the social network side of things. Thanks to social networks, especially the giant that is Facebook, you can get in touch with somebody on the opposite side of the world through a friend, or a friend of a friend. When it comes to social graphs, it’s all about who you know or who knows you; it’s all about the common connections. Depending on what kind of social network you’re building, the common connections could be interests, what somebody does for a living, hobbies, or anything.

In this case, we’ll be using a basic example that will include: people, animals, and companies. As with the other example, the code for this one will be available as a GitHub gist, so any updates to the code will be available there, should anything change.

For this example, we’ll be making a mini social network structure, involving people, companies, and animals. This will give a lot of potential for connections and relationships, and at least it’s a little different from the usual social network stuff, eh? To start things off, Table 5-2 outlines the different labels that’ll be used for nodes.

Table 5-2. Social network node types and description

Node type (Label)

Explanation

Company

A company is as it suggests, a company will be owned by a person or persons, and people can also work there.

Person

The main part of the social graph, people.

Animal

This label will be applied to any animal node, but in addition a label for the type, such asDogwill also be added to give some extra context.

Although there aren’t as many types in this example, there are a lot of relationships that can exist, and you don’t always need a lot of node types to create a complex dataset. When the code is present on the Gist, it’ll be commented as required, with different sections outlined, and instructions to run certain parts in isolation.

It’s always good practice to create constraints, so let’s do this here with the `Person` names to ensure they’re always unique:

CREATE CONSTRAINT ON (p:Person) ASSERT p.name IS UNIQUE;

Also, in our case we aren’t allowed to use companies with the same name either, so let’s add that constraint in too:

CREATE CONSTRAINT ON (c:Company) ASSERT c.name IS UNIQUE;

With the constraints in place it’s time to get some data into the database, starting with a number of `Persons`, six of them to be exact:

CREATE (person1:Person {name: "Chris"})
CREATE (person2:Person {name: "Kane"})
CREATE (person3:Person {name: "Dave"})
CREATE (person4:Person {name: "Claire"})
CREATE (person5:Person {name: "Ruth"})
CREATE (person6:Person {name: "Charlotte"})

This gives us a good number of people to work with. In this case it’s just been kept simple with names, but you could easily add additional properties, if desired. With the people in place, let’s add some animals to make things a bit more interesting.

CREATE (animal1:Animal:Dog {name: "Rolo"})
CREATE (animal2:Animal:Fish {name: "Totoro"})
CREATE (animal3:Animal:Fish {name: "Elsa"})
CREATE (animal4:Animal:Dog {name: "Ki"})
CREATE (animal5:Animal:Dog {name: "Rio"})

Here we have a number of animals, five in total, three dogs, and two fish. Each one is still an animal though, so if you ever wanted to find out the total number of animals, it saves having to add all the individual label counts together to get the total. You’ll notice the additional labels are chained with `:`s, which creates the node with both labels. Finally, we need some companies to work with, so let’s add those in:

CREATE (company1:Company {name: "Badass company"})
CREATE (company2:Company {name: "Supercorp"})
CREATE (company3:Company {name: "All of the things"})

Finally we have our companies, but without any relationships, these are just nodes in the database, so we’ll start with relating people to their animals, with an `OWNS` relationship.

CREATE UNIQUE (person1)-[:OWNS]->(animal4)
CREATE UNIQUE (person1)-[:OWNS]->(animal5)
CREATE UNIQUE (person2)-[:OWNS]->(animal4)
CREATE UNIQUE (person2)-[:OWNS]->(animal5)
CREATE UNIQUE (person4)-[:OWNS]->(animal2)
CREATE UNIQUE (person4)-[:OWNS]->(animal3)
CREATE UNIQUE (person6)-[:OWNS]->(animal1)

You’ll notice there are some multiples here. In this case, there’s shared ownership of some of the animals, so the relationships are doubled up, but since they only go one way, it doesn’t hit the bi-directional issue mentioned earlier. Having pets is great, but sometimes your pet can be more known than you. I know If I’m ever walking my dog and my brother’s friends see him, they’ll come up to me and say hello, even though I have no idea who they are, but since they know my dog, they apparently know me. Now let’s add in some relationships to link together certain people by various means.

CREATE UNIQUE (person1)-[:RELATED_TO]->(person2)
CREATE UNIQUE (person1)-[:FRIENDS_WITH]->(person6)
CREATE UNIQUE (person2)-[:KNOWS]->(person3)
CREATE UNIQUE (person4)-[:FRIENDS_WITH]->(person2)
CREATE UNIQUE (person5)-[:KNOWS]->(animal4)

Here we have a mix of relationships, from `RELATED_TO` to `FRIENDS_WITH` between both `Person` and `Animal` nodes. I’m sure there are many people that you may know, or know of, but aren’t friends with, and that’s what is being illustrated here. There’s nothing to say that later down the line, a `KNOWS` relationship goes to a `FRIENDS_WITH` or even a `DATING` relationship, who knows?!

With our people and animals related, it’s time to sort out the companies. In this case, we have owners, and employees, and for one company, a mascot. First things first, let’s set up the owners:

CREATE UNIQUE (person1)-[:FOUNDED]->(company3)
CREATE UNIQUE (person3)-[:FOUNDED]->(company2)
CREATE UNIQUE (person6)-[:FOUNDED]->(company1)

The `FOUNDED` relationship has been used here, but there could easily be additional properties for dates, or any other related information, if it was needed. Now the companies have been founded, employees are needed, so let’s add those now:

CREATE UNIQUE (person2)-[:WORKS_AT]->(company3)
CREATE UNIQUE (person4)-[:WORKS_AT]->(company2)
CREATE UNIQUE (person5)-[:WORKS_AT]->(company1)

Using the `WORKS_AT relationship here to show which people work at which company. Again, there’s always the option for additional context, such as hours or pay, if it were needed. Finally, because I can, my dog is the mascot of my company, so let’s add that relationship:

CREATE UNIQUE (animal4)-[:MASCOT_OF]->(company1)

With all that data in place, it creates a graph that makes extensive use of Relationships (Figure 5-3).

Even with this small amount of data, it’s possible to see how close some of the `Person` nodes are, so with some simple queries, it could easily recommend new friends, or even a new place to work. Using information like this you could see which company your friends and family work for, and whose pet is the most popular.

9781484212288_Fig05-03.jpg

Figure 5-3. A preview of the data structure created for the theoretical social network, which has animals, people, and companies

With more data, this would increase the potential connections in the database, and with new animals and companies, it would start to create a social graph that’d make finding new friends and/or opportunities easy.

As mentioned earlier, the code used to create this demo will be available via GitHub on the Gist (https://gist.github.com/chrisdkemper/8c981b759275ec36d3bf) so any changes or updates will be made available there, or just check out https://gist.github.com/chrisdkemper for all of my Gists.

Summary

In this chapter we’ve been through some common pitfalls when structuring data in Neo4j, as well as some example structures. Hopefully these can be used to give an idea on how you can structure your own application to allow for better connections, less clutter, and therefore a better database experience. There is a lack of complex Cypher queries in this chapter, but don’t worry, in the Chapter 7, we’ll be taking some of the lessons learned here a step further, to show how it’s possible to create things like recommendations based on an existing dataset.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.148.145.2