CHAPTER 4

image

Meet Cypher

With Neo4j finally installed, it’s time to get into the really interesting stuff, and start looking into its query language, Cypher. When I first heard Cypher, I actually thought of a certain Pokemon with knife hands that was green, wings, and is pretty cool. For those who aren’t so into Pokemon, I’m referring to Syther. Name aside, Cypher really is a brilliant language, and when you get the hang of it, it’s really easy to use.

This chapter will serve as a cheatsheet essentially, giving a rundown of the different commands and actions that can be performed, and the code needed to do them. With that out of the way, the next chapter is when things will start to get interesting, when we use the knowledge gained from this chapter, and apply it to actual data, to show the raw power Cypher has to offer.

Basic Syntax

When you perform a Cypher query, what you’re actually doing is giving it pattern, and then Cypher will use that pattern to find data. With any other pattern, you’d need to use certain patterns and identifiers to find or match certain data, and of course Cypher is no exception. We’ve already seen the patterns needed to match Nodes, Relationships, and Properties previously, but it’s time to go into a bit more detail.

In addition to the patterns themselves, one thing worth noting is the casing of the language. Throughout the book you’ll see certain keywords (or clauses, as they’re known in Cypher) written in uppercase. As with other languages (such as MySQL) the casing actually isn’t important, but when you start writing more complex queries, having the clauses in capitals helps distinguish the different parts of the query. In this case, I’ll be staying with uppercase as it helps to break the query up, and makes it easier to read. Values on the other hand are case sensitive. With that out of the way, it’s time to cover how to use Nodes, Relationships, and Properties.

Before we go into the specifics of the different parts of the query, let’s look at an example query, and break down the components of it to make going through it later easier. Since it was used previously we’ll go with the following query:

MATCH (n) RETURN n;

The query in this case starts with MATCH which is a clause, and a query must start with one, so other possible options are CREATE and BLAH. If you didn’t start with a clause, then Cypher wouldn’t know what to do with the rest of the query, so just think of it as an instruction. We then have the node in its parentheses, aliased by `n`. Since there aren’t any filters or property filters on this query (we’ll get to those later in the chapter) then `n` actually represents every node in the database. The RETURN part of the query is what is actually returned, and since `n` is used again, then every node will be returned, and thanks to the query not being filtered this also brings back all of the properties and relationships tied to these nodes.

Most of the time when you’re working with data, you’ll have different types, and of course Neo4j is no different here so to distinguish types, Labels are used. A label is represented using a colon, followed by the name of the label and can only be applied to nodes. For example, if you were to use the previous query but you wanted to get all of the nodes labeled with `Product` it would be as follows:

MATCH (n:Product) RETURN n;

It’s possible for a node to have multiple labels, and it’s also possible to query against multiple labels, so if a `Promoted` label were to be introduced to certain products (in addition to the Product label) that could be represented like so:

MATCH (n:Product:Promoted) RETURN n;

Now only nodes with both labels will be returned. The previous query could well be written without the use of the `Product` label to only return the promoted products, but it does show how multiple labels can work together.

These were just a few examples to give an idea on the basic structure of a query to make things a bit easier going forward. There are still things such as Properties and Relationships to go over, but they’ll be covered in isolation. With that out of the way let’s take a more in-depth look at nodes.

Nodes

As you may recall, nodes are represented using parentheses (), so if you see them in a query, you know it’s a node. In most cases you’d see a node with some kind of identifier inside it, but in some instances, if you don’t care about the name (or alias) of the node(s) you’re querying, it can actually be omitted. Most of the time, the node, or nodes that you are querying will most likely be used later in the query in some way, so having an alias makes sense. Of course, if you’re for example, just creating multiple nodes, then you could leave out the alias if you wanted to, but it’s entirely optional. Generally speaking, it’s easier to keep the aliases in just to get into the habit of doing it.

There are a few rules when it comes to identifiers. First, they can contain underscores and alphanumeric characters, but must always start with a letter and are case sensitive. It’s also possible to put spaces in your identifier, but if you do this, you’ll need to wrap the name in backquotes (or back tics, depending on your preference) for example (`this has a space`). For simple queries it makes sense to use an easy identifier so in a lot of cases something as simple as `(n)` will be more than adequate.

With that note about identifiers out of the way it’s time to move on. As mentioned earlier, when you create nodes, it’s possible to add labels to them to make finding them easier later on. In most cases you may use one label, but multiple labels can be used if it suits your data.

If you want to use a label in your query, then it can be done like so `(n:Person)`. In this case the label is `Person`, but still using `n` as the identifier. Now if you had a complex query that returned different types of nodes, you may want to use a more specific identifier so that it can be reused at a later point. Something like `(people:Person)` will allow us to use the keyword “`people`”, instead of `n`, later on in our query. If additional labels are required, you can just add them like so: `(james:Person:Relative)`. In this case, the identifier for the node is `james` and there are two labels, `Person` and `Relative`.

In terms of just nodes by themselves, that pretty much covers the basic pattern. Of course, the node pattern can be used with properties, and the pattern is required when querying relationships, but these will be covered in their respective sections, so we may as well move on to properties.

Properties

A property is useless by itself, as they need to be applied to either a node, or a relationship, which is why Nodes came first. When it comes to representing properties, this is done using curly braces {} and in most cases a property will be inside the parentheses of a node. When performing a read-based query (getting data out of the database) then the property will act as a filter, whereas when creating or updating nodes, the properties will be set onto that node so depending on the context, properties can have multiple functions.

There are other times when properties are used, but the most common format they’ll be seen in will be `(n:People {name: “Chris”})` and its function would be altered depending on the context of its use. In the case of `(n:People {name: “Chris”})` we are looking for all “People” nodes, `n`, with a name property of `“Chris”`, which we know is a string literal from the double quotes. Many different value types can be used when saving properties, which can be seen in Table 4-1, but the easiest type is a string, as if it’s not a numeric value or an array, it’s a string. The names for properties work in the same way the nodes, so they must start with a letter, be alphanumeric, and are case sensitive. Again like with nodes, if you want to use spaces you can, but these must be wrapped in backquotes, the “`” character, in order to work.

Table 4-1. Different datatypes available within Neo4j

Property type

Explanation

Numerical values

Essentially, you can store any numerical value you want. The limits of this come from the JVM, in particular the Long, Float and Double integer types, so if you have some bespoke use cases for numerical property values, look in that direction. Otherwise, Neo4j in most cases will be fine with whatever number you throw at it.

String

Strings are fine to use within Neo4j, and will be stored without any craziness.

Boolean

Booleans are stored as `true`/`false`, and are stored without any real issue, in lower case. If you create a node with say TRUE, it’ll lower case the value when stored. You can however still write cypher queries using uppercase Boolean values, and it will still work the same way.

Array

You can store arrays in Neo4j, but arrays have certain rules. An array must contain values of the same type. An array of different types (A string, an int, and a Boolean, for example) isn’t supported. If you tried to store an array of multiple types, cypher will cast all values in the array to whatever the type of the first item is, so a string, integer, and a Boolean would save as three strings, e.g., “string”, “100”,  “true”.

In addition, you cannot create a node with an empty array, because Neo4j needs to know the type when storing the array. Once the type has been determined, the array can then be emptied. If you had an array of strings (such as the example above) you could then empty that array, but any values added to it, would be cast as strings, because that’s the array type, and the type doesn’t change.

Multiple properties can be specified when performing queries as needed. Each property identifier and value pair needs to be separated with a comma, just be sure not to leave a trailing one, or you’ll get a Cypher query error. This comes in really handy when creating nodes with a lot of properties, or creating when you want to return a very specific subset of nodes.

It’s not just nodes that can have properties added to them; relationships are also able to have them assigned, which makes being able to query nodes (or saving information about the relationship) really easy. Speaking of relationships, let’s discuss them, shall we?

Relationships

Relationships are probably one of the most powerful features within Neo4j and graph databases in general, and Cypher makes them really easy to use, both in terms of creation and retrieval. For a relationship to happen, there needs to be things to relate, and what do you relate in Neo4j? Nodes.

Depending on whether or not you’re querying a relationship, or creating one, the pattern of the relationship is slightly different. When creating a relationship, you need to at the very least specify a direction, so at the very basic level, nodes can be related like so:

(a)- ->(b)

This shows that `a` is related to `b`, which can be seen by the use of an arrow, as it’s pointing to the node it’s related to. This relationship has no type, or properties, or an identifier, though, so it’s a very basic relationship. A more complex example of creating a relationship would be something like:

(j:Kerbal {name: "Jeb"})-[r:KNOWS]->(b:Kerbal {name: "Bill"})

The important part of the pattern is in the middle, as the first and last parts are just nodes. This particular pattern could be used to create a relationship between these nodes, or to search the database for nodes that met the correct criteria. The nodes here have the label of `Kerbal` with the property `name` with the value of Jeb, and another by the `name` of `Bill`.

In case the reference to `Kerbal` is lost on you never fear, as they’re just the race of people used in the game Kerbal Space Program which is a brilliant game. The names Bill and Jeb are in the game, however they’re the favorite characters of my favorite YouTuber, Robbaz. Not relevant to Neo4j, but always nice to know.

Anyway, that example has a bit more information than the basic one, as it’s not just an arrow this time, there are also square brackets (or brackets, as they’re actually called) in this one. Names aren’t required if you don’t need them, that’s why the first example had none. So, with nothing to go in the brackets, they were removed. Inside the brackets you can give the relationship an identifier if you’d like to use it later in the query, and also a type. The types are similar to a nodes label, however you can only have one type per relationships, but you can have many relationships between nodes.

The `-[r:KNOWS]->` part of the query is what we’re interested in. Here `r` is the identifier (which could be omitted if it’s not needed), the type (all caps and underscores are allowed) and if they were needed, properties too. The head of the arrow is pointing right in this case, but can be on either side depending on the relationship, inwards or outwards.

In this example the relationship being created was that Jeb knows Bill, but not that Bill knows Jeb. Essentially this means, if you were to get every node with that relationship, then only Jeb knowing Bill would be returned, not the other way around. In cases where relationships work both ways, the relationship was just created twice, with the direction of the relationship flipped in the second query.

This just means that you’re able to create one-way relationships that can be inward or outward. Another example would be say, a dog and its owner. If there were Dog nodes and Person nodes, the Person could be related to the dog with an `OWNER` relationship, but this wouldn’t work the other way around.

Of course, this has just covered a one-to-one relationship, what about a chain of relationships? This is entirely possible in Neo4j and is known as a Path, although all relationships are paths really, just of different lengths. Paths are what makes Neo4j exciting, and where a lot of its power lies. Cypher also gives control over how it queries Paths. This is covered as needed in the next part of the chapter.

Querying Cypher

Knowing the patterns to perform a query is great, but without knowing how to query Cypher in the first place, you aren’t going to get very far. Depending on your use case, there are a number of ways to communicate with Cypher, which will be covered in a bit more detail momentarily. By far the most universal way is to use the REST API via HTTP, which will work regardless of your system. In these examples I’ll be using `curl` to interact with the API, as it’s the most common way of doing so. Before the `curl` side of things, let’s go through the easier way using the brilliant Neo4j Browser.

Browser

By far the easiest way to query your database with Cypher is by using the browser. Depending on what stage you’re up to in terms of development, how you use the Browser will be different, however it always has uses. Whether this is for adding nodes, reviewing data, or performing certain admin actions.

In the early stages, it helps when getting used to the syntax of Cypher, gives useful error messages, keeps a history of your previous queries, shows you your results, and many more things. Even after the inlaid stages, the Browser is brilliant for debugging your application, so if you’re getting a strange return from Neo4j, checking the Cypher query in the Browser can help to work out if it’s the query that’s wrong, or if the issues lie somewhere else. You can of course get a query working as needed in the Browser and then use in your application after it works as expected, which makes things a lot easier.

The Browser was created to allow interaction with the data within the database, so of course the prompt to perform Cypher queries is at the top of the page, with space below for the query history. To the right of the prompt are the options to save the query for later, create a new query, or execute the query.

The console also allows you to perform keyboard actions that you’d be used to in a proper editor, so highlighting words for copy/pasting is easy, and as an added bonus, pressing return will run the query, just like a proper terminal, so the experience of using it is very nice.

For every query performed (even the ones with errors) it adds another item below the prompt, so you have a full query history available at all times, which is really useful. Each item also comes with options to either Export the resulting data, delete the item from the history, or make that particular result set full screen, which can be seen when you hover over an item. The fullscreen view is excellent for navigating a large result graph, or if you happen to return a large amount of properties and having the additional ROM is rather beneficial.

Although it’s possible to query data easily in the Browser, it can also be used to update or create nodes, relationships, and also manage the properties for these. It’s a very nice interface to interact with Cypher, and thanks to the query history and the instant feedback on errors, it’s a very powerful tool.

If you find yourself performing the same queries often, it may be worth saving the query for later use, as mentioned earlier. If you save a query, it’ll be available for later use via the ‘Saved queries` button on the left of the Browser, which happens to be a star to keep things easy. When the save button is pressed, it’ll open up the saved scripts dialog for you regardless, and if you have a query in the prompt when it is pressed, that query will be added to the list. The star will also highlight to show the query has been saved.

After you save a query, changes can be made to it quite easily, so if the query changes, you can just select it from the saved scripts menu, which will load it into the prompt, ready to execute. If any changes are made, the star will change to an exclamation mark to show changes have been made, so just hit this button when you’re finished with the modifications, and it’ll be updated. You can of course just run the query and it won’t be updated.

Using the Browser is the easiest way to interact with your data, but is of course useless in applications, so when it comes to it being used in an application, it needs to use the REST API (which the Browser uses under the hood anyway) so let’s move from the Browser to that, shall we?

REST API

You communicate to Neo4j using its REST API, which allows you to manage your database by using certain endpoints, headers, and sending certain data to these endpoints. Through using these things in different combination you can do anything we’ve mentioned, creating nodes, and relationships, but without using Cypher. In previous version of Neo4j, Cypher had its own dedicated endpoint to use, so you would essentially send your Cypher query to `http://localhost:7474/db/data/cypher` and get a response back. This of course assumes you have Neo4j set up using the default path and ports. The usage of the REST API directly to perform Cypher queries may be overkill for most cases, but it’s still good to cover how it’s possible. In the new version, there is still an endpoint available to run Cypher queries, but it’s now done via the transaction endpoint.

We’ve touched on transactions before, but just in case it’s escaping your mind, a database transaction is a group of queries bundled together, so that it’s possible to roll back the previous actions if one fails, for example. For the most part that doesn’t matter though as the queries will be one-offs in the examples, so the transaction will only have one action inside of it.

To get back on track, as previously mentioned `curl` will be used to interact with Neo4j at these examples. One thing that needs to be added to the curl command is an authentication header, which is now required so you’ll need your Neo4j username and password to interact with the database via curl.

In previous versions of Neo4j, the authentication module was disabled, so unless you wanted it to be secured, Neo4j would be open, so no username and password would be required. Now though, as of version 2.2 Neo4j requires authentication, which is why you need to login to the Browser on first use. In addition to logging in, you need to change your password on first use, which is another security measure, but after all of these processes are complete, there will be a set of credentials that you have, which give you access to Neo4j. If you tried to communicate with Neo4j without changing the password, you would get an authentication error, telling you that you need to change your password. The easiest way to get around this problem is to log in via the browser and change your password when prompted to do so. For the sake of ease, I’ll use the default values for these, which is a username of `neo4j` and also a password `neo4j` but your password will be different, as you need to change it on first use, as previously mentioned.

We already know the endpoint we’ll be interacting with, which is the transaction endpoint, as it’s the only way to perform Cypher queries using the REST API. The endpoint being used is `http://localhost:7474/db/data/transaction/commit` (again, assuming the defaults are used) which you would also use if you were to perform a transaction with the API also, but we’re using it for Cypher queries. This endpoint is a little different, as it’s essentially for transactions that have one action. You’ll use the `commit` segment in the URL, which is essentially committing this transaction immediately, making it like a normal query, so we’ll be using this endpoint for our Cypher queries.

To perform the query, the basic version of the curl command is as follows:

curl -i -H "Content-Type: application/json" -X POST -u neo4j:neo4j http://localhost:7474/db/data/transaction/commit

There are many ways to perform curl queries, whether it’s in a Terminal window (Mac and Linux, that is, the Windows command prompt doesn’t support curl by default) or by installing one of the many available chrome extensions, or desktop applications. One popular Chrome extension is called Postman, but there are many other options.

The query in this case (which will be explained in more detail below) is a POST request, so it can also be done using any technology capable of sending a POST request. Although the examples will be done in curl, if you’re more comfortable using another platform to perform these queries, then by all means do so. A breakdown of the query params can be seen Table 4-2.

Table 4-2. Query Parameters

Flag

Explanation

-i

This adds an HTTP header to the response, and in this case is optional. The header can be useful to help debug issues, but if it’s not supplied, then only the response from the server is shown and nothing else, which in this case will be JSON.

-H

This is the header flag, and allows you to add content headers to the request. In this case, we’re telling the server that the content we’re sending over is JSON. This is important, as the server expects JSON, so if it’s not in the correct format, it won’t do anything.

-X

Represents the request type, which can be typically POST, DELETE, PUT, and the default GET. In essence we’re posting to the endpoint and getting a result, so it’s just like a remote form.

-u

Finally the authentication for the request. If you didn’t have this, it would give an authentication error, and also if these details were wrong.

-d

This is the data sent along with the request, if it’s needed. This can be POST variables, or even a string of JSON, which is what we’ll be using it for.

-v (optional)

When you’re debugging (or learning, in this case) it’s generally to get as much information as possible to help find the solution. With curl requests, this comes in the form of the verbose flag, which when used essentially gets curl to explain itself, and the steps the command is taking are output to the screen.

You may not want that additional information to be displayed (which is why I’ve marked it as optional) as it outputs a lot more information, but depending on the use case it can be useful. To use the flag, just add it like the other flags, just be sure not put it between the flag and its argument, such as after -X, because -X expects the type to follow it, for example.

If this query were to be run now, it wouldn’t do anything, as no data is being sent over, which is why the `-d` flag is needed, which is where the JSON (including the Cypher query) is sent. Before the full query is used, let’s have a look at the JSON:

{
    "statements" : [
        { "statement" : "MATCH (n) RETURN n;" }
    ]
}

A query is described as a statement when submitting to the transaction endpoint, so essentially the JSON above is an array of ‘statements’, with one ‘statement’ inside it. Although the transactions in this case are only one statement, if you were to add multiple statements it would look something like the following:

{
    "statements" : [
        { "statement" : "MATCH (n:Person) RETURN n;" },
        { "statement" : "MATCH (n:Pet) RETURN n;" }
    ]
}

You’ll notice `n` is used in both queries which would cause an error if it was in the same query, but since these queries are performed in isolation from each other (although still grouped within the same transaction) then using the same alias isn’t an issue.

To make the queries in the chapter easier to read, they’ve been spread over multiple lines, but when you’re running queries on the command line it’s generally easier to remove the formatting and run everything on a single line. Below is an example of the curl query from earlier with the JSON required to perform the query, so pretty much everything is in one command.

curl -i -H “Content-Type: application/json” -X POST -u neo4j:password http://localhost:7474/db/data/transaction/commit -d ‘{“statements” : [{ “statement” : “MATCH (n) RETURN n;” }]}’

With all components of the query in place, if the query is run now, it’ll return every node with all of the properties attached to them, which depending on the structure of your nodes, will look something like:

{
    "results": [{
        "columns": ["n"],
        "data": [{
            "row": [{
                "uid": "1",
                "date": "29-03-15",
                "value": "10",
                "stat_id": "3"
            }]
        }, {
            "row": [{
                "uid": "1",
                "date": "24-04-15",
                "value": "1",
                "stat_id": "4"
            }]
        }]
    }],
    "errors": []
}

The JSON returned consists of two arrays, ‘results’ and ‘errors’. In this case, there is only one item in the result array, because only one statement has been run, but multiple statements would result in multiple result sets. Within the result set, the columns are whatever you have returned, so in this case it is `n` which is what I specified in the return statement. Each row with the results is a node, and each item within a row is the properties of the node being returned. One thing you may notice here is the lack of node ids, which is because they need to be returned in a certain way, which will be covered a little later on. If you had returned a relationship instead of a node, then each item within the results would be a relationship.

The JSON returned here can be used within an application however it’s needed, so whether you have one or many sets of results, they can be iterated over without any real issue, In this instance the errors array is empty (because there aren’t any, of course) but if there were any errors, then they would be output within the errors array, so if it’s empty, there have been no errors.

Although this was a read query, write queries work in the same way so using this method of performing Cypher queries via curl, you can manage all of your Neo4j actions from the command line, if you’d like to. To make this process a little easier though, developers have created a number of libraries to interact with Neo4j, so rather than having to write the query yourself you can just use a function, method, or whatever the developer has deemed appropriate. There are many different options available for multiple programming languages, so if you’d like to use a library, you’ll most likely find one.

How to Build a Cypher Query

With the basics covered, it’s time to cover the anatomy of the queries, and show what the different keywords and functions that can be used are, and also how they work. When covering the syntax, I made a point of leaving out keywords to ensure the usage in each context could be covered separately, without duplication. Each item will have an example with it, so seeing how it gets used is as easy as possible. Plus, this has the added bonus of being a great reference guide when you think to yourself “Ah, I know I can do that, but I can’t remember how” and with a quick glance it’ll all come flooding back.

A Quick note on Comments

Before we dive into the anatomy of Cypher queries, it’s worth mentioning comments. A comment, is a string of text that can be used within the query, but isn’t executed. You initiate a comment by starting the string with `//` which makes the text following it a comment, and because of that, not executed. You can include a comment on a new line in a query, or at the end of a particular part, an example can be seen below:

MATCH (n) RETURN n; //Return all of the nodes, on one line

The text within the comment isn’t executed. The same goes for between lines, as well, like so:

MATCH (n)
//Time to return some nodes
RETURN n

The only time a comment won’t act like a comment is when it’s within quotes, as it then becomes a string, for assigning to a property, for example.

Enough on that though, it’s time to get started, starting with a clause you’ll see a lot, RETURN.

RETURN

If you want to use the data you’re referencing in a query, then it’ll need to be returned; otherwise, the query will just execute and you won’t get anything back. This is fine for creating nodes or relationships as you don’t always want what you’ve just created to be returned, but when it comes to querying data, you’ll want it returned, or at least part of it.

There’s been a lot of talk of aliases, and the main use for an alias is when it’s being returned, or of course if you’re using it later in the query. If you reference a node with `n`, then you can use `RETURN n` to have access to that node’s data within your code. The same applies to relationships. You won’t always want the full node, what if you only want the `name` property value from your node? That’s not a problem, in this case `n.name` would just be used after the return. You can also return multiple properties by comma separating them, like so: `RETURN n.name,  n.age`. This can also be achieved with multiple nodes, so if you’re referencing a relationship, you may have node `a` and node `b`, you can return properties from a, and b, or both nodes, or nothing. It all depends on your needs. There is always the option of using `*` as well, which will return everything.

When it comes to return, essentially just think about what you need to use in your application, if you realize you only use certain properties, then just return those properties. If you have a lot of complex nodes, returning single properties will optimize the query and make it faster. There’s also the added bonus of optional properties, so even if the value is null for that node it won’t error, it’ll just return null. Although the speed increase is small as Neo4j is already pretty fast, any speed increase is better than nothing, right?

If you need a return value to have a certain name to make your code easier, you can just alias the query by using `AS` so an example would be `RETURN n AS Person` so accessing the returned data in code would be much easier. You can also return unique results, just in case your query would return multiple ones. You can do this by adding `DISTINCT` to your query.

There’s also the option to perform last-minute filters on results. If you have a numeric value, or any other that can be evaluated against, you can use something like `RETURN n.age > 30` which will only return nodes with an age over 30, easy!

If you’re returning a node, you can also return its relationships if you want to. Since you can use commas to add an argument to a return clause, if you want the nodes relationships you can just add it to the clause. To do this, just use `(n)-->()` which assuming your node is aliased with `n`, it’ll add the nodes relationships to the response.

MATCH

When writing Cypher queries, you’ll no doubt see MATCH a lot, as it’s the main way to query the data and potentially return results. Using match allows you to get information on nodes, properties, and relationships, but filtering them using various clauses. Let’s quickly run through some of the basic query patterns you may see.

MATCH (n)
RETURN n;

This will return any nodes stored within the database.

MATCH (n:Person { name: "Chris" })
RETURN n;

This will match any nodes that have the label `Person`, the property `name`, and the value of said property is `Chris`.

MATCH (a)--(b)
RETURN a, b;

Here we’re matching nodes that are related, regardless of the direction (notice the lack of an arrow) and returning the nodes from both sides.

MATCH (a)-[r]-(b)
RETURN a, r, b;

In this case, the relationships have been assigned to the `r` variable, which means they can be returned from the query, so if you need the relationship (or any of its properties) then it can be returned easily enough.

MATCH (a)-[:RELATED]->(b)
RETURN a, b;

When you don’t want to wildcard everything, you may want to have a certain type of relationship, which is added with the `:TYPE` pattern. In this case, the type is RELATED and since the relationship isn’t needed later the alias has been dropped.

Rather than only returning the nodes involved in a path, you may want the path itself, which can also be done easily enough. Using the previous example, that’ll be made into a named path.

MATCH p=(a)-[:RELATED]->(b)
RETURN p;

It’s possible to use multiple MATCH clauses in a query so if you wanted to return two particular nodes, you could just use multiple match causes and then return the result, like so:

MATCH (a:Person {name: ’Chris’})
MATCH (b:Person {name: ’Kane’})
RETURN a, b;
The previous query could also be rewritten as follows and would still give the same result, using one MATCH.
MATCH (a:Person {name: ’Chris’}),(b:Person {name: ’Kane’})
RETURN a, b;

This would return the nodes requested just as you would expect, if the nodes can be found. If the second MATCH failed, then the first one would also fail, and the query would return 0 results, as it’s looking for a AND also b, in this case, therefore if it can’t find `b` then the query isn’t valid. You can get around this though, by using an optional match. This will essentially return the match if it’s there, if not it’ll return `null`, so the query still works.

MATCH (a:Person {name: ’Chris’})
OPTIONAL MATCH (b:Person {name: ’Kane’})
RETURN a, b;

You can also use the optional flag to return potential relationships for a node. If the node only might have a relationship, then an optional flag can be used to remedy this, like so:

MATCH (a:Person {name: ’Chris’})
OPTIONAL MATCH (a)-->(x)
RETURN a, x;

For all of the `Person` labeled nodes with the `name` of `Chris` both the nodes that do and don’t have relationships will be returned.

You may also see uses of `START` in some example queries online. Match now is what start used to be, but now it’s depreciated and is only used if you want to pick out something from a legacy index. Going forward though, you shouldn’t see or use start in your queries anyway with it being depreciated, but if you see `START` anywhere, at least the knowledge on it is there.

CREATE/CREATE UNIQUE

This clause is one you’ll be familiar with, as we’ve covered it briefly before, but CREATE does what it says, creates things. This can be a node, a relationship, a node with a relationship, a node with properties, or any combination of these. One good thing about return queries, is that you don’t always have to return anything. If you’re just creating a new node or relationship, then there isn’t anything to return, which will make your query smaller. Of course the option to return from the query is there, but it’s not required. A create can be as simple as `CREATE (n)` which would create a node with no label, or properties.

CREATE (n:Person:Developer)
Here we can see an example with multiple labels, but no properties.
CREATE (n:Person { name : ’Chris’, job_title : ’Developer’ })
This example has multiple properties on a node which also has a label.
MATCH (a:Person),(b:Person)
WHERE a.name = ’Chris’ AND b.name = ’Kane’
CREATE (a)-[r:RELATED {relation: "brother"}]->(b)
RETURN r

In this example two previously created nodes are being matched, using MATCH and WHERE, and then CREATE is used to add the relationship between the previously matched nodes. In this case, there’s a RELATED relationship being added here, with a property of ‘relation: brother’ to give some context to the relationship.

You can already create relationships with newly created nodes too, so the following example would create two nodes, and a relationship:

CREATE (:Person {name: "Bill"})-[:KNOWS]->(:Person {name: "Bob"})

If you ran this query again however, it would create duplicate nodes, which isn’t always ideal. Duplicates can be reduced with query constraints, but depending on the use case, this isn’t always needed.

Being able to create a unique Node or Relationship can come in very useful, for duplication reduction, and also updating existing nodes. To achieve this, you can use CREATE UNIQUE, which essentially performs a MATCH without you having to. Using CREATE UNIQUE means if you’re say, adding a relationship to a Node, you can safely use a create query without the worry of duplication. An example of that would be:

MATCH (bill:Person {name: "Bill"})
CREATE UNIQUE (bill)-[r:KNOWS]->(bob:Person {name: "Bob"})
RETURN r

In this case, if the `Bob` node didn’t exist, it would be created, with the relationship. If the query was run again though, then the “Bob” node wouldn’t be duplicated, and neither would the relationship. This means you can have some control over duplication of data within your applications.

DELETE/REMOVE

There always comes a time when you need to delete data, whether it’s a user that has left, or a comment that’s been deleted, there’s always a need, and that’s what DELETE does. This clause is very simple to use, and is used to delete Nodes, and Relationships. When deleting nodes you must remember that if you delete it, unless you delete its relationships as well, it will remain. A basic usage of DELETE is as easy as:

MATCH (n:RemoveMe)
DELETE n

In the query, DELETE essentially takes the place of RETURN and deletes whatever is matched in the query, which in this case is any nodes with the label `RemoveMe`. This label could have been added by a worker and flagged for deletion with the `RemoveMe` label.

When it comes to removing the relationships it can be done in one step, just use MATCH to get the Relationships and delete them, which can be done using:

MATCH (n:RemoveMe)-[r]-()
DELETE n, r

The main change here is the inclusion of `r` which is the alias of the relationships (either direction) found with MATCH. There comes a time when you may be working on a project and need to clear out your database frequently. This snippet can be used to remove relatively small amounts of data easily to allow you to start again. The snippet in question is:

MATCH (n)
OPTIONAL MATCH (n)-[r]-()
DELETE n,r

This isn’t recommended for large datasets, as it will attempt to delete all the nodes and relationships at once, which will be quite intensive. For deleting larger amounts of data, it’s possible to use DELETE with LIMIT, allowing for batches to be utilized. LIMIT will be covered in more detail later in the chapter.

You don’t always want to remove the entire node though; just certain Properties or Labels, which can be achieved using REMOVE. The main concept for REMOVE is the same as DELETE, you use MATCH to get the nodes you wish to modify, then do so with REMOVE, which looks like this:

MATCH (bill { name: ’Bill’ })
REMOVE bill.subscription_start
RETURN bill

This example would be used if Bill had decided he didn’t want the newsletter anymore, so code in the application needed his `subscription_start` property to be removed. Labels work in the same way, just MATCH it, then REMOVE it, like so:

MATCH (n { name: ’Sara’ })
REMOVE n:Remove
RETURN n

In this scenario it’s assumed that there is only one node with the Property ‘Sara’, and that said node already has a `Remove` label.

WHERE

The WHERE clause is a powerful one as it allows you to filter your queries, to get more specific results. Although WHERE is powerful, it’s useless without `MATCH`, `OPTIONAL MATCH`, `START`, or `WITH`, as it needs something to feed it data to filter.

So WHERE is a filter, but one with a lot of flexibility and power, and if you’re from an SQL background, you’ll be familiar with this already, as the behavior is essentially the same.

Being able to filter on if a node has a certain relationship, if a property matches a pattern, or as little as a property equals a value; this is just a taste of WHERE’s power, but enough talk, let’s get into some examples starting simple then getting more complex.

MATCH (n:Developer)
WHERE n.name = ’Chris’
RETURN n;

This example would only return nodes with the label `Developer`, with the property of `name`, and a value of `Chris`.

You can also use WHERE to filter based on value ranges using `<` and `>` such as `WHERE n.age > 30` which would mate any nodes with the property `age` with a value of greater than 30.

MATCH (n:Developer)
WHERE n.age > 30
RETURN n;

You can also combine WHERE with AND, OR, and NOT to build up some really specific results. Let’s have a look at an example:

MATCH (n:Person)
WHERE n.age > 18 AND (n.name = ’Chris’ OR n.name = "Kane") AND (n)-[:RELATED {relation: "brother"}]-()
RETURN n;

This was a big example, but let’s go over it in chunks. The first part of WHERE is the filter on if `n.age` is greater than 18, followed by AND, which is checking if `name` is `Chris` or `Kane` enclosed in brackets to ensure the `AND` is used properly. Next up is checking to make sure the node has the `RELATED` relationship, and the relationship property is equal to `brother`. This is why no direction is specified when checking the relationship, and that empty parentheses are used rather than a node directly.

This is a very specific way of getting me and my brother, or anybody else who happens to be called Chris and Kane and are also brothers. Depending on the dataset this would be a bit much, so if you had a smaller dataset, the extra constraint on the relationship may not have been needed.

If you need to ensure a property exists on the nodes you return, then EXISTS is here to help.

MATCH (n:Developer)
WHERE EXISTS(n.subscription_start)
RETURN n;

In this instance, the nodes in question have started a subscription to something and the date has been stored, so if they’re sending out a newsletter, those that haven’t subscribed won’t be bothered. The EXISTS function can also be used for relationships, as well as nodes.

When working with property values, sometimes you may want nodes with a certain value, or even those that don’t have a property set. When a property isn’t set on a node, it’ll return NULL, so if you’re expecting a value to not be there, you must address it directly, which looks like this:

MATCH (n)
WHERE n.level = ’beginner’ OR n.level IS NULL
RETURN n
ORDER BY n.name

This would get all those that were `n.level` as ‘beginner’ or if the level hadn’t been set, and was NULL.

It’s also possible to utilize Regular expressions, within your queries. You can declare a pattern by using `=~` followed by the pattern. An example of this would be:

MATCH (n)
WHERE n.name =~ ’(?i)^[a-d].*’
RETURN n

The use of `(?i)` in the expression makes the whole thing case insensitive, that’s why that’s there. This particular example gets any names that start with the letters between `a` and `d`. This would only really be used if you had a huge amount of people in a list, and were batching e-mails to send out, or something of that nature.

You can also essentially inverse a query to exclude those particular nodes by using `NOT`. One usage would be:

MATCH (n)
WHERE NOT n.name = ’Chris’
RETURN n

This would give every other node that wasn’t `name` equals `Chris`. Maybe somebody needs to send an e-mail about planning my birthday party or something, who knows? That silliness aside, `NOT` can be very useful in complex queries when you have a tricky filter and certain values keep creeping in: “Uck, yes, them, but not you guys!”.

ORDER BY

This clause pretty much does what it says on the tin, it allows you to order the data by something, more specifically, properties. This can be useful if you want to alphabetize a list, order people by age, or anything like that. You can sort a response by the properties on a node or relationship, but not by the nodes or relationships themselves. A basic example is something like:

MATCH (n)
RETURN n
ORDER BY n.name

You’ll notice that the `ORDER BY` is after the RETURN, which is required, and will result in an error if it’s not in the correct place. Although ordering by one property is good, it’s also possible to sort by multiple values, which can be achieved by adding a comma, like so:

MATCH (n)
RETURN n
ORDER BY n.age, n.name

When it comes to sorting null values, these will appear at the end of the list, so most importantly, it doesn’t break the query if the value isn’t there. By default the sort order is ascending, so if you’d like a descending order, just add `DESC` at the end of the query, which will reverse the order, this also means that if you do have null values, they’ll be at the start, rather than the end of the query. The previous example, reversed, would be:

MATCH (n)
RETURN n
ORDER BY n.age DESC, n.name DESC

In this case I’ve reversed both properties, but it only needs to be added to the applicable property.

INDEXES

Using indexes is always recommended, but isn’t always possible. An index is a redundant copy of the information that’s being indexed, to make looking up said information faster. When they can be used, indexes make things faster, and that’s always good, but it’s possible to have too much of a good thing. Storing an index takes up space, and also lowers write speed. This comes from the indexes needing to be updated when new information is stored in the database, creating a performance cost.

Neo4j allows you to create an index on properties of nodes that share the same label. If there is a particular property that you happen to query a lot, then it may be worth adding an index for it, if it doesn’t already have one. In some cases an index can be automatically assigned, such as constraints, which will be covered in a moment. It’s also possible to have a nodes property sit in multi indexes, which has the potential to cause problems. In the cases of multiple indexes, then the USING clause can be utilized, allowing you to specify which index the query will use. Unless USING is specified, Cypher will work out what it believes to be the most logical index to use if some are there to use, and will do this without any additional input from the user. We’ll also cover USING a little later.

If you create an index in Neo4j it’ll be automatically updated. This includes any updates to node that have properties in an index and also when new nodes are created meeting the required criteria. Adding an index can be as easy as:

CREATE INDEX ON :Person(name)

This creates an index on any nodes with a `Person` Label and a name property, and will also be used automatically, as soon as it is ready. When the query to create an index is received by Neo4j, it’s not added immediately, and will be used internally as soon as it is ready. This is for performance reasons, and the index is created in the background to keep everything fast (adding an index on a huge dataset may take some time, because creating an index is blocking, and atomic) but once it’s done it’ll be used automatically.

As mentioned earlier, you can sometimes have too many indexes, which can actually hinder performance. It may also be that you have a particularly large index that doesn’t get used too often, and you want to save space by removing it. If you decide you don’t want a particular index anymore, that’s fine, it can be dropped as easily as it was created by using:

DROP INDEX ON :Person(name)

This will drop the index, and the database will act as though it never existed. If you change your mind you can always create the index again.

CONTRAINTS

Using constraints helps keep your data unique, and its integrity intact. Data integrity can mean different things to different people, but for a registration-based system, having duplicates would be classified as an integrity violation. Unique constraints are extremely useful when working with information like e-mail addresses or usernames that are required to be unique, and can cause issues if they aren’t.

When a constraint is created, it creates an index for the properties that are required to be unique. This is used to help keep track of the existing values, so if it’s not in the index, then its unique value. When the constraint is created, the index is also, so no need to manually create the index. Once the index has been built and all nodes scanned, then it is available, and used on queries thereafter. If you have data within the database that violates the unique constraint, then said constraint will fail to be created. In the event that a constraint fails to be applied to your graph, you need to resolve any redundancy issues with your data before attempting to apply the constraint again. For this reason, it’s advisable that you add your constraints sooner rather than later to avoid this type of clean-up. In most cases though, you’ll just create a constraint, and then that’ll be it, which can be as easy as:

CREATE CONSTRAINT ON (p:Person) ASSERT p.email IS UNIQUE

As with an index, a constraint is added on nodes with a certain label and property combination. In this case `Person` and `email` respectively. If you try and create a node that violates the constraint, then the CREATE will produce an error and the node will not be created.

Just like indexes, constraints can be dropped, which is as easy as creating one in the first place, and looks like:

DROP CONSTRAINT ON (p:Person) ASSERT p.email IS UNIQUE

This would also remove the index used with the constraint, so if that index was helping with performance, or something of that nature, then it may be worth adding it back in after dropping the constraint, but in most cases it can be removed and then forgotten.

LIMIT

This clause simply limits the number of rows that’ll be returned. Without the use of this clause, any applicable rows will be returned, which isn’t always the desired outcome. There are many use cases for LIIMIT, from getting the first five registered users, to being able to list the top ten products within a system. Although both of these things could work well as a full list, sometimes you just want a small subset of data.

Being able to limit results is also useful when it comes to batch deleting items, or if you need to limit a result set to pass it on to another part of the query.

When using a limit, if the rows returned are greater than the limit when the limit is reached, no additional rows will be returned. This of course means if there are only 5 applicable rows when a LIMIT of 10 is specified, all 5 rows will be returned. It can be added to a query like so:

MATCH (n)
RETURN n
ORDER BY n.name
LIMIT 3

When using LIMIT, ensure it comes at the end of the query it’s related to, as that’s where it goes. In this example LIMIT is used to restrict the query to 3, so if there were 5 results, only 3 would be returned, but if there were only 2, then 2 would be returned.

SKIP

The `SKIP` clause works like an offset, so you essentially tell Cypher to skip the first x results. Using SKIP, in combination with LIMIT, allows things such as pagination to be created. In that case, your SKIP value would be the current page (+1 to avoid page 0) multiplied by the limit. Another potential use would be with promoted items. If there was a featured product on an e-commerce website, then the rest of the products were in a list, you’d want to SKIP 1 on the list query to avoid the featured product appearing. That is of course based on a number of assumptions, but the use case is sound.

If you wanted to skip the first row returned, then that can be as simple as:

MATCH (n)
RETURN n
ORDER BY n.name
SKIP 1

You can also combine SKIP and LIMIT together, so you can then limit the remaining rows down to a specified value. To build on the previous example, if we did only want the first 10 applicable rows after the first skipped one, it would look like so:

MATCH (n)
RETURN n
ORDER BY n.name
SKIP 1
LIMIT 10

WITH

The WITH clause is one that may be familiar to the Terminal savvy, as it’s essentially a unix pipe. Essentially, WITH passes information to the next part of the query, but it can be used in different ways to achieve different goals. It can be used to filter down results, and make queries more efficient by stripping out unneeded data. It can also be used to collect additional data from a query, so it can be quite useful. Let’s start with a filter, shall we?

MATCH (me { name: "Chris" })--(friend)-->()
WITH friend, count(*) AS foaf
WHERE foaf > 1
RETURN friend

In this example, the query is matching `me` then any connections I have to another node, aliased by `friend`. The query then looks for an outgoing relationship, so we’re looking for a friend of a friend (hence foaf) here. The WITH is first of all passing the `friend` value, then is passing through a count of *, aliased with `foaf`. The `friend` value is required, otherwise the value couldn’t be returned. In this example if you tried to return `me` the query would fail, as it’s not passed onto the next stage of the query with WITH. The count is the sum of the friends my friend(s) have, as it’s the last part in the path. The count is then used in the WHERE to ensure only my friends that know one other person are returned.

A common use of WITH is to order your data before you return it which can be pretty useful, but there’s a bit more to it than that.

MATCH (me { name: "Chris" })--(f)
WITH f
ORDER BY f.name DESC
LIMIT 1
MATCH (f)--(fof)
RETURN fof.name

Since the MATCH has been aggregated with the WITH clause, it allows you to order the data before it’s returned, and then limit that result, and then use it again straight away. Here the query is getting the related nodes of the matched one, after they have been sorted by the `name` property, in descending order, and limiting them to one. This query is essentially working out which of my friends has the name closest to the end of the alphabet, finding their friends,`fof`, and returning their names.

Next up is being able to aggravate data so that it can be collected, and also returned. So for the times you’d be looping through rows and collecting certain pieces of data, such as names, then this saves you the loop.

MATCH (n)
WITH n
ORDER BY n.name DESC LIMIT 3
RETURN collect(n.name)

The WITH aggregates the nodes so they can be ordered, and collected.

UNWIND

There may be times when you’re querying data, where you have a collection, and you want to have rows. Well, that is just what UNWIND does, it takes collections of nodes, or arrays of data and splits them into individual rows again. When using UNWIND, the data must be aliased for the query to work. A very simple example is as follows:

UNWIND [ ’Chris’, ’Kyle’, ’Andy’, ’Dave’, ’Kane’] AS x
RETURN x

This would return all of the names in individual rows, rather than as a collection as they were passed in. You can also pass in structured data which can be iterated over within a query, and then used in combination with other clauses to make multiple changes, or you can even create nodes, too.

Using UNWIND in conjunction with MERGE (which we’ll get to soon) can lead to a very efficient query that can create and/or update nodes and relationships. This situation implied that structured data has been passed into it, allowing it to be used by Cypher. An example of the data being passed in would be:

{
  "events" : [ {
    "year" : 2014,
    "id" : 1
  }, {
    "year" : 2014,
    "id" : 2
  } ]
}

This data can then be passed through to UNWIND and then, each item within the array of data can be passed through to something like MERGE, or even a CREATE statement. This means that rather than doing many creates, you can pass through data in an array, and make a query to do all of the hard work that only has to be done once. This doesn’t mean to say you can’t use a transaction and run a lot of queries that way, it’s just another option.

UNION

This clause is used to return multiple queries, as if they were one, uniting them if you will. This can save you running multiple single queries, or to clean up the return statement of a more complex query. Say you had Tutor nodes and Pupil nodes, but just wanted names from both, UNION would be great there. If you were to return these in a normal statement you would need to return these values, probably aliased with two different things, such as `pupil_name` and `tutor_name` but with UNION, that’s not a problem, and on using this example, the query would look like so:

MATCH (n:Tutor)
RETURN n.name
UNION ALL MATCH (n:Pupil)
RETURN n.name

This would return the results from both queries in one result set, so in this case it would be an array of names. In this example both of the property names were the same, but this isn’t always the case, and doesn’t need to be to take advantage of UNION. As long as the values are returned with the same names, then it doesn’t matter if the name is the original property name, or an alias. An example of this can be seen as follows:

MATCH (n:Tutor)
RETURN n.tutor_name AS name
UNION ALL MATCH (n:Pupil)
RETURN n.pupil_name AS name

To use UNION you must first return everything you want from the first query, then add in UNION before performing the next query. You’ll notice that in this example, `ALL` is present, which essentially returns the exact result from each query, maintaining duplicates. If you were to remove `ALL` then any duplicates within the result set would be removed, it’s as easy as that.

It’s worth noting you can also combine multiple `UNION`’s, but whatever they return must have the same name, so be sure to use an alias (`AS`) to ensure what you’re returning from each query is consistently named with the others in the `UNION`.

USING

You’ll only really need this clause if you’re using a lot of indexes, because Neo4j takes care of which indexes to use automatically. There may be cases when Neo4j is using the wrong one and it’s causing problems, so by using `USING` you specify an index to use for a particular query. An example of that would be as follows:

MATCH (n:Person)
USING INDEX n:Person(name)
WHERE n.name = ’Chris’
RETURN n;

In the example the `WHERE` clause is used to filter down a result set, and since there’s already an index there, it can be selected with `USING`. Most of the time though, if you use indexes, then Neo4j will take care of a lot of the hard work for you. It’s worth noting though that you can use multiple indexes in one query, so it can be manually controlled.

MERGE

Although its name suggests that MERGE will merge your data, that’s not technically true. This clause ensures data exists within the graph, if it does the data will be merged and if not the data will be created. This sounds similar to how CREATE UNIQUE works, but MERGE is a lot more powerful.

When using MERGE, if all the properties in the query don’t match a returned node, then a new node will be created. A basic example of MERGE works like so:

MERGE (bill { name:’Bill’, age: 29 })
RETURN bill

Here, if there is a node with a `name` of `Bill` but there is no `age` set, then a new node would be created. The basic usage of MERGE is a mix of CREATE and MATCH, but there are some rules attached to this. The whole MERGE pattern matches, or it is created. This becomes even more important when using constraints, as this needs to return 1 node, or no nodes. If you were to perform a partial match, (such as multiple properties on the same node, as the same node would be returned for each property, which isn’t unique) then the query will fail, so be careful with that. Constraints can be a hugely useful with MERGE, as it means you can create unique nodes, and if there’s a problem, it’ll error.

MERGE (char:Person { name:’Charlotte’ })
RETURN char

If there was a constraint placed on the `name` property being unique (CREATE CONSTRAINT ON (n:Person) ASSERT n.name IS UNIQUE;) then this query would either match an existing node, or create a new one. You can also use the same logic on Relationships, using MERGE to create them as needed, to help once again with reducing duplicates within the code.

There is also a bit more control that can be gained by MERGE, with the use of ON CREATE and ON MATCH. Being able to use these clauses essentially gives the power to control the outcome of the query depending on if the query has MATCHed, or CREATEd a node, and you can use both in the same query too, which can be seen here:

MERGE (dave:Person { name:’Dave’ })
ON CREATE SET dave.created = timestamp()
ON MATCH SET dave.last_login = timestamp()
RETURN dave

This example will SET (which will be covered, next actually) a “created” date for Dave if the node doesn’t exist, otherwise the `last_login` property will be updated with a new value. Although in this example both “ON CREATE” and “ON MATCH” have been used, this isn’t required and they can be used independently of each other, as well as together. The only thing to keep in mind is that the query used must be specific enough to return 1 or 0 nodes, because on a database with many people, the odds of having multiple people called “Dave” is most definitely a possibility. To improve the example, a unique identifier (such as an e-mail) could be used which would only ever return a single row (or none) provided the data was always kept unique, that is.

SET

The SET clause is used to update Labels on nodes, and also properties on Nodes and Relationships. To use SET you must first match the node you want to update, then just set the values you want to on said node, or relationship, depending on the use case. A basic example can be seen like so:

MATCH (n { name: ’Chris’ })
SET n.username = ’chrisdkemper’
RETURN n

This would update the `Chris` node with the new username property, so if it didn’t have the property before it’ll be added, and if it did already exist it’ll be updated. In this example the updated node was returned, but this isn’t required and if it isn’t needed, the RETURN can be omitted.

It’s also possible to only add, and not update properties on a node, but using `+=` in the following way:

MATCH (n { name: ’Chris’ })
SET n += { username: ’chrisdkemper’ , level: ’admin’}

If the previous example was run first, then only the level part of this query would be respected, provided of course that the `level` property didn’t already exist on the node. Multiple values can also be updated at once, they just need to be separated by a comma for properties, or by chaining multiple labels, like so:

MATCH (n { name: ’Chris’ })
SET n.username = ’chrisdkemper’ , n.level = ’admin’

This would update/set these properties depending on if they previously existed on the node or not.

When dealing with Labels on Nodes, use the following:

MATCH (n { name: ’Chris’ })
SET n :Moderator:Admin
RETURN n

This would set `Moderator` and `Admin` as labels for the matched node. It’s also possible to remove a property by using SET, by setting a properties value to `NULL` which is essentially saying, you don’t exist. There is a dedicated REMOVE clause for this, but it’s still nice to know how it’s possible with SET, which is achieved like so:

MATCH (n { name: ’Chris’ })
SET n.level = NULL
RETURN n

By setting the `level` property value to NULL, it removes it from the node and would then need to be re-added to be used again.

shortestPath/allShortestPaths

After you’ve established a lot of data, or even if you haven’t, you’ll always end up wanting to find paths of some description, and there are functions in Cypher to achieve just that. If you don’t have any/much data of your own the movie database is always available to you (which we covered in Chapter 2) from the browser. Anyway, say you have two people, you may want to how what the shortest path is between them. There could be a direct connection via a mutual friend, or one person’s Grandma’s, friend’s, cousin may know the other person. Although that’s farfetched (and made up) example, one of those paths is a lot shorter than the other. Using Cypher to work out the path can be achieved like so:

MATCH (bill:Person { name:"Bill" }),(bob:Person { name:"Bob" }),
p = shortestPath((bob)-[*..5]-(bill))
RETURN p

Using `shortestPath` here allows the path to be returned and used in an application. The same concept can also be applied to physical locations, where you can imagine a path as a route. An example of this would be train stations, so although the physical location wouldn’t be taken into account (although this is possible, and will be discussed in Chapter 7), you can still see how far (or hops, as it is here) two stations are apart. In this case, however, inside the relationship square brackets it’s specified that the relationship must be within 5 hops, so any responses here will be close ones.

If this query wasn’t returning any results, it’d be possible to increase the maximum hops, or even remove the upper limit entirely, which would return the single shortest path regardless of length. When databases contain a lot of nodes, if possible it’s advisable to add a limit to the number of hops the path will take.

If you’d then want to look at all the paths, then instead of `shortestPath` being used `allShortestPaths` is in its place. This works in the same way as `shortestPath` it just returns all paths, rather than just the shortest.

Key Functions

Neo4j has a lot of functions that can change how a query works, the return values, and many other things. For example, there are a number of mathematical functions that are used, which may be required if you’re analyzing complex data, for example. If you’re in a position that requires the use of a specific mathematical function within your query, than that information can be found easily enough online. There are however a number of functions that can be quite useful, so rather than detailing every function, a smaller collection of these useful functions will be mentioned instead.

count

There could be many reasons that values need to be counted in Cypher queries, and Neo4j has your back with the count function. Its use is easy enough, and essentially has two forms. The first form is just counting all of the resulting rows from the queries RETURN clause, which on a very basic level can be something like:

RETURN n, count(*)

In this instance, any nodes returned from the MATCH will be counted, and then added to the result set. The other use case for count is when you know what you want to count, and it has been aliased with something you can count, like so:

MATCH (n {name: "Chris"})-->(x)
RETURN count(x)

In this case, any nodes related to `Chris` will be counted, as they’re aliased with `x` and that has been added to `count`. There is a chance that a query like this could have a lot of duplicates, which you can get around by using DISTINCT, so adding this to the previous example is as easy as:

MATCH (n { name: ’Chris’ })-->(x)
RETURN count(DISTINCT x)

This means the count will only include unique values. Count can also be used to count non-null property values on nodes too, which is just as simple:

MATCH (n:Person)
RETURN count(n.subscription_start)

This will count how many of the labeled nodes have that property with an actual value, as it skips `null` values.

length

The length function is essentially like count, but for paths or collection, and returns a number value based on either the path hops, or the number of items within a collection. The function can take any collection as an argument (as paths are returned as collections) so just remember that when using it, or you will have a bad time. Enough chat, an example of `length` is:

MATCH p=(a)-->(b)
WHERE a.name=’Chris’
RETURN length(p)

This query will return the length of every path returned (p) where the `Chris` node is related to any other node, so rather than the nodes themselves, the counts are returned. In more complex queries, length can be used to ensure paths are long enough, for instance, or just as an additional piece of data in a query.

type

In some queries, the type of a relationship doesn’t matter, you just care that it exists, rather than what its type is. When the type does matter, and it needs to be explicitly returned, then that’s where the `type` function comes in. It takes a relationship as an argument, then returns the type of the relationship supplied. An example of this is as follows:

MATCH (n)-[r]->()
WHERE n.name=’Chris’
RETURN type(r)

This query will find any relationships that the `Chris` node has, and then return the type of the relationship, so the more relationships, the more rows returned.

id

This function can be very useful, as it returns the actual id for a node or relationship within the database. When a node is created it’s assigned a numerical id, which cannot be set by a user. When a node is created, the previous node id is incremented and assigned to the new node. When a node is deleted, its ID then becomes available, so the next node created will get the newly available ID, rather than a new one. Relationships work in the same way, but rather than them both sharing the same set of IDs, nodes and relationships keep lists for this.

Although most of the querying in Cypher uses properties, sometimes you need the actual id, and the `id` function makes that easy. As with the other queries of its type, the node or relationship is first MATCH’ed and then passed into `id` where it is returned, which can look as simple as:

MATCH (n)
RETURN id(n)

This would return every node id in the database, as no query constraints have been added, but this would work in the same way, regardless of whether a WHERE or property filter was present. Speaking of WHERE, if you know a particular node’s id and want to be able to query against it, then you need to select that node via a WHERE clause, like so:

MATCH (n)
WHERE id(n) = 150
RETURN n
For multiple IDs, an IN could also be used, if you knew the IDs of the nodes you wanted to return, that is.

timestamp

The timestamp function has a very simple task, return the milliseconds between now and January 1, 1970 UTC (Unix/POSIX Time timestamp). This is similar to how various other timestamp functions work in other programming languages. This can be useful if you want to say, check the servers timezone by running a simple:

RETURN timestamp()

Which would then return the timestamp in milliseconds for you to see. It can also be used when setting properly, so adding dynamic signup or creation, last sign in, and a number of other date-based operations can be simplified by using the `timestamp` function.

nodes/relationships

When you see a function called `nodes` or `relationships` you can assume (given the names) its use has something to do with nodes or relationships (depending on which one is used), and you’d be right. The `nodes` function is used to return the nodes within a supplied path, with the `relationships` function being used to return the relationships present in a path. Both functions require a path to be supplied as an argument in order for them to work. An example of the `node` function would be:

MATCH p=(a)-->(b)-->(c)
WHERE a.name=’Chris’ AND c.name=’Kane’
RETURN nodes(p)

This will return all the nodes present in the path, but if `nodes` were to be replaced with `relationships` then instead of nodes, all the relationships present in the path would be returned.

labels

Although a lot of the time, nodes can be found using one label, what if you need all of the labels attached to a particular node? Well, that’s what the `labels` function is for. This function takes a node as an argument, and returns an array of all the labels that are attached to it. An example of this would be:

MATCH (n)
WHERE n.name=’Chris’
RETURN labels(n)

This would also work if more than one node were to be returned. A collection would be generated for each node being returned.

collect

This powerful little function allows the aggregation of data, so essentially makes many rows into one row, on a basic level. If you’re getting one particular property from a node, then having to process each row, it may well be easier to get one row with every value inside it. That’s what collect does, and it’s very easy to use. A basic example would be:

MATCH (n:Person)
RETURN collect(n.name)

This query would return one row, with an array of the values of the `name` property on every `Person` node that has a name set. It also ignores null values, so that doesn’t need to be considered either.

Summary

There’s a lot of information in this chapter, but hopefully if it doesn’t all go in on the first read, then this chapter will remain as a reference guide. We’ve gone from the very basics of building Cypher queries, to then making some complex and specific queries to ensure the data returned is as specific as possible. Of course this chapter doesn’t contain everything, and as the book progresses more practical uses for the different Cypher query constraints will be unearthed, but the basic usage and explanation will be here.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.13.76