Cypher is a declarative query language that makes it easy to write expressive and efficient queries to traverse, update, and administer a graph database. It provides a powerful way to express even complex graph traversals more simply so that you can concentrate on the domain aspects of your query instead of worrying about syntax.
Cypher expands upon a variety of established practices for querying. For example, the WHERE, ORDER BY, and CASE keywords are inspired by SQL syntax. Some of the list semantics are borrowed from Haskell and Python.
This chapter introduces the Cypher syntax with some examples. We will review the important aspects of the Cypher syntax and semantics in building graph traversal queries. We will discuss important keywords and the role they play in building the queries. We will take a look at the graph data model and how Cypher queries follow the data connections.
In this chapter, we will cover these aspects:
We should have Neo4j Desktop installed and have created a local instance. There are no special technical skills required to follow the concepts discussed in this chapter. Having knowledge of SQL can be useful in understanding Cypher, but is not required.
Before we look into the Cypher syntax to query the data, it is important to understand how the data is persisted as a graph. Data diagram representations give a good idea about how the Cypher queries can be written using the data model. We will take a look at a data diagram and see how it helps us with querying:
Figure 2.1 – Sample graph data diagram
This diagram shows how the data is stored in the database. Each node represents one entity that knows what relationship entities it is connected to and whether they are incoming or outgoing; each property is an entity that is associated with a node or relationship. Each relationship entity knows what nodes it is connected to and the direction of the relationship.
The preceding diagram tells us that a person named Tom owns two addresses. This person lives at one address and rents the other one. A person named Shelley lives at an address that is rented. If you read this diagram from the address perspective, it can be seen that the 1 Test Dr address is owned by Tom and is rented by Shelley, and the 5 Test Dr address is owned by Tom who also lives at that address.
If the preceding data diagram were represented as a graph data model, it would look like this:
Figure 2.2 – Sample graph data model
This diagram shows what possible relationships can exist between given node types. In real data, these relationships need not exist for all nodes. A graph data model shows how the data can be connected and can provide a starting point for building graph traversals using Cypher queries.
Cypher is like American Standard Code for Information Interchange (ASCII) art. A simple Cypher traversal query can look like this:
(A)-[:LIKES]->(B), (B)-[:LIKES]->(C), (A)-[:LIKES]->(C)
This also can be written as follows:
(A)-[:LIKES]->(B)-[:LIKES]->(C)<-[:LIKES]-(A)
If you notice the syntax, it reads more like a simple statement. A likes B, who likes C, who is also liked by A. Nouns represent nodes and verbs represent relationships.
Cypher supports various data types, which fall into three different categories.
The following are the different property types available in Cypher:
Property types can have the following characteristics.
Let’s move on to the structural types available in Cypher.
The following are the different structural types:
Let’s now move on to the composite types.
The following are the different composite types:
Now that we have reviewed the basic syntax of Cypher queries and property types, let’s look at the nodes syntax.
In Cypher, a node is surrounded by parentheses, (), making it resemble a circle in a diagram. Here are some example usages in Cypher:
Let’s move on to the relationships syntax.
In Cypher, a relationship can be represented using -->, which resembles an arrow on a diagram. Here are some example usages in Cypher:
Now that we have taken a look at the Cypher syntax and data types available, we will take a look at the keywords available in Cypher to build the queries.
In this section, we will introduce the Cypher keywords and their syntax. Detailed usage of these keywords will be covered in upcoming sections of the book.
Let us start by using the MATCH and OPTIONAL MATCH keywords.
The MATCH keyword allows you to specify the graph traversal patterns to find and return the data from Neo4j. It is most often coupled with a WHERE clause to filter out the results, and a RETURN clause to return the results.
MATCH (n)
RETURN n
MATCH (n:Movie)
RETURN n.title
MATCH (n {title: 'My Movie' } )
RETURN n.released
Caution
This is the most common mistake made in the early phases of learning Cypher. This query can cause a lot of issues to the DB server and should be avoided. We will discuss this more in the later sections.
MATCH (n)
WHERE n.title = 'My Movie'
RETURN n.released
The OPTIONAL MATCH clause works similarly to MATCH with the exception that when there is no data matching the pattern specified, it will not stop the execution and returns null as result. In SQL terms, you can think of it like LEFT JOIN.
Let’s go through a few OPTIONAL MATCH clause examples below:
MATCH (n:Movie)
RETURN n.title
OPTIONAL MATCH (n:Movie)
RETURN n.title
The MATCH clause will stop the query execution and returns no results if any of the MATCH segments do not return data. OPTIONAL MATCH will continue the next steps and return the data.
The following queries explain that behavior:
MATCH (m:Movie)
MATCH (p:Person)
RETURN m.title, p.name
MATCH (m:Movie)
OPTIONAL MATCH (p:Person)
RETURN m.title, p.name
OPTIONAL MATCH (m:Movie)
OPTIONAL MATCH (p:Person)
RETURN m.title, p.name
MATCH (:Movie {title:'Wall Street'})<-[:ACTED_IN]-(actor)
RETURN actor.name
Let us continue with creating and deleting data from the graph using the CREATE and DELETE keywords.
The CREATE clause will let you create new nodes and relationships:
CREATE (p:Person {name: 'Tom'})
RETURN p
CREATE (p {name: 'Tom'})
RETURN p
Caution
This is another common mistake. In Neo4j, labels are optional. By using this query, we create nodes without any labels, and querying for them would be very inefficient.
CREATE (p:Person {name:'Tom'})
-[:LIVES_AT]->
(:Address {city:'New York', country:'USA'})
Note
This query does not have a RETURN clause. For CREATE statements, RETURN is optional.
We will look at the usage of DELETE next:
MATCH (p:Person {name: 'Tom'})
DELETE p
This query finds a Person node with the name Tom and deletes the node and all the relationships this node is attached to from the database.
MATCH (p:Person {name:'Tom'}) DETACH DELETE p
This query deletes all the relationships with the LIVES_AT type for the Person node with the name Tom, keeping the node as it is.
MATCH (:Person {name:'Tom'})-[r:LIVES_AT]->() DELETE r
Caution
If the node is connected to many relationships, this can have a negative impact on the database. We will discuss options to do this safely in the upcoming sections.
This query deletes all the nodes and relationships in the database.
MATCH (n) DETACH DELETE n
Caution
Depending on how much data is in the database, this query can cause Out of Memory exceptions. We will discuss options for safe bulk deletion in the upcoming sections.
Let us continue with manipulating properties on nodes and relationships using the SET and REMOVE keywords.
The SET clause allows us to set the properties or labels on a node or set the properties on relationships. It can be used in conjunction with the MATCH or CREATE clauses:
MATCH (n:Person {name:'Tom'})
SET n.age = 20
MATCH (n:Person {name:'Tom'})
SET n:Actor
MATCH (n:Person {name:'Tom'})
MATCH (o:Person {name:'Joe'})
SET o=n
The REMOVE clause is used to remove or delete properties from nodes or relationships. It can also remove labels from nodes:
MATCH (n:Person:Actor {name:'Tom'})
REMOVE n:Actor
MATCH (n:Person {name:'Tom'})
REMOVE n.age
Let us continue with filtering data using WHERE and other keywords.
The WHERE clause is how we can use filter the data in Cypher. Cypher provides options for both an implicit WHERE qualifier and an explicit WHERE clause:
MATCH (n:Person {name:'Tom'})
RETURN n.age
In explicit WHERE usage, you get a lot more control over the expressions:
MATCH (n:Person)
WHERE
n.name = 'Peter'
XOR (n.age < 30 AND n.name = 'Timothy')
OR NOT (n.name = 'Timothy' OR n.name = 'Peter')
RETURN
n.name AS name,
n.age AS age
// Explicit WHERE label check
MATCH (n)
WHERE n:Swedish
RETURN n.name, n.age
// Implicit WHERE label check. Most common usage
MATCH (n:Swedish)
RETURN n.name, n.age
// Range query with less than a value
MATCH (n:Person)
WHERE n.age < 50
RETURN n.name, n.age
// Range query with in between values
MATCH (n:Person)
WHERE 30 < n.age < 60
RETURN n.name, n.age
// This query shows how to check for property
// existence. This finds all Person nodes which have // title property.
MATCH (n:Person)
WHERE n.title IS NOT NULL
RETURN n.name, n.title
MATCH (n:Person)
WHERE n.title IS NULL
RETURN n.name
// Explict WHERE condition
MATCH (n:Person)
-[k:KNOWS]->(f)
WHERE k.since < 2000
RETURN f.name
// Implicit WHERE condition
MATCH (n:Person)
-[k:KNOWS {since: 2000}]->(f)
RETURN f.name
MATCH (n:Person)
WHERE n.name STARTS WITH 'Tom'
RETURN n.name, n.age
MATCH (n:Person)
WHERE n.email =~ '.*\.com'
RETURN n.name, n.age, n.email
MATCH (n:Person)
WHERE n.name =~ '(?i)ANDY.*'
RETURN n.name, n.age
MATCH
(tom:Person {name: 'Tom'}),
(other:Person)
WHERE
other.name IN ['Andy', 'Bob']
AND (other)(tom)
RETURN other.name, other.age
MATCH
(person:Person),
(tom:Person {name: 'Tom'})
WHERE NOT (person)-->(tom)
RETURN person.name, person.age
MATCH (n:Person)-[r]->()
WHERE n.name='Tom' AND type(r) =' 'C.*'
RETURN type( r ), r.since
Note
While this is valid syntax, its usage should be limited, as it may lead to non-performant queries. It should be limited to scenarios where you might not know all existing relationship names or relationship names can be dynamic.
In WHERE clauses, you can also use existential subqueries. The syntax of these queries looks like this:
EXISTS { MATCH [Pattern] WHERE [Expression] }
This allows you to specify a subquery as an expression. The WHERE clause in the subquery is optional, shown as follows.
MATCH (person:Person)
WHERE EXISTS {
MATCH (person)-[:HAS_DOG]->(:Dog)
}
RETURN person.name AS name
MATCH (person:Person)
WHERE EXISTS {
MATCH (person)-[:HAS_DOG]->(dog:Dog)
WHERE person.name = dog.name
}
RETURN person.name AS name
Existential subqueries can also be nested.
Let us continue manipulating data using the MERGE keyword.
A MERGE clause is an upsert operation. It will check for the existence of the node or path and if it doesn’t exist, it tries to create the node or path as applicable:
MERGE (p:Person)
RETURN p
Note
Remember this creates only one node. When you run this multiple times, it does not create multiple nodes.
The MERGE operation is not thread-safe. If you run the same query in parallel in multiple threads, it can create multiple nodes. To avoid this, you should use constraints.
MERGE is often used to make sure we do not create duplicate nodes and relationships in the database:
MERGE (p:Person {name: 'Tom', age: 30})
RETURN p
At runtime, a MERGE statement lets us know whether a new node is being created or whether a handle to an existing node is returned. We can identify these scenarios using ON CREATE and ON MATCH clauses in conjunction with a MERGE clause. Both of those clauses are optional:
MERGE (p:Person {name: 'Tom', age: 30})
ON CREATE
SET p.created = timestamp()
ON MATCH
SET p.updated = timestamp()
RETURN p
If any of these conditions is false, it will create a Person node named Tom, a Person node named Andy, and a KNOWS relationship between those nodes. The MERGE clause will try to create the whole path as it is provided. It is immaterial whether a Person node named Tom or Andy already exists. If you had a unique constraint on the name and if any of the nodes already existed, then this query would fail with a constraint error. If all the conditions are false or all of them true, then there won’t be any error:
MERGE (tom:Person {name: 'Tom'}) -[:KNOWS]-> (andy:Person {name: 'Andy'})
This query follows these steps:
MERGE (tom:Person {name: 'Tom'})
MERGE (andy:Person {name: 'Andy'})
MERGE (tom)-[:KNOWS]->(andy)
For the MERGE clause, if you pass a variable as part of the path, it will not try to recreate that node.
Note
You should be very careful with MERGE to make sure you do not create multiple nodes and relationships. For that, you should understand PATH MERGE as explained previously to make sure you do not get errors with MERGE statements or create duplicated nodes. Also, note that the MERGE clause cannot be used with an explicit WHERE clause.
Let us continue with iterating the lists using the FOREACH keyword.
A FOREACH clause will let you iterate over a list of values and perform write operations using CREATE, MERGE, DELETE, SET, or REMOVE clauses. You cannot use a MATCH clause within a FOREACH clause:
FOREACH(
n in nodesList |
SET n.marked = true
)
Let us continue with other means of iterating the lists using UNWIND.
An UNWIND clause converts a list into rows so that each entry can be processed:
WITH [1,2,3,4] as list
UNWIND list as x
RETURN x
Let’s visualize the output:
Figure 2.3 – UNWIND usage
Most of the time, an UNWIND clause is used to load batches of data into Neo4j. An example usage of this is shown next.
UNWIND $events AS event
MERGE (y:Year {year: event.year})
MERGE (y)<-[:IN]-(e:Event {id: event.id})
RETURN e.id AS x ORDER BY x
Note
As mentioned, this is the most common pattern used to load data in batches from a client. Also, you can notice here a PATH MERGE operation.
Next, let us take a look at returning data from queries using RETURN and other keywords, such as WITH, RETURN, ORDER BY, SKIP, and LIMIT.
A WITH clause allows queries to be chained together, making sure that the results from one query are piped to the next as starting data points for the next query.
A RETURN clause is the last part of the query.
ORDER BY, SKIP, and LIMIT clauses can be used with either WITH or RETURN clauses:
MATCH (tom:Person {name: 'Tom'})
-[:KNOWS]->(other)
WITH other
WHERE other.name STARTS WITH 'An'
RETURN other.name as otherName, other.age as age
ORDER BY age DESC
// Using SKIP and LIMIT along WITH clause
MATCH (p:Person)
WHERE p.name STARTS WITH 'A'
WITH p.name as name
SKIP 3
LIMIT 3
RETURN name
// Using SKIP and LIMIT with RETURN clause
MATCH (p:Person)
RETURN p.name
SKIP 3
LIMIT 3
Next, let us take a look at the usage of the UNION keyword.
A UNION clause is used to combine the results of two or more queries. Each query must return the same number of values with the same alias names:
MATCH (n:Actor)
RETURN n.name AS name
UNION
MATCH (n:Movie)
RETURN n.title AS name
MATCH (n:Actor)
RETURN n.name AS name
UNION ALL
MATCH (n:Movie)
RETURN n.title AS name
Now, let us take a look at how to use indexes and constraints in Cypher.
Indexes and constraints play a critical role in obtaining optimal performance from the database, along with making sure data integrity is maintained.
The available index options are as follows:
The available constraints are as follows:
Now let us take a look at how to create an index on a single property.
Single-property indexes are indexed on a single property name of a node label or relationship.
This is how we can create a single property index on a node. This creates an index on the name property of nodes with the Person label. The highlighted sections are optional. If we don’t specify a name, then a generated name will be assigned to the index. The IF NOT EXISTS option will create the index only if an index for this name or combination does not yet exist:
CREATE INDEX person_name IF NOT EXISTS FOR (n:Person) ON (n.name)
Note
Remember the index is associated with a label, and only one label. When you query, you must specify this label in the MATCH query to be able to leverage the index.
This is how we can create a single-property index on a relationship:
CREATE INDEX knows_since IF NOT EXISTS FOR ()-[r:KNOWS]->() ON (r.since)
Now that we have seen how to create a single-property index on a node and relationship, let’s look at creating composite property indexes.
Composite indexes function similarly to single-property indexes but on combinations of two or more properties:
CREATE INDEX person_name_age IF NOT EXISTS FOR (n:Person) ON (n.name, n.age)
In most scenarios, a single-property index on each property might be more efficient.
The same approach can be used to create a composite property index on relationships also.
A text index is the same as a single-property index with the exception that it will only recognize and apply to string values. If there are other types of values assigned to that property, then they are not included in the index.
This creates a TEXT index on the Person name. If you create a TEXT index on a property that does not contain string values, then it is as good as not having an index on that property:
CREATE TEXT INDEX person_name_text IF NOT EXISTS FOR (n:Person) ON (n.name)
Note
The TEXT index can only be used as a single-property index.
It is possible to create a TEXT index on a relationship property also.
Now, let us take a look at creating full-text indexes.
Full-text indexes are Lucene native indexes. These can support multiple labels and properties to create a single index.
The query creates a full-text index for Movie and Book node labels for the title and description properties:
CREATE FULLTEXT INDEX full_text IF NOT EXISTS FOR (n:Movie|Book) ON EACH (n.title, n.description)
We can create a full-text index on relationship properties also.
Now, let us take a look at unique node property constraints.
We can create unique constraints on node or relationship properties.
This creates a unique constraint on the Person name property:
CREATE CONSTRAINT person_name_c IF NOT EXISTS FOR (n:Person) REQUIRE n.name IS UNIQUE
The unique constraint is backed by an index automatically.
Existence constraints can make sure a property exists on the node or relationship.
This query enforces that the name property exists when a Person node is created:
CREATE CONSTRAINT person_name_e IF NOT EXISTS FOR (n:Person) REQUIRE n.name IS NOT NULL
Existence constraints can be created on relationships as well.
Node key constraints are similar to primary keys in the RDBMS world.
Here, we are showing a node key with multiple properties. It can be with a single property also:
CREATE CONSTRAINT person_name_age IF NOT EXISTS FOR (n:Person) REQUIRE (n.name, n.age) IS NODE KEY
Let’s summarize our understanding of this chapter.
In this chapter, we have covered these aspects: basic Cypher syntax, nodes syntax, relationships syntax, data types available in Cypher, keywords available in Cypher, working with indexes, working with Constraints, and working with full-text indexes.
By now, you should be aware of the basic Cypher aspects and should be able to build basic Cypher queries.
Certain advanced aspects, such as subqueries and so on, are covered in later chapters, and built-in functions are introduced as we learn more about Cypher.
You can find the latest Cypher documentation in the Neo4j Cypher Manual (https://neo4j.com/docs/cypher-manual/current/).
You can also find a quick reference guide at https://neo4j.com/docs/cypher-refcard/current/.
In the next chapter, we will start building graph models and using Cypher to start loading data. We will be leveraging the concepts learned in this chapter to achieve that.
44.211.24.175