Modifying existing data

The last query in the previous section creates a relationship between two nodes. If we run that query twice, we will have two relations between those nodes. In most cases, this redundancy is unnecessary and useless for us. Suppose our social network was online and had a button called "Add Friend". In this scenario, if two users, say A and B, click on this button at the same time to add each other as friends, the relation would be doubled in the database. This is a waste of storage. In this context, we need to check the database and create the relation only if it does not exist. This is why an OPTIONAL MATCH clause is required to prevent double storage. This is illustrated in the following query:

MATCH (a:User {name: "Jack", surname: "Roe"}), 
      (b:User {name: "Jack", surname: "Smith"})
OPTIONAL MATCH (a) -[r:Knows]- (b)
WITH a,r,b
WHERE r IS NULL
CREATE (a) -[rn:Knows]-> (b)
RETURN a,rn,b

This query, first of all, finds the users Jack Roe and Jack Smith in the database (the MATCH clause), then checks whether they are connected through a relation of the type Knows (the OPTIONAL MATCH clause). If not, (r IS NULL means that the relation cannot be found) the CREATE command that follows will create a relationship between the nodes. The WITH clause is necessary to apply the WHERE clause to the whole query. If the WITH clause is not used, the WHERE clause is applied only to the OPTIONAL MATCH clause.

If you run the preceding query after the query mentioned in the Creating relationships between existing nodes using read-and-write queries section, you'll get no rows. This is because the relationship is already created in the database. Clearly, this query isn't easy to read or write and it's error-prone. For these reasons, Cypher provides us with two keywords to deal with existing data.

Creating unique patterns

The complexity of the preceding query is due to the fact that we have to check the nonexistence of a relationship before creating it. This is because we want that relationship to be unique. Fortunately, Cypher provides us with a command that wraps such a check and ensures that the pattern specified is unique in the database.

For example, we can rewrite the preceding query using the CREATE UNIQUE command, as shown in the following query:

MATCH (a:User {name: "Jack", surname: "Roe"}), 
      (b:User {name: "Jack", surname: "Smith"})
CREATE UNIQUE (a) -[rn:Knows]-> (b)
RETURN a,rn,b

Using the CREATE UNIQUE command in the preceding query saved us from writing the entire OPTIONAL MATCH and WHERE clauses. My preferred motto is that the more code you write, the more bugs you hide; here, the latter is the preferred choice.

However there are two important differences between the the preceding query and the one in the previous section. They are as follows:

  • If the CREATE UNIQUE command finds the relationship multiple times in the database, it will throw an error. For example, if two instances of the Knows relationship exist between the users Jack Roe and Jack Smith, then the query with the CREATE UNIQUE command will fail with an error, while the query with the OPTIONAL MATCH command will succeed (it will not create the relationship). Anyway, both the CREATE UNIQUE and the OPTIONAL MATCH commands won't make any modifications to the database. This difference is not a disadvantage of the CREATE UNIQUE command, rather an advantage. An error thrown by the query means that the database is corrupted as it has multiple instances of a relationship (or any pattern) that should be unique.

    Note

    In the next chapter, we will learn how to enforce certain assertions using constraints.

  • The query with the OPTIONAL MATCH command returns a row only if it creates a new relationship. However, the query with the CREATE UNIQUE command will return a result if it finds a relationship or creates a new one. This can be a useful feature in some contexts; we can know the state of certain paths in the database after the CREATE UNIQUE command is executed without performing another read-only query.

Yet, the CREATE UNIQUE command can be even more useful. Suppose we don't know if a user named Jack Smith has been created; if not we have to create and link it to the user Jack Roe. Consider the following read-and-write query:

MATCH (a:User {name: "Jack", surname: "Roe"})
CREATE UNIQUE (a) -[rn:Knows]-> 
                 (b:User {name: "Jack", surname: "Smith"})
RETURN a,rn,b

First of all, it looks for the user Jack Roe in the database, binding it to the variable a. If it cannot be found, the query will finish the execution and return zero rows. Otherwise, it executes the CREATE UNIQUE command, and there are four possible scenarios, which are listed as follows:

  1. The full path already exists and it is unique; we have the user node Jack Roe with exactly one relationship with the user node Jack Smith. In this case, the existing nodes are bound to the variables a, rn, and b. Then, these variables are returned as result.
  2. Neither the Jack Smith node nor the relationship exists in the database. In this case, the CREATE UNIQUE command creates the full path. The new relation is bound to the variable rn, while the new node is bound to the variable b.
  3. When there are multiple paths, the path (a)-[:Knows]-(b) exists multiple times. For example, the Knows relationship exists multiple times between the nodes. If this happens, a Neo.ClientError.Statement.ConstraintViolation error is thrown because the CREATE UNIQUE command can't deal with multiple patterns.
    Creating unique patterns
  4. Both Jack Roe and Jack Smith exist in the database as nodes, but there is no Knows relationship between them. As the matching follows the all-or-none rule, the Cypher engine creates a new Jack Smith node and a new relationship bound to the variable rn. This is due to the fact that the purpose of the CREATE UNIQUE command is to ensure that a whole pattern is unique in the graph and if the node already exists but not the relationship, we do not have the whole pattern in the graph.

The last scenario could be a problem because we would have duplicated a user in the database. We can resolve this issue using the MERGE clause, which is discussed later in the chapter.

To summarize, the following diagram shows how the CREATE UNIQUE clause works:

Creating unique patterns

Complex patterns

Just as the MATCH and the CREATE clauses, you can join simple patterns to describe a complex one. Consider the following query:

MATCH (a:User {name: "Jack", surname: "Roe"})
CREATE UNIQUE (a) -[kn:Knows]-> 
                   (b:User {name: "Jack", surname: "Smith"}),
              (a) -[cw:Colleague]-> (b)

This query creates two relationships between two users. Only the relationships not found in the database are created. If you launch this query after the query from the previous section, you'll get a the message Created 1 relationship, returned 0 rows in 307 ms.

In fact, the relationship Knows and the user Jack Smith were already in the database, while the Colleague relationship was missing. If all of them exist, this query makes no modifications to the graph. The second time you launch this query, you'll get the result Returned 0 rows in 229 ms, which means that neither relationships nor nodes were created.

Note that the CREATE UNIQUE command looks for a path that exactly matches the pattern. So, for example, the following query won't match either the existing user node or the existing relationship. Instead, it will create a new relationship and a new node.

MATCH (a:User {name: "Jack", surname: "Roe"})
CREATE UNIQUE (a) -[rn:Knows {friend: true}]-> 
     (b:User {name: "Jack",surname: "Smith", age:34})

In fact, we haven't set the age property to the user Jack Smith in our database. However, this could return weird results in some cases (as the preceding example). How to update the user node without creating a new user if a new property is found in the pattern? Again, this issue can be solved using the MERGE clause.

Setting properties and labels

First of all, we need to know how to set the property of an existing node. The SET clause is just the ticket. Let's start with an example. Consider the following query:

MATCH (a:User {name: "Jack", surname: "Roe"})
SET a.age = 34
RETURN a

This query takes the user node Jack Roe and sets the age property for it; then, it returns the updated node. Neo4j Browser shows the result as Set 1 property, returned 1 row in 478 ms.

Note that the SET clause here works on the nodes found using the MATCH clause. This means that we can set a property on a huge list of nodes if we don't write the MATCH clause carefully. The following query sets the city property on all the nodes with the surname property Roe:

MATCH (a:User {surname: "Roe"})
SET a.place = "London"
RETURN a

In our database, this query updates three nodes: Jane, Jack, and Mary Roe. Neo4j Browser shows the result as Set 3 properties, returned 3 rows in 85 ms.

Again, you can change several assignment expressions to make more property changes at the same time. For example, to set the country as well, the query will be as follows:

MATCH (a:User {surname: "Roe"})
SET a.place="London", a.country="UK"
RETURN a

The syntax to set a property to a relationship is the same, as shown in the following query:

MATCH (:User{surname: "Roe"})-[r:Knows]-() 
SET r.friend = true

This query finds all the Knows relationships of users with the surname property Roe and sets the property friend to true for all of them.

Cloning a node

The SET clause can also be used to copy all the properties of a node to another. For example, to copy the node x to the node y, use the following query:

SET y = x

Note that all of the destination node's properties will be removed before the node is copied.

Copying a node is useful when a node needs cloning. For example, in our social network, there could be a function to create an alias identity; the user could start cloning his/her own identity and then modify the new one. This command can be coded as shown in the following query:

MATCH (a:User {name: "Jack", surname: "Roe"})
CREATE (b:Alias)-[:AliasOf]->(a)
WITH a,b
SET b = a
RETURN a,b

This query, once it finds the user node to clone, creates a new node with labels Alias and User and have a relationship with the source node of the type AliasOf. Then, it copies all the properties from the source node to it and finally returns the node. The command SET b = a doesn't affect the labels of the node b or its relationships; it just copies the properties.

Adding labels to nodes

The SET clause can also be used to add one or more labels to a node, as shown in the following query:

MERGE (b:User {name: "Jack", surname: "Smith"})
SET b:Inactive

The only difference is that we need to use the label separator instead of the property assignment. To chain more labels, just append them with the separator, as shown in the following query:

MERGE (b:User {name: "Jack", surname: "Smith"})
SET b:Inactive:NewUser:MustConfirmEmail

Merging matched patterns

The MERGE clause is a new feature of Cypher, introduced by Neo4j 2.0. The features of the MERGE clause are similar to those of the CREATE UNIQUE command. It checks whether a pattern exists in the graph. If not, it creates the whole pattern; otherwise, it matches it. The main difference is that the pattern doesn't have to be unique. The other differences are as follows:

  • The MERGE clause supports the single node pattern
  • The MERGE clause allows users to specify what to do when the pattern is matched and what to do when the pattern is being created

In an earlier section, we saw two issues with the CREATE UNIQUE command. They are as follows:

  1. How to create a new node if the pattern does not match, but match the existing node if the node exists?
  2. How to set the variables when merging nodes and relationships?

To answer the first question, let's recall the second query from the Creating unique patterns section:

MATCH (a:User {name: "Jack", surname: "Roe"})
CREATE UNIQUE (a) -[rn:Knows]-> 
                 (b:User {name: "Jack", surname: "Smith"})

Now, if the intent of this query is to match an existing Jack Smith user node before creating a relationship to it, it will fail. This is because if the relationship does not exist, a new Jack Smith node will be created again. We can take advantage of the single node pattern supported by the MERGE clause and write the following query:

MATCH (a:User {name: "Jack", surname: "Roe"})
MERGE (b:User {name: "Jack", surname: "Smith"})
WITH a,b
MERGE (a) -[rn:Knows]-> (b)
RETURN a,rn,b

To accomplish our goal, we had to split the query in two parts using the WITH clause. The first step is to find the Jack Roe user node in the graph with the MATCH clause. Then, the first MERGE clause ensures that a node with exactly two properties—the name Jack and surname Smith—exists in the database. In the latter part of the query, the focus is on the relationship Knows between the two nodes involved; the second MERGE clause ensures that the relationship exists after the execution. What happens if the Jack Smith user exists twice in the database and the nodes are already related? The MERGE clause wouldn't fail; it would succeed, returning two rows.

Note

In the next chapter, we will learn how to create constraints in the database to ensure that it won't ever create nodes with the same property value.

Now, about the second problem of how to set properties during merging operations, the MERGE clause supports two interesting features. They are as follows:

  • ON MATCH SET: This clause is used to set one or more properties or labels on the matched nodes
  • ON CREATE SET: This clause is used to set one or more properties or labels on the new nodes

For example, suppose that we want to set the Jack Smith user node's place property to London only if we are creating it, then the following query can be used:

MERGE (b:User {name: "Jack", surname: "Smith"})
ON CREATE SET b.place = "London"

If at the same time, we want to set his age property to 34 only if the user already exists, then the following query can be used:

MERGE (b:User {name: "Jack", surname: "Smith"})
ON CREATE SET b.place = "London"
ON MATCH SET b.age = 34

Clearly, when we want to set a property in both cases, you can just append a SET clause to a MERGE clause, as shown in the following query:

MERGE (b:User {name: "Jack", surname: "Smith"})
SET b.age = 34

Note

Once you learn how to use the MERGE clause and the CREATE UNIQUE command, you may wonder when to use either of these. As a general rule, when in doubt, you should use the CREATE UNIQUE command when the pattern is conceived as a whole path that must be unique in the graph.

Idempotent queries

In certain applications, such as websites with several client types, parallel applications, and so on, some commands happen to be sent multiple times from external layers to the backend. This is due to a number of reasons, for example, user interfaces are not up to date, users can send a command multiple times, synchronization issues, and so on. In these cases, you could get the command to be executed multiple times; clearly you don't want the second or the nth execution to have an effect on the database. Commands that are executed once but have no effect when executed multiple times again on the same graph later are idempotent. Both MERGE and SET clauses allow you to write idempotent commands that nowadays are very useful in these growing contexts.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.163.208