Chapter 4. Improving Performance

Using Neo4j has several advantages, including a very natural model that allows you to easily express complex schemas with a lot of relations as well as Atomicity, Consistency, Isolation, and Durability (ACID) transactions. It gives a great performance compared to relational databases.

Performance is a key feature in some scenarios and drives developers and architects to choose Neo4j. In this chapter, you will learn about:

  • Several performance issues that you face while using Cypher and some possible solutions
  • How to profile a Cypher query to estimate its computational cost and its cost in terms of number of times I/O is accessed
  • How to use the schema to preserve the integrity of the database and take advantage of it to improve performance

Performance issues

We will focus on two common types of performance issues we face with Cypher. They are as follows:

  • A long and complex query is executed too slow. The query can be a complex read-write query or a read-only query with aggregation, complex computations, and sorting.
  • A small query is repeated many times, for example, in a loop cycle. The query itself does not underperform, but repeating it many times causes the whole operation to take a long time to finish.

Now, suppose that we have a huge database with a lot of nodes; for example, a database for a social network. In fact, if we are experiencing performance problems with small databases, we probably need to check the hardware, or the operating system, or the configuration of Neo4j because it's unusual to experience a very slow query with small datasets. In this chapter, we will see some configuration options of Neo4j that are useful when tuning the performance of the database.

To simulate performance issues, the example used in this chapter must have a lot of data. To fill our example database, we can create a number of nodes, with Neo4j Embedded, by writing some lines of the Java code in a loop. This is described in the following query:

for (int i = 0; i < 1000; i++) {
  try (Transaction tx = graphDb.beginTx()) {
    Node node = graphDb.createNode();
    node.setProperty("email", "user" + i + "@learningcypher.com");
    node.setProperty("userId", i);
    node.addLabel(DynamicLabel.label("User"));

    if(i % 100 == 0)
       node.addLabel(DynamicLabel.label("Inactive"));

    tx.success();
  }
}

Each cycle, which is repeated 1000 times, creates a node, sets two properties (email and userId), and sets a label (User) to the node. The DynamicLabel.label("User") call allows us to specify a label without having to declare a statically-typed label implementation. Finally, after every 100 nodes, one node is set the Inactive label. Each step is executed in a transaction; this is a requirement as we are accessing the database. The problem here is that we have a lot of tiny transactions; every time we commit a transaction, Neo4j will access the disk to persist the data, leading to a performance overhead.

Alternatively, using any script language, you can generate a long Cypher query that joins a number of nodes to create then copy and paste the query in Neo4j Browser's prompt. This is described in the following query:

CREATE (:User {email: '[email protected]', userId: 0 }),
       (:User {email: '[email protected]', userId: 1 }),
       (:User {email: '[email protected]', userId: 2 }),
       (:User {email: '[email protected]', userId: 3 }),

The preceding query will create the same data in a single transaction. Neo4j will save the data in the memory and will persist everything simultaneously at the end. The problem here is that the Cypher engine must parse and process a huge string to translate it into a real operation that will be performed on the database. In the next sections, we will see the performance issues we have found in these two approaches in detail and how to get rid of them.

In the code bundle, which can be downloaded for free from the Packt Publishing website (http://www.packtpub.com/support), you will find the whole script generated there as well as the Java project with the Neo4j embedded example code. We'll see these in the rest of the chapter so that you can set up the database that is used as an example in this chapter.

Tip

From Neo4j 2.1 onwards, bulk creations can be performed by reading from a comma-separated values (CSV) file with the following new clause:

LOAD CSV FROM "file.csv"
CREATE (:User{email: csvLine[0], userId: csvLine[1] })
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.26.185