Introducing indexing

Indexing is most certainly the number one feature that can boost performance in a database system, and if used correctly, it can simplify even the most complex queries.

But, what exactly is a database index? A database index is a data structure aimed at improving the performance of read queries at the cost of decreasing the performance of write queries and increasing the storage space required by the data.

As a metaphor, we can say that a database index is similar to a book's index Instead of looking through the entire book, the database takes a shortcut and uses an ordered list that points to the correct content, resulting in the queries that are the orders of magnitude faster.

When you interact with a database, you don't have to use an index; although, it is strongly recommended. A query that doesn't make use of an index is called a full table scan, which means that the database has to go through the entire table to find the query result. This process is basically similar to searching for something in a book without an index. You start at the beginning, and you read every page until you find what you're looking for. As you can imagine, table scans are extremely slow and inefficient.

RethinkDB supports various types of indexes:

  • Simple indexes are indexes constructed on the value of a single field within a document
  • Compound indexes are based on multiple fields
  • Advanced indexes are based on arbitrary ReQL expressions

Before we start creating and using indexes, let's try running a few example queries on the table that contains the data that we imported in the previous section.

First of all, let's run a simple query that outputs one document from the people table so that we can see how the documents are structured:

r.db('test').table('people').limit(1)

You might not get the exact same document, but the result of the query will be a JSON document similar to this:

{
    "age":52,
    "email":"[email protected]",
    "id":"00004713-a852-4865f578edf7949",
    "name":"Kathy",
    "surname":"Adkins"
}

As you can see from this output, all the documents in the people database have an id field that acts as the primary key and contains a random UUID, a name field, a surname field, an email field, and finally an age field.

Now, let's try a more complex query. Suppose, we want to retrieve all the documents of people whose first name is Alex. We can do this by running the following query:

r.db('test').table('people').filter(r.row('name').eq('Alex'))

This query doesn't use an index, so it performs a full table scan. What this means is that the database had to look at every one of the 30,000 documents and check if the name field was equal to Alex. As you can imagine, this is very inefficient.

Now that we know what indexes are, we're ready to start using them. However, it may be interesting to exactly understand how much impact indexing has on the query performance. For this reason, in the following section, we'll look at how to measure query performance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.244.101