Advanced queries

Over the past chapters, we've looked at quite a few different queries. We've seen how to create tables, insert and remove data, filter data based on conditions, and efficiently query data using queries.

Although, these queries are the most common types of queries you will use, sometimes you may want to write more complex queries, especially when working with large datasets. In this section, you will learn new ReQL commands and use RethinkDB's advanced features to run more advanced queries.

We will start by looking at three extremely useful queries.

Limits, skips, and sorts

The most common queries in a database system are read queries, which make use of RethinkDB's filter command or, in case you're using an index, the getAll command. There are a number of query options that can be appended at the end of these queries; these include limiting the number of documents returned, skipping a number of documents, and ordering documents.

The ReQL query language is extremely expressive, so these options can be chained to the base query, and they can also be used all together.

The limit command

To set a limit to the number of documents returned by the query, chain the limit command to the end of your query. For example, to only return three documents from the people table, run the following command:

r.db("test").table("people").limit(3)

As you can imagine, only the first three documents will be returned from the database. It's useful to know that the limit command sets an upper limit but not a lower limit. If there are fewer than three documents matching the query in the table, only the number of matching documents will be returned.

The skip command

The skip command works just as the limit command. It skips a number of elements from the start of the result sequence.

Suppose we run this query:

r.db("test").table("people").skip(3)

This will skip the first three matches and return the rest of the matches. If, by any chance, the number of matching documents is less than the number specified by the skip command, no document will be returned.

The limit and skip commands can also be chained together in one query. Suppose, we want to limit the results to 10 documents, but we also want to skip the first five; the following query exactly does this:

r.db("test").table("people").skip(5).limit(10)

Sorting documents

The ReQL query language provides the orderBy command to sort documents. This command accepts the name of a field or index that can be used to sort as the input.

Sorting direction can be r.asc (ascending) or r.desc (descending). For example, to sort documents based on the age field in the ascending order, run the following query:

r.db("test").table("people").orderBy('age')

Let's look at another example. This time, let's sort the documents alphabetically based on the surname in the descending order, limiting the results to the first 10 documents and skipping the first five results:

r.db("test").table("people").orderBy(r.desc('surname')).skip(5).limit(10)

As you can see, this query is a bit more complex; however, ReQL is such a flexible language that even the most complex queries can be constructed simply by chaining commands together.

Although, these two examples are valid queries and provide the correct result, they are extremely inefficient as they require a full table scan. RethinkDB must access each and every document in the database to construct the correct order. As you can see from the following screenshot, the last query took more than 3 seconds on my server:

Sorting documents

Fortunately, RethinkDB also allows you to sort documents using an index. As an example, suppose we want to speed up the last query; we can create a simple index on the surname field to do this. We can do this from the web interface, but it's probably much faster doing it by running this simple query:

r.db("test").table("people").indexCreate('surname')

As usual, we need to wait a few seconds for the index to be built. When the index is ready to be used, we can run the previous query once again, this time using the index that we just created:

r.db("test").table("people").orderBy({index: r.desc("surname")}).skip(5).limit(10)

The resulting documents are exactly the same as before; however, let's take a look at the query profiler:

Sorting documents

Without the index, the query time was more than 3 seconds, whereas now, the query took just 16 milliseconds. What an improvement!

Finding a random document

A fairly common task in a database is to get a random document from a table. Other databases usually make this task surprisingly difficult; RethinkDB, however, provides us with a simple command called sample, which does just that.

If we want one random document from the people database, we can run the following query:

r.db("test").table("people").sample(1)

The argument passed to the sample command indicates the number of random documents that we want from the database.

Grouping

In a database system, grouping allows you to group data based on certain fields and combine their values.

RethinkDB provides us with the group command, which receives the name of a field as input and partitions the dataset into multiple groups based on the provided field.

This kind of aggregation query is extremely useful as it allows us, for example, to know how many documents there are in a specific group based on a field. If we want to organize the documents in groups based on the name of each person, we can use the group command:

r.db("test").table("people").group("name").count()

The result of the query will be as follows:

Grouping

As you can see, the results contain the name of the group (in our example, the name of the person) and the number of documents (that is, the number of people) in that group.

These types of queries become even more useful when you combine them with other commands such as orderBy. As an example, we can modify the previous query to find the five most popular names in the database:

r.db('test').table('people').group('name').count().ungroup().orderBy(r.desc('reduction')).limit(5)

This query first groups all documents by name, then calculates the number of items in each group. Finally, we use the ungroup command to turn the data into an array, order the results in descending order, and limit the results to five documents.

The final result will be this:

[
    {
        "group": "Beth" ,
        "reduction": 214
    } ,
    {
        "group": "Claire" ,
        "reduction": 214
    } ,
    {
        "group": "Neal" ,
        "reduction": 212
    } ,
    {
        "group": "Chris" ,
        "reduction": 208
    } ,
    {
        "group": "Wesley" ,
        "reduction": 196
    }
]

As we can see from this JSON document, the most popular names in the dataset are Beth and Claire, which each have 214 occurrences in the people table.

Aggregations

RethinkDB provides us with a bunch of aggregation commands that are easy to use and extremely useful.

We'll have a quick look at the most popular ones.

Average

The average command calculates the average of all values within a specified field in a sequence of documents. If we want to know the average age of the people in our dataset, we can use the avg command by passing the age field as a parameter:

r.db('test').table('people').avg('age').round()

We also chain the round command at the end of the query to round the resulting number. The result will be 29, which is the average age.

Maximum

The max command works in a very similar way to the avg command; calling the max command and passing the name of a field as a parameter will result in the maximum number in the sequence.

As an example, let's calculate the age of the oldest person in our dataset. The query will be as follows:

r.db('test').table('people').max('age')

The result will be the JSON document with the oldest person in the dataset:

{
    "age": 59 ,
    "email": [email protected], ยป
    "id": "06b6e9ae-da07-4625-aae0-2dbf7166692d" ,
    "name": "Dan" ,
    "surname": "Lawrence"
}

The pluck command

Sometimes, we don't want the entire JSON document, but we're only interested in one of the fields in the JSON query result. We can specify the field to keep in the resulting JSON by chaining the pluck command at the end of the query.

Building on from the previous example, if we don't want the entire JSON but we're only interested in the age, we can modify the query:

r.db('test').table('people').max('age').pluck("age")

The result will be as follows:

{
    "age": 59
}

As you can see from these simple examples, aggregations provide a very useful way to gather information and express it in a summarized form, such as a statistical analysis.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.237.154