Chapter 17. Seeing What Your Application Is Doing

Once you have an application up and running, how do you know what it’s doing? This chapter covers how to figure out what kind of queries MongoDB is running, how much data is being written, and how to investigate what MongoDB is actually doing. You’ll learn:

  • How to find slow operations and kill them

  • Getting and interpreting statistics about your collections and databases

  • Using command-line tools to give you a picture of what MongoDB is doing

Seeing the Current Operations

An easy way to find slow operations is to see what is running. Anything slow is more likely to show up and have been running for longer. It’s not guaranteed, but it’s a good first step to see what might be slowing down an application.

To see the operations that are running, use the db.currentOp() function:

> db.currentOp()
{
    "inprog" : [
        {
            "opid" : 34820,
            "active" : true,
            "secs_running" : 0,
            "op" : "query",
            "ns" : "test.users",
            "query" : {
                "count" : "users",
                "query" : {
                    "username" : "user12345"
                },
                "fields" : {

                }
            },
            "client" : "127.0.0.1:39931",
            "desc" : "conn3",
            "threadId" : "0x7f12d61c7700",
            "connectionId" : 3,
            "locks" : {
                "^" : "r",
                "^test" : "R"
            },
            "waitingForLock" : false,
            "numYields" : 0,
            "lockStats" : {
                "timeLockedMicros" : {

                },
                "timeAcquiringMicros" : {
                    "r" : NumberLong(9),
                    "w" : NumberLong(0)
                }
            }
        },
        ...
    ]
}

This displays a list of operations that the database is performing. Here are some of the more important fields in the output:

"opid"

This is the operation’s unique identifier. You can use this number to kill an operation (see Killing Operations).

"active"

If this operation is running. If this field is false, it means the operation has yielded or is waiting for a lock.

"secs_running"

How long this operation has been running. You can use this to find queries that are taking too long or sucking up database resources.

"op"

The type of operation. This is generally a query, insert, update, or remove. Note that database commands are processed as queries.

"desc"

This can be correlated with messages in the logs. Every log message related to this connection will be prefixed with [conn3], so you can use this to grep the logs for relevant information.

"locks"

This describes the types of locks taken by this operation. "^" indicates the global lock.

"waitingForLock"

Whether this operation is currently blocking, waiting to acquire a lock.

"numYields"

The number of times this operation has yielded, releasing its lock to allow other operations to go. Generally, any operation that searches for documents (queries, updates, and removes) can yield. An operation will only yield if there are other operations enqueued and waiting to take its lock. Basically, if there are no operations in "waitingForLock" state, the current operations will not yield.

"lockstats.timeAcquiringMicros"

This shows how long it took this operation to acquire the locks it needed.

You can filter currentOp() to only look for operations fulfilling certain criteria, such as operations on a certain namespace or ones that have been running for a certain length of time. Filter the results by passing in a query argument:

> db.currentOp({"ns" : "prod.users"})

You can query on any field in currentOp, using all the normal query operators.

Finding Problematic Operations

The most common use for db.currentOp() is looking for slow operations. You can use the filtering technique described in the previous section to find all queries that take longer than a certain amount of time, which may suggest a missing index or improper field filtering.

Sometimes people will find that unexpected queries are running, generally because there’s an app server running an old or buggy version of software. The "client" field can help track down where unexpected operations are coming from.

Killing Operations

If you find an operation that you want to stop, you can kill it by passing db.killOp() its "opid":

> db.killOp(123)

Not all operations can be killed. In general, operations can only be killed when they yield, so updates, finds, and removes can all be killed. Operations holding or waiting for a lock usually cannot be killed.

Once you have sent a “kill” message to an operation, it will have a "killed" field in the db.currentOp output. However, it won’t actually be dead until it disappears from list of current opertations.

False Positives

If you look for slow operations, you may see some long-running internal operations listed. There are several long-running requests MongoDB may have running, depending on your setup. The most common are the replication thread (which will continue fetching more operations from the sync source for as long as possible) and the writeback listener for sharding. Any long-running query on local.oplog.rs can be ignored as well as any writebacklistener commands.

If you kill either of these operations, MongoDB will just restart them. However, you generally should not do that. Killing the replication thread will briefly halt replication and killing the writeback listener may cause mongos to miss legitimate write errors.

Preventing Phantom Operations

There is an odd, MongoDB-specific issue that you may run into, particularly if you’re bulk-loading data into a collection. Suppose you have a job that is firing thousands of update operations at MongoDB and MongoDB is grinding to a halt. You quickly stop the job and kill off all the updates that are currently occurring. However, you continue to see new updates appearing as soon as you kill the old ones, even though the job is no longer running!

If you are loading data using unacknowledged writes, your application will fire writes at MongoDB, potentially faster than MongoDB can process them. If MongoDB gets backed up, these writes will pile up in the operating system’s socket buffer. When you kill the writes MongoDB is working on, this allows MongoDB to start processing the writes in the buffer. Even if you stop the client sending the writes, any writes that made it into the buffer will get processed by MongoDB, since they’ve already been “received” (just not processed).

The best way to prevent these phantom writes is to do acknowledged writes: make each write wait until the previous write is complete, not just until the previous write is sitting in a buffer on the database server.

Using the System Profiler

To find slow operations you can use the system profiler, which records operations in a special system.profile collection. The profiler can give you tons of information about operations that are taking a long time, but at a cost: it slows down mongod’s overall performance. Thus, you may only want to turn on the profiler periodically to capture a slice of traffic. If your system is already heavily loaded, you may wish to use another technique described in this chapter to diagnose issues.

By default, the profiler is off and does not record anything. You can turn it on by running db.setProfilingLevel() in the shell:

> db.setProfilingLevel(2)
{ "was" : 0, "slowms" : 100, "ok" : 1 }

Level 2 means “profile everything.” Every read and write request received by the database will be recorded in the system.profile collection of the current database. Profiling is enabled per-database and incurs a heavy performance penalty: every write has to be written an extra time and every read has to take a write lock (because it must write an entry to the system.profile collection). However, it will give you an exhaustive listing of what your system is doing:

> db.foo.insert({x:1})
> db.foo.update({},{$set:{x:2}})
> db.foo.remove()
> db.system.profile.find().pretty()
{
    "ts" : ISODate("2012-11-07T18:32:35.219Z"),
    "op" : "insert",
    "ns" : "test.foo",
    "millis" : 37,
    "client" : "127.0.0.1",
    "user" : ""
}
{
    "ts" : ISODate("2012-11-07T18:32:47.334Z"),
    "op" : "update",
    "ns" : "test.foo",
    "query" : {

    },
    "updateobj" : {
        "$set" : {
            "x" : 2
        }
    },
    "nscanned" : 1,
    "fastmod" : true,
    "millis" : 3,
    "client" : "127.0.0.1",
    "user" : ""
}
{
    "ts" : ISODate("2012-11-07T18:32:50.058Z"),
    "op" : "remove",
    "ns" : "test.foo",
    "query" : {

    },
    "millis" : 0,
    "client" : "127.0.0.1",
    "user" : ""
}

You can use the "client" field to see which users are sending which operations to the database. If we were using authentication, we could see which user was doing each operation, too.

Often, you do not care about most of the operations that your database is doing, just the slow ones. For this, you can set the profiling level to 1: profile only slow operations. By default, level 1 profiles operations that take longer that 100 ms. You can also specify a second argument, which defines what “slow” means to you. This would record all operations that took longer than 500 ms:

> db.setProfilingLevel(1, 500)
{ "was" : 2, "slowms" : 100, "ok" : 1 }

To turn profiling off, set the profiling level to 0:

> db.setProfilingLevel(0)
{ "was" : 1, "slowms" : 500, "ok" : 1 }

Generally do not set slowms to a low value. Even with profiling off, slowms has an effect on mongod: it sets the threshold printing slow operation in the log. Thus, if you set slowms to 2, every operation that takes longer than 2 ms will show up in the log, even with profiling off. Thus, if you lower slowms to profile something, you might want to raise it again before turning off profiling.

You can see the current profiling level with db.getProfilingLevel(). The profiling level is not persistent: restarting the database clears the level.

There are command-line options for configuring the profiling level, --profile level and --slowms time, but bumping the profiling level is generally a temporary debugging measure, not something you want to add to your configuration long-term.

If you turn on profiling and the system.profile collection does not already exist, MongoDB creates a small capped collection for it (a few megabytes in size). If you want to run the profiler for an extended period of time, this may not be enough space for the number of operations you need to record. You can make a larger system.profile collection by turning off profiling, dropping the system.profile collection, and creating a new system.profile capped collection that is the size you desire. Then enable profiling on the database.

Calculating Sizes

In order to provision the correct amount of disk and RAM, it is useful to know how much space documents, indexes, collections, and databases are taking up. See Calculating the Working Set for information on calculating your working set.

Documents

The easiest way to get the size of a document is to use the shell’s Object.bsonsize() function. Pass in any document to get the size it would be when stored in MongoDB.

For example, you can see that storing _ids as ObjectIds is more efficient than storing them as strings:

> Object.bsonsize({_id:ObjectId()})
22
> // ""+ObjectId() converts the ObjectId to a string
> Object.bsonsize({_id:""+ObjectId()})
39

More practically, you can pass in documents directly from your collections:

> Object.bsonsize(db.users.findOne())

This shows you exactly how many bytes a document is taking up on disk. However, this does not count padding or indexes, which can often be significant factors in the size of a collection.

Collections

For seeing information about a whole collection, there is a stats function:

> db.boards.stats()
{
    "ns" : "brains.boards",
    "count" : 12,
    "size" : 32292,
    "avgObjSize" : 2691,
    "storageSize" : 270336,
    "numExtents" : 3,
    "nindexes" : 2,
    "lastExtentSize" : 212992,
    "paddingFactor" : 1.0099999999999825,
    "flags" : 1,
    "totalIndexSize" : 16352,
    "indexSizes" : {
        "_id_" : 8176,
        "username_1_slug_1" : 8176
    },
    "ok" : 1
}

stats starts with the namespace ("brains.boards") and then the count of all documents in the collection. The next couple of fields have to do with the size of the collection. "size" is what you’d get if you called Object.bsonsize() on each element in the collection and added up all the sizes: it’s the actual number of bytes the document in the collection are taking up. Equivalently, if you take the "avgObjSize" and multiply it by "count", you’ll get "size".

As mentioned above, a total count of the documents’ bytes leaves out some important space a collection uses: the padding around each document and the indexes. "storageSize" not only includes those, but also empty space that has been set aside for the collection but not yet used. Collections always have empty space at the “end” so that new documents can be added quickly.

"nindexes" is the number of indexes on the collection. An index is not counted in "nindexes" until it finishes being built and cannot be used until it appears in this list. Each index is currently one “bucket” (8 KB), since the collection is so small. In general, indexes will be a lot larger than the amount of data they store, as there is a lot of free space to optimize adding new entries. You can minimize this free space by having right-balanced indexes (as described in Introduction to Compound Indexes). Indexes that are randomly distributed will generally be approximately 50% free space, whereas ascending-order indexes will be 10% free space.

As your collections get bigger, it may become difficult to read stats() output with sizes in the billions of bytes or beyond. Thus, you can pass in a scaling factor: 1024 for kilobytes, 1024*1024 for megabytes, and so on. For example, this would get the collection stats in terabytes:

> db.big.stats(1024*1024*1024*1024)

Databases

Databases have a stats function that’s similar to collections’:

> db.stats()
{
    "db" : "brains",
    "collections" : 11,
    "objects" : 1109,
    "avgObjSize" : 299.79440937781783,
    "dataSize" : 332472,
    "storageSize" : 1654784,
    "numExtents" : 15,
    "indexes" : 11,
    "indexSize" : 114464,
    "fileSize" : 201326592,
    "nsSizeMB" : 16,
    "ok" : 1
}

First, we have the name of the database and the number of collections it contains. "objects" is the total count of documents across all collections in this database.

The bulk of the document contains information about the size of your data. "fileSize" should always be the largest: that is the total amount allocated to the database files. This number should be equivalent to adding the sizes of all of the brains.* files in your data directory.

The next largest field is generally going to be "storageSize", which is the total amount of space the database is using. This doesn’t match "fileSize" because "fileSize" includes preallocated files. For example, if you have brains.0, brains.1, and brains.2 in your data directory, brains.2 will be filled with zeros. As soon as anything is written to brains.2, brains.3 will be allocated. You should always have an empty (full of zeros) file for each database: as soon as it’s written to, the next file will be preallocated. Thus, this empty file (plus anything not being used yet in the previous files) is the difference between "fileSize" and "storageSize".

"dataSize" is the amount of space the data in this database is taking up. Note that this does not include space on the free list, but it does include documents’ padding. Thus, the difference between this and the "storageSize" should be the size of documents deleted.

db.stats() can take a scale argument the same way that the collections’ stats() function can.

If you call db.stats() on a nonexistent database, the "nsSizeMB" will be 0. This is the size of the .ns file, which is essentially a database’s table of contents. Any database that exists needs a .ns file

Keep in mind that listing databases on a system with a high lock percent can be very slow and block other operations. Avoid doing it, if possible.

Using mongotop and mongostat

MongoDB comes with a few command-line tools that can help you determine what MongoDB is doing by printing stats every few seconds.

mongotop is similar to the top UNIX utility: it gives you an overview of which collections are busiest. You can also run mongotop --locks to give you locking statistics for each database.

mongostat gives server-wide information. By default, mongostat prints out a list of statistics once per second, although this is configurable by passing a different number of seconds on the command line. Each of the fields gives a count of how many times the activity has happened since the field was last printed.

insert/query/update/delete/getmore/command

These are simple counts of how many of each of these operations there have been.

flushes

How many times mongod has flushed data to disk.

mapped

The amount of memory mongod has mapped. This is generally roughly the size of your data directory.

vsize

The amount of virtual memory mongod is using. This is generally twice the size of your data directory (once for the mapped files, once again for journaling).

res

This is the amount of memory mongod is using. This should generally be as close as possible to all the memory on the machine.

locked db

This shows the database that spent the most time locked in the last timeslice. It reports the percent of time the database was locked combined with how long the global lock was held, meaning that this value might be over 100%.

idx miss %

This is the most confusingly-named field in the output. It is how many index accesses had to page fault: the index entry (or section of index being searched) was not in memory, so mongod had to go to disk.

qr|qw

This is the queue size for reads and writes: how many reads and writes are blocking, waiting to be processed.

ar|aw

How many active clients there are: clients currently performing reads and writes.

netIn

Number of network bytes in, as counted by MongoDB (not necessarily the same as what the OS would measure).

netOut

Number of network bytes out, as counted by MongoDB.

conn

The number of connections this server has open, both incoming and outgoing.

time

The time at which these statistics were taken.

You can run mongostat on a replica set or sharded cluster. If you use the --discover option, mongostat will try to find all the members of the set or cluster from the member it initially connects to and will print one line per server per second for each. For a large cluster, this can get unmanageable fast, but it can be useful for small clusters and tools that can consume the data and present it in a more readable form.

mongostat is a great way to get a quick snapshot of what your database is doing, but for long-term monitoring a tool like MMS is preferred (see Chapter 21).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.3.208