Once you have an application up and running, how do you know what it’s doing? This chapter covers how to figure out what kind of queries MongoDB is running, how much data is being written, and how to investigate what MongoDB is actually doing. You’ll learn:
How to find slow operations and kill them
Getting and interpreting statistics about your collections and databases
Using command-line tools to give you a picture of what MongoDB is doing
An easy way to find slow operations is to see what is running. Anything slow is more likely to show up and have been running for longer. It’s not guaranteed, but it’s a good first step to see what might be slowing down an application.
To see the operations that are running, use the db.currentOp()
function:
>
db
.
currentOp
()
{
"inprog"
:
[
{
"opid"
:
34820
,
"active"
:
true
,
"secs_running"
:
0
,
"op"
:
"query"
,
"ns"
:
"test.users"
,
"query"
:
{
"count"
:
"users"
,
"query"
:
{
"username"
:
"user12345"
},
"fields"
:
{
}
},
"client"
:
"127.0.0.1:39931"
,
"desc"
:
"conn3"
,
"threadId"
:
"0x7f12d61c7700"
,
"connectionId"
:
3
,
"locks"
:
{
"^"
:
"r"
,
"^test"
:
"R"
},
"waitingForLock"
:
false
,
"numYields"
:
0
,
"lockStats"
:
{
"timeLockedMicros"
:
{
},
"timeAcquiringMicros"
:
{
"r"
:
NumberLong
(
9
),
"w"
:
NumberLong
(
0
)
}
}
},
...
]
}
This displays a list of operations that the database is performing. Here are some of the more important fields in the output:
"opid"
This is the operation’s unique identifier. You can use this number to kill an operation (see Killing Operations).
"active"
If this operation is running. If this field is false
, it means the operation has yielded
or is waiting for a lock.
"secs_running"
How long this operation has been running. You can use this to find queries that are taking too long or sucking up database resources.
"op"
The type of operation. This is generally a query, insert, update, or remove. Note that database commands are processed as queries.
"desc"
This can be correlated with messages in the logs. Every log
message related to this connection will be prefixed with [conn3]
, so you can use this to grep the
logs for relevant information.
"locks"
This describes the types of locks taken by this operation.
"^
" indicates the global
lock.
"waitingForLock"
Whether this operation is currently blocking, waiting to acquire a lock.
"numYields"
The number of times this operation has
yielded, releasing its lock to allow other
operations to go. Generally, any operation that searches for
documents (queries, updates, and removes) can yield. An operation
will only yield if there are other operations enqueued and waiting
to take its lock. Basically, if there are no operations in "waitingForLock"
state, the current
operations will not yield.
"lockstats.timeAcquiringMicros"
This shows how long it took this operation to acquire the locks it needed.
You can filter currentOp()
to
only look for operations fulfilling certain criteria, such as operations
on a certain namespace or ones that have been running for a certain length
of time. Filter the results by passing in a query argument:
>
db
.
currentOp
({
"ns"
:
"prod.users"
})
You can query on any field in currentOp
, using all the normal query
operators.
The most common use for db.currentOp()
is looking for slow
operations. You can use the filtering technique described in the
previous section to find all queries that take longer than a certain
amount of time, which may suggest a missing index or improper field
filtering.
Sometimes people will find that unexpected queries are running,
generally because there’s an app server running an old or buggy version
of software. The "client"
field can
help track down where unexpected operations are coming from.
If you find an operation that you want to stop, you can kill it by
passing db.killOp()
its "opid"
:
>
db
.
killOp
(
123
)
Not all operations can be killed. In general, operations can only be killed when they yield, so updates, finds, and removes can all be killed. Operations holding or waiting for a lock usually cannot be killed.
Once you have sent a “kill” message to an operation, it will have
a "killed"
field in the db.currentOp
output. However, it won’t
actually be dead until it disappears from list of current
opertations.
If you look for slow operations, you may see some long-running internal operations listed. There are several long-running requests MongoDB may have running, depending on your setup. The most common are the replication thread (which will continue fetching more operations from the sync source for as long as possible) and the writeback listener for sharding. Any long-running query on local.oplog.rs can be ignored as well as any writebacklistener commands.
If you kill either of these operations, MongoDB will just restart them. However, you generally should not do that. Killing the replication thread will briefly halt replication and killing the writeback listener may cause mongos to miss legitimate write errors.
There is an odd, MongoDB-specific issue that you may run into, particularly if you’re bulk-loading data into a collection. Suppose you have a job that is firing thousands of update operations at MongoDB and MongoDB is grinding to a halt. You quickly stop the job and kill off all the updates that are currently occurring. However, you continue to see new updates appearing as soon as you kill the old ones, even though the job is no longer running!
If you are loading data using unacknowledged writes, your application will fire writes at MongoDB, potentially faster than MongoDB can process them. If MongoDB gets backed up, these writes will pile up in the operating system’s socket buffer. When you kill the writes MongoDB is working on, this allows MongoDB to start processing the writes in the buffer. Even if you stop the client sending the writes, any writes that made it into the buffer will get processed by MongoDB, since they’ve already been “received” (just not processed).
The best way to prevent these phantom writes is to do acknowledged writes: make each write wait until the previous write is complete, not just until the previous write is sitting in a buffer on the database server.
To find slow operations you can use the system profiler, which records operations in a special system.profile collection. The profiler can give you tons of information about operations that are taking a long time, but at a cost: it slows down mongod’s overall performance. Thus, you may only want to turn on the profiler periodically to capture a slice of traffic. If your system is already heavily loaded, you may wish to use another technique described in this chapter to diagnose issues.
By default, the profiler is off and does not record anything. You
can turn it on by running db.setProfilingLevel()
in the shell:
>
db
.
setProfilingLevel
(
2
)
{
"was"
:
0
,
"slowms"
:
100
,
"ok"
:
1
}
Level 2 means “profile everything.” Every read and write request received by the database will be recorded in the system.profile collection of the current database. Profiling is enabled per-database and incurs a heavy performance penalty: every write has to be written an extra time and every read has to take a write lock (because it must write an entry to the system.profile collection). However, it will give you an exhaustive listing of what your system is doing:
>
db
.
foo
.
insert
({
x
:
1
})
>
db
.
foo
.
update
({},{
$set
:
{
x
:
2
}})
>
db
.
foo
.
remove
()
>
db
.
system
.
profile
.
find
().
pretty
()
{
"ts"
:
ISODate
(
"2012-11-07T18:32:35.219Z"
),
"op"
:
"insert"
,
"ns"
:
"test.foo"
,
"millis"
:
37
,
"client"
:
"127.0.0.1"
,
"user"
:
""
}
{
"ts"
:
ISODate
(
"2012-11-07T18:32:47.334Z"
),
"op"
:
"update"
,
"ns"
:
"test.foo"
,
"query"
:
{
},
"updateobj"
:
{
"$set"
:
{
"x"
:
2
}
},
"nscanned"
:
1
,
"fastmod"
:
true
,
"millis"
:
3
,
"client"
:
"127.0.0.1"
,
"user"
:
""
}
{
"ts"
:
ISODate
(
"2012-11-07T18:32:50.058Z"
),
"op"
:
"remove"
,
"ns"
:
"test.foo"
,
"query"
:
{
},
"millis"
:
0
,
"client"
:
"127.0.0.1"
,
"user"
:
""
}
You can use the "client"
field to
see which users are sending which operations to the database. If we were
using authentication, we could see which user was doing each operation,
too.
Often, you do not care about most of the operations that your database is doing, just the slow ones. For this, you can set the profiling level to 1: profile only slow operations. By default, level 1 profiles operations that take longer that 100 ms. You can also specify a second argument, which defines what “slow” means to you. This would record all operations that took longer than 500 ms:
>
db
.
setProfilingLevel
(
1
,
500
)
{
"was"
:
2
,
"slowms"
:
100
,
"ok"
:
1
}
To turn profiling off, set the profiling level to 0:
>
db
.
setProfilingLevel
(
0
)
{
"was"
:
1
,
"slowms"
:
500
,
"ok"
:
1
}
Generally do not set slowms
to a low value. Even
with profiling off, slowms
has an effect on mongod: it sets the threshold printing slow
operation in the log. Thus, if you set slowms
to 2, every
operation that takes longer than 2 ms will show up in the log, even with
profiling off. Thus, if you lower slowms
to profile
something, you might want to raise it again before turning off
profiling.
You can see the current profiling level with db.getProfilingLevel()
. The profiling level is
not persistent: restarting the database clears the level.
There are command-line options for configuring the profiling level,
--profile
and
level
--slowms
, but bumping the
profiling level is generally a temporary debugging measure, not something
you want to add to your configuration long-term.time
If you turn on profiling and the system.profile collection does not already exist, MongoDB creates a small capped collection for it (a few megabytes in size). If you want to run the profiler for an extended period of time, this may not be enough space for the number of operations you need to record. You can make a larger system.profile collection by turning off profiling, dropping the system.profile collection, and creating a new system.profile capped collection that is the size you desire. Then enable profiling on the database.
In order to provision the correct amount of disk and RAM, it is useful to know how much space documents, indexes, collections, and databases are taking up. See Calculating the Working Set for information on calculating your working set.
The easiest way to get the size of a document is to use the
shell’s Object.bsonsize()
function.
Pass in any document to get the size it would be when stored in
MongoDB.
For example, you can see that storing _id
s as ObjectId
s is
more efficient than storing them as strings:
>
Object
.
bsonsize
({
_id
:
ObjectId
()})
22
>
// ""+ObjectId() converts the ObjectId to a string
>
Object
.
bsonsize
({
_id
:
""
+
ObjectId
()})
39
More practically, you can pass in documents directly from your collections:
>
Object
.
bsonsize
(
db
.
users
.
findOne
())
This shows you exactly how many bytes a document is taking up on disk. However, this does not count padding or indexes, which can often be significant factors in the size of a collection.
For seeing information about a whole collection, there is a
stats
function:
>
db
.
boards
.
stats
()
{
"ns"
:
"brains.boards"
,
"count"
:
12
,
"size"
:
32292
,
"avgObjSize"
:
2691
,
"storageSize"
:
270336
,
"numExtents"
:
3
,
"nindexes"
:
2
,
"lastExtentSize"
:
212992
,
"paddingFactor"
:
1.0099999999999825
,
"flags"
:
1
,
"totalIndexSize"
:
16352
,
"indexSizes"
:
{
"_id_"
:
8176
,
"username_1_slug_1"
:
8176
},
"ok"
:
1
}
stats
starts with the
namespace ("brains.boards"
) and then
the count of all documents in the collection. The next couple of fields
have to do with the size of the collection. "size"
is what you’d get if you called
Object.bsonsize()
on each element
in the collection and added up all the sizes: it’s the actual number of
bytes the document in the collection are taking up. Equivalently, if you
take the "avgObjSize"
and multiply it
by "count"
, you’ll get "size"
.
As mentioned above, a total count of the documents’ bytes leaves
out some important space a collection uses: the padding around each
document and the indexes. "storageSize"
not only includes those, but
also empty space that has been set aside for the collection but not yet
used. Collections always have empty space at the “end” so that new
documents can be added quickly.
"nindexes"
is the number of
indexes on the collection. An index is not counted in "nindexes"
until it finishes being built and
cannot be used until it appears in this list. Each index is currently
one “bucket” (8 KB), since the collection is so small. In general,
indexes will be a lot larger than the amount of data they store, as
there is a lot of free space to optimize adding new entries. You can
minimize this free space by having right-balanced indexes (as described
in Introduction to Compound Indexes). Indexes that are randomly
distributed will generally be approximately 50% free space, whereas
ascending-order indexes will be 10% free space.
As your collections get bigger, it may become difficult to read
stats()
output with sizes in the
billions of bytes or beyond. Thus, you can pass in a scaling factor:
1024 for kilobytes, 1024*1024 for megabytes, and so on. For example,
this would get the collection stats in terabytes:
>
db
.
big
.
stats
(
1024
*
1024
*
1024
*
1024
)
Databases have a stats
function that’s similar to collections’:
>
db
.
stats
()
{
"db"
:
"brains"
,
"collections"
:
11
,
"objects"
:
1109
,
"avgObjSize"
:
299.79440937781783
,
"dataSize"
:
332472
,
"storageSize"
:
1654784
,
"numExtents"
:
15
,
"indexes"
:
11
,
"indexSize"
:
114464
,
"fileSize"
:
201326592
,
"nsSizeMB"
:
16
,
"ok"
:
1
}
First, we have the name of the database and the number of
collections it contains. "objects"
is
the total count of documents across all collections in this
database.
The bulk of the document contains information about the size of
your data. "fileSize"
should always
be the largest: that is the total amount allocated to the database
files. This number should be equivalent to adding the sizes of all of
the brains.* files in your data
directory.
The next largest field is generally going to be "storageSize"
, which is the total amount of
space the database is using. This doesn’t match "fileSize"
because "fileSize"
includes preallocated files. For
example, if you have brains.0,
brains.1, and brains.2 in your data directory, brains.2 will be filled with zeros. As soon
as anything is written to brains.2,
brains.3 will be allocated. You
should always have an empty (full of zeros) file for each database: as
soon as it’s written to, the next file will be preallocated. Thus, this
empty file (plus anything not being used yet in the previous files) is
the difference between "fileSize"
and
"storageSize"
.
"dataSize"
is the amount of
space the data in this database is taking up. Note that this does not
include space on the free list, but it does include documents’ padding.
Thus, the difference between this and the "storageSize"
should be the size of documents
deleted.
db.stats()
can take a scale
argument the same way that the collections’ stats()
function can.
If you call db.stats()
on a
nonexistent database, the "nsSizeMB"
will be 0. This is the size of the .ns file, which is essentially a database’s
table of contents. Any database that exists needs a .ns file
Keep in mind that listing databases on a system with a high lock percent can be very slow and block other operations. Avoid doing it, if possible.
MongoDB comes with a few command-line tools that can help you determine what MongoDB is doing by printing stats every few seconds.
mongotop is similar to the top UNIX utility: it gives you an overview of which collections are busiest. You can also run mongotop --locks to give you locking statistics for each database.
mongostat gives server-wide information. By default, mongostat prints out a list of statistics once per second, although this is configurable by passing a different number of seconds on the command line. Each of the fields gives a count of how many times the activity has happened since the field was last printed.
These are simple counts of how many of each of these operations there have been.
How many times mongod has flushed data to disk.
The amount of memory mongod has mapped. This is generally roughly the size of your data directory.
The amount of virtual memory mongod is using. This is generally twice the size of your data directory (once for the mapped files, once again for journaling).
This is the amount of memory mongod is using. This should generally be as close as possible to all the memory on the machine.
This shows the database that spent the most time locked in the last timeslice. It reports the percent of time the database was locked combined with how long the global lock was held, meaning that this value might be over 100%.
This is the most confusingly-named field in the output. It is how many index accesses had to page fault: the index entry (or section of index being searched) was not in memory, so mongod had to go to disk.
This is the queue size for reads and writes: how many reads and writes are blocking, waiting to be processed.
How many active clients there are: clients currently performing reads and writes.
Number of network bytes in, as counted by MongoDB (not necessarily the same as what the OS would measure).
Number of network bytes out, as counted by MongoDB.
The number of connections this server has open, both incoming and outgoing.
The time at which these statistics were taken.
You can run mongostat on a
replica set or sharded cluster. If you use the --discover
option, mongostat will try to find
all the members of the set or cluster from the member it initially
connects to and will print one line per server per second for each. For a
large cluster, this can get unmanageable fast, but it can be useful for
small clusters and tools that can consume the data and present it in a
more readable form.
mongostat is a great way to get a quick snapshot of what your database is doing, but for long-term monitoring a tool like MMS is preferred (see Chapter 21).
3.143.3.208