In this chapter, we’ll explain a process by which you can create a suitable configuration file for your MySQL server. It is a roundabout trip, with many points of interest and side trips to scenic overlooks. These side trips are necessary. Determining the shortest path to a suitable configuration doesn’t start with studying configuration options and asking which ones you should set or how you should change them. Nor does it begin with examining server behavior and asking whether any configuration options can improve it. It’s best to start with an understanding of MySQL’s internals and behavior. You can then use that knowledge as a guide for how to configure MySQL. Finally, you can compare the desired configuration to the current configuration and correct any significant and worthwhile differences.
People often ask, “What’s the optimal configuration file for my server with 32 GB of RAM and 12 CPU cores?” Unfortunately, it’s not that simple. You should configure the server for the workload, data, and application requirements, not just the hardware. MySQL has scores of settings that you can change - but you shouldn’t. It’s usually better to configure the basic settings correctly (and there are only a few that matter in most cases) and spend more time on schema optimization, indexes, and query design. After you’ve set MySQL’s basic configuration options correctly, the potential gains from further changes are usually small.
On the other hand, the potential downside of fiddling with the configuration can be great. The MySQL defaults are there with good reason. Changing them without understanding the impact can lead to crashing, constant stalls, or performing slowly.
So what should you do? You should make sure the basics such as the InnoDB buffer pool and log file size are appropriate. Then you should set a few safety options if you wish to prevent undesired behavior (but note that these usually won’t improve performance - they’ll only avoid problems). Then leave the rest of the settings alone. If you begin to experience a problem, begin by diagnosing it carefully. If your problem is caused by a part of the server whose behavior can be corrected with a configuration option, then you might need to change it.
Sometimes you might also need to set specific configuration options that can have a significant performance impact in special cases. However, these should not be part of a basic server configuration file. You should set them only when you find the specific performance problems they address. That’s why we don’t suggest that you approach configuration options by looking for bad things to improve. If something needs to be improved, it should show up in query response times. It’s best to start your search with queries and their response times, not with configuration options. This could save you a lot of time and prevent many problems.
Another good way to save time and trouble is to use the defaults unless you know you shouldn’t. There is safety in numbers, and a lot of people are running with default settings. That makes them the most thoroughly tested settings. Unexpected bugs can arise when you change things needlessly.
We’ll begin by explaining MySQL’s configuration mechanisms, before covering what you should configure in MySQL. MySQL is generally pretty forgiving about its configuration, but following these suggestions might save you a lot of work and time.
The first thing to know is where MySQL gets configuration information: from command-line arguments and settings in its configuration file. On Unix-like systems, the configuration file is typically located at /etc/my.cnf or /etc/mysql/my.cnf. If you use your operating system’s startup scripts, this is typically the only place you’ll specify configuration settings. If you start MySQL manually, which you might do when you’re running a test installation, you can also specify settings on the command line. The server actually reads the contents of the configuration file, removes any comment lines and newlines, and then processes it together with the command-line options.
locked_in_memory
variable.Be sure you know where your server’s configuration file is located! We’ve seen people try unsuccessfully to configure a server with a file it doesn’t read, such as /etc/my.cnf on Debian servers, which look in /etc/mysql/my.cnf for their configuration. Sometimes there are files in several places, perhaps because a previous system administrator was confused as well. If you don’t know which files your server reads, you can ask it:
$ which mysqld /usr/sbin/mysqld $ /usr/sbin/mysqld --verbose --help | grep -A 1 'Default options' Default options are read from the following files in the given order: /etc/mysql/my.cnf ~/.my.cnf /usr/etc/my.cnf
The configuration file is in the standard INI format and is divided into sections, each of which begins with a line that contains the section name in square brackets. A MySQL program will generally read the section that has the same name as that program, and many client programs also read the client section, which gives you a place to put common settings. The server usually reads the mysqld section. Be sure you place your settings in the correct section in the file, or they will have no effect.
Configuration settings are written in all lowercase, with words separated by underscores or dashes. The following are equivalent, and you might see both forms in command lines and configuration files:
/usr/sbin/mysqld --auto-increment-offset=5 /usr/sbin/mysqld --auto_increment_offset=5
We suggest that you pick a style and use it consistently. This makes it easier to search for settings in your files.
Configuration settings can have several scopes. Some settings are server-wide (global scope); others are different for each connection (session scope), and others are per-object. Many session-scoped variables have global equivalents, which you can think of as defaults. If you change the session-scoped variable, it affects only the connection from which you changed it, and the changes are lost when the connection closes. Here are some examples of the variety of behaviors of which you should be aware:
max_connections
variable is globally scoped.sort_buffer_size
variable has a global default, but you can set it per-session as well.join_buffer_size variable
has a global default and can be set per-session, but a single query that joins several tables can allocate one join buffer per join, so there might be several join buffers per query.In addition to setting variables in the configuration files, you can also change many (but not all) of them while the server is running. MySQL refers to these as dynamic configuration variables. The following statements show different ways to change the session and global values of sort_buffer_size
dynamically:
SET sort_buffer_size = <value>; SET GLOBAL sort_buffer_size = <value>; SET @@sort_buffer_size := <value>; SET @@session.sort_buffer_size := <value>; SET @@global.sort_buffer_size := <value>;
If you set variables dynamically, be aware that those settings will be lost when MySQL shuts down. If you want to keep the settings, you’ll have to update your configuration file as well.
There is also a special value you can assign to variables with the SET
command: the keyword DEFAULT
. Assigning this value to a session-scoped variable sets that variable to the corresponding globally scoped variable’s value; assigning it to a globally scoped variable sets the variable to the compiled-in default (not the value specified in the configuration file). This is useful for resetting session-scoped variables back to the values they had when you opened the connection. We advise you not to use it for global variables because it probably won’t do what you want - that is, it doesn’t set the values back to what they were when you started the server.
If all of this variable scoping and configuration business wasn’t complicated enough, you also had to be aware that if MySQL was restarted, it would revert back to what you had in your configuration file - even if you had used SET GLOBAL
to change a global variable. This meant that you had to manage a configuration file and the runtime configuration of MySQL, and ensure they stayed in sync with each other. If you wanted to increase max_connections
for your servers, you had to issue a SET GLOBAL max_connections
command on each running instance, and then follow up with editing the configuration file to reflect your new configuration.
MySQL 8.0 introduced a new feature called persisted system variables which helps to make this a little less complicated. The new syntax SET PERSIST
now allows you to set the value once for both runtime and MySQL will write this setting out to disk enabling it to be used at the next restart.
Setting variables dynamically can have unexpected side effects, such as flushing dirty blocks from buffers. Be careful which settings you change online because this can cause the server to do a lot of work.
Sometimes you can infer a variable’s behavior from its name. For example, max_heap_table_size
does what it sounds like: it specifies the maximum size to which implicit in-memory temporary tables are allowed to grow. However, the naming conventions aren’t completely consistent, so you can’t always guess what a variable will do by looking at its name.
Let’s take a look at some commonly used variables and the effects of changing them dynamically:
table_open_cache
thread_cache_size
read_buffer_size
read_rnd_buffer_size
The official MySQL documentation explains what these variables do in detail, and this isn’t an exhaustive list. Our goal here is simply to show you what behavior to expect when you change a few common variables.
You should not raise the value of a per-connection setting globally unless you know it’s the right thing to do. Some buffers are allocated all at once, even if they’re not needed, so a large global setting can be a huge waste. Instead, you can raise the value when a query needs it.
Be careful when setting variables. More is not always better, and if you set the values too high, you can easily cause problems: you might run out of memory, or cause your server to swap.
Referring back to chapter MYSQLPERF, monitor your SLOs to ensure that your changes don’t impact the customer experience. Benchmarks aren’t enough, because they’re not real. If you don’t measure your server’s actual performance, you might hurt performance without knowing it. We’ve seen many cases where someone changed a server’s configuration and thought it improved performance, when in fact the server’s performance worsened overall because of a different workload at a different time of day or day of the week.
Ideally, you’re using a version control system to track changes to your configuration files. This strategy can be very effective at correlating a performance change or SLO breach to a specific configuration change. Just be aware that changing the configuration file doesn’t actually do anything by default - you have to change the runtime setting too.
Before you start changing your configuration, you should optimize your queries and your schema, addressing at least the obvious things such as adding indexes. If you get deep into tweaking the configuration and then change your queries or schema, you might need to reevaluate the configuration. Keep in mind that unless your hardware, workload, and data are completely static, chances are you’ll need to revisit your configuration later. And in fact, most people’s servers don’t even have a steady workload throughout the day - meaning that the “perfect” configuration for the middle of the morning is not right for midafternoon! Obviously, chasing the mythical “perfect” configuration is completely impractical. Thus, you don’t need to squeeze every last ounce of performance out of your server; in fact, the return for such an investment of time will probably be very small. We suggest that you focus on optimizing for your peak workload and then stop at “good enough,” unless you have reason to believe you’re forgoing a significant performance improvement.
Before we get started with server configuration, we want to encourage you to avoid a few common practices that we’ve found to be risky or practically not worth the effort. Warning: rants ahead!
You might be expected (or believe that you’re expected) to set up a benchmark suite and “tune” your server by changing its configuration iteratively in search of optimal settings. This usually is not something we advise most people to do. It requires so much work and research, and the potential payoff is so small in most cases, that it can be a huge waste of time. You are probably better off spending that time on other things such as checking your backups, monitoring changes in query plans, and so on.
You should not “tune by ratio.” The classic “tuning ratio” is the rule of thumb that your InnoDB buffer pool hit ratio should be higher than some percentage, and you should increase the cache size if the hit rate is too low. This is very wrong advice. Regardless of what anyone tells you, the cache hit ratio has nothing to do with whether the cache is too large or too small. To begin with, the hit ratio depends on the workload - some workloads simply aren’t cacheable no matter how big the cache is - and secondly, cache hits are meaningless, for reasons we’ll explain later. It sometimes happens that when the cache is too small, the hit rate is low, and increasing the cache size increases the hit rate. However, this is an accidental correlation and does not indicate anything about performance or proper sizing of the cache.
The problem with correlations that sometimes appear to be true is that people begin to believe they will always be true. Oracle DBAs abandoned ratio-based tuning years ago, and we wish MySQL DBAs would follow their lead.1 We wish even more fervently that people wouldn’t write “tuning scripts” that codify these dangerous practices and teach them to thousands of people. This leads to our next suggestion of what not to do: don’t use tuning scripts! There are several very popular ones that you can find on the Internet. It’s probably best to ignore them.
We also suggest that you avoid the word “tuning,” which we’ve used liberally in the past few paragraphs. We favor “configuration” or “optimization” instead (as long as that’s what you’re actually doing). The word “tuning” conjures up images of an undisciplined novice who tweaks the server and sees what happens. We suggested in the previous section that this practice is best left to those who are researching server internals. “Tuning” your server can be a stunning waste of time.
On a related topic, searching the Internet for configuration advice is not always a great idea. You can find a lot of bad advice in blogs, forums, and so on. Although many experts contribute what they know online, it is not always easy to tell who is qualified. We can’t give unbiased recommendations about where to find real experts, of course. But we can say that the credible, reputable MySQL service providers are a safer bet in general than what a simple Internet search turns up, because people who have happy customers are probably doing something right. Even their advice, however, can be dangerous to apply without testing and understanding, because it might have been directed at a situation that differed from yours in a way you don’t understand.
Finally, don’t believe the popular memory consumption formula - yes, the very one that MySQL itself prints out when it crashes. (We won’t repeat it here.) This formula is from an ancient time. It is not a reliable or even useful way to understand how much memory MySQL can use in the worst case. You might see some variations on this formula on the Internet, too. These are similarly flawed, even though they add in more factors that the original formula doesn’t have. The truth is that you can’t put an upper bound on MySQL’s memory consumption. It is not a tightly regulated database server that controls memory allocation.
As we mentioned at the beginning of this chapter, we don’t have a one-size-fits-all “best configuration file” for, say, a 4-CPU server with 16 GB of memory and 12 hard drives. You really do need to develop your own configurations, because even a good starting point will vary widely depending on how you’re using the server.
We’ve created a minimal sample configuration file for this book, which you can use as a good starting point for your own servers.2 You must choose values for a few of the settings; we’ll explain those later in this chapter. Our base file, built around MySQL 8.0, looks like this:
$ [mysqld] # GENERAL datadir = /var/lib/mysql socket = /var/lib/mysql/mysql.sock pid_file = /var/lib/mysql/mysql.pid user = mysql port = 3306 # INNODB innodb_buffer_pool_size = <value> innodb_log_file_size = <value> innodb_file_per_table = 1 innodb_flush_method = O_DIRECT # LOGGING log_error = /var/lib/mysql/mysql-error.log log_slow_queries = /var/lib/mysql/mysql-slow.log # OTHER tmp_table_size = 32M max_heap_table_size = 32M max_connections = <value> thread_cache_size = <value> table_open_cache = <value> open_files_limit = 65535 [client] socket = /var/lib/mysql/mysql.sock port = 3306
This might seem too minimal in comparison to what you’re used to seeing, but it’s actually more than many people need. There are a few other types of configuration options that you are likely to use as well, such as binary logging; we’ll cover those later in this and other chapters.
The first thing we configured is the location of the data. We chose /var/lib/mysql for this, because it’s a popular location on many Unix variants. There is nothing wrong with choosing another location; you decide. We’ve put the PID file into the same location, but many operating systems will want to place it in /var/run instead. That’s fine, too. We simply needed to have something configured for these settings. By the way, don’t let the socket and PID file be located according to the server’s compiled-in defaults; there are some bugs in various MySQL versions that can cause problems with this. It’s best to set these locations explicitly. (We’re not advising you to choose different locations; we’re just advising you to make sure the my.cnf file mentions those locations explicitly, so they won’t change and break things if you upgrade the server.)
We also specified that mysqld should run as the mysql user account on the operating system. You’ll need to make sure this account exists, and that it owns the data directory and all files within. The port is set to the default of 3306, but sometimes that is something you’ll want to change.
In MySQL 8.0, a new configuration option, innodb_dedicated_server
, was introduced. This option examines the available memory on the server and configures four additional variables (innodb_buffer_pool_size
, innodb_log_file_size
, innodb_log_files_in_group
, and innodb_flush_method
) appropriately for a dedicated database server. This simplifies calculating and changing these values. This can be especially useful in a cloud environment, where you might run a VM with 128GB of RAM, and then reboot it to scale up to 256GB RAM. MySQL here would be self-configuring and you don’t need to manage changing the values in the configuration file. This is often the best way to manage these four settings.
Most of the other settings in our sample file are pretty self-explanatory, and many of them are a matter of judgment. We’ll explore several of them in the rest of this chapter. We’ll also discuss some safety settings later in this chapter, which can be very helpful for making your server more robust and helping prevent bad data and other problems. We don’t show those settings here.
One setting to explain here is the open_files_limit
option. We’ve set this as large as possible on a typical Linux system. Open file handles are very cheap on modern operating systems. If this setting isn’t large enough, you’ll see error 24, “too many open files.”
Skipping all the way to the end, the last section in the configuration file is for client programs such as mysql and mysqladmin, and simply lets them know how to connect to the server. You should set the values for client programs to match those you chose for the server.
Sometimes you can use the output from SHOW GLOBAL STATUS as input to your configuration to help customize the settings better for your workload. For the best results, look both at absolute values and at how the values change over time, preferably with several snapshots at peak and off-peak times. You can use the following command to see incremental changes to status variables every 60 seconds:
$ mysqladmin extended-status -ri60
We will frequently refer to changes in status variables over time as we explain various configuration settings. We will usually expect you to be examining the output of a command such as the one we just showed. Other helpful tools that can provide a compact display of status counter changes are Percona Toolkit’s pt-mext
or pt-mysql-summary
.
Now that we’ve shown you the preliminaries, we’ll take you on a guided tour of some server internals, interleaved with advice on configuration. This will give you the background you’ll need to choose appropriate values for configuration options when we return to the sample configuration file later.
Using innodb_dedicated_server
will typically use 50% - 75% of your RAM. This leaves you with at least 25% for per-connection memory allocations, operating system overhead and other memory settings. We go over each of these in the following sections, and then we take a more detailed look at the various MySQL caches’ requirements.
MySQL needs a small amount of memory just to hold a connection (thread) open. It also requires a base amount of memory to execute any given query. You’ll need to set aside enough memory for MySQL to execute queries during peak load times. Otherwise, your queries will be starved for memory, and they will run poorly or fail.
It’s useful to know how much memory MySQL will consume during peak usage, but some usage patterns can unexpectedly consume a lot of memory, which makes this hard to predict. Prepared statements are one example, because you can have many of them open at once. Another example is the InnoDB data dictionary (more about this later).
You don’t need to assume a worst-case scenario when trying to predict peak memory consumption. For example, if you configure MySQL to allow a maximum of 100 connections, it theoretically might be possible to simultaneously run large queries on all 100 connections, but in reality this probably won’t happen. Queries that use many large temporary tables, or complex stored procedures, are the most likely causes of high per-connection memory consumption.
Just as with queries, you need to reserve enough memory for the operating system to do its work. This involves running any local monitoring software, configuration management tooling, scheduled jobs, etc. The best indication that the operating system has enough memory is that it’s not actively swapping (paging) virtual memory to disk.
The InnoDB buffer pool needs more memory than anything else, as it’s generally the most important variable for performance. The InnoDB buffer pool doesn’t just cache indexes: it also holds row data, the adaptive hash index, the change buffer, locks, and other internal structures. InnoDB also uses the buffer pool to help it delay writes, so it can merge many writes together and perform them sequentially. In short, InnoDB relies heavily on the buffer pool, and you should be sure to allocate enough memory to it. You can use variables from SHOW commands or tools such as innotop to monitor your InnoDB buffer pool’s memory usage.
If you don’t have much data, and you know that your data won’t grow quickly, you don’t need to over allocate memory to the buffer pool. It’s not really beneficial to make it much larger than the size of the tables and indexes that it will hold. There’s nothing wrong with planning ahead for a rapidly growing database, of course, but sometimes we see huge buffer pools with a tiny amount of data. This isn’t necessary.
Large buffer pools come with some challenges, such as long shutdown and warm-up times. If there are a lot of dirty (modified) pages in the buffer pool InnoDB can take a long time to shut down, because it writes the dirty pages to the data files upon shutdown. You can force it to shut down quickly, but then it just has to do more recovery when it restarts, so you can’t actually speed up the shutdown and restart cycle time. If you know in advance when you need to shut down, you can change the innodb_max_dirty_pages_pct
variable at runtime to a lower value, wait for the flush thread to clean up the buffer pool, and then shut down once the number of dirty pages becomes small. You can monitor the number of dirty pages by watching the Innodb_buffer_pool_pages_dirty
server status variable or using innotop to monitor SHOW INNODB STATUS. You can also use the variable innodb_fast_shutdown
to tweak how shutdown occurs.
Lowering the value of the innodb_max_dirty_pages_pct variable doesn’t actually guarantee that InnoDB will keep fewer dirty pages in the buffer pool. Instead, it controls the threshold at which InnoDB stops being “lazy.” InnoDB’s default behavior is to flush dirty pages with a background thread, merging writes together and performing them sequentially for efficiency. This behavior is called “lazy” because it lets InnoDB delay flushing dirty pages in the buffer pool, unless it needs to use the space for some other data. When the percentage of dirty pages exceeds the threshold, InnoDB will flush pages as quickly as it can to try to keep the dirty page count lower. These page cleaner operations have been greatly optimized3 from previous behavior, including being able to configure multiple threads to perform flushing.
When MySQL starts back up again, the buffer pool cache is empty - also referred to as a cold cache. All of the benefit of having rows and pages in memory is now gone. Thankfully, by default, the configuration options innodb_buffer_pool_dump_at_shutdown
and innodb_buffer_pool_load_at_startup
work together to warm the server at startup. The load at startup takes time, but it can speed up the performance of a server much faster than waiting for it to naturally populate.
The thread cache holds threads that aren’t currently associated with a connection but are ready to serve new connections. When there’s a thread in the cache and a new connection is created, MySQL removes the thread from the cache and gives it to the new connection. When the connection is closed, MySQL places the thread back into the cache, if there’s room. If there isn’t room, MySQL destroys the thread. As long as MySQL has a free thread in the cache it can respond rapidly to connection requests, because it doesn’t have to create a new thread for each connection.
The thread_cache_size
variable specifies the number of threads MySQL can keep in the cache. You probably won’t need to change this from the default value of -1, or auto-sized unless your server gets many connection requests. To check whether the thread cache is large enough, watch the Threads_created
status variable. We generally try to keep the thread cache large enough that we see fewer than 10 new threads created each second, but it’s often pretty easy to get this number lower than 1 per second.
A good approach is to watch the Threads_connected variable and try to set thread_cache_size large enough to handle the typical fluctuation in your workload. For example, if Threads_connected usually stays between 100 and 120, you can set the cache size to 20. If it stays between 500 and 700, a thread cache of 200 should be large enough. Think of it this way: at 700 connections, there are probably no threads in the cache; at 500 connections, there are 200 cached threads ready to be used if the load increases to 700 again.
Making the thread cache very large is probably not necessary for most uses, but keeping it small doesn’t save much memory, so there’s little benefit in doing so. Each thread that’s in the thread cache or sleeping typically uses around 256 KB of memory. This is not very much compared to the amount of memory a thread can use when a connection is actively processing a query. In general, you should keep your thread cache large enough that Threads_created doesn’t increase very often. If this is a very large number, however (e.g., many thousand threads), you might want to set it lower because some operating systems don’t handle very large numbers of threads well, even when most of them are sleeping.
A few configuration options affect how MySQL synchronizes data to disk and performs recovery. These can affect performance dramatically, because they involve I/O operations. They also represent a trade-off between performance and data safety. In general, it’s expensive to ensure that your data is written to disk immediately and consistently. If you’re willing to risk the danger that a disk write won’t really make it to permanent storage, you can increase concurrency and/or reduce I/O waits, but you’ll have to decide for yourself how much risk you can tolerate.
InnoDB permits you to control not only how it recovers, but also how it opens and flushes its data, which greatly affects recovery and overall performance. InnoDB’s recovery process is automatic and always runs when InnoDB starts, though you can influence what actions it takes. Leaving aside recovery and assuming nothing ever crashes or goes wrong, there’s still a lot to configure for InnoDB. It has a complex chain of buffers and files designed to increase performance and guarantee ACID properties, and each piece of the chain is configurable. Figure 4-1 illustrates these files and buffers.
A few of the most important things to change for normal usage are the InnoDB log file size, how InnoDB flushes its log buffer, and how InnoDB performs I/O.
InnoDB uses its log to reduce the cost of committing transactions. Instead of flushing the buffer pool to disk when each transaction commits, it logs the transactions. The changes transactions make to data and indexes often map to random locations in the tablespace, so flushing these changes to disk would require random I/O. InnoDB assumes it’s using conventional disks, where random I/O is much more expensive than sequential I/O because of the time it takes to seek to the correct location on disk and wait for the desired part of the disk to rotate under the head.
InnoDB uses its log to convert this random disk I/O into sequential I/O. Once the log is safely on disk, the transactions are permanent, even though the changes haven’t been written to the data files yet. If something bad happens (such as a power failure), InnoDB can replay the log and recover the committed transactions.
Of course, InnoDB does ultimately have to write the changes to the data files, because the log has a fixed size. It writes to the log in a circular fashion: when it reaches the end of the log, it wraps around to the beginning. It can’t overwrite a log record if the changes contained there haven’t been applied to the data files, because this would erase the only permanent record of the committed transaction.
InnoDB uses a background thread to flush the changes to the data files intelligently. This thread can group writes together and make the data writes sequential, for improved efficiency. In effect, the transaction log converts random data file I/O into mostly sequential log file and data file I/O. Moving flushes into the background makes queries complete more quickly and helps cushion the I/O system from spikes in the query load.
The overall log file size is controlled by innodb_log_file_size and innodb_log_files_in_group, and it’s very important for write performance. If you took our earlier advice, and used innodb_dedicated_server, these are managed for you based on how much memory your system has.
When InnoDB changes any data, it writes a record of the change into its log buffer, which it keeps in memory. InnoDB flushes the buffer to the log files on disk when the buffer gets full, when a transaction commits, or once per second - whichever comes first. Increasing the buffer size, which is 1 MB by default, can help reduce I/O if you have large transactions. The variable that controls the buffer size is called innodb_log_buffer_size.
You usually don’t need to make the buffer very large. The recommended range is 1 to 8 MB, and this usually will be enough unless you write a lot of huge BLOB records. The log entries are very compact compared to InnoDB’s normal data. They are not page-based, so they don’t waste space storing whole pages at a time. InnoDB also makes log entries as short as possible. They are sometimes even stored as the function number and parameters of a C function!
When InnoDB flushes the log buffer to the log files on disk, it locks the buffer with a mutex, flushes it up to the desired point, and then moves any remaining entries to the front of the buffer. It is possible that more than one transaction will be ready to flush its log entries when the mutex is released. InnoDB uses a group commit feature that can commit all of them to the log in a single I/O operation.
The log buffer must be flushed to durable storage to ensure that committed transactions are fully durable. If you care more about performance than durability, you can change innodb_flush_log_at_trx_commit to control where and how often the log buffer is flushed.
Possible settings are as follows:
It’s important to know the difference between writing the log buffer to the log file and flushing the log to durable storage. In most operating systems, writing the buffer to the log simply moves the data from InnoDB’s memory buffer to the operating system’s cache, which is also in memory. It doesn’t actually write the data to durable storage. Thus, settings 0 and 2 usually result in at most one second of lost data if there’s a crash or a power outage, because the data might exist only in the operating system’s cache. We say “usually” because InnoDB tries to flush the log file to disk about once per second no matter what, but it is possible to lose more than a second of transactions in some cases, such as when a flush gets stalled.
Sometimes the hard disk controller or operating system fakes a flush by putting the data into yet another cache, such as the hard disk’s own cache. This is faster but very dangerous, because the data might still be lost if the drive loses power. This is even worse than setting innodb_flush_log_at_trx_commit
to something other than 1, because it can cause data corruption, not just lost transactions.
Setting innodb_flush_log_at_trx_commit
to anything other than 1 can cause you to lose transactions. However, you might find the other settings useful if you don’t care about durability (the D in ACID). Maybe you just want some of InnoDB’s other features, such as clustered indexes, resistance to data corruption, and row-level locking.
The best configuration for high-performance transactional needs is to leave innodb_flush_log_at_trx_commit
set to 1 and place the log files on a RAID volume with a battery-backed write cache and solid state drives (SSDs). This is both safe and very fast. In fact, we dare say that any production database server that’s expected to handle a serious workload needs to have this kind of hardware.
The innodb_flush_method option lets you configure how InnoDB actually interacts with the filesystem. Despite its name, it can affect how InnoDB reads data, not just how it writes it.
This is a slightly confusing option, because it affects both the log files and the data files, and it sometimes does different things to each kind of file. It would be nice to have one configuration option for the logs and another for the data files, but they’re combined.
If you use a Unix-like operating system and your RAID controller has a battery-backed write cache, we recommend that you use O_DIRECT. If not, either the default or O_DIRECT will probably be the best choice, depending on your application. If you opted to use innodb_dedicated_server
, as we mentioned earlier, this option is set automatically for you.
InnoDB keeps its data in a tablespace, which is essentially a virtual filesystem spanning one or many files on disk. InnoDB uses the tablespace for many purposes, not just for storing tables and indexes. It keeps its undo log (old row versions), change buffer, doublewrite buffer, and other internal structures in the tablespace.
You specify the tablespace files with the innodb_data_file_path configuration option. The files are all contained in the directory given by innodb_data_home_dir. Here’s an example:
innodb_data_home_dir = /var/lib/mysql/ innodb_data_file_path = ibdata1:1G;ibdata2:1G;ibdata3:1G
That creates a 3 GB tablespace in three files. Sometimes people wonder whether they can use multiple files to spread load across drives, like this:
innodb_data_file_path = /disk1/ibdata1:1G;/disk2/ibdata2:1G;...
While that does indeed place the files in different directories, which represent different drives in this example, InnoDB concatenates the files end-to-end. Thus, you usually don’t gain much this way. InnoDB will fill the first file, then the second when the first is full, and so on; the load isn’t really spread in the fashion you need for higher performance. A RAID controller is a smarter way to spread load.
To allow the tablespace to grow if it runs out of space, you can make the last file autoextend as follows:
...ibdata3:1G:autoextend
The default behavior is to create a single 10 MB autoextending file. If you make the file autoextend, it’s a good idea to place an upper limit on the tablespace’s size to keep it from growing very large, because once it grows, it doesn’t shrink. For example, the following example limits the autoextending file to 2 GB:
...ibdata3:1G:autoextend:max:2G
Managing a single tablespace can be a hassle, especially if it autoextends and you want to reclaim the space (for this reason, we recommend disabling the autoextend feature, or at least setting a reasonable cap on the space). The only way to reclaim space is to dump your data, shut down MySQL, delete all the files, change the configuration, restart, let InnoDB create new empty files, and restore your data. InnoDB is completely unforgiving about its tablespace - you cannot simply remove files or change their sizes. It will refuse to start if you corrupt its tablespace. It is likewise very strict about its log files. If you’re used to casually moving files around as you could do with MyISAM, take heed!
The innodb_file_per_table
option lets you configure InnoDB to use one file per table. It stores the data in the database directory as tablename.ibd files. This makes it easier to reclaim space when you drop a table, and it can be useful for spreading tables across multiple disks. However, placing the data in multiple files can actually result in more wasted space overall, because it trades internal fragmentation in the single InnoDB tablespace for wasted space in the .ibd files. This is more of an issue for very small tables, because InnoDB’s page size is 16 KB. Even if your table has only 1 KB of data, it will still require at least 16 KB on disk.
Even if you enable the innodb_file_per_table option
, you’ll still need the main tablespace for the undo logs and other system data. It will be smaller if you’re not storing all the data in it, but it’s still a good idea to disable autoextend, because you can’t shrink the file without reloading all your data.
Some people like to use innodb_file_per_table
just because of the extra manageability and visibility it gives you. For example, it’s much faster to find a table’s size by examining a single file than it is to use SHOW TABLE STATUS, which has to perform more complex work to determine how many pages are allocated to a table.
There is a dark side to innodb_file_per_table
: slow DROP TABLE performance. This can be severe enough to cause a noticeable server-wide stall, for two reasons:
innodb_buffer_pool_instances
.What’s the final recommendation? We suggest that you use innodb_file_per_table
and cap the size of your shared tablespace to make your life easier. If you run into any circumstances that make this painful, as noted above, consider one of the fixes we suggested.
InnoDB’s tablespace can grow very large in a write-heavy environment. If transactions stay open for a long time (even if they’re not doing any work) and they’re using the default REPEATABLE READ transaction isolation level, InnoDB won’t be able to remove old row versions, because the uncommitted transactions will still need to be able to see them. InnoDB stores the old versions in the tablespace, so it continues to grow as more data is updated. The purge process is multi-threaded but may need to be tuned for workloads if you experience problems with purge lag (innodb_purge_threads
and innodb_purge_batch_size
).
The output of SHOW INNODB STATUS can help you pinpoint the problem. Look at the history list length in the TRANSACTIONS section; it shows the size of the undo log.
------------ TRANSACTIONS ------------ Trx id counter 1081043769321 Purge done for trx's n:o < 1081041974531 undo n:o < 0 state: running but idle History list length 697068
If you have a large undo log and your tablespace is growing because of it, you can force MySQL to slow down enough for InnoDB’s purge thread to keep up. This might not sound attractive, but there’s no alternative. Otherwise, InnoDB will keep writing data and filling up your disk until the disk runs out of space or the tablespace reaches the limits you’ve defined.
To throttle the writes, set the innodb_max_purge_lag
variable to a value other than 0. This value indicates the maximum number of transactions that can be waiting to be purged before InnoDB starts to delay further queries that update data. You’ll have to know your workload to decide on a good value. As an example, if your average transaction affects 1 KB of rows and you can tolerate 100 MB of unpurged rows in your tablespace, you could set the value to 100,000.
Bear in mind that unpurged row versions impact all queries, because they effectively make your tables and indexes larger. If the purge thread simply can’t keep up, performance can decrease dramatically. Setting the innodb_max_purge_lag
variable will slow down performance too, but it’s the lesser of the two evils.4
The sync_binlog
option controls how MySQL flushes the binary log to disk. Its default value is 1, which means MySQL will perform flushing and keep binary logs durable and safe. This is the recommended setting and we caution against you for setting this to any other value.
If you don’t keep sync_binlog
set at 1, it’s likely that a crash will cause your binary log to be out of sync with your transactional data. This can easily break replication and make recovery impossible, especially if using global transaction IDs (more on this in REPLICATION). The safety provided by leaving this at 1 far outweighs the IO performance penalty that is incurred.
We covered RAID in more depth in BPHARDWARE, but it’s worth repeating here that good-quality RAID controllers, with battery-backed write caches set to use the write-back policy, can handle thousands of writes per second and still give you durable storage. The data gets written to a fast cache with a battery, so it will survive even if the system loses power. When the power comes back, the RAID controller will write the data from the cache to the disk before making the disk available for use. Thus, a good RAID controller with a large enough battery-backed write cache can improve performance dramatically and is a very good investment. Of course, solid-state storage is also the recommended solution at this point and that also dramatically improves IO performance.
When you’re running MySQL in a high-concurrency workload, you might run into bottlenecks you wouldn’t otherwise experience. This section explains how to detect these problems when they happen, and how to get the best performance possible under these workloads.
If you have problems with InnoDB concurrency and you are not running at least MySQL 5.7, the solution is usually to upgrade the server. Older versions still held a lot of high concurrency scalability challenges. Everything queued on global mutexes such as the buffer pool mutex, and the server practically ground to a halt. If you upgrade to one of the newer versions of MySQL, you don’t need to limit concurrency in most cases.
If you find yourself hitting this bottleneck, your best option is to shard your data. If sharding is not a viable path forward, you may need to limit concurrency. InnoDB has its own “thread scheduler” that controls how threads enter its kernel to access data, and what they can do once they’re inside the kernel. The most basic way to limit concurrency is with the innodb_thread_concurrency
variable, which limits how many threads can be in the kernel at once. A value of 0 means there is no limit on the number of threads. If you are having InnoDB concurrency problems in older MySQL versions, this variable is the most important one to configure.
MySQL’s online documentation5 provides the best guide or configuration here. You will have to experiment to find the best value for your system but our recommendation is to start with setting innodb_thread_concurrency
to the same number of CPU cores you have available and then begin tuning up or down as needed.
If more than the allowed number of threads are already in the kernel, a thread can’t enter the kernel. InnoDB uses a two-phase process to try to let threads enter as efficiently as possible. The two-phase policy reduces the overhead of context switches caused by the operating system scheduler. The thread first sleeps for innodb_thread_sleep_delay
microseconds, and then tries again. If it still can’t enter, it goes into a queue of waiting threads and yields to the operating system.
The default sleep time in the first phase is 10,000 microseconds. Changing this value can help in high-concurrency environments, when the CPU is underused with a lot of threads in the “sleeping before entering queue” status. The default value can also be much too large if you have a lot of small queries, because it adds to query latency.
Once a thread is inside the kernel, it has a certain number of “tickets” that let it back into the kernel for “free,” without any concurrency checks. This limits how much work it can do before it has to get back in line with other waiting threads. The innodb_concurrency_tickets
option controls the number of tickets. It rarely needs to be changed unless you have a lot of extremely long-running queries. Tickets are granted per-query, not per-transaction. Once a query finishes, its unused tickets are discarded.
In addition to the bottlenecks in the buffer pool and other structures, there’s another concurrency bottleneck at the commit stage, which is largely I/O-bound because of flush operations. The innodb_commit_concurrency
variable governs how many threads can commit at the same time. Configuring this option might help if there’s a lot of thread thrashing even when innodb_thread_concurrency
is set to a low value.
After your basic configuration settings are in place, you might wish to enable a number of settings that make the server safer and more reliable. Some of them influence performance, because safety and reliability are often more costly to guarantee. Some are just sensible, however: they prevent silly mistakes such as inserting nonsensical data into the server. And some don’t make a difference in day-to-day operation, but prevent bad things from happening in edge cases.
Let’s look at a collection of useful options for general server behavior first:
expire_logs_days
max_connect_errors
max_connections
max_connections
high enough to accommodate the usual load that you think you’ll experience, as well as a safety margin to permit logging in and administering the server. For example, if you think you’ll have 300 or so connections in normal operations, you might set this to 500 or so. If you don’t know how many connections you’ll get, 500 is not an unreasonable starting point anyway. The default is 151, but that’s not enough for a lot of applications.wait_timeout
. Max_used_connections
status variable over time. It is a high-water mark that shows you if the server has had a spike in connections at some point. If it reaches max_connections
, chances are a client has been denied at least once.skip_name_resolve
sql_mode
sysdate_is_now
read_only
and super_read_only
There is a more restrictive read-only option, super_read_only, which prevents even users with the SUPER privilege from being able to write data. With this enabled, the only thing that can write changes to your database is replication. We also strongly recommend enabling super_read_only. It will prevent you from accidentally using an administrator account to write data to your replica, putting it out of sync.
Some of these InnoDB options are quite important for server performance, and there are also a couple of safety options:
innodb_autoinc_lock_mode
innodb_buffer_pool_instances
innodb_io_capacity
innodb_read_io_threads
and innodb_write_io_threads
innodb_strict_mode
innodb_old_blocks_time
After you’ve worked through this chapter, you should have a server configuration that is much better than the defaults. Your server should be fast and stable, and you should not need to tweak the configuration unless you run into an unusual circumstance.
To review, we suggest that you begin with our sample configuration file, set the basic options for your server and workload and add safety options as desired. That’s really all you need to do.
If you’re running a dedicated database server, then the best option you can set is innodb_dedicated_server which handles 90% of your performance configuration. If you are unable to use this option, then the most important options are these two:
innodb_buffer_pool_size
innodb_log_file_size
Congratulations - you just solved the vast majority of real-world configuration problems we’ve seen!
We’ve also made a lot of suggestions about what not to do. The most important of these are not to “tune” your server; not to use ratios, formulas, or “tuning scripts” as a basis for setting the configuration variables.
1 If you are not convinced that “tuning by ratio” is bad, please read Optimizing Oracle Performance by Cary Millsap (O’Reilly). He even devotes an appendix to the topic, with a tool that can artificially generate any cache hit ratio you wish, no matter how badly your system is performing! Of course, it’s all for the purpose of illustrating how useless the ratio is.
2 Please note that newer versions of MySQL remove, deprecate, and change some options; check the docs for details.
3 https://dev.mysql.com/doc/refman/8.0/en/innodb-buffer-pool-flushing.html
4 Note that the way this ought to be implemented is a topic of some debate; see MySQL bug 60776 for the details.
5 https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_thread_concurrency
6 Follow-up reading:
https://www.percona.com/blog/2019/12/18/give-love-to-your-ssds-reduce-innodb_io_capacity_max/
https://www.percona.com/blog/2020/01/22/innodb-flushing-in-action-for-percona-server-for-mysql/
https://www.percona.com/blog/2020/05/14/tuning-mysql-innodb-flushing-for-a-write-intensive-workload/
18.232.169.110