Chapter 4. Optimizing Server Settings

In this chapter, we’ll explain a process by which you can create a suitable configuration file for your MySQL server. It is a roundabout trip, with many points of interest and side trips to scenic overlooks. These side trips are necessary. Determining the shortest path to a suitable configuration doesn’t start with studying configuration options and asking which ones you should set or how you should change them. Nor does it begin with examining server behavior and asking whether any configuration options can improve it. It’s best to start with an understanding of MySQL’s internals and behavior. You can then use that knowledge as a guide for how to configure MySQL. Finally, you can compare the desired configuration to the current configuration and correct any significant and worthwhile differences.

People often ask, “What’s the optimal configuration file for my server with 32 GB of RAM and 12 CPU cores?” Unfortunately, it’s not that simple. You should configure the server for the workload, data, and application requirements, not just the hardware. MySQL has scores of settings that you can change - but you shouldn’t. It’s usually better to configure the basic settings correctly (and there are only a few that matter in most cases) and spend more time on schema optimization, indexes, and query design. After you’ve set MySQL’s basic configuration options correctly, the potential gains from further changes are usually small.

On the other hand, the potential downside of fiddling with the configuration can be great. The MySQL defaults are there with good reason. Changing them without understanding the impact can lead to crashing, constant stalls, or performing slowly.

So what should you do? You should make sure the basics such as the InnoDB buffer pool and log file size are appropriate. Then you should set a few safety options if you wish to prevent undesired behavior (but note that these usually won’t improve performance - they’ll only avoid problems). Then leave the rest of the settings alone. If you begin to experience a problem, begin by diagnosing it carefully. If your problem is caused by a part of the server whose behavior can be corrected with a configuration option, then you might need to change it.

Sometimes you might also need to set specific configuration options that can have a significant performance impact in special cases. However, these should not be part of a basic server configuration file. You should set them only when you find the specific performance problems they address. That’s why we don’t suggest that you approach configuration options by looking for bad things to improve. If something needs to be improved, it should show up in query response times. It’s best to start your search with queries and their response times, not with configuration options. This could save you a lot of time and prevent many problems.

Another good way to save time and trouble is to use the defaults unless you know you shouldn’t. There is safety in numbers, and a lot of people are running with default settings. That makes them the most thoroughly tested settings. Unexpected bugs can arise when you change things needlessly.

How MySQL’s Configuration Works

We’ll begin by explaining MySQL’s configuration mechanisms, before covering what you should configure in MySQL. MySQL is generally pretty forgiving about its configuration, but following these suggestions might save you a lot of work and time.

The first thing to know is where MySQL gets configuration information: from command-line arguments and settings in its configuration file. On Unix-like systems, the configuration file is typically located at /etc/my.cnf or /etc/mysql/my.cnf. If you use your operating system’s startup scripts, this is typically the only place you’ll specify configuration settings. If you start MySQL manually, which you might do when you’re running a test installation, you can also specify settings on the command line. The server actually reads the contents of the configuration file, removes any comment lines and newlines, and then processes it together with the command-line options.

Note
A note on terminology: because many of MySQL’s command-line options correspond to server variables, we sometimes use the terms option and variable interchangeably. Most variables have the same names as their corresponding command-line options, but there are a few exceptions. For example, --memlock sets the locked_in_memory variable.
Tip
Any settings you decide to use permanently should go into the global configuration file, instead of being specified at the command line. Otherwise, you risk accidentally starting the server without them. It’s also a good idea to keep all of your configuration files in a single place so that you can inspect them easily.

Be sure you know where your server’s configuration file is located! We’ve seen people try unsuccessfully to configure a server with a file it doesn’t read, such as /etc/my.cnf on Debian servers, which look in /etc/mysql/my.cnf for their configuration. Sometimes there are files in several places, perhaps because a previous system administrator was confused as well. If you don’t know which files your server reads, you can ask it:

$ which mysqld
/usr/sbin/mysqld
$ /usr/sbin/mysqld --verbose --help | grep -A 1 'Default options'
Default options are read from the following files in the given order:
/etc/mysql/my.cnf ~/.my.cnf /usr/etc/my.cnf

The configuration file is in the standard INI format and is divided into sections, each of which begins with a line that contains the section name in square brackets. A MySQL program will generally read the section that has the same name as that program, and many client programs also read the client section, which gives you a place to put common settings. The server usually reads the mysqld section. Be sure you place your settings in the correct section in the file, or they will have no effect.

Syntax, Scope, and Dynamism

Configuration settings are written in all lowercase, with words separated by underscores or dashes. The following are equivalent, and you might see both forms in command lines and configuration files:

/usr/sbin/mysqld --auto-increment-offset=5
/usr/sbin/mysqld --auto_increment_offset=5

We suggest that you pick a style and use it consistently. This makes it easier to search for settings in your files.

Configuration settings can have several scopes. Some settings are server-wide (global scope); others are different for each connection (session scope), and others are per-object. Many session-scoped variables have global equivalents, which you can think of as defaults. If you change the session-scoped variable, it affects only the connection from which you changed it, and the changes are lost when the connection closes. Here are some examples of the variety of behaviors of which you should be aware:

  • The max_connections variable is globally scoped.
  • The sort_buffer_size variable has a global default, but you can set it per-session as well.
  • The join_buffer_size variable has a global default and can be set per-session, but a single query that joins several tables can allocate one join buffer per join, so there might be several join buffers per query.

In addition to setting variables in the configuration files, you can also change many (but not all) of them while the server is running. MySQL refers to these as dynamic configuration variables. The following statements show different ways to change the session and global values of sort_buffer_size dynamically:

SET sort_buffer_size = <value>;
SET GLOBAL sort_buffer_size = <value>;
SET @@sort_buffer_size := <value>;
SET @@session.sort_buffer_size := <value>;
SET @@global.sort_buffer_size := <value>;

If you set variables dynamically, be aware that those settings will be lost when MySQL shuts down. If you want to keep the settings, you’ll have to update your configuration file as well.

Warning
If you set a variable’s global value while the server is running, the values for the current session and any other existing sessions are not affected. Keep this in mind if your clients rely on persistent database connections. This is because the session values are initialized from the global value when the connections are created. You should inspect the output of SHOW GLOBAL VARIABLES after each change to make sure it’s had the desired effect.

There is also a special value you can assign to variables with the SET command: the keyword DEFAULT. Assigning this value to a session-scoped variable sets that variable to the corresponding globally scoped variable’s value; assigning it to a globally scoped variable sets the variable to the compiled-in default (not the value specified in the configuration file). This is useful for resetting session-scoped variables back to the values they had when you opened the connection. We advise you not to use it for global variables because it probably won’t do what you want - that is, it doesn’t set the values back to what they were when you started the server.

Persisted System Variables

If all of this variable scoping and configuration business wasn’t complicated enough, you also had to be aware that if MySQL was restarted, it would revert back to what you had in your configuration file - even if you had used SET GLOBAL to change a global variable. This meant that you had to manage a configuration file and the runtime configuration of MySQL, and ensure they stayed in sync with each other. If you wanted to increase max_connections for your servers, you had to issue a SET GLOBAL max_connections command on each running instance, and then follow up with editing the configuration file to reflect your new configuration.

MySQL 8.0 introduced a new feature called persisted system variables which helps to make this a little less complicated. The new syntax SET PERSIST now allows you to set the value once for both runtime and MySQL will write this setting out to disk enabling it to be used at the next restart.

Side Effects of Setting Variables

Setting variables dynamically can have unexpected side effects, such as flushing dirty blocks from buffers. Be careful which settings you change online because this can cause the server to do a lot of work.

Sometimes you can infer a variable’s behavior from its name. For example, max_heap_table_size does what it sounds like: it specifies the maximum size to which implicit in-memory temporary tables are allowed to grow. However, the naming conventions aren’t completely consistent, so you can’t always guess what a variable will do by looking at its name.

Let’s take a look at some commonly used variables and the effects of changing them dynamically:

table_open_cache
Setting this variable has no immediate effect - the effect is delayed until the next time a thread opens a table. When this happens, MySQL checks the variable’s value. If the value is larger than the number of tables in the cache, the thread can insert the newly opened table into the cache. If the value is smaller than the number of tables in the cache, MySQL deletes unused tables from the cache.
thread_cache_size
Setting this variable has no immediate effect - the effect is delayed until the next time a connection is closed. At that time, MySQL checks whether there is space in the cache to store the thread. If so, it caches the thread for future reuse by another connection. If not, it kills the thread instead of caching it. In this case, the number of threads in the cache, and hence the amount of memory the thread cache uses, does not immediately decrease; it decreases only when a new connection removes a thread from the cache to use it. (MySQL adds threads to the cache only when connections close and removes them from the cache only when new connections are created.)
read_buffer_size
MySQL doesn’t allocate any memory for this buffer until a query needs it, but then it immediately allocates the entire chunk of memory specified here.
read_rnd_buffer_size
MySQL doesn’t allocate any memory for this buffer until a query needs it, and then it allocates only as much memory as needed. (The name max_read_rnd_buffer_size would describe this variable more accurately.)

The official MySQL documentation explains what these variables do in detail, and this isn’t an exhaustive list. Our goal here is simply to show you what behavior to expect when you change a few common variables.

You should not raise the value of a per-connection setting globally unless you know it’s the right thing to do. Some buffers are allocated all at once, even if they’re not needed, so a large global setting can be a huge waste. Instead, you can raise the value when a query needs it.

Planning Your Variable Changes

Be careful when setting variables. More is not always better, and if you set the values too high, you can easily cause problems: you might run out of memory, or cause your server to swap.

Referring back to chapter MYSQLPERF, monitor your SLOs to ensure that your changes don’t impact the customer experience. Benchmarks aren’t enough, because they’re not real. If you don’t measure your server’s actual performance, you might hurt performance without knowing it. We’ve seen many cases where someone changed a server’s configuration and thought it improved performance, when in fact the server’s performance worsened overall because of a different workload at a different time of day or day of the week.

Ideally, you’re using a version control system to track changes to your configuration files. This strategy can be very effective at correlating a performance change or SLO breach to a specific configuration change. Just be aware that changing the configuration file doesn’t actually do anything by default - you have to change the runtime setting too.

Before you start changing your configuration, you should optimize your queries and your schema, addressing at least the obvious things such as adding indexes. If you get deep into tweaking the configuration and then change your queries or schema, you might need to reevaluate the configuration. Keep in mind that unless your hardware, workload, and data are completely static, chances are you’ll need to revisit your configuration later. And in fact, most people’s servers don’t even have a steady workload throughout the day - meaning that the “perfect” configuration for the middle of the morning is not right for midafternoon! Obviously, chasing the mythical “perfect” configuration is completely impractical. Thus, you don’t need to squeeze every last ounce of performance out of your server; in fact, the return for such an investment of time will probably be very small. We suggest that you focus on optimizing for your peak workload and then stop at “good enough,” unless you have reason to believe you’re forgoing a significant performance improvement.

What Not to Do

Before we get started with server configuration, we want to encourage you to avoid a few common practices that we’ve found to be risky or practically not worth the effort. Warning: rants ahead!

You might be expected (or believe that you’re expected) to set up a benchmark suite and “tune” your server by changing its configuration iteratively in search of optimal settings. This usually is not something we advise most people to do. It requires so much work and research, and the potential payoff is so small in most cases, that it can be a huge waste of time. You are probably better off spending that time on other things such as checking your backups, monitoring changes in query plans, and so on.

You should not “tune by ratio.” The classic “tuning ratio” is the rule of thumb that your InnoDB buffer pool hit ratio should be higher than some percentage, and you should increase the cache size if the hit rate is too low. This is very wrong advice. Regardless of what anyone tells you, the cache hit ratio has nothing to do with whether the cache is too large or too small. To begin with, the hit ratio depends on the workload - some workloads simply aren’t cacheable no matter how big the cache is - and secondly, cache hits are meaningless, for reasons we’ll explain later. It sometimes happens that when the cache is too small, the hit rate is low, and increasing the cache size increases the hit rate. However, this is an accidental correlation and does not indicate anything about performance or proper sizing of the cache.

The problem with correlations that sometimes appear to be true is that people begin to believe they will always be true. Oracle DBAs abandoned ratio-based tuning years ago, and we wish MySQL DBAs would follow their lead.1 We wish even more fervently that people wouldn’t write “tuning scripts” that codify these dangerous practices and teach them to thousands of people. This leads to our next suggestion of what not to do: don’t use tuning scripts! There are several very popular ones that you can find on the Internet. It’s probably best to ignore them.

We also suggest that you avoid the word “tuning,” which we’ve used liberally in the past few paragraphs. We favor “configuration” or “optimization” instead (as long as that’s what you’re actually doing). The word “tuning” conjures up images of an undisciplined novice who tweaks the server and sees what happens. We suggested in the previous section that this practice is best left to those who are researching server internals. “Tuning” your server can be a stunning waste of time.

On a related topic, searching the Internet for configuration advice is not always a great idea. You can find a lot of bad advice in blogs, forums, and so on. Although many experts contribute what they know online, it is not always easy to tell who is qualified. We can’t give unbiased recommendations about where to find real experts, of course. But we can say that the credible, reputable MySQL service providers are a safer bet in general than what a simple Internet search turns up, because people who have happy customers are probably doing something right. Even their advice, however, can be dangerous to apply without testing and understanding, because it might have been directed at a situation that differed from yours in a way you don’t understand.

Finally, don’t believe the popular memory consumption formula - yes, the very one that MySQL itself prints out when it crashes. (We won’t repeat it here.) This formula is from an ancient time. It is not a reliable or even useful way to understand how much memory MySQL can use in the worst case. You might see some variations on this formula on the Internet, too. These are similarly flawed, even though they add in more factors that the original formula doesn’t have. The truth is that you can’t put an upper bound on MySQL’s memory consumption. It is not a tightly regulated database server that controls memory allocation.

Creating a MySQL Configuration File

As we mentioned at the beginning of this chapter, we don’t have a one-size-fits-all “best configuration file” for, say, a 4-CPU server with 16 GB of memory and 12 hard drives. You really do need to develop your own configurations, because even a good starting point will vary widely depending on how you’re using the server.

Minimal Configuration

We’ve created a minimal sample configuration file for this book, which you can use as a good starting point for your own servers.2 You must choose values for a few of the settings; we’ll explain those later in this chapter. Our base file, built around MySQL 8.0, looks like this:

$ [mysqld]
# GENERAL
datadir = /var/lib/mysql
socket = /var/lib/mysql/mysql.sock
pid_file = /var/lib/mysql/mysql.pid
user = mysql
port = 3306
# INNODB
innodb_buffer_pool_size = <value>
innodb_log_file_size = <value>
innodb_file_per_table = 1
innodb_flush_method = O_DIRECT
# LOGGING
log_error = /var/lib/mysql/mysql-error.log
log_slow_queries = /var/lib/mysql/mysql-slow.log
# OTHER
tmp_table_size = 32M
max_heap_table_size = 32M
max_connections = <value>
thread_cache_size = <value>
table_open_cache = <value>
open_files_limit = 65535
[client]
socket = /var/lib/mysql/mysql.sock
port = 3306

This might seem too minimal in comparison to what you’re used to seeing, but it’s actually more than many people need. There are a few other types of configuration options that you are likely to use as well, such as binary logging; we’ll cover those later in this and other chapters.

The first thing we configured is the location of the data. We chose /var/lib/mysql for this, because it’s a popular location on many Unix variants. There is nothing wrong with choosing another location; you decide. We’ve put the PID file into the same location, but many operating systems will want to place it in /var/run instead. That’s fine, too. We simply needed to have something configured for these settings. By the way, don’t let the socket and PID file be located according to the server’s compiled-in defaults; there are some bugs in various MySQL versions that can cause problems with this. It’s best to set these locations explicitly. (We’re not advising you to choose different locations; we’re just advising you to make sure the my.cnf file mentions those locations explicitly, so they won’t change and break things if you upgrade the server.)

We also specified that mysqld should run as the mysql user account on the operating system. You’ll need to make sure this account exists, and that it owns the data directory and all files within. The port is set to the default of 3306, but sometimes that is something you’ll want to change.

In MySQL 8.0, a new configuration option, innodb_dedicated_server, was introduced. This option examines the available memory on the server and configures four additional variables (innodb_buffer_pool_size, innodb_log_file_size, innodb_log_files_in_group, and innodb_flush_method) appropriately for a dedicated database server. This simplifies calculating and changing these values. This can be especially useful in a cloud environment, where you might run a VM with 128GB of RAM, and then reboot it to scale up to 256GB RAM. MySQL here would be self-configuring and you don’t need to manage changing the values in the configuration file. This is often the best way to manage these four settings.

Most of the other settings in our sample file are pretty self-explanatory, and many of them are a matter of judgment. We’ll explore several of them in the rest of this chapter. We’ll also discuss some safety settings later in this chapter, which can be very helpful for making your server more robust and helping prevent bad data and other problems. We don’t show those settings here.

One setting to explain here is the open_files_limit option. We’ve set this as large as possible on a typical Linux system. Open file handles are very cheap on modern operating systems. If this setting isn’t large enough, you’ll see error 24, “too many open files.”

Skipping all the way to the end, the last section in the configuration file is for client programs such as mysql and mysqladmin, and simply lets them know how to connect to the server. You should set the values for client programs to match those you chose for the server.

Inspecting MySQL Server Status Variables

Sometimes you can use the output from SHOW GLOBAL STATUS as input to your configuration to help customize the settings better for your workload. For the best results, look both at absolute values and at how the values change over time, preferably with several snapshots at peak and off-peak times. You can use the following command to see incremental changes to status variables every 60 seconds:

$ mysqladmin extended-status -ri60

We will frequently refer to changes in status variables over time as we explain various configuration settings. We will usually expect you to be examining the output of a command such as the one we just showed. Other helpful tools that can provide a compact display of status counter changes are Percona Toolkit’s pt-mext or pt-mysql-summary.

Now that we’ve shown you the preliminaries, we’ll take you on a guided tour of some server internals, interleaved with advice on configuration. This will give you the background you’ll need to choose appropriate values for configuration options when we return to the sample configuration file later.

Configuring Memory Usage

Using innodb_dedicated_server will typically use 50% - 75% of your RAM. This leaves you with at least 25% for per-connection memory allocations, operating system overhead and other memory settings. We go over each of these in the following sections, and then we take a more detailed look at the various MySQL caches’ requirements.

Per-Connection Memory Needs

MySQL needs a small amount of memory just to hold a connection (thread) open. It also requires a base amount of memory to execute any given query. You’ll need to set aside enough memory for MySQL to execute queries during peak load times. Otherwise, your queries will be starved for memory, and they will run poorly or fail.

It’s useful to know how much memory MySQL will consume during peak usage, but some usage patterns can unexpectedly consume a lot of memory, which makes this hard to predict. Prepared statements are one example, because you can have many of them open at once. Another example is the InnoDB data dictionary (more about this later).

You don’t need to assume a worst-case scenario when trying to predict peak memory consumption. For example, if you configure MySQL to allow a maximum of 100 connections, it theoretically might be possible to simultaneously run large queries on all 100 connections, but in reality this probably won’t happen. Queries that use many large temporary tables, or complex stored procedures, are the most likely causes of high per-connection memory consumption.

Reserving Memory for the Operating System

Just as with queries, you need to reserve enough memory for the operating system to do its work. This involves running any local monitoring software, configuration management tooling, scheduled jobs, etc. The best indication that the operating system has enough memory is that it’s not actively swapping (paging) virtual memory to disk.

The InnoDB Buffer Pool

The InnoDB buffer pool needs more memory than anything else, as it’s generally the most important variable for performance. The InnoDB buffer pool doesn’t just cache indexes: it also holds row data, the adaptive hash index, the change buffer, locks, and other internal structures. InnoDB also uses the buffer pool to help it delay writes, so it can merge many writes together and perform them sequentially. In short, InnoDB relies heavily on the buffer pool, and you should be sure to allocate enough memory to it. You can use variables from SHOW commands or tools such as innotop to monitor your InnoDB buffer pool’s memory usage.

If you don’t have much data, and you know that your data won’t grow quickly, you don’t need to over allocate memory to the buffer pool. It’s not really beneficial to make it much larger than the size of the tables and indexes that it will hold. There’s nothing wrong with planning ahead for a rapidly growing database, of course, but sometimes we see huge buffer pools with a tiny amount of data. This isn’t necessary.

Large buffer pools come with some challenges, such as long shutdown and warm-up times. If there are a lot of dirty (modified) pages in the buffer pool InnoDB can take a long time to shut down, because it writes the dirty pages to the data files upon shutdown. You can force it to shut down quickly, but then it just has to do more recovery when it restarts, so you can’t actually speed up the shutdown and restart cycle time. If you know in advance when you need to shut down, you can change the innodb_max_dirty_pages_pct variable at runtime to a lower value, wait for the flush thread to clean up the buffer pool, and then shut down once the number of dirty pages becomes small. You can monitor the number of dirty pages by watching the Innodb_buffer_pool_pages_dirty server status variable or using innotop to monitor SHOW INNODB STATUS. You can also use the variable innodb_fast_shutdown to tweak how shutdown occurs.

Lowering the value of the innodb_max_dirty_pages_pct variable doesn’t actually guarantee that InnoDB will keep fewer dirty pages in the buffer pool. Instead, it controls the threshold at which InnoDB stops being “lazy.” InnoDB’s default behavior is to flush dirty pages with a background thread, merging writes together and performing them sequentially for efficiency. This behavior is called “lazy” because it lets InnoDB delay flushing dirty pages in the buffer pool, unless it needs to use the space for some other data. When the percentage of dirty pages exceeds the threshold, InnoDB will flush pages as quickly as it can to try to keep the dirty page count lower. These page cleaner operations have been greatly optimized3 from previous behavior, including being able to configure multiple threads to perform flushing.

When MySQL starts back up again, the buffer pool cache is empty - also referred to as a cold cache. All of the benefit of having rows and pages in memory is now gone. Thankfully, by default, the configuration options innodb_buffer_pool_dump_at_shutdown and innodb_buffer_pool_load_at_startup work together to warm the server at startup. The load at startup takes time, but it can speed up the performance of a server much faster than waiting for it to naturally populate.

The Thread Cache

The thread cache holds threads that aren’t currently associated with a connection but are ready to serve new connections. When there’s a thread in the cache and a new connection is created, MySQL removes the thread from the cache and gives it to the new connection. When the connection is closed, MySQL places the thread back into the cache, if there’s room. If there isn’t room, MySQL destroys the thread. As long as MySQL has a free thread in the cache it can respond rapidly to connection requests, because it doesn’t have to create a new thread for each connection.

The thread_cache_size variable specifies the number of threads MySQL can keep in the cache. You probably won’t need to change this from the default value of -1, or auto-sized unless your server gets many connection requests. To check whether the thread cache is large enough, watch the Threads_created status variable. We generally try to keep the thread cache large enough that we see fewer than 10 new threads created each second, but it’s often pretty easy to get this number lower than 1 per second.

A good approach is to watch the Threads_connected variable and try to set thread_cache_size large enough to handle the typical fluctuation in your workload. For example, if Threads_connected usually stays between 100 and 120, you can set the cache size to 20. If it stays between 500 and 700, a thread cache of 200 should be large enough. Think of it this way: at 700 connections, there are probably no threads in the cache; at 500 connections, there are 200 cached threads ready to be used if the load increases to 700 again.

Making the thread cache very large is probably not necessary for most uses, but keeping it small doesn’t save much memory, so there’s little benefit in doing so. Each thread that’s in the thread cache or sleeping typically uses around 256 KB of memory. This is not very much compared to the amount of memory a thread can use when a connection is actively processing a query. In general, you should keep your thread cache large enough that Threads_created doesn’t increase very often. If this is a very large number, however (e.g., many thousand threads), you might want to set it lower because some operating systems don’t handle very large numbers of threads well, even when most of them are sleeping.

Configuring MySQL’s I/O Behavior

A few configuration options affect how MySQL synchronizes data to disk and performs recovery. These can affect performance dramatically, because they involve I/O operations. They also represent a trade-off between performance and data safety. In general, it’s expensive to ensure that your data is written to disk immediately and consistently. If you’re willing to risk the danger that a disk write won’t really make it to permanent storage, you can increase concurrency and/or reduce I/O waits, but you’ll have to decide for yourself how much risk you can tolerate.

InnoDB permits you to control not only how it recovers, but also how it opens and flushes its data, which greatly affects recovery and overall performance. InnoDB’s recovery process is automatic and always runs when InnoDB starts, though you can influence what actions it takes. Leaving aside recovery and assuming nothing ever crashes or goes wrong, there’s still a lot to configure for InnoDB. It has a complex chain of buffers and files designed to increase performance and guarantee ACID properties, and each piece of the chain is configurable. Figure 4-1 illustrates these files and buffers.

A few of the most important things to change for normal usage are the InnoDB log file size, how InnoDB flushes its log buffer, and how InnoDB performs I/O.

The InnoDB transaction log

InnoDB uses its log to reduce the cost of committing transactions. Instead of flushing the buffer pool to disk when each transaction commits, it logs the transactions. The changes transactions make to data and indexes often map to random locations in the tablespace, so flushing these changes to disk would require random I/O. InnoDB assumes it’s using conventional disks, where random I/O is much more expensive than sequential I/O because of the time it takes to seek to the correct location on disk and wait for the desired part of the disk to rotate under the head.

Figure 4-1. InnoDB’s buffers and files

InnoDB uses its log to convert this random disk I/O into sequential I/O. Once the log is safely on disk, the transactions are permanent, even though the changes haven’t been written to the data files yet. If something bad happens (such as a power failure), InnoDB can replay the log and recover the committed transactions.

Of course, InnoDB does ultimately have to write the changes to the data files, because the log has a fixed size. It writes to the log in a circular fashion: when it reaches the end of the log, it wraps around to the beginning. It can’t overwrite a log record if the changes contained there haven’t been applied to the data files, because this would erase the only permanent record of the committed transaction.

InnoDB uses a background thread to flush the changes to the data files intelligently. This thread can group writes together and make the data writes sequential, for improved efficiency. In effect, the transaction log converts random data file I/O into mostly sequential log file and data file I/O. Moving flushes into the background makes queries complete more quickly and helps cushion the I/O system from spikes in the query load.

The overall log file size is controlled by innodb_log_file_size and innodb_log_files_in_group, and it’s very important for write performance. If you took our earlier advice, and used innodb_dedicated_server, these are managed for you based on how much memory your system has.

Log buffer

When InnoDB changes any data, it writes a record of the change into its log buffer, which it keeps in memory. InnoDB flushes the buffer to the log files on disk when the buffer gets full, when a transaction commits, or once per second - whichever comes first. Increasing the buffer size, which is 1 MB by default, can help reduce I/O if you have large transactions. The variable that controls the buffer size is called innodb_log_buffer_size.

You usually don’t need to make the buffer very large. The recommended range is 1 to 8 MB, and this usually will be enough unless you write a lot of huge BLOB records. The log entries are very compact compared to InnoDB’s normal data. They are not page-based, so they don’t waste space storing whole pages at a time. InnoDB also makes log entries as short as possible. They are sometimes even stored as the function number and parameters of a C function!

How InnoDB flushes the log buffer

When InnoDB flushes the log buffer to the log files on disk, it locks the buffer with a mutex, flushes it up to the desired point, and then moves any remaining entries to the front of the buffer. It is possible that more than one transaction will be ready to flush its log entries when the mutex is released. InnoDB uses a group commit feature that can commit all of them to the log in a single I/O operation.

The log buffer must be flushed to durable storage to ensure that committed transactions are fully durable. If you care more about performance than durability, you can change innodb_flush_log_at_trx_commit to control where and how often the log buffer is flushed.

Possible settings are as follows:

0
Write the log buffer to the log file and flush the log file every second, but do nothing at transaction commit.
1
Write the log buffer to the log file and flush it to durable storage every time a transaction commits. This is the default (and safest) setting; it guarantees that you won’t lose any committed transactions, unless the disk or operating system “fakes” the flush operation.
2
Write the log buffer to the log file at every commit, but don’t flush it. InnoDB schedules a flush once every second. The most important difference from the 0 setting is that 2 won’t lose any transactions if the MySQL process crashes. If the entire server crashes or loses power, however, you can still lose transactions.

It’s important to know the difference between writing the log buffer to the log file and flushing the log to durable storage. In most operating systems, writing the buffer to the log simply moves the data from InnoDB’s memory buffer to the operating system’s cache, which is also in memory. It doesn’t actually write the data to durable storage. Thus, settings 0 and 2 usually result in at most one second of lost data if there’s a crash or a power outage, because the data might exist only in the operating system’s cache. We say “usually” because InnoDB tries to flush the log file to disk about once per second no matter what, but it is possible to lose more than a second of transactions in some cases, such as when a flush gets stalled.

Sometimes the hard disk controller or operating system fakes a flush by putting the data into yet another cache, such as the hard disk’s own cache. This is faster but very dangerous, because the data might still be lost if the drive loses power. This is even worse than setting innodb_flush_log_at_trx_commit to something other than 1, because it can cause data corruption, not just lost transactions.

Setting innodb_flush_log_at_trx_commit to anything other than 1 can cause you to lose transactions. However, you might find the other settings useful if you don’t care about durability (the D in ACID). Maybe you just want some of InnoDB’s other features, such as clustered indexes, resistance to data corruption, and row-level locking.

The best configuration for high-performance transactional needs is to leave innodb_flush_log_at_trx_commit set to 1 and place the log files on a RAID volume with a battery-backed write cache and solid state drives (SSDs). This is both safe and very fast. In fact, we dare say that any production database server that’s expected to handle a serious workload needs to have this kind of hardware.

How InnoDB opens and flushes log and data files

The innodb_flush_method option lets you configure how InnoDB actually interacts with the filesystem. Despite its name, it can affect how InnoDB reads data, not just how it writes it.

Warning
Changing how InnoDB performs I/O operations can impact performance greatly, so be sure you understand what you’re doing before you change anything!

This is a slightly confusing option, because it affects both the log files and the data files, and it sometimes does different things to each kind of file. It would be nice to have one configuration option for the logs and another for the data files, but they’re combined.

If you use a Unix-like operating system and your RAID controller has a battery-backed write cache, we recommend that you use O_DIRECT. If not, either the default or O_DIRECT will probably be the best choice, depending on your application. If you opted to use innodb_dedicated_server, as we mentioned earlier, this option is set automatically for you.

The InnoDB tablespace

InnoDB keeps its data in a tablespace, which is essentially a virtual filesystem spanning one or many files on disk. InnoDB uses the tablespace for many purposes, not just for storing tables and indexes. It keeps its undo log (old row versions), change buffer, doublewrite buffer, and other internal structures in the tablespace.

Configuring the tablespace

You specify the tablespace files with the innodb_data_file_path configuration option. The files are all contained in the directory given by innodb_data_home_dir. Here’s an example:

innodb_data_home_dir = /var/lib/mysql/
innodb_data_file_path = ibdata1:1G;ibdata2:1G;ibdata3:1G

That creates a 3 GB tablespace in three files. Sometimes people wonder whether they can use multiple files to spread load across drives, like this:

innodb_data_file_path = /disk1/ibdata1:1G;/disk2/ibdata2:1G;...

While that does indeed place the files in different directories, which represent different drives in this example, InnoDB concatenates the files end-to-end. Thus, you usually don’t gain much this way. InnoDB will fill the first file, then the second when the first is full, and so on; the load isn’t really spread in the fashion you need for higher performance. A RAID controller is a smarter way to spread load.

To allow the tablespace to grow if it runs out of space, you can make the last file autoextend as follows:

...ibdata3:1G:autoextend

The default behavior is to create a single 10 MB autoextending file. If you make the file autoextend, it’s a good idea to place an upper limit on the tablespace’s size to keep it from growing very large, because once it grows, it doesn’t shrink. For example, the following example limits the autoextending file to 2 GB:

...ibdata3:1G:autoextend:max:2G

Managing a single tablespace can be a hassle, especially if it autoextends and you want to reclaim the space (for this reason, we recommend disabling the autoextend feature, or at least setting a reasonable cap on the space). The only way to reclaim space is to dump your data, shut down MySQL, delete all the files, change the configuration, restart, let InnoDB create new empty files, and restore your data. InnoDB is completely unforgiving about its tablespace - you cannot simply remove files or change their sizes. It will refuse to start if you corrupt its tablespace. It is likewise very strict about its log files. If you’re used to casually moving files around as you could do with MyISAM, take heed!

The innodb_file_per_table option lets you configure InnoDB to use one file per table. It stores the data in the database directory as tablename.ibd files. This makes it easier to reclaim space when you drop a table, and it can be useful for spreading tables across multiple disks. However, placing the data in multiple files can actually result in more wasted space overall, because it trades internal fragmentation in the single InnoDB tablespace for wasted space in the .ibd files. This is more of an issue for very small tables, because InnoDB’s page size is 16 KB. Even if your table has only 1 KB of data, it will still require at least 16 KB on disk.

Even if you enable the innodb_file_per_table option, you’ll still need the main tablespace for the undo logs and other system data. It will be smaller if you’re not storing all the data in it, but it’s still a good idea to disable autoextend, because you can’t shrink the file without reloading all your data.

Some people like to use innodb_file_per_table just because of the extra manageability and visibility it gives you. For example, it’s much faster to find a table’s size by examining a single file than it is to use SHOW TABLE STATUS, which has to perform more complex work to determine how many pages are allocated to a table.

There is a dark side to innodb_file_per_table: slow DROP TABLE performance. This can be severe enough to cause a noticeable server-wide stall, for two reasons:

  • Dropping the table unlinks (deletes) the file at the filesystem level, which can be very slow on some filesystems (ext3, we’re looking at you). You can shorten the duration of this with tricks on the filesystem: link the .ibd file to a zero-sized file, then delete the file manually, instead of waiting for MySQL to do it.
  • When you enable this option, each table gets its own tablespace inside InnoDB. It turns out that removing the tablespace actually requires InnoDB to lock and scan the buffer pool while it looks for pages belonging to this tablespace, which is very slow on a server with a large buffer pool. This is improved if you break the buffer pool into many parts using innodb_buffer_pool_instances.

What’s the final recommendation? We suggest that you use innodb_file_per_table and cap the size of your shared tablespace to make your life easier. If you run into any circumstances that make this painful, as noted above, consider one of the fixes we suggested.

Old row versions and the tablespace

InnoDB’s tablespace can grow very large in a write-heavy environment. If transactions stay open for a long time (even if they’re not doing any work) and they’re using the default REPEATABLE READ transaction isolation level, InnoDB won’t be able to remove old row versions, because the uncommitted transactions will still need to be able to see them. InnoDB stores the old versions in the tablespace, so it continues to grow as more data is updated. The purge process is multi-threaded but may need to be tuned for workloads if you experience problems with purge lag (innodb_purge_threads and innodb_purge_batch_size).

The output of SHOW INNODB STATUS can help you pinpoint the problem. Look at the history list length in the TRANSACTIONS section; it shows the size of the undo log.

------------
TRANSACTIONS
------------
Trx id counter 1081043769321
Purge done for trx's n:o < 1081041974531 undo n:o < 0 state: running but idle
History list length 697068

If you have a large undo log and your tablespace is growing because of it, you can force MySQL to slow down enough for InnoDB’s purge thread to keep up. This might not sound attractive, but there’s no alternative. Otherwise, InnoDB will keep writing data and filling up your disk until the disk runs out of space or the tablespace reaches the limits you’ve defined.

To throttle the writes, set the innodb_max_purge_lag variable to a value other than 0. This value indicates the maximum number of transactions that can be waiting to be purged before InnoDB starts to delay further queries that update data. You’ll have to know your workload to decide on a good value. As an example, if your average transaction affects 1 KB of rows and you can tolerate 100 MB of unpurged rows in your tablespace, you could set the value to 100,000.

Bear in mind that unpurged row versions impact all queries, because they effectively make your tables and indexes larger. If the purge thread simply can’t keep up, performance can decrease dramatically. Setting the innodb_max_purge_lag variable will slow down performance too, but it’s the lesser of the two evils.4

Other I/O configuration options

The sync_binlog option controls how MySQL flushes the binary log to disk. Its default value is 1, which means MySQL will perform flushing and keep binary logs durable and safe. This is the recommended setting and we caution against you for setting this to any other value.

If you don’t keep sync_binlog set at 1, it’s likely that a crash will cause your binary log to be out of sync with your transactional data. This can easily break replication and make recovery impossible, especially if using global transaction IDs (more on this in REPLICATION). The safety provided by leaving this at 1 far outweighs the IO performance penalty that is incurred.

We covered RAID in more depth in BPHARDWARE, but it’s worth repeating here that good-quality RAID controllers, with battery-backed write caches set to use the write-back policy, can handle thousands of writes per second and still give you durable storage. The data gets written to a fast cache with a battery, so it will survive even if the system loses power. When the power comes back, the RAID controller will write the data from the cache to the disk before making the disk available for use. Thus, a good RAID controller with a large enough battery-backed write cache can improve performance dramatically and is a very good investment. Of course, solid-state storage is also the recommended solution at this point and that also dramatically improves IO performance.

Configuring MySQL Concurrency

When you’re running MySQL in a high-concurrency workload, you might run into bottlenecks you wouldn’t otherwise experience. This section explains how to detect these problems when they happen, and how to get the best performance possible under these workloads.

If you have problems with InnoDB concurrency and you are not running at least MySQL 5.7, the solution is usually to upgrade the server. Older versions still held a lot of high concurrency scalability challenges. Everything queued on global mutexes such as the buffer pool mutex, and the server practically ground to a halt. If you upgrade to one of the newer versions of MySQL, you don’t need to limit concurrency in most cases.

If you find yourself hitting this bottleneck, your best option is to shard your data. If sharding is not a viable path forward, you may need to limit concurrency. InnoDB has its own “thread scheduler” that controls how threads enter its kernel to access data, and what they can do once they’re inside the kernel. The most basic way to limit concurrency is with the innodb_thread_concurrency variable, which limits how many threads can be in the kernel at once. A value of 0 means there is no limit on the number of threads. If you are having InnoDB concurrency problems in older MySQL versions, this variable is the most important one to configure.

MySQL’s online documentation5 provides the best guide or configuration here. You will have to experiment to find the best value for your system but our recommendation is to start with setting innodb_thread_concurrency to the same number of CPU cores you have available and then begin tuning up or down as needed.

If more than the allowed number of threads are already in the kernel, a thread can’t enter the kernel. InnoDB uses a two-phase process to try to let threads enter as efficiently as possible. The two-phase policy reduces the overhead of context switches caused by the operating system scheduler. The thread first sleeps for innodb_thread_sleep_delay microseconds, and then tries again. If it still can’t enter, it goes into a queue of waiting threads and yields to the operating system.

The default sleep time in the first phase is 10,000 microseconds. Changing this value can help in high-concurrency environments, when the CPU is underused with a lot of threads in the “sleeping before entering queue” status. The default value can also be much too large if you have a lot of small queries, because it adds to query latency.

Once a thread is inside the kernel, it has a certain number of “tickets” that let it back into the kernel for “free,” without any concurrency checks. This limits how much work it can do before it has to get back in line with other waiting threads. The innodb_concurrency_tickets option controls the number of tickets. It rarely needs to be changed unless you have a lot of extremely long-running queries. Tickets are granted per-query, not per-transaction. Once a query finishes, its unused tickets are discarded.

In addition to the bottlenecks in the buffer pool and other structures, there’s another concurrency bottleneck at the commit stage, which is largely I/O-bound because of flush operations. The innodb_commit_concurrency variable governs how many threads can commit at the same time. Configuring this option might help if there’s a lot of thread thrashing even when innodb_thread_concurrency is set to a low value.

Safety Settings

After your basic configuration settings are in place, you might wish to enable a number of settings that make the server safer and more reliable. Some of them influence performance, because safety and reliability are often more costly to guarantee. Some are just sensible, however: they prevent silly mistakes such as inserting nonsensical data into the server. And some don’t make a difference in day-to-day operation, but prevent bad things from happening in edge cases.

Let’s look at a collection of useful options for general server behavior first:

expire_logs_days
If you enable binary logging, you should enable this option, which causes the server to purge old binary logs after the specified number of days. If you don’t enable it, you will eventually run the server out of disk space, and it will freeze or crash. We suggest setting this option large enough that you can recover from at least two backups ago (in case the most recent backup fails). Even if you take backups every day, still leave yourself at least 7 to 14 days’ worth of binary logs. Our experience shows that you’ll be grateful for a week or two of binary logs when you have some unusual problem, such as rebuilding a replica and then trying to get it caught up again with the source. You want to keep enough binary logs around to give yourself some breathing room for operations such as these.
max_connect_errors
If something goes wrong with your networking for a moment, there is an application or configuration error, or there is another problem such as privileges that prevent connections from completing successfully for a brief period of time, clients can get blacklisted and will be unable to connect again until you flush the host cache. The default setting for this option is so small that this problem can happen too easily. You might want to increase it, and in fact, if you know that the server is adequately secured against brute-force attacks, you can just make it very large to effectively disable host blacklisting.
max_connections
This setting acts like an emergency brake to keep your server from being overwhelmed by a surge of connections from the application. If the application misbehaves, or the server encounters a problem such as a stall, a lot of new connections can be opened. But opening a connection does no good if it can’t execute queries, so being denied with a “too many connections” error is a way to fail fast and fail cheaply.
Set max_connections high enough to accommodate the usual load that you think you’ll experience, as well as a safety margin to permit logging in and administering the server. For example, if you think you’ll have 300 or so connections in normal operations, you might set this to 500 or so. If you don’t know how many connections you’ll get, 500 is not an unreasonable starting point anyway. The default is 151, but that’s not enough for a lot of applications.
Beware also of surprises that might make you hit the limit of connections. For example, if you restart an application server, it might not close its connections cleanly, and MySQL might not realize they’ve been closed. When the application server comes back up and tries to open connections to the database, it might be refused due to the dead connections that haven’t timed out yet. This can also come into play if you do not use persistent connections and your application does not disconnect gracefully. The server will keep a connection around for the number of seconds configured with wait_timeout.
Watch the Max_used_connections status variable over time. It is a high-water mark that shows you if the server has had a spike in connections at some point. If it reaches max_connections, chances are a client has been denied at least once.
skip_name_resolve
This setting disables another networking- and authentication-related trap: DNS lookups. DNS is one of the weak points in MySQL’s connection process. When you connect to the server, by default it tries to determine the hostname from which you’re connecting and uses that as part of the authentication credentials. (That is, your credentials are your username, hostname, and password - not just your username and password.) But to verify your hostname, the server needs to perform both a reverse and a forward DNS lookup. This is all fine until DNS starts to have problems, which is pretty much a certainty at some point in time. When that happens, everything piles up and eventually the connection times out. To prevent this, we strongly recommend that you set this option, which disables DNS lookups during authentication. However, if you do this you will need to convert all of your hostname-based grants to use IP addresses, wildcards, or the special hostname “localhost,” because hostname-based accounts will be disabled.
sql_mode
This setting can accept a variety of options that modify server behavior. We don’t recommend changing these just for the fun of it; it’s better to let MySQL be MySQL in most ways and not try to make it behave like other database servers. (Many client and GUI tools expect MySQL to have its own flavor of SQL, for example, so if you change it to speak more ANSI-compliant SQL some things might break.) However, several of the settings are very useful, and some might be worth considering in your specific cases. In the past, MySQL was generally very loose about sql_mode, but it is much more strict in later versions.
However, be aware that it might not be a good idea to change these settings for existing applications, because doing so might make the server incompatible with the application’s expectations. It’s pretty common for people to unwittingly write queries that refer to columns not in the GROUP BY clause or use aggregate functions, for example, so if you want to enable the ONLY_FULL_GROUP_BY option it’s a good idea to do it in a development or staging server first, and only deploy it in production once you’re sure everything is working.
sysdate_is_now
This is another setting that might be backward-incompatible with applications’ expectations. But if you don’t explicitly desire the SYSDATE() function to have non deterministic behavior, which can break replication and make point-in-time recovery from backups unreliable, you might want to enable this option and make its behavior deterministic.
read_only and super_read_only
This option prevents unprivileged users from making changes on replicas, which should be receiving changes only via replication, not from the application. We strongly recommend setting replicas to read-only mode.

There is a more restrictive read-only option, super_read_only, which prevents even users with the SUPER privilege from being able to write data. With this enabled, the only thing that can write changes to your database is replication. We also strongly recommend enabling super_read_only. It will prevent you from accidentally using an administrator account to write data to your replica, putting it out of sync.

Advanced InnoDB Settings

Some of these InnoDB options are quite important for server performance, and there are also a couple of safety options:

innodb_autoinc_lock_mode
This option controls how InnoDB generates auto incrementing primary key values, which can be a bottleneck in some cases, such as high-concurrency inserts. If you have many transactions waiting on the auto increment lock (you can see this in SHOW ENGINE INNODB STATUS), you should investigate this setting. We won’t repeat the manual’s explanation of the options and their behaviors.
innodb_buffer_pool_instances
This setting divides the buffer pool into multiple segments in MySQL 5.5 and newer, and is probably one of the most important ways to improve MySQL’s scalability on multicore machines with a highly concurrent workload. Multiple buffer pools partition the workload so that some of the global mutexes are not such hot contention points.
innodb_io_capacity
InnoDB used to be hardcoded to assume that it ran on a single hard disk capable of 100 I/O operations per second. This was a bad default. Now you can inform InnoDB how much I/O capacity is available to it. InnoDB sometimes needs this set quite high (tens of thousands on extremely fast storage such as PCIe flash devices) to flush dirty pages in a steady fashion, for reasons that are quite complex to explain6.
innodb_read_io_threads and innodb_write_io_threads
These options control how many background threads are available for I/O operations. The default in recent versions of MySQL is to have four read threads and four write threads, which is enough for a lot of servers, especially with the native asynchronous I/O available since MySQL 5.5. If you have many hard drives and a high-concurrency workload, and you see that the threads are having a hard time keeping up, you can increase the number of threads, or you can simply set them to the number of physical spindles you have for I/O (even if they’re behind a RAID controller).
innodb_strict_mode
This setting makes InnoDB throw errors instead of warnings for some conditions, especially invalid or possibly dangerous CREATE TABLE options. If you enable this option, be certain to check all of your CREATE TABLE options, because it might not let you create some tables that used to be fine. Sometimes it’s a bit pessimistic and overly restrictive. You wouldn’t want to find this out when you were trying to restore a backup.
innodb_old_blocks_time
InnoDB has a two-part buffer pool least recently used (LRU) list, which is designed to prevent ad hoc queries from evicting pages that are used many times over the long term. A one-off query such as those issued by mysqldump will typically bring a page into the buffer pool LRU list, read the rows from it, and move on to the next page. In theory, the two-part LRU list will prevent this page from displacing pages that will be needed for a long time by placing it into the “young” sublist and only moving it to the “old” sublist after it has been accessed multiple times. But InnoDB is not configured to prevent this by default, because the page has multiple rows, and thus the multiple accesses to read rows from the page will cause it to be moved to the “old” sublist immediately, placing pressure on pages that need a long lifetime. This variable specifies the number of milliseconds that must elapse before a page can move from the “young” part of the LRU list to the “old” part. It’s set to 0 by default, and setting it to a small value such as 1000 (one second) has proven very effective in our benchmarks.

Summary

After you’ve worked through this chapter, you should have a server configuration that is much better than the defaults. Your server should be fast and stable, and you should not need to tweak the configuration unless you run into an unusual circumstance.

To review, we suggest that you begin with our sample configuration file, set the basic options for your server and workload and add safety options as desired. That’s really all you need to do.

If you’re running a dedicated database server, then the best option you can set is innodb_dedicated_server which handles 90% of your performance configuration. If you are unable to use this option, then the most important options are these two:

  • innodb_buffer_pool_size
  • innodb_log_file_size

Congratulations - you just solved the vast majority of real-world configuration problems we’ve seen!

We’ve also made a lot of suggestions about what not to do. The most important of these are not to “tune” your server; not to use ratios, formulas, or “tuning scripts” as a basis for setting the configuration variables.

1 If you are not convinced that “tuning by ratio” is bad, please read Optimizing Oracle Performance by Cary Millsap (O’Reilly). He even devotes an appendix to the topic, with a tool that can artificially generate any cache hit ratio you wish, no matter how badly your system is performing! Of course, it’s all for the purpose of illustrating how useless the ratio is.

2 Please note that newer versions of MySQL remove, deprecate, and change some options; check the docs for details.

3 https://dev.mysql.com/doc/refman/8.0/en/innodb-buffer-pool-flushing.html

4 Note that the way this ought to be implemented is a topic of some debate; see MySQL bug 60776 for the details.

5 https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_thread_concurrency

6 Follow-up reading:
https://www.percona.com/blog/2019/12/18/give-love-to-your-ssds-reduce-innodb_io_capacity_max/
https://www.percona.com/blog/2020/01/22/innodb-flushing-in-action-for-percona-server-for-mysql/
https://www.percona.com/blog/2020/05/14/tuning-mysql-innodb-flushing-for-a-write-intensive-workload/

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.232.169.110