The query cache explained

The query cache stores the queries that have been executed against the server and their results. If a user issues a query that is in the cache, the results are returned immediately. Of course, the server still checks whether the account has the right to execute the query.

How the query cache affects the performance of a server strictly depends on the workload. In many cases, it may even cause a performance loss. It is enabled by default, but the DBA should carefully benchmark the applications with the query cache enabled and disabled to determine whether it is beneficial for the server. The tests should also be periodically repeated if the applications change. It is also possible to activate it on demand, that is, the query cache will normally be disabled, except for the queries that explicitly ask to be cached. This can be very useful if using the query cache affects performance negatively, but a limited set of queries would be boosted greatly by the cache. The query cache is protected by a mutex. Mutex stands for mutual exclusion; it is a type of lock used by MariaDB to make sure that two threads cannot access the same resource at the same time—usually the resource is a cache. This lock can slow down the database if the concurrency is high or the cache is too big. Also, remember that the query cache necessarily reduces the memory available for data or key caches (such as the InnoDB buffer pool), which are generally more useful.

Any change to table structure or data invalidates the cached queries that involve that table. Note that, even if several cached queries contain many results, even the smallest change to one of the underlying tables invalidates all queries that read it. If an invalidated query can be executed by several connections at the same time, then multiple connections might try to re-execute the query and recache it at the same time. This problem is called a miss storm or cache stampede.

If the data is changed often, the query cache will hardly improve the performance. If the data is seldom changed (for example, only by night or, even better, once a week), it may be a good idea to execute some queries immediately after the changes to cache them.

In a Galera cluster and in servers with the OQGRAPH storage engine enabled, the query cache must be disabled. It should also be avoided in servers containing SPIDER tables, but it can be used on the remote servers. Galera is explained in Chapter 12, MariaDB Galera Cluster, and the SPIDER storage engine is explained in Chapter 11, Data Sharding.

A query sent by the user matches a query in the cache only if their text is identical. Case differences, spaces, and (by default) comments can make two queries different, just as in the following examples:

SELECT * FROM t;
select * from 1;

SELECT *   FROM t;
SELECT * FROM t;

SELECT * FROM t;
SELECT * FROM t/* */;

Also, queries only match if they use the same default database, default character set, SQL_MODE, and protocol version. A prepared statement can be cached if it is sent using the API (not the PREPARE SQL statement). Prepared statements can only match other prepared statements, and they must use the same parameters. Subqueries are not cached, but they have a specific cache. Outer queries containing subqueries can be cached.

Queries cannot be cached in the following scenarios:

  • If they generate warnings
  • If they are executed within a stored program
  • If they use temporary tables or user-defined variables
  • If they contain non-deterministic functions, stored functions, or UDFs
  • If they use system tables
  • If they have one of the following clauses: INTO OUTFILE, INTO DUMPFILE, LOCK IN SHARE MODE, and FOR UPDATE
  • If they do not use tables (SELECT version();)
  • If they reference a table for which the account has column-level privileges
  • If the tables involving some special storage engines, such as SPIDER, cannot be cached

If the query cache is enabled, it is still possible to prevent a query from being cached using the following syntax:

SELECT SQL_NO_CACHE …

If the query cache is only enabled on demand, the following syntax can be used to cache it:

SELECT SQL_CACHE …

The query cache contents are modified by the server and, after some time, the memory can get fragmented. To defragment it, the following statement can be used:

FLUSH QUERY CACHE;

If, for some reason, the cache has been populated with queries that are not useful anymore, it can be emptied with the following statement:

RESET QUERY CACHE;

Configuring the query cache

If we use the query cache, it is important to configure it properly. It is also a good idea to repeat the tests regularly to be sure that the server's performance benefit from the query cache. The query cache configuration is discussed as follows:

  • query_cache_type: This enables or disables the query cache. The value 0 or OFF disables it; 1 or ON enables it; 2 or DEMAND only enables it on demand as explained previously. If this variable is set as a startup option, numeric values must be used (for example, --query-cache-type=2).
  • query_cache_size: This specifies the size of the query cache. Its default size is 1 MB. The value 0 disables the cache but, in this case, query_cache_type should also be specified to prevent the server from checking whether the queries are cacheable and protecting the cache with a mutex. Very small values are not allowed. This value will be rounded off to the nearest multiple of 1,024.
  • query_cache_alloc_block_size: This is the size of the memory blocks used in the cache. This value can be modified at any moment. A high value may reduce fragmentation, making the cache faster, but may also waste more memory.
  • query_cache_limit: This determines the maximum size of the results of the cached queries. If a query returns results that are too big, it cannot be cached. This variable is very important, because it prevents a small number of big queries from consuming all the cache.
  • query_cache_strip_comments: This variable, if set to ON, causes the comments to be erased from the queries before caching. In this way, identical queries with different comments will match. Note that this is not necessary in case of the used clients or if the API's strip the comments. For example, the mysql client does it unless it is started with the --comments option.

Information on the status of the query cache

To get generic information about the status of the query cache, the following query can be used:

MariaDB [none]> SHOW STATUS LIKE 'qcache%';
+-------------------------+--------+
| Variable_name           | Value  |
+-------------------------+--------+
| Qcache_free_blocks      | 1      |
| Qcache_free_memory      | 974512 |
| Qcache_hits             | 1      |
| Qcache_inserts          | 7      |
| Qcache_lowmem_prunes    | 0      |
| Qcache_not_cached       | 4      |
| Qcache_queries_in_cache | 7      |
| Qcache_total_blocks     | 16     |
+-------------------------+--------+
8 rows in set (0.00 sec)

The Qcache_free_blocks and Qcache_free_memory caches represent the free memory, measured in memory blocks and bytes. If the query cache is highly used and these values are high, the cache is probably fragmented. In this case, FLUSH QUERY CACHE can be used to defragment it.

The value of Qcache_total_blocks is the total number of memory blocks, both used and unused. The query cache uses variable-sized blocks.

The value of Qcache_hits is the number of queries for which a match was found in the cache. Check that this number is high and always keeps increasing.

The value of Qcache_inserts is the number of entries added to the cache. The value of Qcache_lowmem_prunes is the number of entries deleted from the cache. The query cache is less efficient if these values are high.

The value of Qcache_not_cached is the number of queries that could not be cached.

The value of Qcache_queries_in_cache is the number of the currently cached queries.

MariaDB also provides the query_cache_info plugin. The plugin is not enabled by default and must be enabled with the following statement:

INSTALL SONAME 'query_cache_info';

Once installed, the QUERY_CACHE_INFO table is added to the information_schema database. It stores information about the individual entries in the query cache. Its columns are:

  • STATEMENT_SCHEMA: The database that was selected when the statement was executed
  • STATEMENT_TEXT: The cached query
  • RESULT_BLOCKS_COUNT: The number of memory blocks used to store the results
  • RESULT_BLOCKS_SIZE: The memory block size
  • RESULT_BLOCKS_SIZE_USED: The amount of memory used to store results (the blocks' total size is usually bigger than used memory)

Now, we'll look at an example. First, we will empty the cache. We will execute SHOW STATUS and will query the QUERY_CACHE_INFO table with the empty query cache. Then, we will execute one simple query and again issue the same commands to see the small differences. The query is as follows:

MariaDB [test]> RESET QUERY CACHE;
Query OK, 0 rows affected (0.00 sec)
MariaDB [test]> SHOW STATUS LIKE 'Qcache%';
+-------------------------+--------+
| Variable_name           | Value  |
+-------------------------+--------+
| Qcache_free_blocks      | 1      |
| Qcache_free_memory      | 982192 |
| Qcache_hits             | 1      |
| Qcache_inserts          | 12     |
| Qcache_lowmem_prunes    | 0      |
| Qcache_not_cached       | 17     |
| Qcache_queries_in_cache | 0      |
| Qcache_total_blocks     | 1      |
+-------------------------+--------+
8 rows in set (0.00 sec)
MariaDB [test]> SELECT * FROM information_schema.QUERY_CACHE_INFO G
Empty set (0.00 sec)
MariaDB [test]> SELECT COUNT(*) FROM test.t;
+----------+
| COUNT(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)
MariaDB [test]> SHOW STATUS LIKE 'Qcache%';
+-------------------------+--------+
| Variable_name           | Value  |
+-------------------------+--------+
| Qcache_free_blocks      | 1      |
| Qcache_free_memory      | 980656 |
| Qcache_hits             | 1      |
| Qcache_inserts          | 13     |
| Qcache_lowmem_prunes    | 0      |
| Qcache_not_cached       | 18     |
| Qcache_queries_in_cache | 1      |
| Qcache_total_blocks     | 4      |
+-------------------------+--------+
8 rows in set (0.00 sec)
MariaDB [test]> SELECT * FROM information_schema.QUERY_CACHE_INFO G
*************************** 1. row ***************************
       STATEMENT_SCHEMA: test
         STATEMENT_TEXT: SELECT COUNT(*) FROM test.t
    RESULT_BLOCKS_COUNT: 1
     RESULT_BLOCKS_SIZE: 512
RESULT_BLOCKS_SIZE_USED: 127
1 row in set (0.00 sec)

Explaining the subquery cache

As mentioned before, subqueries are cached in a separate cache called the subquery cache. It is useful if some subqueries occur within different outer queries. The subquery cache has been introduced in MariaDB 5.2 and is enabled by default. To disable it, the following statement can be used:

SET @@global.optimizer_switch='subquery_cache=off';

Two status variables provide information about the efficiency of the subquery cache. The Subquery_cache_hit variable is the number of subqueries for which a match was found in the cache. The Subquery_cache_miss variable is the number of subqueries for which no match was found. The hit rate is defined as follows:

Subquery_cache_hit / (Subquery_cache_hit + Subquery_cache_miss)

For each subquery that needs to be cached, a temporary MEMORY table is created. This table stores the results of a subquery and its parameters. As with the internal temporary tables created to order or group query results (see Chapter 3, Optimizing Queries), the subquery cache tables are affected by the tmp_table_size and max_heap_table_size system variables. When one of these values is exceeded, the behavior of the cache depends on the hit rate. If the hit rate is less than 0.2, the cache is disabled. If the hit rate is less than 0.7, the table is emptied but its structure is preserved, so it can be refilled later. If the hit rate is greater than or equal to 0.7, the table is written onto the disk.

Alternative query caching methods

The problem with the query cache is that frequent changes to the tables invalidate the cached queries, making the performance hard to predict and causing cache stampedes. For such reasons, depending on the workload, we may want to implement different query caching methods.

Sometimes very expensive queries generate small results. A typical example is an aggregation (such as a count, an average, or a geometric mean) of values in a big table. Such queries usually generate statistical data and possibly need not be always perfectly up-to-date. For example, if a query involves all data aggregated by month (like the sales that happened last month), it is not important for the user that the results include data generated today (or even this week, or this month). In such cases, the workload may benefit from summary tables. A summary table is a table that is periodically emptied and repopulated with the results of a query. The query that populates the table might be expensive, but the query that involves the summary table is much faster.

It may be sufficient to aggregate data into a single summary table, with several queries retrieving a subset of its rows. Or, it is possible to store each different result-set in a different table. Or again, a hybrid solution could be the best choice.

In many cases, queries generate HTML reports or XML documents. Some of these reports are not expected to change every hour or every day. Or, if they do, the changes may be irrelevant. If a report is expected to remain the same for some time or is allowed to do so, the whole report could be cached in a table. This may greatly speed up the applications, particularly if updating the report involves expensive queries.

The MEMORY tables are usually the best choice in both cases. Of course, data is lost if the server crashes, but the tables can be repopulated, because the non-aggregated data are written on on-disk tables. Another option to store such summaries or reports is to use a key-value software that is optimized for this kind of request. Memcached is the most common choice. One of its advantages is the ability to set a timeout for each stored value. This allows us to store stale data that is not invalidated when the underlying data changes, but automatically disappears after a given amount of time. However, when data expires, the cache stampede problem can happen. Thus, for many workloads, this solution may be unable to solve the main problem that affects the query cache.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.187.113