Introducing indexes

Before discussing the EXPLAIN statement and how the MariaDB optimizer chooses an execution plan, it is important to understand how MariaDB uses indexes.

An index can be defined on one or more columns and their order is relevant. An index that involves string columns can be defined on their prefixes (the leftmost part of the data). For the TEXT and BLOB columns, the index is mandatory.

Tip

Rarely can the use of an index prefix speedup queries. However, sometimes we may want to reduce the disk space occupied by indexes. If only the leftmost characters of a string column are used in WHERE clauses, we can choose to use a partial index on it; for example, this can be feasible if a column contains codes, where each character or group of characters has a special meaning. However, we will probably only save a considerable amount of space if we have a very high number of rows, or if several column prefixes can be indexed.

There are two important index types: BTREE and HASH. Another type, RTREE, is only used for geometric data. The FULLTEXT indexes are used to execute full-text queries. These types will not be discussed in this book.

The HASH type can only be used for equality comparisons: the ones involving the = or <=> (NULL-safe equal) operators. It cannot be used to order or group rows. Where the index type is relevant, all examples in this book use BTREE indexes, and so the type is not specified.

The BTREE type can be used with many comparison operators, such as <, <=, =, >=, >, LIKE, BETWEEN, and IN. It can be used to order and group rows.

Thus, any statement can take advantage of a BTREE index if it is built on the relevant columns. HASH can typically be used for queries that search for an exact index entry. For such queries, HASH indexes can be faster than BTREE indexes. But remember that, for different query types, these indexes will simply be ignored. Even in such cases, though, using a BTREE index is not generally considered a major performance problem.

The BTREE index is the default type for most storage engines. However, since the MEMORY storage engine is often used to cache data (thus, to search one exact entry), its default index type is HASH. Remember to explicitly define the BTREE indexes if HASH is not suitable for some of your queries. As explained later, in the Storage engines and indexes section, InnoDB can optionally use an adaptive algorithm to silently convert the BTREE indexes to HASH, if the latter option seems to be more effective for the server's workload.

MariaDB is able to use the leftmost part of an index. For example, if an index is defined on a column called col1, a query involving the initial part of the column will usually take advantage of the index:

SELECT * FROM t WHERE col1 LIKE 'begin%';

However, if we execute a query that only reads the rightmost, or central, part of the index, the index will not be used. For example, the following query will not use the index:

SELECT * FROM t WHERE col1 LIKE '%end';

For the same reason, if an index involves multiple columns, it can speedup queries that use the leftmost columns in the index, but not queries that do not involve the leftmost part. For example, if an index is defined on two columns called col1 and col2, the following query can take advantage of it in this order:

SELECT * FROM t WHERE col1 = 10;

The following query cannot use such index:

SELECT * FROM t WHERE col2 = 10;

The ORDER BY and GROUP BY clauses can use an index even if the column's order does not match. However, this requires a two-pass sort; the data will be copied into a temporary table or file, and then ordered. The I/O required by such operations can be a performance killer and should be avoided whenever possible. For example, the following query can use our index and does not require extra sorting operations:

SELECT * FROM t ORDER BY col1, col2;

However, the following query requires a temporary table or file:

SELECT * FROM t ORDER BY col2, col1;

An index is usually of great help if it can avoid any access to data for a query. This happens if the SELECT statement only returns columns from the index, and all the other clauses can take advantage from the index. In this case, it is called a covering index.

Table statistics

Even if a query can use an index, the optimizer can decide if the use of an index is not of great help, and thus use another index or a full table scan. The general rule is that indexes are useful when they help reduce read operations. What reduces read operations is the index cardinality, that is, the number of unique values in an index. If an index is unique, its cardinality is equal to the difference in the number of rows and the NULL values. If an index only has a few possible values (for example, it is an ENUM field, or TINYINT used as a boolean), its cardinality will be very low. The selectivity is a correlated term that indicates how many rows can be excluded if an index is used to satisfy a WHERE clause.

To determine whether an index is useful, the optimizer takes into account factors such as the index cardinality, index length, and number of rows in the table. For example, if a table contains only a few rows, there is no point in using an index to avoid reading the data.

The problem with cardinality is that MariaDB does not know its exact value; it is just an estimated value. While in most cases the cardinality is accurate enough, this value can sometimes be out of sync with the real data. This can lead the optimizer to exclude an index (and maybe choose a full table scan) for a query, which would greatly benefit from the use of an index. To recalculate a table's index cardinality, the ANALYZE TABLE statement can be used. The estimate index cardinality is reported in the output of the SHOW INDEX statement.

Index statistics are collected by storage engines and, optionally, by the server. The statistics are based on groups of identical values stored in each index. The most important value is the average number of elements present in groups. Some storage engines have a variable that determines whether NULL values should be considered like any other value (in this context) or not. These variables are innodb_stats_method, aria_stats_method, and myisam_stats_method. They all accept the same values. A nulls_equal value means that the average number of elements should be calculated considering the group of NULL values as any other group. A nulls_unequal value means that each individual NULL value should be considered a group. Both these options affect the statistics in a different way; the first value should be preferred if there are a few NULL index records, while the latter value is preferable when there are several NULL index records. A nulls_ignored value also exists, which causes the NULL values not to be counted. For InnoDB and MyISAM, nulls_equal is the default value, while for Aria, nulls_unequal is the default value.

InnoDB has a variable called @@innodb_stats_on_metadata. It is disabled by default. If it is set to 1, the InnoDB statistics are collected each time the user executes SHOW KEYS, or a corresponding query on the information_schema database. This helps to have updated statistics, but can take a long time on big tables.

Since MariaDB 10, the server can also collect statistics. Such statistics are collected not only for the indexed columns, but also for nonindexed columns. The main problem with this feature is that recalculating statistics for a table always requires a full table scan. Engine-independent statistics are disabled by default, and can be configured using the @@user_stat_tables server system variable. A value never disables the feature, whereas other values enable the engine-independent statistics; with complementary, the features will be used only when the storage engine does not provide the required information, while preferably uses the server statistics whenever possible.

Storage engines and indexes

An InnoDB table always has a clustered index. A clustered index is a column containing sorted values that uniquely identify each row. Each row in secondary indexes contains the indexed values, followed by the clustered index value from the same table row. Only the clustered index contains pointers to the position of the physical rows in the files. This technique is much more complex than the index structure used in MyISAM. All MyISAM keys (the primary key and all other indexes) contain the offset of each row in the data file. The use of clustered indexes makes InnoDB much faster when searching for a specific value in the primary key.

InnoDB uses the primary key as a clustered index, if it exists. If a table does not have a primary key, a unique index is used. If there are no unique keys, a cluster index is created by InnoDB. In such cases, the cluster index is a unique value of 6 bytes that is invisible to the user. Such clustered indexes imply more locks than a normal clustered index based on the primary key. For these reasons, with InnoDB, it is important to explicitly define small primary keys.

The AUTO_INCREMENT values generally imply a lock that prevents two concurrent connections from using the same value. The working of this lock depends on the value of the innodb_autoinc_lock_mode system variable. The allowed values, for the innodb_autoinc_lock_mode system variable are discussed as follows:

  • 0: This value describes that all the INSERT statements hold a table lock until the end of the operation. This was the only available mode in the older InnoDB versions. It is slow, but can be used if a problem arises with value 1.
  • 1: This value describes that the multirow INSERT statements and LOAD DATA INFILE still require a table lock. For a single row INSERT statements, a lighter lock is used. This is the default process because it is the most scalable mode that can be safely used with a statement-based replication.
  • 2: This value denotes that no table locks are ever held. This is not safe with the statement-based replication. This value is, however, safe with row-based replication, and is mandatory with Galera, as explained in Chapter 12, MariaDB Galera Cluster.

If the innodb_adaptive_hash_index server variable is set to 1 (the default), InnoDB can automatically convert BTREE indexes to HASH, and vice versa. This is generally useful for tables whose data is stored in the buffer pool, queried by statements that use the = operator in the WHERE clause. To decide which type of index should be used, InnoDB collects statistical data about how the tables are used. Increasing the value of the innodb_adaptive_max_sleep_delay system variable reduces the overhead due to such operations, which can be useful on busy servers.

With Aria and MyISAM, indexes store the minimum and maximum value of each field they contain. Thus, a SELECT command without any optional clauses, and that returns only maximum and minimum values, is immediately executed.

Index limits (number of indexes per table, number of columns per index, index length, prefix length, and so on) may vary depending on the storage engine. However, the limits are very high.

Some special storage engines, such as CSV, do not support indexes or have a very limited support for them.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.252.56