Query optimizations

If the partitioning expression and the partition type have been carefully selected, most queries will only involve one partition or a set of partitions.

In many cases, the optimizer will find out which partitions are not relevant for the current query. This optimization is called partition pruning.

Also, the user can use a SQL clause to specify the list of partitions that must be used. This is called partition selecting.

Note

However, in MariaDB, queries are never parallelized. Even if the optimizer knows that two partitions must be read, and those partitions are on different disks, the same thread will read them sequentially. In particular, one can expect full table scans and full index scans on partitioned tables to run much faster because of parallelization, but this is not the case.

Partition pruning

In MariaDB, partition pruning is possible with RANGE and LIST partitioning, but not with RANGE COLUMNS or LIST COLUMNS. When a statement references the columns used by the partitioning expression, the optimizer is usually able to determine whether one or more partitions can be excluded by the query plan. Often, it can exclude all partitions but one. In some cases, it can even detect an Impossible WHERE and avoid executing the query at all.

The optimizer examines the WHERE clause in the following statements to determine whether partition pruning can be applied:

  • SELECT
  • INSERT … SELECT
  • REPLACE … SELECT
  • DELETE
  • UPDATE

For the INSERT statements, the optimizer examines the inserted values. For the REPLACE statements, both the WHERE clause and the new values are examined.

When the optimizer analyzes a WHERE clause, it can use the following operators for pruning:

  • =
  • !=
  • <
  • >
  • <=
  • >=
  • BETWEEN
  • IN

If a partition only contains rows for which the partitioning expression returns NULL values, IS NULL and IS NOT NULL can also be used for pruning.

To obtain a statement execution plan, the EXPLAIN command can be used. We discussed this in Chapter 3, Optimizing Queries. Now that we are dealing with partition, we need to use an EXPLAIN extension: the PARTITIONS option. It adds the partitions' columns to the output of EXPLAIN. This column contains a list of the partitions that will be used to execute the statement. This extension has been specifically added to check whether and how partition pruning is applied. We will use it in the following examples.

For the following examples, we will use the article table, with a RANGE partitioning based on the column ID:

CREATE TABLE article (
  id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
  date DATE NOT NULL,
  author VARCHAR(100),
  language TINYINT UNSIGNED,
  text TEXT,
  PRIMARY KEY (id, date)
)
  ENGINE = InnoDB
PARTITION BY RANGE (id) (
  PARTITION p0 VALUES LESS THAN (5000),
  PARTITION p1 VALUES LESS THAN (10000),
  PARTITION p2 VALUES LESS THAN (15000),
  PARTITION p3 VALUES LESS THAN (20000)
);

Queries that contain a WHERE condition based on the ID column are very likely to take advantage of partition pruning. Consider the following example:

MariaDB [test]> EXPLAIN PARTITIONS
    -> SELECT *  FROM article WHERE id > 5000 G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: article
   partitions: p1,p2,p3

1 row in set (0.00 sec)

Only the relevant columns are shown here. Since the first partition only contains VALUES LESS THAN (5000), the optimizer understands that it cannot contain any relevant row. Thus, p0 is not accessed, which is shown as follows:

MariaDB [test]> EXPLAIN PARTITIONS
    -> SELECT *  FROM article WHERE id BETWEEN 8000 AND 13000 G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: article
   partitions: p1,p2

1 row in set (0.00 sec)

This technique works perfectly with a range of values that are spread over multiple partitions, as shown in this example where p0 and p4 are not accessed:

MariaDB [test]> EXPLAIN PARTITIONS
    -> SELECT *  FROM article WHERE id = 11000 G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: article
   partitions: p2

1 row in set (0.00 sec)

Queries that retrieve a single row, or at least search for a single value, should always be able to only access one partition. For common cases, this is the best case, shown as follows:

MariaDB [test]> EXPLAIN PARTITIONS
    -> SELECT * FROM article WHERE id = 9999999 G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: NULL
   partitions: NULL

        Extra: Impossible WHERE noticed after reading const tables
1 row in set (0.00 sec)

In this case, no partition can contain the value we are looking for. As clearly stated in the Extra column, an impossible WHERE was detected after examining the partitions' definitions. Thus, the query is not executed at all, which is shown as follows:

MariaDB [test]> EXPLAIN PARTITIONS
    -> SELECT *  FROM article WHERE id = 18000 OR id < 10000 G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: article
   partitions: p0,p1,p3

1 row in set (0.00 sec)

This example merely shows that partition pruning also works when more value ranges are queried, even if different operators are used.

We will try some queries on LIST partitioned tables too, just to demonstrate that partition pruning works with this partitioning type. The table definition we will use is the following:

CREATE TABLE article (
  id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
  date DATE NOT NULL,
  author VARCHAR(100),
  language TINYINT UNSIGNED,
  text TEXT,
  PRIMARY KEY (id, language)
)
  ENGINE = InnoDB
PARTITION BY LIST (language) (
  PARTITION pn VALUES IN (NULL),
  PARTITION p0 VALUES IN (1),
  PARTITION p1 VALUES IN (2,3),
  PARTITION p2 VALUES IN (4,5,6,7),
  PARTITION p3 VALUES IN (8,9,10,11)
);

Now, let's try some queries, shown as follows:

MariaDB [test]> EXPLAIN PARTITIONS
    -> SELECT *  FROM article WHERE language = 1 G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: article
   partitions: p0

1 row in set (0.00 sec)

Only the first query contains the value 1, and the optimizer prunes other partitions away:

MariaDB [test]> EXPLAIN PARTITIONS
    -> SELECT *  FROM article WHERE language IN (1, 10) G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: article
   partitions: p0,p3

1 row in set (0.00 sec)

This time we queried the table for noncontiguous values that cannot be stored on contiguous partitions, shown as follows:

MariaDB [test]> EXPLAIN PARTITIONS
    -> SELECT *  FROM article WHERE language BETWEEN 2 AND 5 G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: article
   partitions: p1,p2

1 row in set (0.00 sec)

Looking for a range of values across multiple partitions also works as expected.

Partition selection

With partition pruning, the optimizer automatically determines which partitions must be accessed to execute a statement. MariaDB 10.0 also provides a way for the user to explicitly declare which partitions must be involved in the query: the PARTITION clause.

This clause can be specified after any table name, in any of the following SQL statements:

SELECT, including JOIN and UNION queries;
CREATE TABLE … SELECT;
INSERT … SELECT;
INSERT;
UPDATE;
DELETE;
REPLACE;
LOAD DATA INFILE;
LOAD XML INFILE;

The syntax of the PARTITION clause is the following:

<table_name> PARTITION (<partition_list>)

The partition_list variable is a list of one or more partitions, separated by a comma. Subpartitions can also be included in the list, concatenating their name with their main partition's name.

For example, if we have a tab table, the following statement returns all rows from the partition p0, and the s3 subpartition in the partition p1:

SELECT * FROM tab PARTITION (p0, p1s3);

If any of the specified partitions or subpartitions do not exist, the whole statement fails with an error similar to the following:

ERROR 1735 (HY000): Unknown partition <partition_name> in table <table_name>

By contrast, existing partitions can be named more than once in any order, and the subpartitions can be named even if their main partition is included in the list.

Since partition pruning is automatic and it was implemented in MariaDB even before partition selection, this feature can seem useless at first glance. However, it can be useful in several ways. For example:

  • It works with any partitioning expression.
  • It works with all partitioning types.
  • It allows us to specify subpartitions that cannot be automatically pruned.
  • It can be used in rare situations where the optimizer does not apply an optimal partition pruning, perhaps because of a bug.
  • For complex queries, it can slightly speed up the optimizer.
  • It can be less verbose than the corresponding WHERE clause.
  • Statements that destroy or modify existing data, such as DELETE and UPDATE, can cause a data loss if the WHERE clause is not correct. An additional PARTITION clause can reduce the risks when similar damages occur.
  • Similarly, a PARTITION clause can be added to statements that add a huge amount of data to prevent the writing of incorrect values. This technique causes the whole bulk insertion to fail if at least one row does not fit one of the specified partitions. The IGNORE clause does not affect this behavior.
  • It allows us to quickly query one partition or subpartition at a time to analyze the overall data distribution.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.79.84