The SPIDER storage engine

Like the FEDERATEDX storage engine and the CONNECT MYSQL table type, SPIDER allows remote access to tables as if they were on the local server. However, the SPIDER storage engine has been specifically designed for data sharding. Its main function is to access data from a multitude of servers by querying a single local table.

Data sharding is implemented in SPIDER using table partitioning. If a SPIDER table is partitioned, each partition can be linked to a different remote table. The SPIDER storage engine is suitable with the RANGE and LIST partitioning methods, including RANGE COLUMNS and LIST COLUMNS.

The SPIDER storage engine supports both regular SQL transactions and XA transactions, if the remote tables support them too.

The SPIDER storage engine has been originally designed for MySQL. The version distributed along with MariaDB is slightly modified to take advantage of the MariaDB-specific features.

The original author of SPIDER is Kentoku Shiba. The project's official site is http://www.spiderformysql.com/.

Explaining the working of the SPIDER storage engine

The SPIDER storage engine is essentially a storage engine that communicates with the local server's optimizer on one side and the remote server on the other side. When the optimizer chooses an execution plan for a query, which involves SPIDER tables, SPIDER translates this plan into calls to one or more remote servers, acting like a MariaDB client.

When SPIDER is asked to insert data into more than one remote server, it internally uses a transaction with a two-phase commit. The problem with a single-phase commit is that it guarantees data integrity only if it involves one server. When all the modifications have been requested, the commit makes them effective. However, imagine that the modifications involve two servers. We send the commands to both servers, and we get no error. Then, we issue a commit on server 1 and it succeeds. Finally, we send the commit to server 2. If this commit fails, we have created an inconsistency. In fact, the changes we requested are already effective on server 1 and they cannot be undone anymore. For this reason, a single-phase commit is not suitable to execute a transaction between multiple servers.

The two-phase commit transaction model is the same as the model used for the XA transactions. XA commands can be sent to SPIDER by the user because they are fully supported. However, even if a normal transaction is used, SPIDER uses a two-phase commit to make the changes effective across multiple servers. With this technique, when a server receives a commit, it does not immediately apply the changes. It knows that the transaction is finished, but it waits for a second commit. If any of the remote servers returns an error or is not reachable, SPIDER rolls back the transaction on each involved server, avoiding data corruption. Only if the first commit succeeds on all the remote servers, SPIDER sends a second commit to each server. The second commit makes data effective.

When a query involves multiple SPIDER partitions, or multiple unpartitioned SPIDER tables, they are broken into multiple threads. A separate thread is used for each remote server that needs to be accessed by the query. Keeping this in mind, the DBA can augment the parallelization of queries by adding more partitions that point to a remote server.

Query results are buffered by SPIDER until they can be sent to the clients. The incomplete result sets can be stored on the remote servers or on the local server.

The SPIDER storage engine maintains in-memory statistics on remote tables and indexes. These statistics are updated at regular intervals of time. Like other storage engines, SPIDER communicates these values to the optimizer so that it can use them to choose good query plans.

Installing the SPIDER storage engine

The SPIDER storage engine is compiled in official MariaDB distributions, but it is not enabled by default. Before we start using it, the following steps are necessary:

  1. Installing the plugin
  2. Executing install_spider.sql

Like all plugins, the SPIDER storage engine can be installed while the server is running, with the SQL INSTALL statement.

The exact location of the install_spider.sql file depends on the MariaDB distribution and the operating system we are using. It is generally located in the share subdirectory. This file creates the system tables used by SPIDER, in the mysql database.

The following example shows a successful installation of SPIDER:

MariaDB [(none)]> INSTALL SONAME 'ha_spider';
Query OK, 0 rows affected (0.07 sec)
MariaDB [(none)]> SOURCE /usr/local/mariadb/share/install_spider.sql;

Then, we can check that the SPIDER system tables have been installed:

MariaDB [(none)]> SHOW TABLES FROM mysql LIKE '%spider%';
+----------------------------+
| Tables_in_mysql (%spider%) |
+----------------------------+
| spider_link_failed_log     |
| spider_link_mon_servers    |
| spider_tables              |
| spider_xa                  |
| spider_xa_failed_log       |
| spider_xa_member           |
+----------------------------+
6 rows in set (0.01 sec)

Creating a SPIDER table

The SPIDER storage engine supports a special syntax to specify where the remote tables are located. This syntax is different from the syntaxes used for FEDERATEDX and CONNECT, and it must be used in the COMMENT table option.

The following example shows how to create a simple, unpartitioned SPIDER table, which connects to a remote table. We will use the user table, which we already used for some FEDERATEDX and CONNECT examples:

CREATE TABLE test.user (
  id INTEGER UNSIGNED NOT NULL,
  username VARCHAR(50),
  password VARCHAR(50)
)
  ENGINE = SPIDER,
  COMMENT = 'user "user1", password "pwd" host "remote_server", port "3306", database "db1", table "user"';

If table is not specified, it will be the same as the local table name. If database is not specified, it will be the same database as the one that contains the local table.

We can also provide a defined server name, shown as follows:

CREATE TABLE test.user (
  id INTEGER UNSIGNED NOT NULL,
  username VARCHAR(50),
  password VARCHAR(50)
)
  ENGINE = SPIDER,
  COMMENT = 'srv1';

The SPIDER storage engine can automatically discover the structure of the remote table and create an identical local table. Thus, we can simply write:

CREATE TABLE test.user
  ENGINE = SPIDER,
  COMMENT = 'server "srv1"';

The SPIDER storage engine tables are really useful when they are partitioned. Each partition can point to a different remote table. The following example shows how to create a SPIDER partitioned table:

CREATE TABLE test.user (
  id INTEGER UNSIGNED NOT NULL,
  username VARCHAR(50),
  password VARCHAR(50)
)
  ENGINE = SPIDER
PARTITION BY RANGE (id)
(
  PARTITION p0 VALUES LESS THAN (1000000)
    COMMENT = 'server "srv1"',
  PARTITION p1 VALUES LESS THAN (2000000)
    COMMENT = 'server "srv2"',
  PARTITION p2 VALUES LESS THAN (3000000)
    COMMENT = 'server "srv3"'
);

The following table options can be used to create a SPIDER table, or partition, which connects to a remote server using SSL:

Option name

Description

SSL_CA

This is the name or path of the authority certificate

SSL_CAPATH

This is the path of the authority certificate's directory

SSL_CERT

This is the name or path of the certificate

SSL_CIPHER

This is a comma-separated list of the encryption algorithms that can be used

SSL_KEY

This is the path of the public key

SSL_VERIFY_SERVER_CERT

If set to 1, the remote server's certificate will be verified

For further details about SSL connections in MariaDB, see Chapter 5, Users and Connections.

Logging of queries and errors

Statements sent by the users to a SPIDER table can be logged into the general query log, similar to all the other statements. And, of course, if a command generated by SPIDER returns an error, the remote server can record this error in the error log. This behavior depends on the @@general_log and @@log_error server variables, as explained in Chapter 3, Optimizing Queries.

However, when a SPIDER table queries a remote table, the remote server does not log the command by default. It is possible to log such commands by setting the @@spider_general_log server variable to ON in the remote servers. The SPIDER commands will be written in the general query log.

When a command generated by SPIDER returns an error on the remote server, the error can also be logged into the local servers by setting the @@spider_log_result_errors server variable to ON.

Executing arbitrary statements on remote servers

Some User Defined Functions (UDFs) provide an easy way to execute arbitrary SQL statements against remote servers. Such UDFs are automatically installed with SPIDER.

Unlike MYSQL CONNECT tables with a SRCDEF clause, these functions can return a result set.

Note

Note that these functions can be invoked to remotely execute SQL statements in any situation. In fact, while they are designed to assist the management of a cluster based on SPIDER, they work even if SPIDER is not used at all.

Explaining the spider_direct_sql() function

The spider_direct_sql() function allows us to execute arbitrary SQL statements against a remote MariaDB or MySQL server. The results of the query are copied into a temporary table that needs to be explicitly created before calling this function. Note that the table needs to be temporary.

Consider the following example:

MariaDB [test]> CREATE TEMPORARY TABLE output (
    -> v VARCHAR(255)
    -> ) ENGINE = InnoDB;
Query OK, 0 rows affected (0.21 sec)
MariaDB [test]> SELECT spider_direct_sql(
    -> 'SELECT VERSION() AS v',
    -> 'output',
    -> 'user "user1", password "pwd", host "remote_server", port "3306"'
    -> ) AS v;
+---+
| v |
+---+
| 1 |
+---+
1 row in set (0.06 sec)
MariaDB [test]> SELECT * FROM output;
+---------------------+
| v                   |
+---------------------+
| 10.0.11-MariaDB-log |
+---------------------+
1 row in set (0.01 sec)

Let's examine the spider_direct_sql() call. It has three parameters:

  • The SQL command that we want to execute on a remote server.
  • The name of the temporary table that will store the result set. Note that the table has been previously created. The function does not automatically create it.
  • A string that contains the information necessary to access the remote server. The syntax is the same that is used to create SPIDER tables. It is also possible to specify a defined server name.

Explaining the spider_bg_direct_sql() function

When spider_direct_sql() is called, the current connection stays on hold until the remote query execution is completed and its result set is stored into the specified temporary table. However, sometimes we need to execute a long-running statement, and we do not want the current connection to wait until it is completed. In these cases, we can use the spider_bg_direct_sql() function. As the name suggests, it executes the query in the background.

The syntax of spider_bg_direct_sql() is identical to the syntax of spider_direct_sql().

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.54.168