2 Configuration

You have already seen how to produce a basic configuration for a cluster. However, you will almost certainly find that you want to specify some, if not all, of the optional configuration settings; this chapter explains what they all do.

All these settings go in the config.ini file, which is often placed in the DataDir of the management node (conventionally /var/lib/mysql-cluster). If you change this file, you will have to completely restart the cluster either following a process very similar to that of an online upgrade (as described in Chapter 1, “Installation”) or, by doing a backup, a total cluster shutdown, and a re-import.

Note

Technically, the config.ini file can go anywhere, as long as you are in that directory when you start the management daemon (ndb_mgmd). However, we strongly suggest that you keep it on your DataDir, where all the other files that the management daemon will create (such as log files) will go.

To do a rolling online configuration change, you follow these steps:

1.   Change the configuration file as desired (for example, increase DataMemory).

2.   Stop your management daemon by using <id> STOP, where <id> is the ID of the management node, as displayed in the management client (ndb_mgm).

3.   Start your management daemon.

4.   Carry out the following process for each storage node (one node at a time):

Note

You can do more than one storage node at a time. The actual restriction is that you must have one node per node group remaining up at all times, but we don’t suggest that you do this because if you change only one node at a time, (a) if it goes wrong, you have only one node down and (b) you know exactly which node is causing problems.

Image   Stop the storage node by using <id> STOP, where <id> is the ID of the storage node, as displayed in the management client.

Image   Start the storage node by using ndbd --initial. The --initial option is required only for certain parameters; however, it is always safe to use (it may just make the restart take slightly more time). This is where the upgrade actually takes place; in other words, when the node starts, it is upgraded.

Image   Wait for the storage node to fully start before moving on to the next node. To determine that it is done, look at the results of a SHOW command in the management client.

The parameters that require the --initial option for the ndbd restart are ones that change the on-disk format, such as NoOfFragmentLogFiles, but it is always safer to use it. There is no disadvantage except for a small increase in the time the node takes to restart.

To do a backup and restart, you follow these steps:

1.   Make a backup of the cluster.

2.   Shut down the entire cluster.

3.   Restart all the nodes by using the --initial option.

4.   Reimport your backup.

Luckily, there aren’t very many parameters that require a process this severe. This process is required only when an option change requires the data to be partitioned in a different fashion. The following changes require this:

Image   Changing NoOfReplicas

Image   Adding/removing data nodes

This chapter is split into several sections: one that covers configuration options that apply to all nodes, one that covers management nodes, one that covers storage nodes, and one that covers SQL nodes. Each of these sections covers the options that can be used as defaults for all nodes of that type.

It is worth pointing out that the parameter names in MySQL Cluster are no longer case-sensitive. In older versions of 4.1, all the parameters were case-sensitive. So, for example, you can specify Id=3 or ID=3 or id=3, and they will all work the same.

The Structure of config.ini

The config.ini file has several parts. You have already been introduced to a very simple config.ini file. You can find a list of all the sections and what you can put in them in the Chapter 1.

The config.ini file has sections for settings for management, storage, and SQL nodes. Within each section are defaults that apply to all nodes of that type, and you must have one section per node (even if you have no parameters in it).

These are the sections:


#
# STORAGE NODES
#

[NDBD DEFAULT]
# In here you put the settings that apply to all storage nodes

[NDBD]
# In here you put the settings for one individual storage node
# You must have one [NBDD] section for EACH storage node

#
# MANAGEMENT NODES
#

[NDB_MGMD DEFAULT]
# In here you put the settings that apply to all management nodes

[NDB_MGMD]
# In here you put the settings for one individual management node
# You must have one [NBD_MGM] section for EACH management node

#
# SQL NODES
#

[MYSQLD DEFAULT]
# In here you put the settings that apply to all SQL nodes

[MYSQLD]
# In here you put the settings for one individual SQL node
# You must have one [MYSQLD] section for EACH SQL node


Note that this block of text is in code form. We suggest that you use this as your config.ini template; you can then easily add parameters to the correct block. You can download this file from this book’s website, www.mysql-cluster.com.

The following sections cover the parameters you can set in each section.

All Nodes

Two options can go under the [NDBD], [NDB_MGM], and [MYSQLD] sections: Id and HostName. It is conventional to define HostName for all nodes.

Id

Each node in a cluster has a unique identity, ranging from 1 to 64, inclusive. This ID is used by all internal cluster messages for addressing the node, and it is also used to identify the node in log files and create output files that can be identified with the node. If you do not set this, the management daemon assigns each node the next free ID to each node that connects (and it assigns itself an ID of 1).

If you choose to run multiple management nodes, you must set the IDs. If you are running a large cluster, you may want to set the IDs so you know without question that the same node will always have the same ID, however often it is perfectly satisfactory to let the management daemon just assign the IDs.

HostName

The HostName parameter defines the hostname or IP address of the computer that the storage node is to reside on (for example, HostName=10.0.0.2).

We suggest that you use IP addresses whenever possible. If you use hostnames, you need to make sure that both your forward and reverse DNS lookups are in perfect order, and if you experience problems, the first thing you should try is to switch the hostnames for IP addresses and see if that fixes the problem.

Management Nodes

As with all three types of node, there are parameters that you should set for all nodes (and it makes sense to set these in the DEFAULT section rather than in each node’s section) and some that you must set individually.

Parameters to Define for All Management Nodes (NDB_MGMD_DEFAULT)

All the following options can go under [NDB_MGM] headers in addition to the DEFAULT section. A typical cluster has only one management daemon, but it is possible to define two or more. The procedure for doing so is covered later in this section. Either way, it makes sense to define settings that apply to all management nodes if you have more than one in the DEFAULT section.

PortNumber

The PortNumber parameter specifies the port number on which the management server listens for configuration requests and management commands. The default, which we suggest you not change, is 1186. You can specify different ports for different management nodes, but we recommend that you do not do so. If you wanted to run multiple management nodes on one server (for example, if you had two separate clusters that shared a management server), you could create two config.ini files (in different folders) with different port numbers and data directories and start both daemons, which would not then conflict.

LogDestination

The LogDestination parameter specifies where the management daemon will send the cluster logs it creates. There are three options in this regard: CONSOLE, SYSLOG, and FILE.

CONSOLE outputs the log to stdout (that is, it displays it on the screen of the server, which is great for debugging when you have the screen in front of you but not so great if you have your servers “headless” in a rack elsewhere). This is easy to set:


LogDestination=CONSOLE


SYSLOG sends the log to a syslog facility; you must provide a facility parameter for this, with possible values being auth, authpriv, cron, daemon, ftp, kern, lpr, mail, news, syslog, user, uucp, local0, local1, local2, local3, local4, local5, local6 and local7 (not every facility is necessarily supported by every operating system). Here’s an example:


LogDestination=SYSLOG:facility=syslog


FILE pipes the cluster log output to a regular file on the same machine. You can also set the following three parameters:

Image   filenameThe name of the log file. The default is ndb_<NODEID>_cluster.log (where <NODEID> is the node ID of the management node, typically 1).

Image   maxsizeThe maximum size to which the file can grow before logging rolls over to a new file. When this occurs, the old log file is renamed by appending .x to the filename, where x is the next number not yet used with this name. The default is 1000000.

Image   maxfilesThe maximum number of log files. The default is 6.

The following is an example of a configuration line:


LogDestination=FILE:filename=my-cluster.log,maxsize=500000,maxfiles=4


It is also possible to specify multiple log destinations by using a semicolon-delimited string:


LogDestination=CONSOLE;SYSLOG:facility=syslog;FILE:filename=/var/log/cluster-log


Again, you can specify different logging levels for different management nodes, but we suggest that you do not do so.

DataDir

The DataDir parameter sets the directory where output files from the management server will be placed. These files include process output files and the daemon’s process ID file, and they may include cluster log files, depending on the settings of LogDestination.

ArbitrationRank

The ArbitrationRank parameter is used to define which nodes can act as arbitrators. Only management nodes and SQL nodes can be arbitrators. Therefore, this configuration option works identically if it is in an [NDB_MGM] section or a [MYSQLD] section of config.ini (the only difference is the different default values of the two different types of nodes).

ArbitrationRank can take one of the following values:

Image   0The node will never be used as an arbitrator.

Image   1The node has high priority; that is, it will be preferred as an arbitrator over low-priority nodes.

Image   2The node has a low priority; that is, it will not be used as an arbitrator unless there are no nodes with higher priority available.

Normally, you should configure all management servers as high-priority arbitrators by setting their ArbitrationRank parameter to 1 (the default value) and setting this parameter for all SQL nodes to 2 (also the default).

Parameters to Define for Each Management Node (NDB_MGMD)

You should specify a Hostname parameter and, if you are setting Id values for other nodes in the cluster for the sake of neatness or if you are using multiple management nodes, you should set Id as well. Other than that, there is no need to set any other parameters, as long as the compulsory ones are fixed in the DEFAULT section.

An Example of a Management Section of a config.ini File

The following is an example of a configuration with one management node:


[NDB_MGMD DEFAULT]
# Listen on default port, 1186
PortNumber=1186
# Log to console, syslog and also to a file
# cluster-log in /var/log/.
LogDestination=CONSOLE;SYSLOG:facility=syslog;FILE:filename=/var/log/cluster-log
# Store files in /var/lib/mysql-cluster
DataDir=/var/lib/mysql-cluster
#All management nodes should be high priority for arbitration.

ArbitrationRank=1

[NDB_MGMD]
# ID of 1 (standard for management node)
Id = 1
# Enter IP instead of xx.xx.xx.xx
Hostname = xx.xx.xx.xx


You can find an example using two management nodes at the end of this chapter, in the section “Using Multiple Management Nodes.” Because there is only one management node in the preceding example, you have probably noticed that it would be functionally identical to ignore the DEFAULT section and just use the following:


[NDB_MGMD]
Id = 1
Hostname = xx.xx.xx.xx
PortNumber=1186
LogDestination=CONSOLE;SYSLOG:facility=syslog;FILE:filename=/var/log/cluster-log
DataDir=/var/lib/mysql-cluster
ArbitrationRank=1


However, it is good practice to use the DEFAULT section, if possible.

Storage Nodes

All the following options go under the [NDBD_DEFAULT] or [NDBD] headings in config.ini.

For each parameter that defines a memory size, it is possible to use k, M, or G as a suffix to indicate units of 1024, 10242, or 10243.

Parameters to Define for All Storage Nodes (NDBD_DEFAULT)

There are some parameters that you must define for all nodes. For example, setting NoOfReplicas to different values for different nodes will not work. Also, it does not make sense to set different values for DataMemory and IndexMemory because if you do, you will find yourself limited to the lowest value (storage nodes must have roughly the same amount of data). It therefore makes sense to put these parameters in the DEFAULT section.

NoOfReplicas

The global parameter NoOfReplicas, which can be set only in the [NDBD DEFAULT] section, defines the number of replicas (copies) of each fragment of data stored in the cluster. This parameter also specifies the size of node groups. (Remember that a node group is a set of nodes that all store the same information.)

Node groups are formed implicitly. The first node group is formed by the set of data nodes with the lowest node IDs, the next node group by the set of the next-lowest node identities, and so on.

There is no default value for NoOfReplicas (it must be specified); the maximum possible value is 4. You can set it to 1, but of course that gives you no redundancy, so we recommend that you set it to 2 or more.

You should bear in mind that your total number of storage nodes should be a multiple of NoOfReplicas and a power of 2, if possible (clearly, this is not possible if NoOfReplicas is set to 3), so if you have NoOfReplicas set to 2, you should have 2, 4, 8, or 16 (and so on) nodes. This is due to the way that MySQL Cluster currently partitions the data, and this is expected to become less important in version 5.1 due to the fact that you will be able to control the partitioning mechanism in that version.

Changing the setting for NoOfReplicas for an active cluster requires that you take down the entire cluster and re-import the data, so generally you should avoid changing this parameter after you have your cluster set up.

DataDir

The DataDir parameter specifies the directory where local checkpoints, trace files, log files, process ID files, and error logs are placed. This can be specified for each individual storage node, but we recommend that you define it in the DEFAULT section to avoid confusion, and we recommend that you set it to /var/lib/mysql-cluster, which is the conventional location. There is no default value; you must specify this either in the [NDBD_DEFAULT] section or in each separate [NDBD] section.

DataMemory and IndexMemory

DataMemory and IndexMemory are parameters that specify the size of memory segments used to store the actual records and their indexes. Although you do not have to set these, the defaults are very low, and you are unlikely to be able to import anything larger than a token table with them.

When setting these parameters, it may be helpful to look up the syntax of the ALL DUMP 1000 command in Appendix B, “Management Commands,” and also the detailed example of its usage in Chapter 2, “Configuration.”

We strongly recommended that you set DataMemory and IndexMemory to the same values for all nodes, which is why we also suggest that you define these parameters in the [NDBD_DEFAULT] section. Because data is distributed evenly over all nodes in the cluster, your storage nodes will be able to use only as much as the smallest setting, so there is really no point in setting different values for different nodes.

DataMemory and IndexMemory can be changed, but decreasing either of them can be risky; doing so can easily lead to a node or even an entire cluster being unable to restart due to insufficient memory space. Increasing these values should be fine, but it is important that such upgrades be performed in the same manner as software upgrades, following the procedure outlined at the beginning of this chapter.

DataMemory defines the amount of RAM that is available for the storage node to keep its data in. Data in this context means the actual data in the table and all indexes apart from the primary (hashed) key.

If you set this parameter too low, you get “table is full” errors when you try to add more data. If you set it too high, the system may start to swap (that is, use the hard drive as RAM) excessively and become unstable. We suggest that when you are importing your initial data, you set this to 70% of your system RAM size. After you have your tables imported, you can use the ALL DUMP 1000 command (see Appendix B and Chapter 1 for more details) to see how much RAM is actually being used, and you can lower this value to be that plus a sensible amount for growth and temporary usage.

You set the DataMemory parameter as a number plus the unit (for example, DataMemory=1024MB). We suggest that you never set this to 100% of your system RAM; if you need that much DataMemory, it is time to add more RAM to your system.

Something that has not been previously discussed that you should be aware of is that DataMemory is also used for UNDO information: For each update, a copy of the unaltered record is allocated in DataMemory. It is therefore necessary to allocate enough memory to handle the largest transactions performed by applications using the cluster. You should avoid large transactions for this and the following reasons:

Image   Large transactions are not any faster than smaller ones.

Image   Large transactions increase the number of operations that are lost and must be repeated in the event of transaction failure.

Image   Large transactions use more memory.

The default value for DataMemory is 80MB, and the minimum value is 1MB. There is no maximum size, but the node should not start swapping when the limit is reached.

IndexMemory defines the amount of RAM that is available for the storage node to keep its hash indexes (that is, primary keys) in. Similar problems occur with IndexMemory as with DataMemory if you set the value too high or low. We suggest that you set IndexMemory to 15% of your system RAM size to start with and adjust it as for DataMemory after your data is imported. It is set identically to DataMemory, with a number followed by units (for example, IndexMemory=128MB).

The default value for IndexMemory is 18MB, and the minimum value is 1MB.

FileSystemPath

The FileSystemPath parameter specifies the directory where all files created for metadata, REDO logs, UNDO logs, and data files are placed on the storage node. The default is the directory specified by DataDir. Note that this directory must exist before the ndbd process is initiated.

The storage daemon creates a subdirectory in the directory specified in FileSystemPath for the node’s file system. This subdirectory contains the node ID. For example, if the node ID is 2, then this subdirectory is named ndb_2_fs. If you leave this at the default and set DataDir to /var/lib/mysql-cluster, you will get your FileSystemPath set to /var/lib/mysql-cluster/ndb_X_fs/, where X is the node ID of the storage daemon.

You can set FileSystemPath to a different value for each storage node, but we suggest that you not do this. Similarly, we cannot think of many instances in which it would be particularly beneficial to keep the FileSystemPath outside the default location (if you have multiple disks it may be a good idea to put this onto a different physical disk to spread out the write operations).

BackupDataDir

It is possible to specify the directory in which backups are placed. By default, this directory is FileSystemPath/BACKUP, which in turn defaults to DataDir/BACKUP unless you have defined FileSystemPath. There should be no reason to change this unless you have multiple hard drives and want to insure against drive failure, in which case you could set BackupDataDir to a folder on a separate hard drive.

Transaction Parameters

The next three parameters are important because they affect the number of parallel transactions and the sizes of transactions that can be handled by the system.

MaxNoOfConcurrentTransactions sets the number of parallel transactions possible in a node and MaxNoOfConcurrentOperations (and its partner parameter MaxNoOfLocalOperations) sets the number of records that can be in an update phase or locked simultaneously.

These parameters (especially MaxNoOfConcurrentOperations) are likely targets for users setting specific values and not using the default value. The default value is set for small clusters using small transactions, in order to ensure that they do not use excessive memory. If you make extensive use of transactions in an application, it is very likely that you will have some large transactions and will therefore have to increase these parameters.

MaxNoOfConcurrentTransactions

Transaction records are allocated to individual MySQL servers, and generally there is at least one transaction record allocated per connection that is using any table in the cluster. For this reason, you should ensure that there are more transaction records in the cluster than there are concurrent connections to all MySQL servers in the cluster (and a lot more if you have individual transactions using multiple tables).

The default setting for the MaxNoOfConcurrentTransactions parameter is 4096, which should be sufficient for all but the most heavily laden clusters. This parameter must be set to the same value for all cluster nodes.

Changing this parameter is never safe and can cause a cluster to crash. When a node crashes, one of the nodes (the oldest surviving node) builds up the transaction state of all transactions ongoing in the crashed node at the time of the crash. It is thus important that this node have as many transaction records as the failed nodes; in other words, it is a good idea to have this number set significantly higher than you anticipate you will ever use.

MaxNoOfConcurrentOperations and MaxNoOfLocalOperations

When performing transactions of only a few operations each and not involving a great many records, there is no need to set this parameter very high. When performing large transactions involving many records, you should set this parameter higher.

Records are kept for each transaction updating cluster data, both in the transaction coordinator (a randomly chosen storage node) and in the nodes where the actual updates are performed. These records contain state information needed in order to find UNDO records for rollbacks, lock queues, and other purposes.

The MaxNoOfConcurrentOperations parameter should therefore be set to the number of records to be updated simultaneously in transactions, divided by the number of cluster data nodes. For example, in a cluster that has 4 data nodes and that is expected to handle 1,000,000 concurrent updates using transactions, you should set this value to 1,000,000 / 4 = 250,000. Of course, you want to build in a safety margin, so you should set the value higher than this.

Read queries that set locks also cause operation records to be created. Some extra operations should be allocated to accommodate cases where the distribution is not perfect over the nodes.

When queries make use of the unique hash index (which most complex queries can be assumed to do at some stage), there are actually two operation records used per record in the transaction: The first record represents the read in the index table (which is a completely separate and hidden table), and the second handles the operation on the base table.

The default value for MaxNoOfConcurrentOperations is 32768. If it is exceeded, you get the error message “Out of operation records in transaction coordinator.”

MaxNoOfConcurrentOperations normally handles a second parameter that can be configured separately. The second parameter, MaxNoOfLocalOperations, specifies how many operation records are to be local to the node and is calculated as 1.1 × MaxNoOfConcurrentOperations by default (in other words, each node can handle up to 10% more than the average for all nodes).

A very large transaction performed on an eight-node cluster requires as many operation records as there are reads, updates, and deletions involved in the transaction. Therefore, the chances are that you will want to set MaxNoOfLocalOperations higher than the default (that is, 10% greater than MaxNoOfConcurrentOperations) if you are dealing with large transactions. If it is necessary to configure the system for one very large transaction, it is a good idea to configure the two parts separately with MacNoOfLocalOperations set much higher than MaxNoOfConcurrentOperations because this parameter cannot be “spread out” over many nodes.

Transaction Temporary Storage

The next set of parameters is used to define the temporary storage available to the storage nodes when executing a query that is part of a cluster transaction. All records are released when the query is completed and the cluster is waiting for the commit or rollback.

The default values for these parameters are adequate in most situations. However, users who need to support transactions involving large numbers of rows or operations may need to increase these values to enable better parallelism over the cluster. On the other hand, users whose applications require relatively small transactions can decrease the values in order to save memory.

MaxNoOfConcurrentIndexOperations

For queries that use a unique hash index, a temporary set of operation records are used during a query’s execution phase. The MaxNoOfConcurrentIndexOperations parameter sets the size of that pool of records available for such use. Thus this record is allocated only while you’re executing a part of a query, and as soon as this part has been executed, the record is released. (The states needed to handle the other phases in a transaction’s life, aborts and commits, are handled by the normal operation records where the pool size is set by the parameter MaxNoOfConcurrentOperations.)

The default value of MaxNoOfConcurrentIndexOperations is 8192. It should be pointed out that only in rare cases of extremely high parallelism using unique hash indexes should it be necessary to increase this value. Using a smaller value is possible and can save memory if you are certain that a high degree of parallelism is not required for your cluster.

MaxNoOfFiredTriggers

A record is created when an operation is performed that affects a unique hash index. Inserting or deleting a record in a table with unique hash indexes or updating a column that is part of a unique hash index “fires” an insert or a deletion into the index table. The resulting record is used to represent this index table operation while waiting for the original operation that fired it to complete. This operation is short lived but can still require a large number of records in its pool for situations with many parallel write operations on a base table containing a set of unique hash indexes.

The default value of MaxNoOfFiredTriggers is 4000, which is sufficient in most situations. In some cases, you can even decrease this value if you feel that the need for parallelism in the cluster is not high.

TransactionBufferMemory

The memory affected by the TransactionBufferMemory parameter is used for tracking operations fired when updating index tables and reading unique indexes. This memory is used to store the key and column information for these operations. It is only very rarely that the value for this parameter needs to be altered from the default.

It is possible to change a similar buffer used for read and write operations by changing a compile-time buffer. See “TransactionBufferMemory” at the online documentation pages (http://dev.mysql.com/doc/refman/5.0/en/mysql-cluster-db-definition.html) for more details on this parameter, which you are unlikely to need to change.

The default value for TransactionBufferMemory is 1MB.

Scans and Buffering

Table and range scans are used during transactions, and there are certain maximum values you need to set. Several buffers are also used. With all these parameters, the defaults will probably work for most clusters, but you may need to increase them. If you are able to reduce them, you will typically save RAM.

MaxNoOfConcurrentScans

The MaxNoOfConcurrentScans parameter is used to control the number of parallel table or range scans that can be performed in the cluster. Each transaction coordinator (a randomly picked storage node) can handle the number of parallel scans defined for this parameter.

Each scan query is performed by scanning all partitions in parallel (for performance reasons). Each partition scan uses a “scan record” in the node where the partition is located, the number of “scan records” being the value of this parameter multiplied by the number of nodes.

Scans are typically performed in one of two cases. The first of these cases occurs when no hash or ordered index exists to handle the query, in which case the query is executed by performing a full table scan (which can be very slow). The second case is encountered when there is no hash index to support the query but there is an ordered index. Using the ordered index means executing a parallel range scan. Because the order is kept on the local partitions only, it is necessary to perform the index scan on all partitions, which may involve communicating with other storage nodes that hold other partitions (or fragments) of the data.

The default value of MaxNoOfConcurrentScans is 256. The maximum value is 500.

MaxNoOfLocalScans

The MaxNoOfLocalScans parameter specifies the number of local scan records that can take place at any one time on each storage node. You might want to increase this value if many of your table scans are not fully parallelized (that is, able to be executed on many nodes at the same time).

The default value for this parameter is the value for MaxNoOfConcurrentScans multiplied by the total number of storage nodes in the cluster.

BatchSizePerLocalScan

The BatchSizePerLocalScan parameter is used to calculate the number of lock records that needs to exist to handle the concurrent scan operations. If you increase MaxScanBatchSize (defined for SQL nodes later in this chapter, in the section “Parameters to Define for All SQL Nodes (MYSQLD_DEFAULT)”), you should also increase this parameter. The default is 64.

LongMessageBuffer

The LongMessageBuffer parameter defines the size of the internal buffer that is used for passing messages within individual nodes and between nodes. Although it is highly unlikely that this would need to be changed, it is configurable. By default this parameter is set to 1MB.

Logging and Checkpointing

Storage nodes do a large amount of logging to various local log files as part of their operation. This logging is mostly not logging in the traditional sense of error logs and so on but involves logging of recent transactions. These logs are used only in the event of a full cluster shutdown, but they are important because without them, if you suffer a full cluster shutdown, you will not be able to restart your cluster.

NoOfFragmentLogFiles

The NoOfFragmentLogFiles parameter sets the number of REDO log files that the storage node will keep. REDO log files are organized in a ring: The cluster creates a new file when the last one gets 16MB and will continue creating until it has created NoOfFragmentLogFiles groups of 4 files, at which point it goes back to the beginning and starts overwriting the older files.

It is extremely important that the first and last log files (sometimes referred to as the head and tail log files, respectively) do not meet; when they approach one another too closely, the node begins aborting all transactions encompassing updates due to not having enough room for new log records.

A REDO log record is not removed until three local checkpoints have been completed since that log record was created. Checkpointing frequency is determined by its own set of configuration parameters, discussed later in this chapter, with the TimeBetweenLocalCheckpoints and TimeBetweenGlobalCheckpoints parameters. Clearly, if you increase the time between checkpoints such that the head and tail of the REDO logs come close to each other, you should increase the value of NoOfFragmentLogFiles.

The default NoOfFragmentLogFiles parameter value is 8, which means eight sets of four 16MB files, for a total of 512MB. (Note that REDO log space must be allocated in blocks of 64MB, or four files of 16MB.) In scenarios that require a great many updates, the value for NoOfFragmentLogFiles may need to be set as high as 300 or even higher in order to provide sufficient space for REDO logs. If you have this sort of requirement, you are probably going to need a very fast disk drive (for example, RAID or SCSI), so you must make sure that the cluster is not held up writing REDO logs to disk.

If the checkpointing is slow and there are so many writes to the database that the log files are full and the log tail cannot be cut without jeopardizing recovery, all updating transactions are aborted with internal error code 410, or “Out of log file space temporarily.” This condition prevails until a checkpoint has completed and the log tail can be moved forward.

MaxNoOfSavedMessages

The MaxNoOfSavedMessages parameter sets the maximum number of trace files that will be kept before old ones are overwritten. Trace files are generated when, for whatever reason, the node crashes and are very useful for debugging. You need them if you are trying to report a bug to a developer or if someone else on the mailing list asks you for them, as they can help determine what the cluster was doing at the exact moment it crashed.

The default is 25 trace files. We suggest that you do not change this.

Metadata Objects

Metadata objects are pretty much everything within your cluster, including database tables, system tables, indexes, and so on. You will have to increase some of these parameters if you are attempting to import anything larger than a token database because the defaults are often very low. This is especially true for MaxNoOfAttributes.

MaxNoOfAttributes

The MaxNoOfAttributes parameter sets the number of attributes (including fields, indexes, and primary keys) that can be defined in the cluster.

The default value for this parameter is 1000, with the minimum possible value being 32. There is no maximum. Each attribute consumes around 200 bytes of storage per node (even if it is not used) due to the fact that all metadata is fully replicated on the servers.

This is quite a common limit to hit, and you are likely to be greeted with error 708: “No more attribute metadata records (increase MaxNoOfAttributes)” if you try importing a big table without first increasing this parameter.

MaxNoOfTables

The MaxNoOfTables parameter might appear fairly self-explanatory, but remember that a table is allocated for each table, unique hash index, and ordered index, even though in the case of indexes, it is hidden from view. This parameter sets the maximum number of table objects for the cluster as a whole.

For each attribute that has a BLOB data type, an extra table is also used to store most of the BLOB data. These tables must also be taken into account when defining the total number of tables.

The default value of this parameter is 128, the minimum is 8, and the maximum is 1600. Each table object consumes approximately 20KB per node, regardless of whether it is used.

MaxNoOfOrderedIndexes

For each ordered index in the cluster, an object is allocated to describe what is being indexed and in which storage segments it resides. Each unique index and primary key has both an ordered index and a hash index.

The default value of the MaxNoOfOrderedIndexes parameter is 128. Each object consumes approximately 10KB of data per node, regardless of whether it is used.

MaxNoOfUniqueHashIndexes

For each unique index that is not a primary key, a special table is allocated that maps the unique key to the primary key of the indexed table. By default, an ordered index is also defined for each unique index. To prevent this, you must use the USING HASH option when defining the unique index. (See Chapter 1 for more detail on the USING HASH option.)

The default value of the MaxNoOfUniqueHashIndexes parameter is 64. Each index consumes approximately 15KB per node.

MaxNoOfTriggers

Internal update, insert, and delete triggers are allocated for each unique hash index. (This means that three triggers are created for each unique hash index.) However, an ordered index requires only a single trigger object. Backups also use three trigger objects for each normal table in the cluster.

The MaxNoOfTriggers parameter sets the maximum number of trigger objects in the cluster. The default value for this parameter is 768.

Boolean Parameters

The behavior of data nodes is affected by a set of parameters that take on Boolean values. You can specify these parameters as TRUE by setting them equal to 1 or Y and as FALSE by setting them equal to 0 or N.

LockPagesInMainMemory

For a number of operating systems, including Solaris and Linux, it is possible to lock a process into memory and so avoid any swapping to disk. You can use the LockPagesInMainMemory parameter to help guarantee the cluster’s real-time characteristics.

By default, this feature is disabled. Of course, if you do this and then don’t have enough RAM, you will have a problem (typically a node crash). You will also have no protection from a complete cluster shutdown; you will lose all your data and will have to restore from your last full backup. We don’t recommend that you use this feature.

StopOnError

The StopOnError parameter specifies whether an ndbd process is to exit or to perform an automatic restart when an error condition is encountered. This feature is enabled by default.

You will notice that when you start ndbd, you get two processes. The “Angel” process restarts the “child” process when it meets an error condition if StopOnError is enabled. Otherwise, both processes exit (the default behavior).

Quite commonly, you will want to turn off this parameter, which makes the ndbd process attempt to restart automatically if they encounter an error. This is particularly useful in the event of a short-term network failure that may be fixed later on.

The drawback of setting this parameter is that, in theory, the data node may keep failing and restarting over and over, which could lead to overall cluster performance issues.

Diskless

It is possible to specify cluster tables as diskless, meaning that tables are not checkpointed to disk and that no logging occurs. Such tables exist only in main memory. A consequence of using diskless tables is that neither the tables nor the records in those tables will be preserved after a crash. However, when operating in diskless mode, it is possible to run ndbd on a diskless computer.

The ability to employ diskless mode on a per-table basis is planned for the 5.1 release of MySQL Cluster but is not currently supported.

Currently, when the Diskless parameter is enabled, backups are performed, but backup data is not actually stored (which makes the backups completely pointless).

Diskless is disabled by default, and we recommend that you not use it unless you are using diskless machines. Using it is very risky, and if your entire cluster shuts down, you will loose all your data (except for any traditional SQL backups you may have).

Controlling Timeouts, Intervals, and Disk Paging

A number of parameters specify timeouts and intervals between various actions in cluster data nodes. Most of the timeout values are specified in milliseconds.

TimeBetweenWatchDogCheck

To prevent the main ndbd thread from getting stuck in an endless loop, a “watchdog” thread checks the main thread. The TimeBetweenWatchDogCheck parameter specifies the number of milliseconds between checks. If the process remains in the same state after three checks, it is terminated by the watchdog thread.

The TimeBetweenWatchDogCheck parameter can easily be changed for purposes of experimentation or to adapt to local conditions. It can be specified on a per-node basis, although there seems to be little reason for doing so.

The default timeout is 4000 milliseconds (that is, 4 seconds).

StartPartialTimeout

The StartPartialTimeout parameter specifies the time that the cluster will wait for all storage nodes to come up before the cluster initialization routine is invoked. This timeout is used to avoid a partial cluster startup whenever possible.

The default value is 30000 milliseconds (that is, 30 seconds). 0 means eternal time out; in other words, the cluster may start only if all nodes are available.

StartPartitionedTimeout

If the cluster is ready to start after waiting for StartPartialTimeout milliseconds but is still in a possibly partitioned state (in other words, there is at least one node in each node group not yet connected), the cluster will wait this further period of time before starting. If it is not in a potentially partitioned state—for example, if all the nodes in one node group have connected—it will start immediately.

The default timeout is 60000 milliseconds (that is, 60 seconds).

StartFailureTimeout

If a storage node has not completed its startup sequence within the time specified by the StartFailureTimeout parameter, the node startup fails. If you set this parameter to 0, no data node timeout is applied, and nodes will continue to attempt to start eternally.

The default value is 60000 milliseconds (that is, 60 seconds). For data nodes that contain extremely large amounts of data, this parameter should be increased. For example, in the case of a storage node containing several gigabytes of data, a period as long as 10 to 15 minutes (that is, 600,000 to 1,000,000 milliseconds) might be required in order to perform a node start.

HeartbeatIntervalDbDb

One of the primary methods of discovering failed storage nodes is heartbeats. The HeartbeatIntervalDbDb parameter states how often heartbeat signals are sent and how often to expect to receive them. After missing three heartbeat intervals in a row, the node is declared dead. Thus the maximum time for discovering a failure through the heartbeat mechanism is four times the heartbeat interval.

The default heartbeat interval is 1500 milliseconds (that is, 1.5 seconds). This parameter must not be changed drastically and should not vary widely between nodes. If one node uses 5,000 milliseconds and the node watching it uses 1,000 milliseconds, obviously, the node will be declared dead very quickly. This parameter can be changed during an online software upgrade but only in small increments. (If half the nodes in the cluster are expecting to receive heartbeats every 1.5 seconds but you have changed the other half to send them every 15 seconds, you will get nodes thinking that other nodes are dead.)

HeartbeatIntervalDbApi

Each storage node sends heartbeat signals to each MySQL server (SQL node) to ensure that it remains in contact. If a MySQL server fails to send a heartbeat in time, it is declared dead, in which case all ongoing transactions are completed and all resources are released. The SQL node cannot reconnect until all activities started by the previous MySQL instance have been completed. The three-heartbeat criterion for the HeartbeatIntervalDbApi parameter is the same as that described for HeartbeatIntervalDbDb.

The default interval is again 1500 milliseconds (that is, 1.5 seconds). This interval can vary between individual storage nodes because each storage node watches the MySQL servers connected to it independently of all other storage nodes. However, again, there seems limited point in varying this, so we suggest that you define it in the DEFAULT section.

TimeBetweenLocalCheckpoints

The sizes of all write operations executed since the start of the previous local checkpoints are added together to decide whether nodes should start local checkpoints. The TimeBetweenLocalCheckpoints parameter specifies the minimum time between local checkpoints and can be used to prevent storage nodes in high-update/insert clusters starting the next local checkpoint immediately after the previous one has finished (independent of the cluster’s workload). You can also use it to ensure that local checkpoints will follow each other with no gap (also independent of the cluster’s workload) by setting it to 6 or less.

It should be noted that you are not actually setting a normal time in milliseconds, but you are specifying the base 2 logarithm of the number of 4-byte words of write operations that have built up since the last local checkpoint started, so that the default value 20 means 4MB (4 × (220)) of write operations, 21 would mean 8MB, and so on, up to a maximum value of 31, which equates to 8GB of write operations.

Note

If you are wondering why it has been decided to set the TimeBetweenLocalCheckpoints parameter as it is set, you are not alone! You should get used to it, however, because defining a value in milliseconds would break your configuration because it would be far too large (22000, for example, if you were trying to mean 2 seconds).

TimeBetweenGlobalCheckpoints

When a transaction is committed, it is committed into the main memory (RAM) of all nodes on which the data is mirrored. However, transaction log records are not flushed to disk (hard drive) as part of the commit. The reasoning behind this behavior is that having the transaction safely committed on at least two autonomous host machines (in RAM) should meet reasonable standards for durability and is essential for good write/update performance. It means that during a complete cluster shutdown, you can technically loose a small number of fully committed transactions (the number depends on the value of this parameter).

It is also important to ensure that even the worst of cases—a complete crash of the cluster—is handled sensibly. To guarantee that this happens, all transactions taking place within a given interval are put into a global checkpoint, which can be thought of as a set of committed transactions that have been flushed to disk.

In other words, as part of the commit process, a transaction is placed in a global checkpoint group. Later on, this group’s log records are flushed to disk, and then the entire group of transactions is safely committed to disk on all computers in the cluster as well.

The TimeBetweenGlobalCheckpoints parameter defines the interval between global checkpoints. The default is 2000 milliseconds (that is, 2 seconds). This means that in a worst-case scenario, you could lose up to 2 seconds of committed transactions from your cluster in the event of a total cluster shutdown the instant after the previous global checkpoint starts. If this is unacceptable to you, you should decrease the value of this parameter, but you need to realize that this will impose an extra load on your storage nodes.

TimeBetweenInactiveTransactionAbortCheck

Timeout handling is performed by checking a timer on each transaction once during every interval specified by the TimeBetweenInactiveTransactionAbortCheck parameter. Thus, if this parameter is set to 1000 milliseconds, every transaction will be checked for timing out once per second.

The default value for this parameter is 1000 milliseconds (that is, 1 second).

TransactionInactiveTimeout

If a transaction is currently not performing any queries but is waiting for further user input, the TransactionInactiveTimeout parameter states the maximum time that the user can wait before the transaction is aborted.

The default for this parameter is 0 (no timeout). For a real-time database that needs to ensure that no transaction keeps locks for too long a time, this parameter should be set to a small value, but not 0. The unit is milliseconds.

TransactionDeadlockDetectionTimeout

When a node executes a query involving an update or a write (in other words, creating a transaction), the node waits for the other nodes in the cluster to respond before continuing. A failure to respond can occur for any of the following reasons:

Image   The node is “dead.”

Image   The operation has entered a lock queue.

Image   The node requested to perform the action is heavily overloaded.

The TransactionDeadlockDetectionTimeout parameter states how long the transaction coordinator will wait for query execution by another node before aborting the transaction, and this is important for both node failure handling and deadlock detection. Setting it too high can cause undesirable behavior in situations involving deadlocks and node failure.

The default timeout value is 1200 milliseconds (that is, 1.2 seconds).

NoOfDiskPagesToDiskAfterRestartTUP

When executing a local checkpoint, the storage nodes flush all data pages to disk. Merely doing this as quickly as possible without any moderation is likely to impose excessive loads on processors, networks, and disks. To control the write speed, the NoOfDiskPagesToDiskAfterRestartTUP parameter specifies how many pages per 100 milliseconds are to be written. In this context, a page is defined as 8KB; thus, this parameter is specified in units of 80KBps. Therefore, setting NoOfDiskPagesToDiskAfterRestartTUP to a value of 20 means writing 1.6MB of data pages to disk each second during a local checkpoint. Put simply, you should increase it if you want your local checkpoints to take less time but use more resources.

This value controls the writing of all data stored in DataMemory, so it includes the UNDO log records for data pages.

The default value is 40 (that is, 3.2MB of data pages per second).

NoOfDiskPagesToDiskAfterRestartACC

The NoOfDiskPagesToDiskAfterRestartACC parameter uses the same units as NoOfDiskPagesToDiskAfterRestartTUP (units of 80KB) and acts in a similar fashion, but it limits the speed of writing index pages from index memory.

The default value for this parameter is 20 index memory pages per second (that is, 1.6MBps).

NoOfDiskPagesToDiskDuringRestartTUP

The NoOfDiskPagesToDiskDuringRestartTUP parameter is similar to NoOfDiskPagesToDiskAfterRestartTUP and NoOfDiskPagesToDiskAfterRestartACC, but it is converted with local checkpoints executed on the node when a node is restarting. As part of all node restarts, a local checkpoint is always performed. During a node restart, it is possible to write to disk at a higher speed than at other times because fewer activities are being performed in the node.

The default value is still 40 (that is, 3.2MBps), but we recommend that you increase this, depending on the speed of your nodes.

NoOfDiskPagesToDiskDuringRestartACC

The NoOfDiskPagesToDiskDuringRestartACC parameter controls the number of index memory pages that can be written to disk during the local checkpoint phase of a node restart.

As with NoOfDiskPagesToDiskAfterRestartTUP and NoOfDiskPagesToDiskAfterRestartACC, values for this parameter are expressed in terms of 8KB pages written per 100 milliseconds (that is, 80KBps).

The default value is 20 (that is, 1.6MBps).

ArbitrationTimeout

The ArbitrationTimeout parameter specifies the time that the data node will wait for a response from the arbitrator to an arbitration message. If this time is exceeded, it is assumed that the network has split (and it will then carry out the relevant logic discussed in Chapter 1).

The default value is 1000 milliseconds (that is, 1 second).

Buffering

Buffers are used as front ends to the file system when writing log records to disk, to improve performance (the storage nodes do not have to wait for the disk to catch up). If the node is running in diskless mode, these parameters can be set to their minimum values without penalty due to the fact that disk writes are “faked” (that is, they do not actually take place) by the NDB storage engine’s file system abstraction layer.

UndoIndexBuffer

The UNDO buffer is used during local checkpoints. The NDB storage engine uses a recovery scheme based on checkpoint consistency in conjunction with an operational REDO log. Put simply, it regularly takes a snapshot of the data in the cluster (local checkpoints), which it does by applying the UNDO log to remove half-complete transactions 1 and then flushing the resulting consistent copy of the database to disk. It also regularly stores the REDO log (which contains the more recent transactions) to disk as part of the global checkpoint process. In the event of a failure, it applies the last known full commit to disk (local checkpoint) and then applies the REDO log (global checkpoint).

The UNDO index buffer is used to store updates on the primary key hash index. Inserts and deletions rearrange the hash index; the NDB storage engine writes UNDO log records that map all physical changes to an index page so that they can be undone at system restart. It also logs all active insert operations for each fragment at the start of a local checkpoint.

Reads and updates set lock bits and update a header in the hash index entry. These changes are handled by the page-writing algorithm to ensure that these operations need no UNDO logging.

The UndoIndexBuffer parameter is 2MB by default. The minimum value is 1MB, and for most applications, the minimum is sufficient. If your applications use extremely large and/or numerous insertions and deletions together with large transactions and large primary keys, it may be necessary to increase the size of this buffer. If this buffer is too small, the NDB storage engine issues internal error code 677: “Index UNDO buffers overloaded.”

UndoDataBuffer

The UNDO data buffer plays the same role as the UNDO index buffer, except that it is used with regard to data memory rather than index memory. This buffer is used during the local checkpoint phase of a fragment for insertions, deletions, and updates.

Because UNDO log entries tend to grow larger as more operations are logged, the UndoDataBuffer parameter is typically set larger than its index memory counterpart, with a default value of 16MB.

For some applications, this amount of memory may be unnecessarily large. In such cases, it is possible to decrease this size down to a minimum of 1MB.

It is rarely necessary to increase the size of this buffer. If there is such a need, it is a good idea to check whether the disks can actually handle the load caused by database update activity. A lack of sufficient disk space or I/O capacity cannot be overcome by increasing the size of this buffer.

If this buffer is too small and gets congested, the NDB storage engine issues internal error code 891: “Data UNDO buffers overloaded.”

RedoBuffer

All update activities need to be logged. The REDO log makes it possible to replay these updates whenever the system is restarted. The NDB recovery algorithm uses a “fuzzy” checkpoint of the data together with the UNDO log, and then it applies the REDO log to play back all changes up to the restoration point.

The RedoBuffer parameter is 8MB by default. The minimum value is 1MB.

If this buffer is too small, the NDB storage engine issues error code 1221: “REDO log buffers overloaded.”

Backup Parameters

The parameters discussed in the following sections define memory buffers set aside for execution of online backups.

BackupDataBufferSize

In creating a backup, there are two buffers used for sending data to the disk. The backup data buffer is used to fill in data recorded by scanning a node’s tables. When this buffer has been filled to the level specified as BackupWriteSize (see the “BackupWriteSize” section), the pages are sent to disk. While flushing data to disk, the backup process can continue filling this buffer until it runs out of space. When this happens, the backup process pauses the scan and waits until some disk writes have completed and have thus freed up memory so that scanning may continue.

The default value for the BackupDataBufferSize parameter is 2MB.

BackupLogBufferSize

The backup log buffer fulfils a role similar to that played by the backup data buffer, except that it is used for generating a log of all table writes made during execution of the backup. The same principles apply for writing these pages as with the backup data buffer, except that when there is no more space in the backup log buffer, the backup fails. For that reason, the backup log buffer must be big enough to handle the load caused by write activities while the backup is being made.

The default value for the BackupLogBufferSize parameter should be sufficient for most applications. In fact, it is more likely for a backup failure to be caused by insufficient disk write speed than it is for the backup log buffer to become full. If the disk subsystem is not configured for the write load caused by applications, the cluster will be likely to be able to perform the desired operations.

It is preferable to configure cluster nodes in such a manner that the processor, rather than the disks or the network connections, becomes the bottleneck.

The default value is 2MB.

BackupWriteSize

The BackupWriteSize parameter specifies the size of messages written to disk by the backup log and backup data buffers.

The default value is 32KB.

Logging

Some parameters are set as a level from 0 to 15, which determines how much information is sent to stdout (printed onscreen typically). The default is to send some (but not much) information about startup events to stdout and nothing else. Note that this is completely separate from the logging information covered earlier, in the section “LogDestination.” This is information that affects only each individual storage node, although it makes sense to set the same values for all storage nodes the same (in other words, to set the value in the DEFAULT section).

LogLevelStartup

The LogLevelStartup parameter specifies the reporting level for events generated during startup of the process.

The default level is 1.

LogLevelShutdown

The LogLevelShutdown parameter specifies the reporting level for events generated as part of graceful shutdown of a node.

The default level is 0.

LogLevelStatistic

The LogLevelStatistic parameter specifies the reporting level for statistical events such as number of primary key reads, number of updates, number of inserts, information relating to buffer usage, and so on.

The default level is 0.

LogLevelCheckpoint

The LogLevelCheckpoint parameter specifies the reporting level for events generated by local and global checkpoints.

The default level is 0.

LogLevelNodeRestart

The LogLevelNodeRestart parameter specifies the reporting level for events generated during node restart.

The default level is 0.

LogLevelConnection

The LogLevelConnection parameter specifies the reporting level for events generated by connections between cluster nodes.

The default level is 0.

LogLevelError

The LogLevelError parameter specifies the reporting level for events generated by errors and warnings by the cluster as a whole. These errors do not cause any node failure but are still considered worth reporting.

The default level is 0.

LogLevelInfo

The LogLevelInfo parameter specifies the reporting level for events generated for information about the general state of the cluster.

The default level is 0.

Parameters to Define for Each Storage Node (NDBD)

Although you could set all the parameters we have looked at so far on a per-storage node basis, generally we suggest that it makes sense to have the same values for all nodes in the cluster and, therefore, unless there is a specific reason you want one node to have a different setting, we suggest that you just define all parameters in the [NDBD_DEFAULT] section.

This leaves the Id and Hostname parameters to set in each [NDBD] section.

An Example of a Storage Section of a config.ini File

Here is an example of a configuration with two storage nodes. It defines the parameters that most people will have to define, and this would make a good first configuration file to work from:


[NDBD DEFAULT] 
# 2 replicas (min. for redundancy; requires 2 storage nodes min.) 
NoOfReplicas=2 
# Standard data directory 
DataDir=/var/lib/mysql-cluster

# 512mb storage for data, 64mb for primary keys. 
DataMemory=512mb 
IndexMemory=64mb 
# Define MaxNoOfConcurrentOperations; leave 
# MaxNoOfLocalOperations as default (1.1*
# MaxNoOfConcurrentOperations) 
MaxNoOfConcurrentOperations=10000 
# Slightly increase the number of ordered indexes, unique hash 
# indexes, tables and attributes allowed from defaults 
# When importing tables, set this as high as you can 
# Then reduce based on output from ALL DUMP 1000 command 
MaxNoOfOrderedIndexes=512 
MaxNoOfUniqueHashIndexes=256 
MaxNoOfTables=256 
MaxNoOfAttributes=1500 

# 2 storage nodes 
[NDBD] 
# Allow auto allocation of ID, so just set hostname: 
Hostname = 10.0.0.5 

[NDBD] 
Hostname = 10.0.0.6 


Hopefully, you can see from this example how you would define the storage node section of your config.ini file, which is by far the largest and most important section.

Remember that if you change anything to do with the file system, you must restart the storage nodes by using the --initial parameter; the procedure for a cluster restart like this without any downtime is given at the beginning of this chapter, under the procedure for a rolling online configuration change.

SQL Nodes

SQL nodes can either be MySQL servers that connect to the cluster—which most are—or other cluster-specific binaries, such as the binaries described in Chapter 3, “Backup and Recovery”.

Parameters to Define for All SQL Nodes (MYSQLD_DEFAULT)

We mention the following parameters for reference. It is unlikely that you are going to want to change them from the defaults. In addition, these only apply to SQL nodes that are MySQL servers connecting to the cluster—not to other binaries that may connect to the cluster.

ArbitrationRank

See the parameter with the same name in the section “Parameters to Define for All Management Nodes (NDB_MGMD_DEFAULT).” This parameter is identical to that one except that its default value is 2 (low priority).

ArbitrationDelay

Setting the ArbitrationDelay parameter to any other value than 0 (the default) means that responses by the arbitrator to arbitration requests will be delayed by the stated number of milliseconds. It is usually not necessary to change this value.

BatchByteSize

For queries that are translated into full table scans or range scans on indexes, it is important for best performance to fetch records in sensibly sized batches. It is possible to set the proper size both in terms of number of records (BatchSize) and in terms of bytes (BatchByteSize). The actual batch size is limited by both parameters (that is, it makes the batch as large as it can to be below both parameters).

The speed at which queries are performed can vary by more than 40%, depending on how the BatchByteSize parameter is set. In future releases, the MySQL server will make educated guesses on how to set parameters relating to batch size, based on the query type.

This parameter is measured in bytes and by default is equal to 32KB. If you have very fast hardware and good networking between nodes, you might want to increase this.

BatchSize

The BatchSize parameter is measured in terms of the number of records and is by default set to 64. The maximum size is 992. If you have very fast hardware and good networking between nodes, you might want to increase this.

MaxScanBatchSize

The batch size is the size of each batch sent from each storage node. Most scans are performed in parallel on the storage nodes, so in order to protect the MySQL server from receiving too much data from many nodes in parallel, the MaxScanBatchSize parameter sets a limit on the total batch size over all nodes. If your SQL nodes are not overloaded, you might want to increase this value significantly.

The default value of this parameter is set to 256KB. Its maximum size is 16MB.

Parameters to Define for Each SQL Node

As with storage nodes, you can set all the parameters that we suggest you put in the DEFAULT section in each individual SQL node part of the configuration file. If you have different hardware for different SQL nodes, we suggest that you do this. However, if you have identical hardware for all SQL nodes, there seems little point.

This leaves the Id and Hostname parameters to set in each [NDBD] section. We suggest that at the bottom of the SQL nodes section of your config.ini file, you leave a few SQL nodes with no Hostname so that you can connect extra cluster-related daemons from any host within your network. (When giving this advice, we assume that you have taken earlier advice and that your cluster is on a private network so there is no chance of anyone evil connecting to your cluster.) You will probably find this useful later on.

A Final Note on SQL Nodes

You should remember that if you are going to attempt to recover a backup, you need one SQL node per storage node available for the ndb_restore command to connect to. If you have followed our advice and have a few spare [MYSQLD] sections free with no Hostname defined, you should be able to do this without any changes.

Remember that there is no problem at all with specifying two SQL nodes with the same IP address. The first node ID will go to the first one to connect, and so on (you can specify which node is which with connectstrings, but we do not really see the point in doing so).

You should by now be familiar enough with producing configuration files that we do not give an example of a storage node section of a configuration file here. There is, however, a full config.ini example in the next section, “Using Multiple Management Nodes.”

Using Multiple Management Nodes

To use multiple management nodes, first of all, you need to create a config.ini file, which must be the same (completely identical) for all management nodes (there is no checking done on this, but if the files are not identical, you will get a giant mess and a cluster crash). When using multiple management nodes, you have to assign all the IDs for the nodes manually in the configuration file. You cannot rely on auto-assignment due to the fact that you can have multiple management nodes assigning the IDs.

You then need to decide which of your management nodes is going to be primary. For example, say that the primary management node is 10.0.1.0, and the backup is 10.0.1.1. Assume a storage and SQL node each on 10.0.1.2 and 10.0.1.3:


[NDB_MGMD DEFAULT] 
PortNumber=1186 
LogDestination=CONSOLE;SYSLOG:facility=syslog;FILE:filename=/var/log/cluster-log 
DataDir=/var/lib/mysql-cluster 
ArbitrationRank=1 

# First (PRIMARY) mgm node 
[NDB_MGMD] 
Id = 1 
Hostname = 10.0.1.0

# Second (BACKUP) mgm node 
[NDB_MGMD] 
Id = 2 
Hostname = 10.0.1.1 

# Storage nodes 
[NDBD_DEFAULT] 
DataDir=/var/lib/mysql-cluster 
[NDBD] 
Id=3 
Hostname=10.0.1.2 
[NDBD] 
Id=4 
Hostname=10.0.1.3 

#SQL Nodes 
[MYSQLD] 
Id=5 
Hostname=10.0.1.2 
[MYSQLD] 
Id=6 
Hostname=10.0.1.3 


Now, on all four servers, you place the following in /etc/my.cnf:


[mysqld] 
ndbcluster 
#connectstring: primary,secondary management nodes 
ndb-connectstring=10.0.0.0,10.0.0.1 
[mysql_cluster] 
ndb-connectstring=id=x,10.0.0.0,10.0.0.1 


Notice the id=x in the second connectstring: Make sure you put the correct node ID (as specified in the configuration file) in here.

Now, on the primary management server, you start ndb_mgmd as per the instructions in Chapter 1, in the section “Starting a Management Node”. Once it is started, you switch to the secondary management server and start ndb_mgmd there exactly as before. Finally, you start the two storage nodes (ndbd) and restart MySQL on the two SQL nodes. You should see something like this:


ndb_mgm> SHOW
Connected to Management Server at: 10.0.0.0:1186
Cluster Configuration
--------------------- 
[ndbd(NDB)]     2 node(s)
id=3    @10.0.0.2  (Version: 5.0.12, Nodegroup: 0)
id=4    @10.0.0.3  (Version: 5.0.12, Nodegroup: 0, Master)

[ndb_mgmd(MGM)] 2 node(s)
id=1   (Version: 5.0.12)
id=2   (Version: 5.0.12)

[mysqld(API)]   4 node(s)
id=5    @10.0.0.2  (Version: 5.0.12)
id=6    @10.0.0.3  (Version: 5.0.12) 


Note

Note that 10.0.1.0 is the IP address of the primary management daemon. If you use ndb_mgm on any of the four servers (including the server that is running the backup ndb_mgmd daemon), you should still get this output: It should connect to the primary because at this stage the primary is working.

Now, let’s see what happens when you unplug the power cable to the primary management daemon, 10.0.0.0. First of all, you should start ndb_mgm on one of the other three server (that is, not the one that you are going to unplug). Then you unplug the server that the primary management daemon is running on. Next, try to run a SHOW command:


ndb_mgm> SHOW 
Could not get status 
*     0: No error 
*        Executing: ndb_mgm_disconnect 


It appears not to have worked. But this is what you expected; the management client is still trying to talk to the primary management daemon. If you exit and reopen ndb_mgm, it should work:


ndb_mgm> exit
[root@s1 mysql-cluster]# ndb_mgm 
-- NDB Cluster -- Management Client -- 
ndb_mgm> show; 
Connected to Management Server at: 10.0.0.1:1186 
Cluster Configuration 
---------------------
[ndbd(NDB)]     2 node(s)
id=3    @10.0.0.2  (Version: 5.0.12, Nodegroup: 0)
id=4    @10.0.0.3  (Version: 5.0.12, Nodegroup: 0, Master) 

[ndb_mgmd(MGM)] 2 node(s)
id=1 (not connected, accepting connect from 10.0.0.0) 
id=2    @10.0.0.1  (Version: 5.0.12)

[mysqld(API)]   4 node(s)
id=5   (Version: 5.0.12) 
id=6   (Version: 5.0.12)


Note

Note that you are now on the IP address of the backup management node.

The NDB and API nodes do not need to be restarted in the event of a primary management daemon failure. They will start communicating with the backup management daemon almost immediately, and they will switch back as and when they need to.



1 By doing this, it does not have to prevent any writes during this process. The UNDO log is activated on one table fragment at a time.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.106.150