In this section we discuss miscellaneous tools and techniques for using SQL and MySQL. We introduce:
Choosing keys and indexes for fast searching
Elementary database-tuning techniques
Adding and deleting users of a DBMS, and changing user permissions
Limitations of MySQL
As
discussed earlier in our introduction to SQL, each table should have
a PRIMARY KEY
definition as part of the
CREATE TABLE
statement. A primary key is an
attribute—or set of attributes—that uniquely identifies a
row in a table. Storing two rows with the same primary key
isn’t permitted and, indeed, an attempt to
INSERT
duplicate primary keys produces an error.
In MySQL, the attribute values of the primary key are stored in an index to allow fast access to a row. The default MySQL index type is fast for queries that find a specific row, a range of rows, for joins between tables, grouping data, ordering data, and finding minimum and maximum values. Indexes don’t provide any speed improvement for retrieving all the rows in a table or for other query types.
Indexes are also useful for fast access to rows by values other than those that are associated with attributes in the primary key. For example, in the customer table, you might define an index by adding the clause:
KEYnamecity
(surname
,firstname
,city
)
to the CREATE
TABLE
statement.
After you define this index, some queries that select a particular
customer through a WHERE
clause can use it.
Consider an example:
SELECT * FROM customer WHERE surname = 'Marzalla' AND firstname = 'Dimitria' AND city = 'St Albans';
This query can use the new index to locate—in at most a few
disk accesses—the row that matches the search criteria. Without
the index, the DBMS must scan all the rows in the
customer table and compare each row to the
WHERE
clause. This might be quite slow and
certainly requires significantly more disk accesses than the
index-based approach (assuming the table has more than a few rows).
A particular feature of DBMSs is that they develop a query evaluation
strategy and optimize it without any interaction from the user or
programmer. If an index is available, and it makes sense to use it in
the context of a query, the DBMS does this automatically. All you
need to do is identify which queries are common, and make an index
available for those common queries by adding the
KEY
clause to the CREATE TABLE
statement or using ALTER TABLE
on an existing
table.
Careful index design is important. The namecity
index we have defined can also speed queries other than those that
supply a complete surname
,
firstname
, and city
. For
example, consider a query:
SELECT * FROM customer WHERE surname = 'LaTrobe' AND firstname = 'Anthony';
This query can also use the index namecity
,
because the index permits access to rows in sorted order first by
surname
, then firstname
, and
then city
. With this sorting, all
“LaTrobe, Anthony” index entries
are clustered together in the index. Indeed, the index can also be
used for the query:
SELECT * FROM customer WHERE surname LIKE 'Mar%';
Again, all surnames beginning with “Mar” are clustered together in the index. However, the index can’t be used for a query such as:
SELECT * FROM customer WHERE firstname = 'Dimitria' AND city = 'St Albans';
The index can’t be used because the leftmost
attribute named in the index, surname
,
isn’t part of the WHERE
clause.
In this case, all rows in the customer table
must be scanned and the query is much slower (again assuming there
are more than a few rows in the customer table, and assuming there is
no other index).
Careful choice of the order of
attributes in a KEY
clause is important. For an index to be usable in a query, the
leftmost attribute must appear in a WHERE
clause.
There are other cases in which an index can’t be
used, such as when a query contains an OR
that
isn’t on an indexed attribute:
SELECT * FROM customer WHERE surname = 'Marzalla' OR email = '[email protected]';
Again, the customer table must be completely
scanned, because the second condition,
email='[email protected]'
, requires all rows
to be retrieved as there is no index available on the attribute
email
. Also, the case where the
OR
ed attribute isn’t the leftmost
attribute in an index requires a complete scan of the
customer table. The following example requires a
complete scan:
SELECT * FROM customer WHERE firstname = 'Dimitria' OR surname = 'Marzalla';
If all the attributes in the index are used in all the queries, to
optimize index size, the leftmost attribute in the
KEY
clause should be the attribute with the
highest number of duplicate entries.
Because indexes speed up queries, why not create indexes on all the attributes you can possibly search on? The answer is that while indexes are fast for searching, they consume space and require updates each time rows are added or deleted, or key attributes are changed. So, if a database is largely static, additional indexes have low overheads, but if a database changes frequently, each additional index slows the update process significantly. In either case, indexes consume additional space, and unnecessary indexes should be avoided.
One way to reduce the size of an index and speed updates is to create
an index on a prefix of an attribute. Our namecity
index uses considerable space: for each row in the
customer table, an index entry is up to 120
characters in length because it is created from the combined values
of the surname
, firstname
, and
city
attributes.[7] To reduce space, you can define the index as:
KEY namecity (surname(10),firstname(3),city(2));
This uses only the first 10 characters of surname
,
3 of firstname
, and the first 2 characters of
city
to distinguish index entries. This is quite
reasonable, because 10 characters from a surname distinguishes
between most surnames, and the addition of a few characters from a
first name and the prefix of their city should be sufficient to
uniquely identify almost all customers. Having a smaller index with
less information can also mean that queries are actually faster,
because more index information can be retrieved from disk per second,
and disk retrieval speed is almost always the bottleneck in query
performance.
The space saving is significant with a reduced index. A new index
entry requires only 15 characters, a saving of up to 105 characters,
so index insertions, deletions, and modifications are now likely to
be much faster. Note that for TEXT
and
BLOB
attribute types, a prefix must be taken when
indexing, because indexing the entire attribute is impractical and
isn’t permitted by the MySQL DBMS.
Careful index design is one technique that improves the speed of a DBMS and can reduce the resource requirements of a database. However, comprehensive database tuning is a complex topic that fills many books. We include in this section only a few additional practical ideas to begin to improve the performance of a database system.
As discussed previously, accessing a hard disk is slow and is usually the bottleneck in DBMS performance. More specifically, disk seeking—moving the disk head to get information from another location of the disk—is the slowest component of disk access. Therefore, most techniques described in this section are also techniques that improve performance by minimizing disk space requirements.[8]
Here are some ways to improve DBMS performance:
Carefully choose attribute types and lengths. Where possible, use
small variants such as SMALLINT
or
MEDIUMINT
rather than the regular choice
INT
. When using fixed-length attributes, such as
CHAR
, specify a length that is as short as
practical.
Use fixed-length attributes; that is, try to avoid types such as
VARCHAR
or BLOB
. While
fixed-length text attributes may waste space, scanning fixed-length
rows in a query is much faster than scanning variable-length rows.
Design indexes with care. As discussed in the last section, keep the primary key index as small as possible, create only indexes that are needed, and use prefixes of attributes where possible. Ensure that the leftmost attribute in the index is the most frequently used in queries and, if all attributes are used, make sure the leftmost attribute is the one with the highest number of duplicate entries.
Create a statistics table if aggregate functions such as
COUNT( )
or SUM( )
are
frequently used in queries on large tables. A statistics table stores
only one row that is manually updated with the aggregate values of
another table. For example, if the statistics table maintains the
count of rows in a large customer table, each
time a row is inserted or deleted in the
customer table, the count is updated in the
statistics table. For large tables, this is often faster than
calculating aggregate functions with the slow built-in functions that
require complete processing of all rows.
If large numbers of rows are deleted from a table, or a table containing variable-length attributes is frequently modified, disk space may be wasted. MySQL doesn’t usually remove deleted or modified data; it only marks the location as being no longer in use. Wasted space can affect access speed.
To reorganize a table—by copying data to a temporary location
and back again—MySQL provides the OPTIMIZE
TABLE
command, which should be used periodically.
For example:
OPTIMIZE TABLE customer;
The OPTIMIZE
command should be run when the DBMS
is offline for scheduled maintenance. The command is nonstandard SQL.
It is possible to create different table types for specific tasks.
The default in MySQL is the MyISAM
type, and all
the tables described so far are this table type. For small,
temporary, frequently used lookup tables, a different type, the
heap
table type, can be used. There are other
types, and we briefly discuss alternatives in Chapter 6. More details are provided in Section 9.4 of
the MySQL user manual.
Section 10.7 of the MySQL manual includes other excellent ideas for simple performance improvement.
Another aspect of database tuning is optimizing the performance of the DBMS itself. Included with the MySQL installation is the mysqladmin tool for database administration. Details of the system setup can be found by running the following command from a Linux shell:
% mysqladmin -ppassword
variables
This shows, in part, the following selected system parameters:
join_buffer current value: 131072 key_buffer current value: 8388600 net_buffer_length current value: 16384 record_buffer current value: 131072 sort_buffer current value: 2097144 table_cache current value: 64
The important parameters are those that impact disk use. MySQL has several main-memory buffer parameters that control how much data is kept in memory for processing. These include:
The record_buffer
for scanning all rows in a table
The sort_buffer
for ORDER BY
and GROUP BY
operations
The key_buffer
for storing indexes in main memory
The join_buffer
for joins that
don’t use indexes
In general, the larger these buffers, the more data from disk is cached or stored in memory and the fewer disk accesses are required. However, if the sum of these parameters is near to exceeding the size of the memory installed in the server, the underlying operating system will start to swap data between disk and memory, and the DBMS will be slow. In any case, careful experimentation based on the application is likely to improve DBMS performance.
Section 10.2.3 of the MySQL manual suggests parameter settings when starting the MySQL server. First, for machines with at least 64 MB of memory, large tables in the DBMS, and a moderate number of users, use:
safe_mysqld -O key_buffer=16M -O table_cache=128 -O sort_buffer=4M -O record_buffer=1M &
Second, if there is less than 64 MB of memory available, and there are many users, try the following:
safe_mysqld -O key_buffer=512k -O sort_buffer=100k -O record_buffer=100k &
The following setting might be appropriate for the winestore, because many users are expected, the queries are largely index-based, and the database is small:
safe_mysqld -O key_buffer=512k -O sort_buffer=16k -O table_cache=32 -O record_buffer=8k -O net_buffer=1K &
Even more conservative settings might also be acceptable.
There are two other parameters we have not discussed. The
table_cache
parameter manages the maximum number
of open tables per user connection, while the
net_buffer
parameter sets the minimum size of the
network query buffer in which incoming queries are kept before they
are executed.
The mysqladmin utility can report the status of the DBMS:
% mysqladmin -ppassword
status
The output has the following format:
Uptime: 5721024 Threads: 14 Questions: 7874982 Slow queries: 6 Opens: 115136 Flush tables: 1 Open tables: 62
This gives a brief point-in-time summary of the DBMS status and can
help find more about the number of user connections, queries, and
table use. Similar output can be generated by running the commands
SHOW STATUS
and SHOW VARIABLES
through the MySQL command interpreter.
Information about query performance can be gained with the
benchmark( )
function, which can be used
iteratively for tuning when altering table design or DBMS system
parameters. The following statement illustrates benchmarking:
SELECT benchmark(10000, COUNT(*)) FROM items;
This statement reports the time taken to evaluate 10,000 calls to
COUNT( )
on the items
table.
We have not yet discussed adding and deleting users from the MySQL DBMS. Our rationale in leaving this topic until this final section is that DBMS users aren’t as important in a web database application as in other applications. Because access to the database and DBMS is generally controlled in the application logic of the middle tier, usually only one or two DBMS users are needed.
A user, hugh, who has full control over all aspects of the DBMS and can access the DBMS from the machine that hosts the DBMS, can be created with the statement:
GRANT ALL PRIVILEGES ON *.* TO hugh@localhost
IDENTIFIED BY 'password
' WITH GRANT OPTION;
Allowing access over a network can be added with:
GRANT ALL PRIVILEGES ON *.* TO hugh@"%"
IDENTIFIED BY 'password
' WITH GRANT OPTION;
There is no need to allow network access for a web database application if the middle-tier components—the web server and scripting engine—are installed on the same machine as the DBMS.
This user can then connect to the database from the shell with the command:
% mysql -ppassword
-uhugh
The user information is stored in the mysql database in the user table, which can be explored with:
USE mysql; SELECT * FROM user;
The mysql database and the
user table can be managed in the same way as any
other database. For example, you can update the password of the new
user with the UPDATE
statement:
UPDATE user
SET password=password('newpwd
')
WHERE user='hugh';
Note the use of the password( )
function we
described earlier to encrypt the password for storage in the
user table.
Users can be
added to the system with an INSERT INTO
the
user table in the mysql
database or, as previously illustrated, you can use the
GRANT
statement. Moreover, privileges can be
adjusted with an UPDATE
,
added with GRANT
, or removed with
REVOKE
.
Consider the following example:
GRANT SELECT,INSERT,UPDATE,DELETE,CREATE,DROP
ON winestore.*
TO dave@localhost
IDENTIFIED BY 'password
';
This adds a new user dave and allows him to use
only the SQL statements listed in the winestore
database. The parameter winestore.*
means all
tables within the winestore database.
Privileges can be removed with the REVOKE
statement. For example:
REVOKE DROP,CREATE ON winestore.* FROM dave@localhost
;
If the privilege or privileges are to be revoked for all databases in
the DBMS, not just a single database, winestore.*
can be replaced with *.*
.
The following privileges can be used in GRANT
and
REVOKE
statements:
ALL PRIVILEGES, FILE, RELOAD, ALTER, INDEX, SELECT, CREATE, INSERT, SHUTDOWN, DELETE, PROCESS, UPDATE, DROP, REFERENCES, USAGE
The most significant limitation of MySQL is that it doesn’t support nested queries. However, support is planned in MySQL Version 4. Nested queries are those that contain another query. Consider an example nested query to find the wines that have inventory stock:
SELECT DISTINCT wine_id FROM wine WHERE wine_id IN (SELECT wine_id from inventory);
The query returns the wine_id
values from the
wine table that are found in the
inventory table. Nested queries use the
IN
, NOT
IN
,
EXISTS
, and NOT
EXISTS
operators.
In many cases, a nested query can be rewritten as a join query. For example, to find the wines that are in stock, you can use the following join query:
SELECT DISTINCT wine.wine_id FROM wine, inventory WHERE wine.wine_id = inventory.wine_id;
However, some nested queries can’t be rewritten as join queries; for difficult queries, temporary tables are often a useful workaround.
A limitation of DELETE
and UPDATE
is that only one table can be specified in the
FROM
clause. This problem is particular to MySQL
and related to the lack of support for nested queries. This
limitation can make modifications of data difficult. For example, it
prevents data being deleted or updated using the properties of
another table. A solution involves data being copied to a temporary
table using a combined INSERT
and
SELECT
statement that joins together data from
more than one table. Then, the data can be deleted or updated in the
temporary table and then transferred back to the original table.
Another approach, using the concat( )
string
function, is discussed in Section 1.4.4.1 in the MySQL manual.
To avoid UPDATE
and DELETE
problems, consider adding additional attributes to tables at design
time. For example, in the winestore we added a
DATE
attribute to the items
table so that shopping-cart items can be removed easily if they
aren’t purchased within one day. Removing rows from
the items table based on the
DATE
in the orders table is
difficult without support for nested queries.
MySQL doesn’t support stored procedures or triggers. Stored procedures are queries that are compiled and stored in the DBMS. They are then invoked by the middle-tier application logic, with the benefit that the query is parsed only once and there is less communication overhead between the middle and database tiers. Triggers are similar to stored procedures but are invoked by the DBMS when a condition is met. Stored-procedure support is planned for MySQL, but trigger support isn’t.
Views aren’t supported in MySQL. Views consolidate read-only access to several tables based on a join condition. For example, a view might allow a user to browse the sales made up to April without the need to create a temporary table, as we did in the example in Section 3.8. View support is planned for the future.
Limitations that we don’t discuss here include the lack of support for foreign keys and cursors. More detail on the limitations of MySQL can be found in Section 1.4 of the manual distributed with MySQL.
[7] This isn’t the space actually required by an index entry, because the data is compressed for storage. However, even with compression, the fewer characters indexed, the more compact the representation, the more space saved, and—depending on the usability of the index—the faster searching and updates are.
[8] Reducing disk space requirements improves both disk seek and read performance. Disk read performance is improved because less data is required to be transferred, while seek performance is improved because the disk head has to move less on average when randomly accessing a smaller file than when accessing a larger file.
3.137.178.9