Appendix A. Scripts and Sample Programs

THE BODY OF THIS BOOK CONTAINS MANY CODE SAMPLES AND SNIPPETS. I TOOK QUITE A NUMBER OF them (and slightly adapted them for confidentiality reasons) from code I have audited and modified; because these samples cannot run without a full array of tables and indexes filled with classified data, providing them makes little sense.

This book also includes many other programs and examples that I wrote specifically for it (which I sometimes merely allude to), as well as utility scripts that query the data dictionary and are of general use. These programs and scripts are available for download. You will find them on the O’Reilly website for this book, http://www.oreilly.com/catalog/9780596514976.

This appendix describes what you will find for download and provides a number of additional comments, which I hope you’ll find interesting.

Because most people are interested in one particular DBMS, I have regrouped the code samples based on DBMS, with one archive file per DBMS. Within each archive file, I organized the code samples by chapter (one directory per chapter).

Note that some of the programs I wrote specifically for one DBMS and did not port them to the others (e.g., this is the case in all the Chapter 7 examples that were written for MySQL exclusively). In several cases, you can adapt them to other products with some minor changes. Therefore, don’t hesitate to look at the code for the other products if you are interested in a program that isn’t provided for your favorite DBMS.

In the descriptions that follow, for programs that aren’t available for MySQL, Oracle, and SQL Server, I have indicated in parentheses the DBMS product(s) for which they are provided. In other words, if nothing is specified, it means a version of the program or script is available for all three products. Note that in the case of scripts that query the data dictionary, a similar name doesn’t mean all versions of the script return exactly the same information, but rather the same type of information.

Chapter 1

To begin, I used four scripts to create the tables in this chapter (these tables appear again in some examples in later chapters). If you want to run the programs on your own test database, you should start by generating the sample data.

These two scripts create the tables and the primary key indexes:

refactoring_big_tables.sql
refactoring_small_tables.sql (includes data)

These two scripts create additional indexes; the second script creates an index that replaces the index created by the first one (please refer to Chapter 1 for details):

additional_index.sql
additional_index_alternate.sql

The big transactions table is populated by GenerateData.java, which requires database.properties, which you must update to specify how to connect to your database.

Once the tables are created and populated, you can run the following programs:

FirstExample.java
SecondExample.java
ThirdExample.java (Java 1.5 or later required)
FourthExample.java
FifthExample.java
SixthExample.java

All of them use the same database.properties file as the data generation program.

The following scripts are product-specific:

  • dba_analysis.sql (SQL Server) is the query the SQL Server DBA ran to diagnose problems (like all DBA scripts, it requires some system privileges).

  • profiler_analysis.sql (SQL Server) is a query you can run if you save what the SQL Server profiler collects to a table. It includes a T-SQL function called Normalize to normalize statement text.

  • snapmon.sql (Oracle, SQL Server) is my core monitoring tool. It collects statistical information regarding all the queries that are currently held in the cache, as well as global values, which are returned as though they were associated with a dummy query. In the course of my assignments, I have developed monitoring tools in various languages (shell script included), all based on this query or slight variants of it. Each query is identified by an identifier, and a particular value is assigned to the global statistics, which allow you to check whether you have collected meaningful information or whether you have missed a lot of statements. In the latter case, I suggest that you run a modified version of this query in which you identify queries by the identifier of their execution plan.

The values returned by snapmon.sql are cumulative. If you want values by time slice, you must compute each time you poll the difference with the previous results.

You can do this in several ways. You can dump the output of the query to files, merge and sort two successive files, or make creative use of awk or perl to get what you want. I advise you to use SQLite (http://www.sqlite.org), which has APIs for most languages and scripting languages, and proceed as follows:

  1. Create two temporary tables, named snaptmp1 and snaptmp2. These tables must have a structure that matches the output of snapmon.sql. You must successively insert into snaptmp1 and snaptmp2. Note that you don’t need to query the database very often. Within a few minutes, you can usually get a very detailed picture of what the database is doing.

  2. After the second pass, begin to insert data into a permanent table with the same structure by executing a query on this pattern:

    insert into ...
    select <identifier>, abs(sum(<col1>)), ...., abs(sum(<coln>))
    from (select <identifier>, <col1>, ... <coln>
          from snaptmp1
          union all
          select identifier, -1 * <col1>, ..., -1 * <coln>
          from snaptmp2) x
    group by <identifier>
    having sum(...) <> 0

    You can refine this further by selecting only the top queries (to which the dummy “global query” will necessarily belong). You should also have a table that holds the text of statements. Whenever new identifiers are encountered, you should get the associated statement text from the database. I don’t want to stymie your creativity by suggesting too many things, but you can collect other information as well (e.g., waits) and generate graphics that will show you the load.

At the time of this writing, something similar wasn’t available for MySQL, for which other monitoring means may be more adequate. Nevertheless, depending on the storage engine that is used, some useful data can be collected from information_schema.global_status (COM_xxx and INNODB_xxx parameters) as well as the falcon_xxx tables that will at least give you a global picture. This can be completed by select statements against information_schema.processlist, which must be sampled more frantically than the Oracle and SQL Server tables because it shows only current information.

Chapter 2

The following scripts are used to create the tables in Chapter 2:

  • IndexSelectivity.java tests the impact of selectivity on index access performance.

  • stats.sql (MySQL, Oracle) displays some statistical values regarding tables.

  • indexing.sql lists all tables in the current schema and tells how many indexes they have, among other information.

  • checkparse.sql is a script that tells you whether many statements are parsed on your database, which is always indicative of hardcoded statements (although the reverse isn’t quite true, because the DBMS sometimes substitutes parameters with constants, as you saw in the chapter).

A few other .java programs come with the mandatory database.properties file. The following three programs test the performance loss of parsing:

HardCoded.java
FirmCoded.java
SoftCoded.java

These programs show step by step how a list of values can be passed as a single parameter:

list0.sql
list1.sql
list2.sql
list3.sql

These three programs fetch rows from the transactions table; the first one uses the default fetch size, the second one takes the fetch size as a command-line argument, and the third one activates the MySQL streaming mode:

DumpTx_default.java
DumpTx.java
DumpTx_Stream.java (MySQL only)

And finally, this program tests the impact of the commit rate on throughput:

UpdateTx.java

Chapter 3

For Oracle, there are three different ways to write a function that counts patterns. There is also a script that creates a test table and successively applies the three functions to this table.

function1.sql (Oracle)
function2.sql (Oracle)
function3.sql (Oracle)
test_search.sql (Oracle)

Also for Oracle, there are three functions to check whether a date corresponds to a weekend day, first as a nondeterministic function and then as a deterministic function, and finally as a function that wraps a deterministic function after removing the time part from the date:

weekend_day1.sql (Oracle)
weekend_day2.sql (Oracle)
weekend_day3.sql (Oracle)

A naive function is available for computing the next business day (NextBusinessDay.sql). This function relies on a table created by public_holidays.sql.

Also, with Oracle (in particular) and with MySQL, there are various attempts at refactoring this function. For Oracle, NextBusinessDay2.sql and NextBusinessDay3.sql both require Oracle11g or later, which isn’t the case for NextBusinessDay4.sql:

public_holidays.sql
NextBusinessDay.sql
NextBusinessDay2.sql (MySQL, Oracle)
NextBusinessDay3.sql (Oracle)
NextBusinessDay4.sql (Oracle)

Similarly, there is a naive currency conversion function called fx_convert.sql and, for Oracle and MySQL, alternative versions (the Oracle fx_convert2.sql file requires Oracle11g or later):

fx_convert.sql
fx_convert2.sql (MySQL, Oracle)
fx_convert3.sql (MySQL, Oracle)

Finally, one script replaces the function with a simple join:

fxjoin.sql

You’ll find various views built on the tables created by the scripts and programs in Chapter 1:

v_amount_by_currency.sql
v_amount_main_currencies.sql
v_amount_other_currencies.sql
v_last_rate.sql

You’ll also find a query that tries to identify complex views that might be a reason for poor query performance:

complex_views.sql

Chapter 4

For Chapter 4, there is the definition of the view, which returns 10 rows numbered 0 through 9:

ten_rows.sql

For SQL Server, there is the creation of the view that is required for further generation of random data, and the function that uses this view:

random_view.sql (SQL Server)
randuniform.sql (SQL Server)

Two functions to return random numbers that are not uniformly distributed in an interval are in these scripts:

randexp.sql
randgauss.sql

And if you want to see what the distribution looks like, a small script that draws a character-based histogram is available:

histogram.sql

To generate a string of random characters (which can be useful for password generation), you have this script:

randstring.sql

job_ref.sql shows how to generate job names that follow a predefined distribution.

To generate American names that are more realistic than those of all the people who email you to tout various drugs, anatomical enhancement devices, and fake watches, I suggest you go to http://www.census.gov/genealogy/www/ and click the link that will bring you to a list of frequently occurring surnames. You should find there a spreadsheet that contains the top 1,000 names in the United States. Download it and save it as a .csv file; my scripts assume that a semicolon is used for a separator.

Once you have the data, create a table to load it:

name_ref.sql

The MySQL script creates the table, loads and normalizes the data, and creates an index, and so does the SQL Server script (which uses an ancillary .xml file).

For Oracle, name_ref.sql just creates the table. You must load the data using name_ref.ctl, which is a control file you can use with SQL*Loader as follows:

sqlldr <username>/<password> control=name_ref.ctl

Then, still with Oracle, run name_ref2.sql to normalize the data and index the table.

get_ten_names.sql generates 10 random American surnames.

For Oracle, there are three additional scripts: dept_ref.sql allows you to randomly pick a department number for the emp table; gen_emp.sql is a pure SQL generator of rows for the emp table that fails to get truly random data; and gen_emp_pl.sql is a PL/SQL version that works fine.

For the generation of random text, please see Appendix B.

check (MySQL and Oracle) is a bash script that runs the output of an arbitrary query through the md5sum program.

qrysum.sql is a function (Oracle) or procedure (MySQL and SQL Server) that checksums the output of an arbitrary query. The script qrysum_complicated.sql does the same for Oracle, but applies successful checksums to buffers. It is mostly interesting as a wild PL/SQL example of dynamic SQL.

qrysum.sql
qrysum_complicated.sql (Oracle)

Chapter 5

There is no script sample for this chapter. Examples in this chapter come from real cases.

Chapter 6

Most of the examples in this chapter also come from real cases, but I wrote some of them for this book.

The SQL Server directory contains the files for the example of unlocking accounts after payment. Two functions were written in Visual Basic—one that strictly follows specifications and another that performs the same task with a more clever use of SQL:

create_tables.sql (SQL Server)
procs.vb (SQL Server)
create_vb_procs.sql (SQL Server)

There is also a rather complete PHP example written for MySQL, and a configuration file that must be updated with connection data. The first two examples show an incorrect and a correct use of found_rows() (very specific to MySQL):

config.php (MySQL)
sample_bad.php (MySQL)
sample_ok.php (MySQL)

Another series of PHP programs shows how you can avoid counting before fetching data when there is an upper limit on the number of rows that will be displayed:

with_count.php (MySQL)
with_count2.php (MySQL)
without_count.php (MySQL)

The most interesting query in the last program is a query that returns the total number of rows in the result set in the first row. It is provided (as a query on the transactions table that returns more than 500 rows) for all DBMS products (including MySQL).

top500GBP.sql

For Oracle users, a very simple example that uses emp and dept shows why I dislike cursor loops:

cursor.sql (Oracle)

Chapter 7 (MySQL)

For Chapter 7, there are code samples for MySQL only (however, SQL Server and Oracle users mustn’t forget the Roughbench tool, which is described in Appendix B).

The table that is used in the examples is created by fifo.sql.

The various programs that simulate the processing of messages need the following sources and the accompanying makefile:

common.c
common.h
consumer_inno.c
consumer_lock.c
consumer_naive.c
consumer_not_so_naive.c
makefile

For the test of concurrent inserts, there are two subdirectories, one for each storage engine used. Scripts postfixed with _ai use an auto-increment key, and scripts postfixed with _p use a partitioned table. There are scripts to create the tables, and scripts to insert data, which you must run with Roughbench.

InnoDB
create_concurrent_insert.sql
create_concurrent_insert_ai.sql
create_concurrent_insert_p.sql
insert_concurrent_insert.sql
insert_concurrent_insert_ai.sql
insert_concurrent_insert_p.sql
MyISAM
create_concurrent_insert.sql
create_concurrent_insert_ai.sql
create_concurrent_insert_p.sql
insert_concurrent_insert.sql
insert_concurrent_insert_ai.sql
insert_concurrent_insert_p.sql
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.255.36