THIS APPENDIX DESCRIBES TWO TOOLS, THE SOURCE CODE FOR WHICH IS AVAILABLE FOR DOWNLOAD, and which I hope you’ll find useful. The first one is actually a pair of programs, mklipsum and lipsum, which I briefly described in Chapter 4. The second one is a Java tool called Roughbench, which I have used often in my tests, in particular in Chapter 7.
These tools are released under the GPL license and come with the usual disclaimers.
mklipsum and lipsum are tools for generating arbitrary random text. The source code is written in C, and I prepared it using GNU Autotools (as best as I could). Windows executables are also provided (they have been ported with Mingwin). Both programs require SQLite3, available from http://www.sqlite.org. For the record, mklipsum contains nontrivial SQL code.
On a *nix machine, after having uncompressed and unarchived the files, you should be able to build the programs by typing ./configure
, then make
and make install
(read the INSTALL file for more information and to learn how to customize default settings).
Using mklipsum and lipsum is fairly simple. mklipsum prepares an SQLite file for use by lipsum from text that it tokenizes and analyzes. The text is read from the standard input. The SQLite file stores the words using UTF8 encoding, and it should also be the character set used in the input. If, however, the library libiconv, which converts character sets, is available on your system, mklipsum will be able to perform character set conversion. The availability or nonavailability of the character set conversion feature is displayed when you run the program.
mklipsum can optionally take one parameter (two if character set conversion is available).
The first parameter is a file identifier that can be used to identify the language of the source file. By default, the SQLite file created is named lipsum.db.
If you run:
mklipsum xx < sample_text_file
the SQLite file will be named lipsumxx.db. This feature is intended for the generation of random text with words from different languages. You can use the ISO code of the language to identify the various files.
The second parameter, if character set conversion is available, is the character set used in the source file. Note that you cannot specify the character set alone, and that it must always follow a file identifier.
I have tested mklipsum and lipsum with different languages that use the Latin alphabet (with extensions); it is possible that languages that use the Cyrillic alphabet may require some adjustments to the programs, and very likely that a language such as Arabic or Hebrew, to say nothing of Chinese or Japanese, will require some rewrites (the programs try hard to retain the original capitalization, but follow Western rules and capitalize the first letter of words that follow a period, for instance).
Here is an example, using a few chapters from Thomas Hobbes’ Leviathan:
$ ./mklipsum < leviathan_sample.txt
mklipsum $Revision$ with charset conversion
-- Reading input ...
-- Loading vocabulary ...
--- 770 words of length 1 analyzed
--- 761 words of length 2 analyzed
--- 800 words of length 3 analyzed
--- 614 words of length 4 analyzed
--- 474 words of length 5 analyzed
--- 312 words of length 6 analyzed
--- 237 words of length 7 analyzed
--- 152 words of length 8 analyzed
--- 119 words of length 9 analyzed
--- 72 words of length 10 analyzed
--- 43 words of length 11 analyzed
--- 22 words of length 12 analyzed
--- 21 words of length 13 analyzed
--- 5 words of length 14 analyzed
--- 1 words of length 15 analyzed
--- 1 words of length 16 analyzed
--- 1 words of length 19 analyzed
-- Vocabulary: 1029 distinct words loaded
-- Loading lengths relationships ...
--- ...................................................................................
..............................................................
-- Indexing
$
To generate random text, you must run the lipsum program that can take as a parameter a file identifier telling which SQLite file should be used.
$ ./lipsum
Speech Thy Eye up with serves which no Businesse; can of is therefore
up can false a Joynts happen, he late Marcus, up any motion But Eyes
of hath Species; up my Cold which my How First Voyces my are up them
also exercise which a Death are any of parts if Circumstances; Image
whereas els waking is any must, Sees For office is am cease. Ever
say. Prognostiques many course, one cause I Speech mind of For removed
of though. Give not onely Anger horse can them, well sight.
Here is an example using another file seeded by a German text written by Immanuel Kant:
$ ./lipsum de
Sein daβ Sitz Begriff Synthesis als, von priori Satz kann vorher
vielleicht. Lassen, in komme eine bewuβt, des Kreis die ist. Um
allen selbst natürlicher a dieser die Prädikat hergibt Zufälligkeit
bewuβt, priori Verstand a fassen leicht, er Begriffe. Hat wir wird
denn desjenigen der empfangen sollten. Seine in der einander ist
mithin man korrespondiert, als.
The lipsum program can take a number of flags:
$ ./lipsum -?
Usage : ./lipsum [flags] [suffix]
This program looks in the current directory for an
sqlite file named lipsum<suffix>.db that must have
previously been generated by the mklipsum utility.
Flags:
-h, -? : Display this and exit.
-n <count> : Number of paragraphs to generate (default 1)
-w <len>[,<dev>] : Approximate number of words per paragraph.
The program generates paragraphs averaging <len>
words, with a standard deviation <dev>. By
default, <dev> is 20% of <len>.
Default for <len>: 500.
This flag is exclusive of -c.
-c <len>[,<dev>] : Same as -c, but based on character count instead of
word count.
-l <len> : Insert a carriage return as soon as possible
after having output <len> characters. Default: 65.
Roughbench takes as a parameter the name of a simple SQL script, reads it, and executes it, for a number of times or for a certain duration of time, and can start several concurrent threads running the same script.
On completion, statistical information is displayed.
You need the J2SE Development Kit (JDK) version 1.5 or later to compile the program. The JDK is available from http://java.sun.com/j2se.
Using Roughbench is straightforward:
Specify the JDBC parameters in a file named roughbench.properties.
Check that your CLASSPATH
environment variable is set properly.
Prepare a file containing the (single) statement you want to run. This statement can contain ? characters as placeholders for values generated by Roughbench.
Specify a number of options and how to generate values on the command line.
The command line looks like this:
java [options
] roughbench<sqlfile> [generators
... ]
The roughbench.properties file must contain at least the following two values:
CONNECT=<JDBC URL>
DRIVER=<Driver package name>
Additionally, if there is no implicit authentication and if no authentication method is provided inside the JDBC URL, you must specify this:
USERNAME=<...>
and this:
PASSWORD=<...>
For example, assuming that we want to connect as user bench
, password possward
, to a database named TEST running on the same server, we can use the following:
roughbench.properties file with Oracle: |
CONNECT=jdbc:oracle:thin:@localhost:1521:TEST USERNAME=bench PASSWORD=possward DRIVER=oracle.jdbc.OracleDriver
Or, with MySQL, we can specify the username and password in the URL:
CONNECT=jdbc:mysql://localhost:3306/TEST?user=bench&password=possward DRIVER=com.mysql.jdbc.Driver
Options, which are listed in Table B-1, are specified on the command line with the following syntax:
-D<option name>=<value>
Option name | Comment | Default value |
| Number of executions between two successive commits (ignored for | 1 (commit after every DML statement) |
| Number of times the SQL statement must be run. Exclusive of - | 1 |
| Number of executions per minute. The rate is an average; the interval between two successive executions is random. | |
| Identifier for the run. | Timestamp, format MonDD-hh:mn |
| Number of concurrent threads to run. All threads execute the same SQL statement. | 1 |
| Number of minutes during which the SQL statement must be run. Exclusive of - |
You can generate random variables to bind to the statement by specifying as many generators on the command line as there are placeholders in the statement. Generators are specified as shown in Table B-2.
Generator | Meaning |
| Constant value; |
| Uniform random value from the list (the same item can occur several times in the list to give it more weight). Values can be of any type. |
| Random value in the range |
| Normal (Gaussian) distribution, with an average of |
| Incremental value. |
| String of random letters of length between |
The generation of integer or float values depends on the type of the first number given in the generator specification. For instance:
R1-250000
will generate integer values between 1 and 250,000.
R1.0-250000
will generate float values in the same range.
To generate dates, you must use the functions available with your DBMS and generate integer values to be used with date functions and date arithmetic.
For instance, you can use the following expressions in different SQL dialects associated to an incremental generator to generate dates in chronological order from the past 300 days. Because 300 days equal almost 26 million seconds, you can generate almost as many days as you want in the interval to the current date.
In MySQL:
date_add(date_sub(curdate(), INTERVAL 300 DAY), INTERVAL ? SECOND)
In Oracle:
sysdate - 300 + ?/86400
In T-SQL (SQL Server/Sybase):
dateadd(ss, ?, dateadd(dd, -300, getdate()))
and so on.
Roughbench displays informational messages on the standard error and results on the standard output, which makes it easy to redirect clean results to a file.
The provided information consists of the following:
Program name, version, and copyright/licensing information
Reminder of the command-line parameters
Text of the query being run
Results are displayed as follows:
tag
<tab>filename
<space>thread#<
tab>tenth of second
<tab>count
tag is the value of the tag specified by -DTAG= … on the command line. It defaults to a timestamp; for example, Jun11-15:42 . |
filename is the name of the file that contains the SQL statement. |
thread# is the number (starting with 0) identifying a particular thread. |
tenth of second and count tell how many statements were executed in what time. A value of 0 for tenth of second means less than 0.1 second, a value of 1 means between 0.1 and 0.2 seconds, and so forth. |
For example, the following output means that for the run of October 23 at 4 p.m., thread 3 executed the script insert.sql 15,874 times, of which five times took between 0.1 and 0.2 seconds and the remainder took less than 0.1 second:
Oct23-16:00 insert.sql 3 0 15869 Oct23-16:00 insert.sql 3 1 5
Additional lines may also be displayed. For instance, when operating in loop mode, the total time to execute the required number of loops is displayed as follows:
tag
<tab>filename
<space>thread#
<tab>elapsed (ms)
<tab>time
If some of the executions end in failure, the number of successful executions and the number of failures will be output as shown here:
tag
<tab>filename
<space>thread#
<tab>OK<tab>count
tag
<tab>filename
<space>thread#
<tab>KO<tab>count
18.116.118.229