Chapter 12. Linux Database Software

A cornucopia of database management systems!

Linux has benefited in the realm of DBMSs by being a UNIX-like OS. A number of software vendors have ported their database servers to Linux. Better yet, a few have even been developed under Linux. In the last year or so, major commercial database makers have also made versions of their systems for Linux.

This chapter actually started as a section, but it got so big and DBMSs are becoming such an important aspect of computing, we made it a chapter. We first talk about two very popular, lightweight servers in detail. Then we have a less detailed listing of other native database servers. Finally, a description of a couple of databases with partial Linux support round out the chapter.

This is not an attempt to enumerate every database server for Linux. It is a list of databases we've used and/or found interesting. For a more complete listing of commercial databases, see the Linux Commercial HOWTO.

Lastly, let's go over a few acronyms and terms for those of you who are not very familiar with the world of databases:

  • SQL—Structured Query Language, a language for performing queries on relational databases.

  • DBMS—Database Management System.

  • Key—A column or set of columns that organizes a table. Often, but not always, keys will be unique, that is, a particular combination of values of the columns that make up the key must be unique (for a one-column key, this means each value in the column must be unique).

  • Schema—An interrelated set of tables and/or databases. Also a description of how a set of tables are joined via keys.

  • RDBMS—Relational DBMS.

  • ODBMS—Object Oriented DBMS.

MySQL and mSQL

These two servers are often mentioned together as they developed somewhat in parallel and MySQL was initially based off of mSQL. Both are used for similar tasks as well.

mSQL

mSQL stands for mini-SQL, a shareware database server (free for non-commercial use). It supports, as its name suggests, a subset of the normal SQL specification. Additionally, it has some extensions to SQL. Before we go any further, it is probably best to list what mSQL can't do so you can decide if it's worth your while to use it for your particular application.

  • There is no grants database or "logging in" to the database. Access is granted to users listed in /etc/passwd to read and/or write on a per-database basis.

  • Databases can be accessed locally via a UNIX domain socket or remotely via a TCP/IP socket. If you grant TCP/IP access to a database, you do so for all usernames, even though the username on the remote host is not necessarily the same person as on the local system. There is no method for finer grained remote versus local access control.

  • It does not support the BLOB data type and is thus not really suitable for storing large blocks of data.

  • It has no ALTER command. This means that, for example, once a table is created, columns cannot be added, removed, or changed. While you may think that careful planning can minimize the inconvenience of this, don't. Invariably, some outside force, be it a new product, new marketing strategy, or simple oversight will force you to alter your carefully planned database schema.

  • It is single-threaded, meaning it can only process one query at a time. As long as the load is not too heavy and the queries not too complex, this shouldn't really be a problem. What exactly too heavy and too complex mean will depend on your system and may require some experimentation on your part.

Despite these shortcomings, mSQL is still a useful database server. For small datasets with a simple schema and low likelihood of simultaneous accesses, it is well-suited. There are several other reasons you might want to use mSQL:

  • It is lightweight, consuming only a small amount of system resources when simply running (as opposed to monsters like Oracle and Sybase, which require many megabytes of RAM simply to start up).

  • It is free for non-commercial use and the shareware fee for commercial use is only $200.

  • It has a large install base and widespread support on the 'net.

  • As a side effect of its popularity, most of the more popular languages have support for mSQL: C (the API is for C), Python, Java, Perl, Tcl, PHP.

  • It has two enhanced LIKE operators: RLIKE, which allows use of the regular expression library, and CLIKE, which is case-insensitive. Since there is a separate RLIKE that implements regex, LIKE and CLIKE do not need to and thus are faster than most other SQL operators, which typically implement at least a subset of regex's functionality.

Lite and W3-msql

These two additional packages are a part of the standard mSQL distribution.

Lite is a scripting language designed for use with mSQL. Its syntax is similar to Perl. Its variables are dynamically typed, arrays are dynamically sized, and variable names start with a $. Its chief advantage over other languages is its tight integration with mSQL.

W3-msql is a Web interface to Lite that is embedded in HTML and server-side parsed. Also included is W3-auth, an access control package for W3-msql. It supports users and groups and gives somewhat finer access control than the non-Web access control file. In some ways, W3-msql is very similar to PHP, though it lacks support for other databases and is simply not as full-featured. See the Web serving chapter for more information on PHP.

Some Other Notes

mSQL's author claims that it will manage up to 4GB of data, but suggests that other system limitations will make such a large database impractical under mSQL. It's not entirely clear how this number is arrived at, if it comes from a real-life example, or to how many rows it corresponds.

Linux 1.3 or higher is required. Earlier versions do not have full mmap support, which mSQL 2 requires. Additionally, you should have the kernel sources installed to ensure that you have all the necessary libraries and header files.

For an example of how to install mSQL, see the Web serving chapter.

MySQL

An increasingly popular free database server, MySQL started off as an enhanced version of mSQL 1. It attempted to add a more complete set of the ANSI SQL specification, plus some enhancements. Today it implements nearly all of ANSI SQL and a bevy of additional features.

Like mSQL it implements a regex LIKE separate from LIKE to enhance performance. It is also fairly lightweight, though not light as mSQL.

Some things that MySQL supports that mSQL doesn't include are:

  • Expanded SQL, including things like an ALTER TABLE clause (allowing for columns to be added or dropped from a table after it is created) and embeddable functions that operate on all datatypes.

  • More datatypes—especially BLOBs, binary large objects. It also supports varying sizes of existing datatypes, e.g., tiny, small, medium, and large integer types.

  • Usernames and passwords independent of /etc/passwd. This is nice for CGI use. Rather than having to allow the user nobody read and possibly write access to the database, you can require a valid MySQL username/password combination that you could ask for on a form. The level of access a user has can be controlled on the operational (selecting, deleting, updating, preloading, and shutting down the server), database, and host levels.

  • Optimized queries, including optimizations for joins (all joins are done in one pass).

  • It is multithreaded and able to handle multiple requests concurrently. This mean that subsequent queries will not have to wait on a large, time-consuming query that was presented to the server first.

  • It has much looser licensing requirements. Basically, unless you are going to sell MySQL, you don't need to purchase a license for it.

  • It is currently serving larger databases (as many as 50 million rows) than mSQL. This isn't to say that mSQL couldn't handle so large a database, it is just an empirical observation.

Some disadvantage of MySQL include:

  • Not as lightweight as mSQL.

  • More complex access control system, which is probably the most difficult aspect of MySQL to master.

In the past year or two, MySQL's popularity has exploded. It has replaced mSQL in many places. Virtually every common programming language has a module or wrapper that implements the C API. Support on Internet mailing lists and on the Web is excellent.

In general, if you are frustrated with mSQL's limitations or licensing, you should definitely look into MySQL. If your favorite language doesn't have an interface and you know a little C, it is probably worth your while to port mSQL modules to MySQL.

Installing MySQL

MySQL is free in almost every circumstance, the one exception being that if you wish to sell it, you have to purchase a license. Getting a legal copy of it for your use is thus simply a matter of FTPing the distribution. The source code is available, or if you choose, you can get a Linux binary distribution. Binaries also exist for Solaris and SunOS, as well as a growing number of other OSes.

MySQL's home page is http://www.tcx.se/; there are also numerous mirrors: in the U.S. (http://www.buoy.com/mysql), Australia (http://mysql.bluep.com/), and many others, all listed on the home page. Also on the home page are pointers to FTP for MySQL, the documentation home page, mailing list, and contributed software.

Compiling and Installing MySQL

If you choose to compile MySQL yourself, you will need a multithreading library. If you have Red Hat 5.02 or later, the threading is built into the C library. Otherwise, you will need the LinuxThreads library. You will also need a C++ compiler, presumably g++ under and libg++. Note that libg++ is distributed separately from GCC/g++/gobj-c, just as GNU libc is. Lastly, you will need recent versions of GCC, at least 2.7.2; GNU make, at least 3.75; and libc 5.4.12 or glibc 2.0.2.

GNU autoconf is used, so the common sequence of configure, make, and make install works. Useful options for the MySQL configure are:

--prefix==/path/to/mysql/install/baseWhere to install MySQL
--localstatedir==/path/to/mysql/datafilesWhere to keep the datafiles.
--without-serverOnly compile client libraries and binaries.
--enable-sharedBuild libmysqlclient as a shared library.

Other configure options can be listed with the --help option. You may also want to run an optional make check before installing.

We're not going to go into a lot of detail about how to deal with a troublesome compile, since you can always fall back to a binary distribution. One thing to note is that unpacking the binary distribution and installing from the compiled source distribution will place things in slightly different directories. The binary distribution creates a directory named mysql-<version> and subdirectories for libraries, header files, and program files. Installing from the source makes mysql subdirectories in /usr/local/lib and /usr/local/include. This is important to keep in mind if you compile MySQL-dependent programs or modules.

Beyond making sure that you have the right versions of the tools needed to compile, there is little to check. If you run into trouble, mysql mailing list is probably the best place to ask for help. Information about it can be found in the FAQ at the MySQL home page or any of its mirrors.

Once you have MySQL compiled, you can run the optional make check and then proceed to running the make install. This will install MySQL in subdirectories of your install base as follows: binaries in bin/, libraries in lib/mysql, header files in include/mysql, and the data area will be in var/.

Installing from a Binary Distribution

Unpack the tarball where you want the mysql directory to be created. It will create a directory named mysql-<version info>-<platform info>; you can then create a link using ln or just rename it to mysql. The MySQL binaries will now be in /usr/local/mysql/bin, libraries in /usr/local/mysql/lib, header files in /usr/local/mysql/include, and /usr/local/mysql/data will be the home of the data files and subdirectories.

Setting up the Grants Database and Other Data Files

The next step in our installation is to set up the grants database. Unless you've done some fiddling after the installation, you should just be able to run it and let it do its thing. What it does is set up tables for access control for users. These users do not need to be system users, that is, they do not need /etc/passwd entries.

The name of the grants database is simply mysql. It has three tables: db, host, and user. The db table lists access for the combination of user, database, and host from which the user is connecting. host controls access for the pairing of host and database. Finally, user combines host, user, and password.

To initialize the mysql database, you need to run mysql_install_db. If you used a binary distribution, run the following command in the directory created when you unpacked the archive type:

scripts/mysql_install_db

If you compiled from source, run the above command in the top level of the source tree. This will set up root as a "superuser" when connecting locally, lock out remote users, and force other local users to be granted access on a per-database level in the db table. It will also set up db to allow any access to the test database, or any database starting with test_ to any user. It will start the MySQL daemon, mysqld, for the first time as well.

Since the grants database is probably the most complex aspect of using MySQL, we'll give a few examples here. There are a few things to remember when manipulating the access tables:

  • Passwords are stored encrypted.

  • The tables are sorted, putting wildcard and " " entries last. The first match is then used. Host table is sorted first, then user, and last, for the db table, database.

  • The host table is only consulted if the host field in the user table is " ". Then the privileges that are listed by the user table are logically ANDed with those in the host table.

  • The results of the previous step are ORed (once again in the logical sense) with the user table.

  • If there is no matching entry in db, the matching entry from host is ANDed with the matching user entry.

To start manipulating the mysql database, as root, enter:

mysql mysql

If you installed from a binary distribution, you will likely have to specify the path to the mysql binary.

The following set of entries will lock out all local users that don't otherwise have an entry for a particular database in db:

INSERT INTO user VALUES
('localhost','','','N','N','N','N','N','N','N','N','N','N'),
INSERT INTO user VALUES
('myhost','','','N','N','N','N','N','N','N','N','N','N'),

INSERT INTO host VALUES ('localhost','%','Y','Y','Y','Y','Y','Y'),
INSERT INTO host VALUES ('myhost','%','Y','Y','Y','Y','Y','Y'),

In the above and following examples, myhost stands for the name of the computer where mysqld is running.

Now, to grant the user cary full access to the database notebooks, we would add this row to the db table:

INSERT INTO db VALUES
('%','notebooks','cary','Y','Y','Y','Y','Y','Y'),

If we wanted to give all other users access to the data in notebooks, but not let them modify the table:

INSERT INTO db VALUES
('%','notebooks','','Y','N','N','N','N','N'),

To give the user mark full access with the password osprey from any host:

INSERT INTO user VALUES ('%','mark',password('osprey'),'Y','Y','Y','Y','Y','Y','Y','Y','Y','Y'),

To test your access control, you can use mysqlaccess. The syntax is very straightforward: mysqlaccess [host] [user] [database]. There are a number of options that control the amount of and format of the output as well; the --help option lists them.

In general, it is easiest to give a user no access at all in the user table; simply set up their username and password. Then, do all the access control in the db table. Of course, superusers have to be given full privileges in user. Also, if you want to give a user shutdown or reload privileges, you can only do this in the user table as well.

Now, you may be a little worried about slinging your MySQL usernames and passwords over your LAN. You need not be worried, or at least not as worried as you would normally be since MySQL encrypts its network traffic to at least make it difficult to sniff out users and passwords. (Of course, no decryptable encryption scheme is 100% foolproof. Cracking encryption is a matter of patience, cleverness, and lots of CPU power.)

You're now ready to start populating your database! Just a couple of last notes. When storing data in binary large objects (BLOBs) or variable-length text (VARCHARs), MySQL doesn't allocate space in the table on disk until there is data there. This can substantially reduce the amount of disk space used by your data compared to servers, where space for the entire VARCHAR or BLOB (or TEXT in mSQL) is allocated when the row is created, regardless of how much is actually stored in it.

Contributed Software for MySQL

An Apache authorization module, which is useful if you have a large number of users and lookups in the password flat file are becoming slow.

  • Another Apache module for storing your logs in a MySQL database.

  • A module for running radius authentication through MySQL.

  • Cryllic language extensions (MySQL already supports German and Swedish on top of English).

  • A C++ interface.

  • An NSAPI (Netscape API) authentication extension, which is useful in the same situations that the Apache authentication module is.

  • A number of applications are written using MySQL. Using PHP or Perl's DBI, you can quickly write your own Web-based applications. Applications written to use other database systems can be easily rewritten to use MySQL in many cases.

xmysql: A GUI Interface to MySQL

An X-based interface to the MySQL client, xmysql, exists and is available from the MySQL home page. It requires the Xforms library for compilation. Xforms can be found at http://bragg.phys.uwm.edu/xforms. The interface allows the user to choose tables and columns by pointing and clicking, doing a good part of the query building on its own. Of course, it's likely that the user will have to enter some information in the where clause.

Xmysqladmin

This is an admin for MySQL. It supports the adding/dropping of databases and tables, full access and manipulation of the grants database, browsing and killing of threads, stopping and starting the server, and repair and optimization of tables.

Other Native Linux DMBSs

Freeware

PostgreSQL

Postgres (http://www.postgresql.org/) has been around the UNIX world for quite some time. It is not purely relational and has some objected-oriented features in its core. It's usually referred to as object-relational. In any event, in its most recent incarnation, it supports nearly all of ANSI SQL plus some extensions, including embeddable functions. PostgreSQL is considered by many to be an alternative to MySQL since it has much of the same support for SQL and includes API code for most major languages, including support for ODBC and Perl's DBI.

PostgreSQL is included with the Red Hat CD-ROM, and documentation can be found in /usr/doc/.

Commercial

Solid

Solid Technologies sells its DBMS for nearly every UNIX platform, including Linux for the x86 and Alpha architectures. Support for Solid is available for Python, Perl, PHP, and of course, C.

This SQL server is aimed at higher-end applications and larger businesses. A free evaluation copy is available. Client software for non-Linux UNIX systems is available too.

Solid supports a number of things you'd expect of a high-end database server:

  • Online backups—i.e., the server remains running and a "snapshot" is saved.

  • Crash and power failure recovery to the last checkpoint.

  • Fine-grained access control (at a similar level to MySQL's).

  • ODBC support.

  • Automatic crash recovery.

  • Symmetric mulitprocessing (SMP) support.

Solid's manuals are available online in HTML. It is designed to require a minimal amount of administration. It is easy to set up and more or less runs itself. System requirements are minimal: a couple of megabytes of disk space and a megabyte of free RAM.

Pricing for Solid's Linux product is considerably less than for other systems; as of this writing, Solid Technologies is offering the desktop version (single user, only UNIX socket connections) free for Linux.

Information on Solid can be found at http://www.solidtech.com/.

Empress

Empress is one of the first commercial databases for Linux. It is a relational database, like Sybase and Oracle. Empress supports these features (among others):

  • Multimedia support.

  • Check-pointing and rollback.

  • BLOBs (binary large objects).

  • Shared libraries and a C API.

  • Interfaces for Tcl/Tk, Perl, and FORTRAN (!).

  • A GUI builder for quickly creating graphical interfaces to Empress applications.

  • Two-phase commit.

  • A CGI pass-through for processing Empress HTML SQL (EHSQL), ESQL, is embedded into a document and then parsed by the CGI and passed on to the client browser.

Personal Empress for Linux is available also. More information on Empress can be found at http://www.empress.com/.

Texpress

KE Software (http://www.kesoftware.com/) is a Canadian company that makes an object-oriented (as opposed to relational) database and numerous support applications and modules. It supports relational, free text, and multimedia datatypes.

Texpress also supports massive datasets and OO features, including inheritance and extensibility. You can define methods on objects in the database. Methods can be defined for data displaying, validation, assignment, and branch expression. This effectively lets you write stored procedures and then associate them with a particular object or class of objects.

Texpress' design actually favors more complicated queries, processing them equally as fast if not faster than simpler ones. It also supports massive datasets with ease. As database applications become more and more common and complicated, these are important features to consider.

Texpress' object-oriented design typically means that data can be stored and represented in a more natural fashion and less structured data is handled better than a relational database server.

Essentia

Essentia (http://www.inter-soft.com/) is another database that incorporates some nontraditional technology. It employs what its developer, Intersoft, calls RISE, the Reduced Instruction Set Engine. The communication interface is kept small and well-defined. It supports both relational and object-oriented models of data management.

Essentia also sports these features:

  • Journaling and consistency checking.

  • Database versions—allows you to go back and examine previous states of databases.

  • Incremental backups.

  • Mirroring.

  • Shadowing—data to be modified is copied to a shadow and not committed to disk until a checkpoint is cleared.

  • Locking at the row, table, and schema levels.

Sybase

After making their client libraries available for free for Linux, Sybase has at last released a Linux version. It is available for free, without support.

Oracle

Oracle has recently ported version 8 of their software to Linux.

Informix

Of the four big DBMS makers, Informix was the first to release a server for Linux. This has helped its acceptance in the Linux world relative to its competitors.

IBM

IBM as well has released a Linux version of DB2, its relational database manager.

Summary

Linux has a fairly large and rich set of databases available for use. These range from the very lightweight mSQL to the freeware MySQL to larger commercial database servers like Empress, Solid, and Texpress to Oracle running in SCO emulation.

All of these servers have different strengths and weaknesses. If you find that one of the smaller free or shareware database servers is not fulfilling your needs, nearly every commercial server listed here offers free or low-cost personal editions and/or a trial version of their larger packages. Take a few weeks to really evaluate these before shelling out what could be a substantial amount of money.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.61.81