Design Considerations in Database Systems

This section will be applicable to you only if you have the luxury of building your database system from scratch. If you are starting with a blank sheet of paper, an empty computer room, and a bottomless bank account you can build a nearly ideal system.

Of course, an ideal system designed to spec today, delivered by the vendors in three months, and operable in 9 to 12 months will never be ideal when it goes into production. At current rates of hardware and software development, something new and better seems to come along every 3 to 4 months. So, the ideal system is just that, an ideal. System vendors themselves never have ideal systems, so why should you expect it for yourself?

In real life, you are probably inheriting an existing system. You have time and budget constraints, you have to maintain production (whatever that means in your environment), and you have a backlog. Welcome to the real world.

Still, the design issues that you would face in building a state-of-the-art system from scratch are applicable to you even if you have an existing system. As your system grows and matures, you will probably have opportunities to upgrade your system, add peripherals, upgrade the software, and modify your applications. Who knows, your company may land a big contract and the CEO might come into your office some day with a blank check.

So what is a state-of-the-art system? The answer, of course, is highly time-dependent. What is hot and new as I am writing this chapter will be old news in a year. You'll find the latest and greatest in current magazines. Maybe you'll see it from your hardware vendors, but in my experience if you don't ask for it, you'll never find out from vendors. An amazing attitude from people who make a living selling you things!

Your hardware decisions will mainly concern choice of equipment manufacturers, processor layout, and peripheral layout. We'll stay away from questions of who makes the best hardware, because there is no correct answer. Your best manufacturer is the one who makes something you feel comfortable with and from whom you can get the best service. I've always found that the service component is much more important than the hardware end. If you have a company that gives you A+ hardware and B- service, you'll probably be much less pleased than with a company giving you B+ hardware and A+ service. Hardware is something that you agonize over while making the original choices and then find ways to live with for the rest of the life cycle. As long as it works and performs adequately, you'll probably be happy. Service is something that you have to live with daily. When you need support or service, odds are you're in a bind and you need it NOW. You probably have users screaming at you and you cannot put up with lackadaisical support or service. Go for the service! With luck, you can get both good hardware and good support, but don't hold your breath.

Hardware

Computer systems continually evolve. A decade ago, proprietary systems were the way to go. You would get all of your hardware and software from one source and that one source would hold your hand forever and handle all of your needs. The end result of this commitment to proprietary systems was that the user became locked into one vendor. Purchasers of systems began to realize that this limited their flexibility and their bargaining power with their system vendor.

Systems are now evolving toward the concept of open systems. Rather than being locked into one "big iron" vendor, you pick the best pieces from best vendors and somehow make them all work together. The glue that ties it all together is the UNIX operating system and the evolving standards that are being adopted by vendors. These standards include UNIX for the operating system, TCP/IP for networking, X-Window for graphical displays, C and C++ as programming languages, and SQL for the database language. Informix is a major player in the open systems movement, with database engines that run on everything from PCs to mainframes. Personal computers, even those not running UNIX or UNIX-derivatives, also play in the open systems arena. Microsoft Windows™, Microsoft Windows NT™, and OS/2 systems are increasingly finding themselves used in connection with UNIX servers. These personal computers often serve multiple purposes, ranging from acting as terminals talking to UNIX boxes over the network to serving as X-Window displays using the PC's intelligence to emulate an X-Window terminal. Personal computers are also serving as full-fledged clients in distributed applications, with the PCs handling the input/output and the graphics and the UNIX servers handling the database management.

In recent years, NT has moved more and more into the mainstream of database environments. When Informix introduced server products on NT, everybody thought they were crazy. This tune will change as more people see NT systems performing on a par with, or better than, their UNIX brethren. This is happening now, and will continue into the future because of the rapid pace of hardware advancement.

It is a fact of life that with the size of the Intel-compatible Windows NT/Windows 95 market, as soon as someone comes out with faster chips or faster busses or faster peripherals, the next day they will be running on NT. Most UNIX systems are more proprietary and will lag behind the rapid pace of development of the PC-based hardware. I can buy an Intel-based PC today that is twice the speed and half the cost of the one I bought a year ago, and it will run Informix 7.30 on NT out of the box. Any inefficiencies in O/S design will be more than made up for in hardware speed. Fast hardware can cure a lot of ills.

There are other options for running Informix on this latest and greatest hardware. Operating systems such as UNIXWare will run on commodity hardware and give you the robustness of UNIX along with the ability to keep pace with hardware developments.

A somewhat surprising advance in database functionality is now happening with the Linux operating system. Linux is a UNIX variant that is essentially free. You can buy a CD with Linux on it for $50 or less. Many UNIX fans are adopting Linux as the software of choice because of its low price and robustness. Linux fans swear that Linux is more reliable and less liable to crash than NT, and it runs on commodity Intel processors. Many organizations are reluctant to bet the farm on an operating system that nobody owns and that nobody supports. That's the line, anyway. What really happens is while there is no central owner who supports Linux, the actual support is done by all of the users and sellers of the product. Since the source code of Linux is freely available, if someone else can't fix your problem you have the option of breaking out your C compiler and fixing it yourself. Try that with NT!

Informix has recently made available the Informix-SE engine on Linux and has made it available free for download over the Web. Plans call for porting other products to Linux and for developing a means of supporting the products on Linux.

A big advantage of the open systems concept is scalability. By choosing to remain compliant to standards, a system can grow and evolve as your needs change. Applications running on an SCO UNIXWare server on a personal computer can easily be ported to larger minicomputers or even mainframes as long as the basic database and language tools are available. This flexibility also extends to your choice of vendors. You are no longer married for life to a hardware or software manufacturer. If you find that you've made a bad choice or two, or if the vendor is no longer providing you with what you need, you no longer have to scrap everything and begin anew. The vendors realize this and are beginning to behave in a more competitive manner. They now see that users and purchasers have choices. This will give users much more leverage with vendors.

Processor Decisions

In today's market, the rising stars in the hardware arena are the multiprocessor or parallel processing systems. We are discovering that several less powerful processors that can cooperate inside one box can often outperform one super-fast processor. By breaking tasks up into pieces that can be farmed out to multiple processors, the central processing unit can often get much more work done and provide a higher level of throughput.

In some systems, the multiprocessing or parallel processing paradigm can also allow for more tolerance for failure, as the processors can fill in for each other should one fail. This is by no means universal. In fact, in some multiprocessing arrangements, failure of any one processor will bring the system down, giving you a higher probability of failure.

The OnLine product provides for various levels of support for multiple and parallel processors. This is version-dependent. Beginning with OnLine 4.1, Informix provided the psort capability that will farm out sorts to multiple processors if your system has them. This was enhanced in Version 5.0. INFORMIX-Dynamic Server provides more utilization for multiple processors in the areas of sorting, indexing, archiving, and database access. Version 7.X provides support for fully parallel architectures and provides outstanding performance for both OLTP and DSS needs.

Clustered Systems

Beginning with the 8.X series of servers, Informix has provided support for clusters of either UNIX or NT systems. Using what Informix calls a "shared-nothing" design, these systems allow multiple IDS engines to cooperate and share data and queries, providing almost linear performance increases as additional systems are added to the cluster. In traditional SMP and MPP environments which share busses and other resources, increases in the number of processors give rise to a diminishing return in processing performance as the shared resources begin to become saturated.

The shared-nothing systems scale much better, since they each handle their own resources and give the DBA much better granularity of control over what happens in each system. In addition to the 8.X (XPS) family, NT versions of IDS beginning with 7.30 have the capability of operating in such clusters.

The first commercially available NT clusters were produced by Tandem. In 1995, Stratus introduced the Radio line of NT cluster systems that won "Best New Product" at several major computer shows. Radio depended upon a 100-megabit network backbone that connected dual-processor off-the-shelf Intel systems into a virtual machine that was designed for fault-tolerance. The fault-tolerance was provided by software running on each node of the cluster.

Microsoft is now working on its "Wolfpack" clustering scheme, which should bring clustered systems into the mainstream. As more such developments occur, the penetration of NT into the "big iron" world should continue to increase.

Disk Storage Systems

Much of the work that a database system does revolves around taking information that is stored on a hard disk drive and making it available for user processes. On most computer systems, the slowest thing that happens is transferring data to and from a hard disk. It's obvious that if a product depends so heavily on something that is relatively slow, management of the slow resource takes on a critical role.

This is the case with Informix. Many of the setup and tuning strategies the DBA employs to extract the best performance from the system have to do with reducing the amount of time the system spends accessing the disks. Thus, we have a large emphasis on caching data items in the shared memory where access is many times faster than from disk. A lot of time can be spent in trying to physically place tables in areas of optimum ease of access on the disk. The aim is to cache the data in shared memory if you can, but if you have to go to the disk, go there with as little relative motion of the disk drive heads as possible. When you have to move data from the disk to shared memory, the data movement should be as fast as possible and involve the least possible movement of the disk heads.

From these requirements, it is obvious that you should have the fastest disk storage system that you can obtain if you are looking for peak performance. Since the disk system is such a common bottleneck to high performance, improvements in this area can often pay impressive performance dividends.

In the area of disk layout, there are four main areas that you need to consider, actual disk configuration, disk striping, mirroring, and RAIDs (Redundant Arrays of Inexpensive Disks).

Physical Disk Configuration

No matter what your disk system type, you need to carefully consider how you use it to get the best performance. The choices that you make regarding placement of the rootdbs, placement of your logfiles, placement of your physical logs, and placement of your most active tables can make a big difference in performance.

If you have multiple disks, it is usually better to place the database components on different physical drives from the UNIX or NT filesystems used by your applications. This way, movement of the disk heads caused by UNIX or NT jobs does not interfere with those belonging to Informix. If you can place these areas on separate controllers, so much the better.

I'm assuming that you are using raw devices for your data areas. They will show a uniform improvement over UNIX filesystems for the simple reason that the work that UNIX would do to maintain a filesystem is redundant for a chunk. Informix is running its own routines to manage its chunks, and there's no need to duplicate them. The situation of raw vs. cooked filesystems becomes murkier with the NT port. Versions prior to 7.30 did not have the option of using raw filesystems. They had to use NT's NTFS filesystem. Nonetheless, performance was very good. The jury is still out on the use of raw devices in IDS 7.30+ on NT.

If you have multiple raw disks available for your database, you have to decide how to divide up the space. Often, an inexperienced DBA will simply create one large rootdbs and place everything there. This is not usually a good idea for several reasons. The rootdbs can never be dropped. To get rid of it, you have to reinitialize your database and restore the data from tape or other sources. As such, space that you devote to rootdbs in Informix is space that you can never get back. If you need to make changes or allocate resources, you have to rebuild.

On the other hand, if you have a reasonably sized rootdbs with separate dbspaces for various tables or databases, you maintain some flexibility in dealing with change. You can move databases or tables from dbspace to dbspace a lot easier than by recreating everything from scratch. These convenience and flexibility items are in addition to the performance issues that can be addressed by having separate dbspaces for different database components.

You may have some disks that perform better than others. If so, this is where you want your most active tables. You can put them there by creating dbspaces on the fast devices and using the IN DBSPACE clause of your table creation statements. You'll find that some tables and some portions of the database system will get a lot more writing activity than others. Put these on the faster devices. Good candidates for placement on faster devices are your physical and logical logfiles.

Disk Striping

Disk striping is a method of creating logical disks that are actually composed of several physical disk devices. Disk striping is also known as RAID 0. Striping is supported on many UNIX and NT machines and is often beneficial to database performance. With striping, a logical disk is composed of data tracks that are interleaved across several devices. Thus, you can give a command calling for accesses to the disk and have the heads of several devices move at once to retrieve the data. In some cases, this can improve performance. It makes actual physical placement of tables and dbspaces on disk less critical, as the striped volume is distributing the read load across many spindles. It also cuts down on the possibility of contention, or forcing a disk head to make large jumps from disk request to disk request.

Striping is an operating system function. Informix does not know or care about how the volume is laid out. As long as the engine can execute calls to the logical volume and receive data, it does not matter whether the underlying device is a physical device or a logical device.

Striping is a powerful tool in your search for performance. If you have it available to you, it is almost always better to use it than not to. If you have an efficient, reliable striping method, you do not really have to worry about the physical location of tablespaces and dbspaces in your system. This can drastically cut down on your possibility of error and on the amount of time you spend fiddling with disk layout. With striping, you can do it once and let UNIX take care of it in the future.

Disk Mirroring

Both striping and mirroring are actually subsets of the RAID concept, each being a different RAID level. Mirroring is also known as RAID 1. There are two choices for mirroring with Informix engines. You can either let the operating system or hardware handle the mirroring, or you can let Informix handle it.

If you enable operating system mirroring, the O/S handles it much like a striped device. When one side of the mirror device has a data change, the drivers automatically make the change on the other side. Mirroring is not really a positive performance issue. It is more of a redundancy issue. Using mirrored devices allows you to survive a failure of a physical drive and to continue running on the other device. Mirroring actually slows performance down somewhat, but the added fault-tolerance makes it often worth the cost.

The other approach to mirroring is to let Informix handle the mirroring. Whenever you create a chunk, you can tell the engine that you want to mirror it. To do so, you must have twice the space normally required on your drives. Thus, to mirror a 1-gigabyte chunk, you would need to have two 1-gigabyte chunks available, a primary chunk and a mirror chunk. The database engine will automatically apply changes from the primary chunk to the mirror chunk. Informix mirroring is somewhat slower than operating system or hardware mirroring, especially if your mirroring is built into your hardware.

The natural question is "Which one is better?" I generally tend toward taking the O/S mirroring over the Informix mirroring in systems that need to minimize maintenance and upkeep. Mirroring that is integrated into either your hardware or your operating system can be more highly optimized for your specific environment. Informix mirroring is a fairly generic product. It works the same for all environments, while the native mirroring is often more highly tuned. From an operations standpoint, they both seem to provide the same level of security.

For situations where the DBA is proactive and sophisticated, Informix mirroring is often a better idea. This mirroring gives the DBA some additional options for moving and relocating disk devices that can prove very helpful if the DBA is savvy enough to take advantage of them. In modern RAID designs, the write speed penalty for RAID 1 is not usually significant. These systems make extensive use of intelligent drivers and caching controllers. The read performance of RAID 1 is often better than with a JBOD, because the RAID offers parallel read capabilities, reading from either the primary or the mirror.

RAID

Both mirroring and striping are considered to be low level RAID implementations. RAIDs utilize several physical disks that are addressed as one logical disk. Various levels of RAID systems lay out the disks in ways that can provide performance gains as well as increase reliability.

A good RAID implementation can provide several advantages to a state-of-the-art system. First, the individual disks that comprise a RAID system are usually less expensive than traditional disks used with larger minicomputer systems. These disk drives are often 4 to 9 gigabyte drives designed to be used with personal computers. These disks are priced more for the PC market than for the minicomputer market and are relatively inexpensive compared to those sold strictly for the high-end markets.

Second, RAIDs provide a level of redundancy and can survive the failure of one or more of the individual disks. Some of the better implementations also allow a hot swap capability in which a bad disk can be replaced without taking the system offline. After the bad disk is replaced, the RAID will begin the process of resynchronizing the disks so that the redundancy is restored.

Not only can you get better performance due to the striping capability of RAIDS, you can also get a level of tunable performance. Some systems can allow different levels of RAID redundancy in different parts of the file system.

As an example, some areas such as physical and logical logs in Informix will see a lot of write activity and little or no reads. If these areas can be put in an area that is tuned for writing, not for random reading, the overall performance will improve.

RAID 5 is a common choice for database systems. It provides improvements in reading speed with a fair degree of fault-tolerance, at the cost of sometimes significant degradation of write speeds. If your system is write-intensive, RAID 5 is probably not the way to go.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.84.155