The Status Script

In this chapter, we've seen a lot of individual scripts and programs that enable the DBA to deal with different parts of the database. Now, it's time to pull them together and create a "dashboard" for the OnLine system.

Of all the tools that come with OnLine, there is nothing that provides a quick, one-page peek into the engine. Various invocations of tbstat will allow you to look at individual items, but nothing gives you a quick status report. If the OnLine system is a database engine, certainly there should be a dashboard. Like a dashboard, the status script gives you or your users one place to look for indications of current problems or problems to come.

This script depends upon having other scripts located in your $INFORMIXDIR/local_bin directory. It requires the following scripts:

  • logging

  • archiving

  • thislog

  • dbdf

Here is the code for the status script:

INFORMIX 103> cat status
#! /bin/csh
clear
set INFORMIXLOCALBIN = "$INFORMIXDIR/local_bin"
echo "Please wait....."
set list = ( '$INFORMIXDIR/bin/tbstat -p | fgrep -v e' )
set dskreads = $list[l]
set pagreads = $list[2]
set bufreads = $list[3]
set readcache = $list[4]
set dskwrits = $list[5]
set pagwrits = $list[6]
set bufwrits = $list[7]
set writecache = $list[8]
set isamtot = $list[9]
set open = $list[10]
set start = $list[ll]
set read = $list[12]
set write = $list[13]
set rewrite = $list[14]
set delete = $list[15]
set commit = $list[16]
set rollbk = $list[17]
set ovtbls = $list[18]
set ovlock = $list [19]
set ovuser = $list[20]
set ovbuff = $list[21]
set usercpu = $list[22]
set syscpu = $list[23]
set numckpts = $list[24]
set flushes = $list[25]
set bufwaits = $list[26]
set lokwaits = $list[27]
set lokreqs = $list[28]
set deadlks = $list[29]
set dltouts = $list[30]
set lchwaits = $list[31]
set ckpwaits = $list[32]
set compress = $list[33]

if ( $lokwaits > 0 ) then
@ lokhold = ( $lokwaits * 100 ) / $lokreqs
@ lokfigure = 100 - $lokhold
else
@ lokfigure = 100
endif
if ( $lchwaits > 0 ) then
@ lchhold = ( $lchwaits * 100 ) / $isamtot
@ lchfigure = 100 - $lchhold
else
@ lchfigure = 100
endif
fgrep Checkpoint $INFORMIXDIR/online.log | tail -6 >! ~/temp_stat
#ps - efa > ~/tempjobs
ps -a > ~/tempJobs
@ NUMBERFULL = '$INFORMIXDIR/bin/tbstat -1 | grep -v RSAM | grep U | wc -1`
endif
@ cpunum = `echo $syscpu | awk -F. '{print $1}'`
@ usernum = `echo $usercpu | awk -F. '{print $1}'`
if ( $cpunum == 0 ) then
@ cpunum = 1
endif
@ ratio = $usernum / $cpunum
echo "
INFORMIX STATUS:COMPUTER NAME: TIME: `date` 
`$INFORMIXDIR/bin/tbstat -` 
CPU UTILIZATION: 
`uptime` 

RATIOS: READ CACHE   WRITE CACHE LOCK HIT% LATCH HIT% USER/SYSTEM
         $readcache       $writecache      $lokfigure        $lchfigure
        ${ratio}-l 
WAITS:  CHECKPOINTS   BUFFERS     DEADLOCKS   
              $ckpwaits      $bufwaits           $deadlks "
echo "
LOGS:   There  are now ${NUMBERFULL} log(s) in use. This includes the current log:
`$INFORMIXLOCALBIN/thislog`"
echo "

CHECKPOINTS:  The last six checkpoints occurred as follows:
`cat ~/temp_stat | sed 's/Checkpoint Completed/ /g'`"
echo 
========================================================================
echo -n " CONTINUOUS BACKUP OF LOG FILES: `$INFORMIXLOCALBIN/logging` ARCHIVE:
`$INFORMIXLOCALBIN/archiving` "
echo " "
Sh $INFORMIXLOCALBIN/dbdf

$INFORMIXDIR/bin/tbstat -d >! ~/temphold99
fgrep PD ~/temphold99 > /dev/null

if ($status == 0) then
echo "CAUTION: A CHUNK IS DOWN"
endif
rm ~/temp_stat* ~/temphold* ~/tempjobs*
echo " "

This script is designed to be relatively portable across systems. I've avoided hard coding anything into the system, with the exception of some of the code in the dbdf script that excludes page usage in my logspaces. Since the logfiles are preallocated, the space usage does not vary. You may or may not want to see the data.

The screen output looks like this:

Please wait.....
INFORMIX STATUS:COMPUTER NAME: batman   TIME:  Fri  Jan 21 15:40:11 CST 1994
RSAM Version 4.10.UD4   -- On-Line --  Up 3  days  17:41:18 -- 18704 Kbytes
CPU UTILIZATION:
3:40pm  up 3 days, 17:46,  79 users,   load  average:  12.94, 13.50, 13.27
RATIOS: READ CACHE   WRITE CACHE    LOCK HIT%    LATCH  HIT%   USER/SYSTEM
              98.39           97.10            100           100         9-1
WAITS: CHECKPOINTS      BUFFERS                DEADLOCKS
               125            9941                    5

LOGS:  There are now 2 log(s) in use.   This includes the current log:
e06d26f0 11       U---C-L  3214     70f4f4       30000    13435    44.78
CHECKPOINTS:   The last six checkpoints occurred as follows:
12:59:42      13:31:53     14:04:41     14:22:41     14:56:41      15:33:37
=============================================================================
  CONTINUOUS  BACKUP OF LOG FILES: OFF     ARCHIVE: OFF
rootdbs       is 37.34 percent full and has 167305 free pages
dbspace2      is 23.56 percent full and has 206379 free pages
dbspacel      is 89.84 percent full and has 166120 free pages
$lowdbspace   is 92.90 percent full and has 61061 free pages
(Chunk down warning appears here if applicable)

The screen does tend to get a bit crowded. In fact, if you have more than four or five dbspaces in your system that are reported by the dbdf script, your report may scroll off the page. If this is the case, you could either reformat the screen output, taking out some of the blank lines that I put in for readability, or you could pipe the result through the UNIX more utility, which would allow the user to page through the output.

Understanding the Status Screen

The first four lines of the screen give you basic status and identifying information about the system. It includes the computer name (from the UNIX hostname utility, some UNIX versions may not have this utility) and the status date on the first line. The second line is the header from tbstat and tells you the RSAM version of OnLine, its status (online, offline, checkpoints), its uptime, and the size of its shared memory segment. The fourth line gives you the output from your UNIX uptime utility so that you can get a feel for the load on your system.

The RATIOS section gives you the read cache and write cache figures from tbstat -p. A well-tuned system will have 95+ percent for read and 85+ percent for write cache hit rates. The next two numbers are derived from tbstat -p output. The LOCK HIT % entry gives you the percentage of time that requests for locks were successful. It is calculated by this formula:

lock hit % =  100  -  ( lock waits  x  100)  /(lock requests)

This number should be close to 100 percent at all times. If this number begins to drop, it means that jobs are having to wait for locks to be released. Look for applications that are holding locks for too long or for problems with long transactions.

The LATCH HIT % statistic is similar to the LOCK HIT % statistic and shows you the number of times that latch requests were successful. There is no statistic available for the number of latch requests issued by the engine, so we used the statistics for the total number of ISAM calls. This gives a count of the number of database primitives that were executed. The formula for this calculation is

latch hit % =  100  -  ( latch waits  x  100)/  (number of isam calls)

This number should also remain at 100 percent. If it starts to go down, look at contention for resources such as buffers, logfiles, etc.

The WAITS line tells you the actual number of times that processes had to wait for checkpoints to complete and for buffers to get free. Use this in conjunction with the LATCH HIT and LOCK HIT percentages if you begin to see slowdowns. The DEADLOCKS number gives the number of times that the engine has taken action to avoid a deadlock situation. This is not the number of times the engine has been deadlocked. Theoretically, a deadlock cannot happen. If your DEADLOCKS count is large, look to your applications. Jobs are failing and are being rolled back because of these potential deadlocks. You have a design error somewhere.

The LOGS lines give you a count of the number of logs that are not yet freed by being backed up to tape. This count also counts the number of logs that have been backed up but have not been released, because they hold information regarding uncommitted transactions. Watch the number of logs carefully. If it gets close to your total number of logfiles, you could be looking at a long transaction rollback. Check this screen to be sure that you have continuous backup of logfiles running. If it is running and if you have many logs in use, run a tbstat -u and look for transactions that have a lot of reads or writes. They are probably the long ones. The second LOGS line shows you the information about the CURRENT logfile. It is in the same format as the tbstat -1 output.

The CHECKPOINTS lines show you the times of the last six checkpoints. If these checkpoints are closer together than you usually see, look for some jobs running that are causing a lot of logging. Long inserts and long deletes are a common culprit here. Unless you begin to get into logfile problems, there's not a lot that you can do. If you find that the checkpoints occur too frequently, you may need to tune the database.

The line immediately after the double lines gives you a quick status of your logging and archiving processes. If you are doing archives using the scripts in this section, you will be piping the output to the archive.log file. If so, when you are doing an archive, the percentage complete will appear after ARCHIVE. If not, it will just read ON or OFF. Continuous backup of logs will be a simple ON/OFF display.

The final section gives you dbspace by dbspace figures of percentage used and free pages. The final section will give you a warning about chunks that are offline if you have any. Note that this script does not consider mirroring chunks. If you are using Informix mirroring, you may wish to modify the script to show the mirror chunk data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.75.227