The Tbstat Utility

The tbstat program gives you instantaneous data about the contents of the shared memory structures. It takes very little CPU time, and locks no tables and creates no latches. As such, you can consider that tbstat is essentially "free"; that is, it has a minuscule effect on database performance. You can let it run continuously in the background if you wish, collecting performance data and usage data with little or no performance cost.

There is only one option that could possibly cause you any problems, and that is the tbstat -o filename option. This option places the entire contents of shared memory into a file specified by filename. If you have a large shared memory, be sure that you have sufficient room in your UNIX file system to save the file. This option will also cause significant I/O in the system if you have it saving a large shared memory file to disk.

The tbstat utility is often used as a troubleshooting tool. When you encounter problems in the engine such as unexpected freeze-ups, long transactions, deadlocks, and the like, you will turn to the tbstat utility to tell you what's happening in the engine. Usually, this is an effective technique, but sometimes your problem will be transient.

Since tbstat gives you an instantaneous snapshot, by the time you enter the correct tbstat command, the problem may be gone. If you are looking for solutions to these transient problems, you may need to use the tbstat -o filename option. The trick is to catch the database in the act of doing whatever it is doing and save the shared memory to a file.

If you can get the output file saved before the problem goes away, you can then inspect the output file at your leisure and run any or all of the tbstat options against the file without worrying about the transient problem going away. To do this you would use the following series of tbstat commands:

   tbstat -o any_filename                                (takes the snapshot)
   tbstat -"any_options" "filename"               (reports from snapshot file)

The first step in fully understanding the tbstat utility is understanding what all of the options mean. This is the help screen for tbstat (commonly used items are marked with an asterisk):

   joe  60>tbstat --
   usage: tbstat [-abcdklmpstuzBDFPRX] [-r secs][-o file][infile]
   *    --     print this help screen
        -     print just the tbstat header (undocumented feature)
        a     print all info (options: bcdklmpstu)
        b     print buffers
        c     print configuration file
   *    d     print dbspaces and chunks
   *    k     print locks
   *    1     print logging
        m     print message log
   *    p     print profile
   *    s     print latches
   *    t     print tblspaces
   *    u     print users
        z     zero profile counts
        B     print all buffers
        D     print dbspaces and detailed chunk stats
        F     print page flushers
        P     print profile, including BIGreads
        R     print LRU queues
        X     print entire list of sharers and waiters for buffers
   *    r     repeat options every n seconds (default: 5)
        o     put shared memory into file (default: tbstat.out)
   infile use infile to obtain shared memory information

Tbstat - -

Using this option displays the help screen for the tbstat command. This is actually the "minus" option, not the "minus minus" option. The first minus sign indicates that a command options follows; the second minus is the actual option.

This convention is followed by most of the OnLine utilities. Just enter commandname -- and you'll see the help text.

Tbstat -

Using just the first minus and a null option will show just the header of the tbstat series of commands. This header looks like:

   RSAM Version 5.01.UC1   -- On-Line -- Up 7 days 23:54:27 -- 18704 Kbytes

This header includes five important sections. The first section gives the version number of the OnLine engine. The second section shows the current operating mode. Possible values of the mode are:

Offline: Engine not running.
Quiescent: No user can start a server process. User informix can use administrative options of tbmonitor. Any user can use tbstat or view options of tbmonitor.
Online: Fully usable by all users.
Shutdown: System is in transition from online to quiescent or from quiescent to offline. Cannot be canceled.
Recovery: Moving from offline to quiescent. Fast recovery is performed here.

The third section is the (checkpoint) indicator. If applicable,this section can contain:

(CKPT REQ) Some process has requested a checkpoint, and the engine is trying to initiate the checkpoint. This flag is often seen if the database is hung waiting for a checkpoint.
(CHKT INP) Checkpoint is in progress. Users have read-only access to the database. No changes to data are allowed until the checkpoint ends.

The fourth section shows the real time that the database has been running, that is, the time since the last database startup. The fifth and final section shows the amount of shared memory in use by OnLine in kilobytes. This is roughly the size of the output file generated by the tbstat -o command.

Tbstat -d

The tbstat -d command provides several important pieces of information, the most important being the disk and chunk usage for the database. The output is divided into two sections, Dbspaces and Chunks.

   joe  43>tbstat -d
   RSAM Version 5.01.UC1   -- On-Line -- Up 7 days 23:48:38 -- 18704 Kbytes
   Dbspaces
   address  number   flags    fchunk   nchunks  flags    owner    name
   e0016e88 1        1        1        3        N        informix rootdbs
   e0016eb8 2        1        4        2        N        informix logspace
   e0016ee8 3        1        5        2        N        informix dbspacel
   e0016fl8 4        1        7        2        N        Informix logspace2
   e0016f48 5        1        10       1        N        Informix slowdbspace
   5 active, 20 total

Table . tbstat -d output (dbspace section)
address Dbspace's address in the shared memory table.
number Dspace's unique ID number. Dependent upon the order in which the dbspaces were created.
Flags First series of flags describes mirroring, status, and/or blobspace status:
1 0x0001 Not mirrored
2 0x0002 Mirrored
3 0x0004 Dbspace is down
4 0x0008 Newlymirroreddbspace
5 0x0010 Blobspace
 Unique identifying number of the first chunk in the dbspace.
nchunks Total number of chunks in this dbspace.
flags Second series of flags gives essentially the same information as the first only using alphabetic codes:
 First position M Mirrored
  N Not mirrored
 Second position X Newly mirrored
 Third Position B Blobspace
owner User name of the owner of the dbspace.
name Name of the dbspace.

  Chunks
  address chk/dbs offset size     free     bpages   flags pathname
  e0014e78 1  1   0      135000   35718             PO-   /dev/rootdbs3
  e0014f10 2  1   0      57000    56997             PO-   /dev/rootdbsl
  e0014fa8 3  1   0      75000    74997             PO-   /dev/rootdbs2
  e0015040 4  2   0      147500   24542             PO-   /dev/log1
  4 active, 27 total

Table . tbstat -d output (chunks section)
address Address of the chunk
chk Unique ID number of the chunk
dbs Number of the dbspace containing this chunk. Corresponds to the number field in the dbspaces output of this command.
Offset Number of pages that this chunk is offset into the device, if any.
Size Size of the chunk in pages.
free Number of free pages in this chunk. This is the number of pages that are not part of any allocated tablespace. It does not take into account any unused pages within extents that are allocated to tablespaces.
Bpages Number of free blobpages. Blobpages can have their own page sizes. Don't confuse this number with free pages. They don't necessarily correspond.
Flags Status flags for the chunk, using the following codes:
Position 1 PPrimary chunk
M Mirror chunk
Position 2: O Chunk Online *
D Chunk Down *
R Recovering
XNewly mirrored
Position 3: B Blobspace
- Dbspace
pathname Either the device name or the path name of the UNIX cooked file that contains this chunk.

On most terminals, the screen displays for the O and the D letters are easily confused. It's often hard to tell from the screen whether a chunk is up or down. Versions prior to 5.0 can run with a chunk down, so it's important to watch your status.

Usage Tips

If you're like most DBAs, you'll spend a lot of time fighting the database "battle of the bulge." Murphy's Law says that any database will expand to fill the amount of space available. OnLine does not bother to warn you when you're beginning to fill up your disks.

The first sign of a problem will be when users get messages saying that OnLine cannot allocate more extents for a table. Especially when you're using raw devices, there is no easy way to tell how much disk space you are using.

The tbstat -d program is your most powerful tool for monitoring the usage of your disks. This invocation of tbstat is used to develop percentage full statistics for various dbspaces in the status script that we develop later. You'll probably develop various scripts on your own to massage the output of all the utilities. With tbstat -d, you'll find yourself writing scripts to answer such questions as "How much space is there left in dbspace X?"

Here it becomes important to have some naming conventions for your chunks and dbspaces. If all of the chunks contained in dbspace dbsl begin with the letters dbsl; for example, it will be a simple matter to grep for dbsl followed by the UNIX command, awk '{print $6}' to get the free space in dbsl. If you have the luxury of creating your databases from scratch, keep in mind this consistency in naming. Do the same with the names of your dbspaces dedicated to logfiles and with the names of your tables too if possible.

You may notice that the examples in this book do not always exhibit this level of foresight. I don't have the excuse of having inherited the databases. I developed many of these databases before I began developing DBA scripts to manage them. If you follow these recommendations, you'll do a better job. I know that doing so would have made my life as a scriptwriter a lot easier.

If you're not familiar with the capabilities of the UNIX commands grep, sed, and awk and with the operations of UNIX regular expressions, it may make some sense to learn them early in the game. Using these UNIX tools on the outputs of OnLine utilities can allow you to generate just about any sort of reports and data

Tbstat -D

This is a useful variant of the tbstat -d command. It provides exactly the same output as the tbstat -d command, with the exception that in the Chunks section the size and free statistics are replaced by page Rd and page Wr numbers representing the numbers of pages read from and written to in each of the chunks.

This option provides a very useful tool for locating "hotspots" in your databases, areas that have abnormally high I/O activity. Since disk I/O is most often the slowest activity your computer system will perform, the elimination or minimization of these hotspots can go a long way toward improving the performance of your databases.

One minor drawback to tbstat -D is that it reports cumulative data rather than instantaneous data. The statistics shown cover the time period beginning when OnLine was either last started or when the statistics were zeroed out. While it would be easy to write a script that subtracts out the last totals and only shows you the differences from the last time the script was run, there is an even easier way to do this. Write a short c-shell script and name it hotspot:

   joe 51> cat hotspot
   #!/bin/csh
   tbstat -D
   tbstat -z

Every time you run the hotspot script, it will zero out the tbstat statistics when you finish. Each subsequent run will show the actual activity that has occurred since the last run. This is suitable for running hotspot repeatedly in one sitting or over a short period of time. It could be modified to save the date and statistics of the last run, but why get complicated? Just remember that running hotspot will zero out all of the tbstat statistics. If you (or any other user) are running cumulative statistics monitoring, the tbstat -z's will interfere with this statistics gathering.

The most important thing I've discovered by running hotspot is the tremendous amount of activity that occurs to the physical logfile. Conventional wisdom says to locate the logfiles in an area that is optimized for heavy write activity. I had taken this advice to mean the logical logfiles. Study of hotspot output will show that it's the physical logfile that needs to be optimized. It is relatively easy to relocate and/or resize the physical logfile (the database does need to come down however). Watch your hotspots for a while and then experiment with relocating and resizing the physical log. You'll see how much activity it gets. It is also possible to experiment with the size of the physical log buffers as a way to optimize the writing activity.

Tbstat -z

Since we just used tbmode -z in a script, this may be a good time to talk about this command option. There's not much to say. It just zero's out everything. This does not affect anything in the database. You can zero it out all day long if you wish. Don't confuse these statistics with the statistics kept by the OnLine UPDATE STATISTICS command. That's a completely different subject. Just be aware of the fact that you may interfere with other statistics-gathering programs. I discourage my users from using any of the OnLine utilities, preferring to give them access through scripts that I control. That way, they won't be stepping on my toes.

If your OnLine system stays up for long periods of time without zeroing out the tbstat statistics, you'll eventually reach your UNIX limits on integer sizes and some of the numbers will roll over, showing negative numbers. To correct this, just run the tbstat -z program.

Tbstat -u

This option gives you an overview of the database processes that exist at any particular time. This option gives you data from the user pseudotable in the shared memory. It is useful as a debugging tool when the system has inexplicably frozen up and the DBA is attempting to find out what is going on. In the case where everything is frozen, you will need to compare information from this option with data from other tbstat options to trace down the offending process. This usually takes some time because you have to track the transactions through several invocations of the utility. If the database is hung up, probably nothing in the user table is changing and you can run multiple tbstats against the shared memory and get the same data. If the database is not frozen, you often run into a problem because the data in the user table is changing. In this instance it is often useful to take a snapshot of the shared memory by running the tbstat -o file command and then running your tbstat options against file.

With a hung database in a production environment, it is often tempting to simply restart the database and hope the problem goes away. While this is often a viable solution to the problem, there are instances when the recovery time would be excessive and when it would be better to try to track down the specific problem.

You can expect such excessively long recovery times in recovery from such problems as long transactions interrupted while in rollback or other occasions in which much logfile data has not been released. Whether to try to debug or restart the database is a judgment call. Even if you decide to shut down the database, do a tbstat -o debug.filename so that you can later try to find out what caused the problem.

If you learn nothing else of tbstat, learn how to read the tbstat -u output. OnLine is a black box and tbstat is the main way of looking into this black box. This data is impossible to decipher from a UNIX ps output. It just shows up as sqlturbo. You'll note that many of the flags correspond with the invocation flags for tbstat. Keep working with it. Soon you'll be reading these outputs in your sleep.

The output of tbstat -u changed from OnLine Version 4.11 to Version 5.0. Both 4.11 and 5.0 have the same first section. OnLine 5.0 gives you a second section, Transactions, that applies to the X/A environment and to transactions that are running under INFORMIX-Star.

Here is a greatly abbreviated sample of the output from the tbstat -u utility:

  joe  46>tbstat -u

address  flags  pid    user    tty     wait    tout locks nreads  nwrites


e0001b70 ------D 555     informix console  0        0    0     6746     2607
e0001f30 ------- 21059   dba      -        0        0    1     31538    3813

address Address of user in user table in shared memory. This is the only point of contact between user addresses found from other tbstat command forms. As you trace a problem using the -s output (latches), the -b -B and -X outputs (buffer status), and the -k output (locks held), you will eventually trace it to a user address. Then match the address with a PID or user name.
Flags Probably the most important output of this command. Tells you what the process is doing. Has a four-position alpha output:
Position 1: B Waiting for Buffer
C Waiting for a Checkpoint
L Waiting for a Lock
S Waiting for a Latch
X Long trans. Awaiting rollback
G Waiting for loG buffer
T* Waiting for transaction*
Position 2: B Transaction w/Begin work
T In trans. Logging occurred
R In Rollback
A Archive process
C* Committed or is committing*
H* Heuristically aborted or aborting*
P* Prepare state. Precommitted*
X* X/A prepared or is doing so*
Position 3: R Reading from the database
X Inside a CRITICAL section
Position 4: M Running tbmonitor program
D Running a Daemon
C Corpse. Dead process awaiting cleanup
F Page Flusher daemon
pid UNIX process identification number.
user Name of the user running the process.
tty TTY associated with the process, if any.
wait Latch or lock ID being waited for, if any.
tout How many times a lock timeout has occurred for the process.
locks Number of locks held by this process.
nreads Number of read calls executed by this process.
nwrites Number of write calls executed by this process.

Note

Items marked with * used in Online V.5.0 and above only Versions 5.0 and above include this Transactions section in the output of the tbstat -u command:


  Transactions
  address  flags user     locks log begin isolation retrys coordinator
  e0004a24 A---- e0001bbc 0     0         NOTRANS   0
  e0004c04 A---- e0001c28 0     0         NOTRANS   0
  e0005384 A---- e0001eb0 1     0         COMMIT    0
  e0005564 A---- e0001f1c 1     0         COMMIT    0

address Address of the transaction in transaction pseudotable.
flags A five-position alpha flag of which #1, #3 & #5 are used:
Position 1: A Attached to a server process.
C Cleanup. Probably crashed.
Position 3: B Begin. Work has been logged.
Writable operation has occurred. Not the same as a BEGIN WORK.
C Transaction being committed.
H Heuristic rollback in progress.
P Prepared to commit.
R Rollback in progress.
Position 5: C This transaction is Coordinator.
G Global transaction.
P Participant transaction.
user Address of the user process that owns the transaction
locks Locks owned by the transaction and thus by its owner process
log begin ID of the logical log where this transaction begins
isolation Transaction Isolation Level:
COMMIT Committed Read
CURSOR Cursor Stability
DIRTY Dirty Read
REPEAT Repeatable Read
NO TRANS Processes that don't own transaction, or Databases with no logging
retrys Number of times this transaction has exceeded the timeout (TXTIMEOUT)
coordinator Name of the transaction coordinator

Tbstat -t

This invocation of the tbstat utility provides information from the tblspace pseudotable in shared memory. These may be regular database tables currently in use, temporary tables created by users, or temporary tables created implicitly by the OnLine system for such tasks as SORT or ORDER BY. Each table in use will have only one entry.

   joe:/devel/usr/joe> tbstat -t

   RSAM Version 5.00.UC3   -- On-Line -- Up 2 days 08:09:41 -- 16080 Kbytes
   Tblspaces

    n address  flgs ucnt tblnum   physaddr npages nused npdata nrows nextns
    0 c0095c98 1    1    1000001  10000e   2214   2214  0      0      3
    1 c0095da8 1    1    2000301  200004   1285   71    0      0      1
    2 c0095eb8 1    1    3000001  300004   245    245   0      0      1
    3 c0095fc8 1    1    4000001  400004   115    2     0      0      1
    4 c00960d8 1    1    5000001  500004   285    2     0      0      1
    5 c00961e8 1    2    6000001  700004   15     2     0      0      1
    6 c00962f8 1    2    7000001  800004   15     2     0      0      1


   7 active, 1500 total, 512 hash buckets

Table . Explanation of the fields in the tbstat -t output.
n Counter of the number of open tablespaces.
address Address of the tablespace in the tblspace pseudotable.
flgs Status of the tablespace:
0×l 1 Busy
0×2 2 Dirty, needs to be written
ucnt Usage count. How many users are using the tablespace?
Tblnum Tablespace number (hexadecimal). Corresponds to hex(partnum) from the system table, systables, for the database. The first digit (leftmost 8 bits) of the tblnum will always tell you the dbspace that contains the table (cross-reference to the tbstat -d output). The remaining portion gives you the order in which the tables were created in the dbspace. The internal table known as tablespace tablespace will always be X000001, where X is the number of the dbspace.
Physaddr Physical address on disk of the tablespace.
npages Total number of pages allocated to the tablespace. This includes data and index pages as well as empty space that is left over in any allocated extents.
Nused The number of pages in the tablespace that have been used.
Npdata Number of data pages used. Does not necessarily correspond with npages because an extent may have empty pages. This does not differentiate between data and index pages, although the heading (npdata) may seem to indicate so. This refers to DATA + INDEX pages.
Nrows Number of rows in the tablespace.
nextns Number of noncontiguous extents allotted. Even though the table grows larger and new extents may be added, if these extents can be created contiguous to the last extent, this number does not get bigger. This is a good indicator of the level of fragmentation of a tablespace.

The final line of the tbstat -t output is

   2 active, 1100 total, 512 hash buckets

The active represents the number of tablespaces that are currently active, while the total figure is the configuration parameter found in the $TBCONFIG table.

A hash bucket is part of an algorithm to speed up access to the tblspace table. It's a method of dividing up all of the possible values and cross-referencing them to values in the hash table. The cross referencing is done by an internal mathematical algorithm that takes each entry in the tblspace table, performs some math on it, and relates it to an entry in the hash table. Each entry in the hash table is called a hash bucket.

Tbstat -s

The tbstat -s invocation provides information on the active latches in the OnLine system memory. An active latch represents a user process that is modifying a shared memory structure. In normal operations, latches are granted and released rapidly. At any particular time, a run of tbstat -s will most often reveal no latches held.

A latch is a type of lock that is placed on shared memory. Most often, latches are placed on the pseudotables in shared memory. When a user needs to access one of these pseudotables, he first must acquire a latch on the table, then make his changes, and then release the latch. If the user cannot obtain the latch, the user process will wait for that latch to be freed. If for some reason it can never obtain the latch, the user process will hang.

When the database is hung, the DBA usually runs tbstat -s to see if someone is frozen on a latch. If anything is found, this is usually a good place to start looking for the bottleneck.

   joe 53> tbstat -s

   Latches with lock or user set

   name     address  lock wait user
   bbf[0]   e075adb8 1    0    e0002770
   bf[2070] e071af2c 1    0    e0002a10
   pt[33]   e067afa0 1    0    e0002590
   bh[4749] e06e9cb0 1    0    e0001e70

Table . tbstat -u output codes.
name The name of the structure that is being latched by this latch.
locks Lock table latch
tblsps Tblspace table latch
ckpt Checkpoint latch
archive Archive latch
chunks Chunk table latch
loglog Logical log buffers latch
physlog Physical log latch
users Users table latch
trans Transaction table latch
flushctl Flush table latches
flush%d Flush process control latches
flushr Page cleaners
LRU%d LRU queue latch
pt [%d] Partition latch
pt Tablespace tablespace latch
bh [%d] Buffer hash latch
bf [%d] Buffer latch
bbf [%d] Big buffer latch
altlatch Alter table count latch
physbl Physical log buffer 1 latch
physb2 Physical log buffer 2 latch
NOTE: The %d in the above codes will be an integer representing a subscripted value of the structure
Address The address of the latch in shared memory. This address corresponds to the wait field in tbstat -u output if the process is waiting for this latch.
Lock Machine-dependent flag indicates if the latch is locked and set.
Wait Indicates if any other processes are waiting for this lock.
User The address in shared memory of the owner of the latch. This corresponds to the address in the tbstat -u output.

Tips on Using the Tbstat -s Output:

Should a persistent latch be found in the tbstat -s output, it is a straightforward job to track down the culprit. A definite sign of a problem is a persistent latch that has many other users waiting for the latch. If the database is frozen, you can expect to see quite a few users stacked up waiting for this latch to release. The information given in the name field will provide detailed information about what kind of latch is giving the problem. Knowing the type of latch that is hung up can point to the underlying problem in the database.

Once the latch is identified and the fact noted that several users are waiting on the latch, you can run tbstat -u and note which users have this latch address in their wait field. If these users and processes are the ones that are hung up, you've found your problem. Going back to the tbstat -s output for the offending latch, you can check its user field and get the address of the user process that is blocking everything. Going back to tbstat -u will identify the process and give you a UNIX process ID number. You can then decide whether or not to kill the offending process.

This process is the heart of troubleshooting with tbstat. You usually poke around until you find something that is out of place or just doesn't seem right. Then you backtrack using the tbstat -u command along with specific commands that seem to be related to the problem. The eventual goal is to find a user process of a job that can be killed to clear up the problem.

Of course, the DBA should be certain that the user process is neither holding a latch nor in a critical part of the code before killing a sqlturbo process, as either case will cause OnLine to abort, crashing the system.

The tbstat -s output can also alert you to possible bottlenecks and resource limitations in your OnLine system. If you find yourself with an abnormally high percentage of jobs waiting for a short time to get latches, you may be able to improve your performance by increasing the numbers of the structures that are controlled by the latch. You cannot alter the number of latches themselves, because they are set based upon the size of the resource's parameters.

There are several statistics in the tbstat -p output that can point to problems with inadequate resources. When we get to that section, I'll point out several ratios that are listed and several ratios that you can derive from the statistics that can give you an early warning to some resource inadequacies.

The status script presented in Chapter 6 makes use of several of these ratios and will give you a central place to look to evaluate the performance of your OnLine database engine.

Tbstat -k

The tbstat -k invocation of tbstat provides information about the state of the locks in the shared memory lock table. The lock table is accessed through a lock hash table in much the same way as the tblspace table. Access to the lock table itself is controlled by a single latch.

Before a process can modify the lock table, it must first acquire the latch. Once the modification is complete, the latch is released. Likewise, each entry in the lock hash table is also guarded by one latch that much be acquired before modifications can be made to any particular hash bucket.

Locks are on tablespaces. Their types and scope (table lock, page lock, row lock, key lock, or byte lock) can be deciphered by using the data and a few rules.

   INFORMIX  206>tbstat -k
   RSAM Version 5.01.UC1  -- On-Line -- Up 5 days 04:09:22 -- 18752 Kbytes
   Locks
   address  wtlist   owner    lklist   type     tblsnum  rowid    size
   11056e48 0        110024b0 0            S    1000002  206      0
   11056ea8 0        1100243C 0            S    1000002  206      0
   11056f08 0        1100209C 11057548 HDR+IX   200035a  0        0
   11056f28 0        11002354 110572c8 HDR+IX   200035f  0        0
   11056f48 0        11002168 110574c8 HDR+IX   2000354  0        0
   11056fc8 0        11002184 110577a8 HDR+IX   200035e  0        0
   11057008 0        11002524 0            S    1000002  206      0
   110571c8 0        110026f4 0            S    1000002  206      0
   110571e8 0        110023c8 110570c8 HDR+IX   2000360  0        0
   11057208 0        110027dc 0            S    1000002  206      0
   11057288 0        1100243C 11056ea8 HDR+IX   200035c  0        0
   110572c8 0        11002354 0            S    1000002  206      0
   11057308 0        11002028 0            S    1000002  206      0
   11057388 0        11002524 11057008 HDR+IX   2000356  0        0
   28 active, 200000 total, 16384 hash buckets

Table . Explanation of the tbstat -k output.
address Address of the lock in the lock table. Corresponds to the wait field in the tbstat -u output if a process is waiting on this lock.
Wtlist Address of first process that is waiting for this lock.
Owner Shared memory address of the owner of the lock. If this value is zero, it means that the server process that owned the transaction is dead but that it continues to hold a lock. In this case, the lock will be almost impossible to clear up and if it's in the way, you'll probably have to restart the database. A valid entry (other than zero) corresponds to the owner's address in the tbstat -u output.
Lklist If the owner of this lock holds other locks, the address of the owner's next lock.
Type HDR Header lock (lock hash table)
 B Bytes lock (for VARCHARS)
 S Shared lock
 X Exclusive lock
 U Update lock
 IX Intent Exclusive
 IS Intent Shared
 SIX Shared, intent exclusive
 A lock can be any combination of the above, its elements are joined by a "+."
Tblsnum The tablespace number for the locked resource. The name can be derived by comparing this number with the tblnum in the tbstat -t output. The first digit represents the dbspace that the tablespace occupies.
Rowid The rowid identifies the row being locked. The rowid can be used to identify the scope of the lock by using the following rules:
 Table Lock rowid = 0
 Page Lock rowid ends in 00
 Row Lock rowid is six digits or less and does not end in 00
 Key Lock hex number with more than six digits
 Byte Lock rowid ends in 00, and the "size" field is not zero
size Number of bytes locked for VARCHAR locking (see Byte Lock).

Tbstat -b and Tbstat -B

These two options will print information about buffer utilization by the OnLine engine. The difference between the two is that the lowercase tbstat -b will list information only about buffers that are in use by engine processes, while the uppercase tbstat -B will print out information about all buffers provided for in the configuration file, whether in use or not. Buffers may and probably will contain data even though they are not currently being accessed by an engine process. The fields reported by both commands are identical. An OnLine buffer structure is actually composed of three separate components:

Buffer pool Contains copies of pages from disk exactly as they come from disk. These pages can be the contents of any disk structure such as data pages, index pages, system data, etc. Each page in this buffer pool is exactly one page in size, so there is no room left to put housekeeping information.
Buffer header table This header table is where the housekeeping information is kept. This header table includes information that OnLine needs to manage the related buffer pool entry. It contains entries for the address of the user using the buffer, the state of the buffer (dirty or clean), and the address of the buffer.
Buffer hash table Used to facilitate faster access to the buffer pool.

Each of the three elements of the buffer has latches that must be acquired before the buffer can be modified. These latches will show up in the tbstat -s output. Each buffer has an individual latch. The buffer header table has a single latch that controls access to the table. Each hash bucket has an individual latch.

joe  31> tbstat -b


RSAM Version 4.10.UD4  -- On-Line --Up 8 days 17:26:14 - 18704 Kbytes

Buffers
address  user    flgs pagenum memaddr nslots pgflgs xflgs owner waitlist


e070976c 0        0     651566   e0937000 29     1      400   e00023b0 0
e06fff6c 0        6     50479c   e0807000 23     1      0     0       0
2526 modified, 4750 total, 8192  hash buckets, 2048 buffer size

Table . Explanation of tbstat -b output, continued.
address Address of the buffer header in the buffer table.
user Address of the current user or the most recent user to access the buffer.
Flgs 0x01 Buffer contains modified data.
 0x02 Buffer contains data.
 0x04 Buffer on LRU list.
 0x08 I/O error on this page.
 0x10 Buffer is in near table.
 0x20 Buffer is being flushed.
pagenum Physical page number the page comes from.
memaddr Address of the buffer page in memory.
nslots Number of slots on the page. The number of rows stored on this page.
pgflgs Flag that indicates the type of data held on the page.
 0x0001 Data page
 0x0002 Partition page (tablespace tablespace)
 0x0004 Free list page
 0x0008 Chunk free list page
 0x0009 Remainder data page
 0x000b Partition resident blob page
 0x000c Blobspace resident blob page
 0x000d Blob chunk free list bit page
 0x000e Blob chunk blob map page
 0x0010 B-tree node page (index)
 0x0020 B-tree root node page (index)
 0x0040 B-tree branch node page (index)
 0x0080 B-tree leaf node page (index)
 0x0100 Logical log page
 0x0200 Last page of logical log
 0x0400 Lync page of logical log
 0x0800 Physical log page
 0x1000 Reserved root pages
xflgs Access flags showing how the buffer can be accessed:
 0x0100 Access buffer in share mode.
 0x0200 Access buffer in update mode.
 0x0400 Access buffer in exclusive mode.
 0x0600 Process is waiting for all other users to finish.
owner User process that locked the buffer with above xflgs.
waitlist First of the list of users waiting to access the buffer.

Tbstat -R

This invocation mode is used to display information about OnLine's Least Recently Used (LRU) queue structures. These queues are a means that OnLine uses to distribute the allocation of the buffer pool in as even a method as possible. When OnLine determines that it needs to place a page into the buffer pool from disk, it needs to be able to find a free buffer in which to place the page.

Since OnLine tries to keep pages in shared memory that may be useful, it uses an LRU algorithm to determine which pages to flush to disk. Depending on the OnLine configuration parameters, there are between 3 and 32 pairs of LRU queues. These queues are in pairs, consisting of a clean queue and a dirty queue. When a page is initially brought into shared memory, it is on a clean queue. When it is modified, it is moved to a dirty queue. Every time a buffer is accessed, it is moved to the head of its queue, called the MRU (most recently used) end.

This movement from the LRU end to the MRU end is done in two steps. First, a latch on the queue is acquired and the buffer is removed from the queue. The latch is then released. Now, OnLine attempts to reobtain the latch. If the latch is reobtained, the buffer is placed on the MRU end of the queue. If OnLine cannot obtain the latch for the second time, it tries the next available queue (either clean or dirty as the case may be). It keeps trying to obtain latches on subsequent queues until it gets one, at which time it places the buffer on the MRU end of the queue.

The effect of this queue dance is that buffers may move from queue to queue in an attempt to balance the load between the queues. The tbstat -R command lets the DBA see how often the queues are being accessed and how their cleaning is going on. This is more of a tuning tool than a debugging tool. The DBA will use this command to decide whether or not he has enough queues allocated and to decide threshold points for the page cleaner daemons. See the section on tuning in Chapter 8 for more details.

   joe 58> tbstat -R
   LRU  0:    6 ( 0.9%) modified of  643 total
   .... (several lines deleted)
   57 dirty, 4750 queued, 4750 total, 8192 hash buckets, 2048 buffer size
   start clean at 60% dirty, stop at 50%; first pass search 70%

The first line of the output gives the total number of queues allocated by the LRUS parameter in $TBCONFIG. For each pair, the output gives you a raw number of modified buffers, its percentage of the whole, and the total number of buffers on each queue. In the last line, the start clean is the $TBCONFIG LRU_MAX_DIRTY parameter and the stop at is the LRU_MIN_DIRTY parameter. Notice that the "total" buffer counts are not necessarily equal for each queue pair. This is because of the "queue dance" just discussed in which the buffers move from queue to queue to balance the load.

Tbstat -F

The tbstat -F invocation provides information about the page cleaners or page flushers allocated to your OnLine system. It gives you historical data on the different types of writes that have been used to flush your buffers off to disk. It also tells you instantaneous status on the page cleaners currently allocated for your engine.

Different types of writes are more or less efficient than others. By noting the types of writes that your system is performing, you may be able to affect your overall throughput by minimizing some and maximizing others.

   joe   59>tbstat -F
   Fg Writes      LRU Writes     Idle Writes   Chunk Writes
   0              279744         5142          471793

   address   flusher  snooze   state    data
   e0000lc30 0        60       I        0        = 0
   e0001c90  1        60       I        0        = 0
   states:  Exit Idle Chunk Lru

Table . Explanation of types of writes in tbstat -f output.
Fg Writes Writes performed by the server process itself. If your system is doing a lot of these writes, your server is constantly having to interrupt itself and go off and do the page cleaner's job. Either increase the number of page cleaners or start them working earlier by lowering LRU_MAX_DIRTY.
LRU Writes Writes performed by the page cleaners when woken up by the engine, not during the normal operations of the page cleaners. The page cleaners can have their sleep time overridden by the engine in two ways. First, if 16 fg writes occur the engine will assume that the page cleaners need to have their sleep interrupted and will wake them up. The other reason is when the percentage of dirty buffers on the LRU queue goes above LRU_MAX_DIRTY before the cleaner's snooze time is up, the engine will awaken the page cleaners and have them begin cleaning the buffers. In this example, this is the case, as LRU_MAX_DIRTY is set at 60%.
Idle Writes Writes performed by the page cleaners as part of their normal operations as they wake up at the expiration of their snooze time.
Chunk Writes Writes that happen during checkpoints. Chunk writes are the most efficient writes performed by OnLine because they are performed as sorted writes. They are also performed when no other writes are allowed and thus can receive more CPU time.

Table . Explanation of page cleaner data in tbstat -F output.
Address Memory address of the cleaner in the page cleaner table. Although the number of page cleaners is allocated according to the CLEANERS parameter in $TBCONFIG, the table will always have 32 slots, which is the maximum number of page cleaners.
Flusher Sequential ID number assigned to this page cleaner.
snooze Current sleep time for this cleaner. This value is set by the page cleaners based upon the amount of activity it is getting, but is a maximum of 60 seconds.
State Code indicating what the page cleaner is doing at this instant:
 I Idle (sleeping).
 C Checkpoint (chunk) write in progress.
 L LRU write in progress.
 E Exiting. Either the system is shutting down or the page cleaner timed out on its last run, often due to a heavy workload. If it timed out, you'll get a note in your online message logfile.
Data Used in conjunction with the state field above, this code is written in decimal, followed by an equals sign, then in hex:
 If state=C Data is chunk number where buffers are being written to.
 If state=L Data is LRU queue from which the page cleaner is writing.

The tbstat -F command is often used in conjunction with tbstat -R to tune the page cleaner system. We'll get into more detail in the section on tuning in Chapter 11, but one point needs to be made now.

Different OnLine applications have different tuning needs, and you have to be sensitive to what your system requires. This is one area where the DBA needs to make tradeoffs. Most of the early information on tuning OnLine emphasized the efficiency of Chunk Writes, and a lot of effort was expended to cause systems to do most of their writes during checkpoints to maximize throughput.

The only problem with that technique is that users started screaming. Even though the throughput may have been much better, the users would have frequent interruptions because writes are prevented during checkpoints. The system would appear to freeze up every few minutes. This would be OK in a batch processing environment, but not in an OLTP environment. Here, the perceived utility of the system was enhanced by forcing the less efficient LRU writes to carry the page-cleaning load while minimizing interruptions to the users when the checkpoints were doing their admittedly more efficient but more intrusive work. The lesson: Know your system. Know your users. Don't follow rules slavishly.

Tbstat -l

This option will give the DBA information about the physical and logical logging of the OnLine instance. There's a lot of information in this invocation, and it is used for both tuning and troubleshooting. We'll break it up into three sections. The first section contains physical logging statistics.

   joe   62>tbstat -l
   Physical  Logging
   Buffer  bufused  bufsize  numpages numwrits pages/I/O
     P-1   69       128      713655   5901     120.94
          phybegin physize  phypos   phyused  %used
          117202   6000     2999     69       1.15

Table . Explanation of tbstat -l (physical log section) output.
Buffer Which of the two physical logs is being used, P-1 or P-2?
Bufused Number of buffer pages used.
bufsize Size of the buffer in pages.
numpages Number of pages written to the physical log.
numwrits Number of writes to the disk.
pages/I/O numpages/numwrits. This statistic is used to tune buffers..
Phybegin Page number of the beginning of the physical logfile.
physize Size of the physical log in pages.
phypos Current offset into the physical log. Where the next page write will occur.
Phyused Number of pages used in the physical log.
%used Percentage of pages used (phyused/physize). This statistic is often used for tuning the size of the physical log buffers.

   Logical Logging
   Buffer bufused  bufsize  numrecs  numpages numwrits recs/pages pages/I/O
   L-1  8        64      31030148 668262   21047   46.4     31.8

Table . Explanation of tbstat -l (logical logging section) output.
Buffer Which of the three logical log buffers is used? L-1, L-2, L-3?
Bufused Number of pages in the buffer that have been used.
bufsize Size of the buffer in pages.
numrecs Number of records written to logical logfiles.
numpages Number of pages written to logical logfiles.
numwrits Number of disk writes to logical logfiles.
recs/pages numrecs/numpages. Dependent upon types of transactions logged.
Pages/io numpages/numwrices. Another tuning statistic. You can alter it by changing the LOGBUFF (buffer size).

   address  number   flags   uniqid  begin       size     used    %used
   e06d26f0  1       F------ 0       70f4f4     30000        0     0.00
   e06d270c  2       F------ 0       716a24     30000        0     0.00
   e06d2744  3       U---C-L 2296    907533     30000    14897    49.65

Table . Explanation of tbstat -1 (actual logfile status) output
address Address of the logfile descriptor.
number Logfile number.
flags Status of each logfile:
 F Free, available for use.
 B Backed up.
 C Current logfile, now receiving transactions.
 U In use, still contains active transactions.
 A Newly added. Will become available for use after the next archive.
 L Contains last completed checkpoint.
uniqid Unique ID number of the logfile.
Begin Location of beginning page of the logfile.
size Size of the logfile in pages.
used Number of pages currently used.
%used Used/Size. How full is this logfile?

The first two sections of this output see most use as a tuning tool. There are more specifics in the section on tuning, but basically the concept is to try to keep the pages per I/O statistics pretty close to the actual number of pages in the buffers, to ensure that your writes are doing the most possible work each time they are done. This is logical, since the write will have to look at all of the data anyway. You may as well have the data it sees as full as possible to wring the most work out of each of your writes.

The last section will become one of your most familiar friends (or enemies) as a DBA. Since the logfiles are so important to recovery and since the consequences of filling them up are so disastrous, you need to keep a close eye on your logs. First, you need to have continuous backup going on your logs at all times. If you forget and don't turn it on, you'll have problems when they get full. Next, you're looking for incipient long transactions. If you see that you have many logfiles that show a B for backed up and do not show an F for free, you know that some sort of a transaction is still pending. If you have lots of logs, you can probably just let it go and wait for the transaction to complete. If you begin to get short of logs, you need to think about killing the runaway transaction. Do it early enough that you'll have ample logfiles to log all of the activities that accompany a rollback. If you allow the logfiles to reach the percentage full specified in the LTX_HWM parameter, the engine will lock all other users out of the system, causing their jobs to freeze up.

Tbstat -p and Tbstat -P

These two invocations of the tbstat utility provide the closest thing to a dashboard for the OnLine engine that Informix provides. It reports on statistics kept in shared memory that relate to the performance of the OnLine engine. This is probably the number one tuning and monitoring tool available to the DBA. The statistics are useful in evaluating the overall efficiency of your database system and in measuring the effects of changes in parameters upon your database operations.

The outputs for both the uppercase and lowercase invocations are the same except that the tbstat -P output includes an additional field, BlGReads as the first element in the report. Everything else is identical. The output is divided into four lines of data with headings above each line. We'll look at each line individually:

   joe 63>tbstat -P


Profile

BlGReads dskreads pagreads bufreads %cached dskwrits pagwrits bufwrits %cached
235 14970755 22166265 498552631 97.00   543747   2139408  49870693 98.91

Table . Explanation of tbstat - p (buffer activity) output.
BIGReads In tbstat -P output only, the number of big buffer reads.
dskreads Number of actual reads from disk.
pagreads Number of pages read from disk.
bufreads Number of reads from the buffer cache.
%cached Percentage of reads cached 100* (bufreads-dskreads)/bufreads If this figure is above 95%, your system is performing very well..
Dskwrits Number of actual physical writes to disk.
pagwrits Number of pages written to disk.
bufwrits Number of writes to the buffers.
%cached Percentage of writes cached 100* (bufwrits-dskwrits)/bufwrits If this number is above about 85%, you're doing well..

isamtot    open     start    read     write    rewrite  delete  commit rolLbk
424714409 5813437  23348999 175473799 10989374 5499045  2603325  8699910  101

Table . Explanation of tbstat -p (ISAM statistics section) output.
isamtot Total number of ISAM (Indexed Sequential Access Method) Does not necessarily correspond to queries, as a query may make many ISAM calls.
Open Calls that open a tablespace.
start Increments by one when positioning within an index.
read Increments when the read function is called.
write Increments when the write function is called.
rewrite Increments when an update occurs.
delete Increments when a row is deleted.
commit Increments when a transaction is successfully committed.
rollbk Increments when a transaction is rolled back.

   ovtbls   ovlock   ovuser   ovbuff   usercpu    syscpu  numckpts flushes
   0        0        0        0        387006.97 41611.33 294      587

Table . Explanation of tbstat -p (other statistics) output.
ovtbls Number of times the tblspace table has overflowed. In the ovrXXXX statistics that follow, a small number of occurrences is not necessarily critical. If the numbers are large or keep increasing, increase the parameter.
Ovlock Number of times attempted to exceed LOCKS parameter.
ovuser Number of times attempted to exceed USERS parameter.
ovbuff Number of times attempted to exceed BUFFERS parameter.
usercpu Cumulative total CPU time by OnLine processes, in seconds.
syscpu Total system CPU time (UNIX system calls).
numckpts Number of checkpoints since last boot.
flushes Number of times the buffer pool has been flushed to disk.

   bufwaits lokwaits lockreqs deadlks dltouts  Ichwaits ckpwaits compress
   21748    8046     476500421 9       0       1594907 1664     2220019

Table . Explanation of tbstat -1 (wait states) output.
bufwaits Increments when a process has to wait for a buffer.
lokwaits Increments when a process has to wait for a lock.
lokreqs Increments when a request for a lock occurs.
deadlks Increments when potential deadlock situation is automatically prevented by the engine. (It kills one of the processes.)
dltouts Increments when a distributed deadlock timeout occurs.
Ichwaits Increments when a process is forced to wait for a latch on a resource.
chkpwaits Increments when a process is forced to wait for a checkpoint.
compress Increments whenever a data page is compressed.

The first section concerning buffers and cache percentages is most useful in tuning your OnLine parameters for performance. You are looking mostly at the %cached fields for reading and writing. If these numbers are low, it is possible that you can increase your performance by tuning the buffer parameters. Be cautious about reading these figures immediately after starting the database. Until the database has "warmed up" and has representative data in the buffer pool, these %cached statistics will be low. One good method is to let the database run for a while, and then do a tbstat -z to zero out all of the statistics. The statistics gathered from then on will not be tainted by the initial loading of the caches that occur when the system is initially started up.

The second section will give you an idea of the types of activities that your system is performing. These statistics are meaningful only if you have something to compare them to. As you gather your performance statistics, you will begin to develop a feel for the patterns you can expect.

The ovrXXXX sections of the next line are an early warning of a need to increase the value of their respective parameters in $TBCONFIG file. For example, if you are getting high numbers for ovlock, increase the LOCKS parameter in your $TBCONFIG file. If you're regularly going over the parameters, your users are probably already complaining about jobs that don't complete.

The lokwaits and lchwaits statistics on the last line can tip you off to a need to increase the value of some of your $TBCONFIG parameters. They can also tip you off to poorly designed programs that may be hogging resources or causing contention between themselves and other programs.

The status script included in Chapter 6 makes extensive use of these statistics in preparing a more complete "dashboard" for monitoring the activity of your database engine.

Tbstat - r and Stacking Tbstat Options

The tbstat command can run with any combination of options. You can stack the options like

   tbstat  -lsp

You can also use tbstat -r with an optional integer number. This option runs tbstat in a loop with a number second pause between loops. This can be quite useful if you are trying to establish trends or are just anxious to find out when something happens. With no number, it runs with a 5-second pause. For example, the following command runs the tables option every 15 seconds:

   tbstat   -tr   15

Tbstat -a

The tbstat -a command gives you one output that includes the following options:

b buffers
c configuration
d dbspaces
k locks
1 logs
m message file
P profile
s latches
t tablespaces
u users

Tbstat -m

Runs the UNIX tail command on your online message log.

Tbstat -c

Prints your $TBCONFIG file.

Tbstat (no options)

Gives you a short combined output of:

u users
P profile

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.120.133