Make a Fast Recovery go Faster

This is a trick question. It can't be done in OnLine. If your system comes down ungracefully, either because of a crash or because you initiated an immediate shutdown, you will be forced to go into recovery mode when you restart the database system. The length of time for fast recovery depends upon how much work has been done since the last checkpoint. If you absolutely have to bring the system down ungracefully, try to do a checkpoint before the shutdown. It will help speed up the recovery process. This is one thing to consider when establishing the frequency of your checkpoints. Spacing your checkpoints further apart may make the interruptions that they cause to processing less intrusive, but you'll take the risk of longer recovery times. You need to decide upon a balance between performance and speed of recovery.

IDS systems have an ONCONFIG parameter called RECOVERY_THREADS that allows the DBA to specify how many threads are started to handle fast recovery. Increasing this parameter will affect the recovery speed, but only in IDS systems.

As part of the recovery, the OnLine system will roll back any transactions that were working at the time of the shutdown and that were not yet committed. If you have a very long transaction working at the time of shutdown, your recovery time will be much longer. You can get a feel for how long it will take to recover by looking at how much logfile data you have. Recovery, especially in cases caused by long transactions, can take hours. Remember this when you're designing applications. This is another reason to keep your transactions short. You can alleviate these types of problems to a large extent by using programs with flutter locks that keep transactions as short as possible.

You will sometimes need to decide whether or not to bring the database down ungracefully. At least you have a choice here. In a crash, you have no choices at all. When making this decision, you must think about what is going on in your system and how long it will take for the recovery to take place. You may save some time by bringing the database down to stop a rogue process, but doing so may increase your recovery time. It's sort of a case of "pay me now or pay me later." Either way, it will take some time. You need to watch these things, especially if you are running a production database and your users need to have it working all the time.

You will eventually be faced with a fast recovery time that seems to go on forever. Maybe someone crashed the system while a long job was going on. Maybe someone brought the database down without checking for active transactions. Anyway, your database, which usually takes about 15 seconds for fast recovery on a restart, has already been cranking away for 15 minutes and the users are beginning to call, wondering when the database will be back up. You need to know how long it will take to recover.

First, run tbstat -1 to check the status of your logfiles. If you've followed my suggestions, you've monitored your recovery times and have a rule of thumb for how long it takes to recover per logfile. This number is a very rough estimate. Sometimes it can be off by several factors of ten. During recovery, you cannot access the database. This means that the system tables are not available to you. However, the tbstat utility will run during recovery. Usually, what takes a lot of time during fast recovery is the rollback of transactions that were not committed when the system went down. You can do a tbstat -u and see which daemons are running during the recovery. This will be a very simplified output. All that will show up are the daemons for fast recovery and rollforward. Nobody else can be in the system.

A common cause of long recovery times is bringing the database down in the middle of long inserts into a table or deletes from a table. If you know, for example, that a table was being built from scratch, that is, it had no rows in it (or if you knew how many rows it initially had), you can get a very good estimate of your rollback time. Do a tbstat -t command and look at the number of rows in any table whose tb1num does not end in 00001. This will be the table that is being rolled back. Then run several tbstat -t commands and note whether this number of rows is increasing or decreasing. If it is a scratch-built table, your recovery will go on until the number of rows equals zero. You can run several tbstat -t commands and calculate the rate at which the rows are being deleted. Extrapolate that number and you'll have an estimate of the rollback time. If it's a delete, and if you know how many rows were originally in the table, the number of rows will be increasing. You can perform the same extrapolation in the opposite direction to get an approximate rollback time.

None of this will make the recovery go any faster, but it'll help you make decisions about what to tell your users. If nothing else, having an accurate estimate and an explanation of why the rollback is occurring will make you look somewhat more in control of the situation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.233.58