Kill a Backend

Informix OnLine versions earlier than 6.0 maintain a two-process architecture. Whenever a tool such as I-SQL or ESQL/C or I-4GL makes a request of the database, the tool is considered a client process. This client process spawns a server process that does the actual access to the database. This server process is called a "backend" process.

OnLine takes great pains to ensure the consistency of the data in the database. Whenever something occurs that could potentially destroy this consistency, OnLine will immediately abort. When the database is restarted, the fast recovery process will roll back any processes that are not completed and would therefore cause the inconsistency.

One of the major actions that can cause the database to become inconsistent is the premature death of a backend process. If a backend is killed before it has the chance to ensure the consistency of the database, it can abort the entire database and cause a database crash. There are two specific instances in which a prematurely killed backend can cause an abort:

  • The backend dies while in a critical section (writing to the database)

  • The backend dies while holding a latch.

You can't depend on being able to look at a tbstat output to see whether or not these circumstances apply. By the time you try to kill the process, the circumstances may have changed.

This means that you should never, never, never use a kill -9 on a backend! Did I make my point? Sometimes you can get away with it. There may even be occasions where you go ahead and break this rule, hoping that you'll dodge a bullet and be able to kill the backend without aborting the database. If you do ignore it, be prepared for crashing the system. Use the kill -9 command only as a last resort.

Why would you want to kill a backend? Many times, a user has started a query and then realizes that she is joining every table in your database with every other one and the query should generate a few billion rows. Maybe someone has started an update that was wrong and will destroy your biggest database table. Maybe someone's killed a client process and the server process is still running. There are many reasons why you may be tempted to kill a query. Sometimes it's best just to let the query continue to run if it will not destroy a table or eat up a lot of system resources. If the query is running from within isql, it is usually possible to press the BREAK or CONTROL-C key and gracefully stop the query. That's usually the user's first option.

Assuming that you've determined that a query should be terminated, how do you kill a backend properly? The recommended method is with the following command:

tbmode -z PID_OF_THE_BACKEND_PROCESS

The tbmode -z command usually succeeds in killing the process. After running the tbmode -z command, run tbstat -u. The entry for the process should be gone or should have an R in the second position of the flags field. This indicates that the process is being rolled back. If this is not the case, try the tbmode command several times. If the query is still cranking away, it's time for a few more desperate measures. Try

kill -15 PID_OF_THE_BACKEND_PROCESS
kill -13 PID_OF_THE_BACKEND_PROCESS

These UNIX kills are caught properly by the engine and are safe, posing no danger to your database. Like the tbmode command, try them several times if needed. You may need to change user to root to get the appropriate permissions to kill the backends. Again, check the tbstat -u output to see if the processes are dead or in rollback.

There are some processes that seem to be unkillable. Anything that is creating an index will be unkillable. UPDATE STATISTICS commands seem to be unkillable. Many of these unkillable processes are so deep into their programming loops that they are not checking their global killflags that the signal sets in the backend. Be patient in trying to kill with a kill -15 or a kill -13. The processes may not go away immediately.

There's one final method of killing a sqlturbo process. This is definitely a last-ditch effort. When you do it, be ready to crash the system because it is definitely dangerous. Informix Technical Support doesn't even acknowledge the existence of the technique. They certainly don't recommend it. I don't even recommend it. But if you're faced with having to bring down a production database to kill a job and you've tried everything else short of a kill -9, what do you have to lose?

This method depends upon your flavor of UNIX having some kind of process control. Do a man on kill and look for some sort of kill that will halt a process in the same way that the CONTROL-Z key halts it for process control. If you can't find it in the man page, try running a kill -1 to list the options. Also, look for the reciprocal command that restarts the stopped job. On my Pyramid, the stop command is kill - STOP PID and the restart command is kill -CONT PID, where PID is the UNIX process ID of the process you want to kill.

The problem with trying to use tbstat information to decide whether or not you can safely kill a process is that tbstat is instantaneous and that your process may either acquire a latch or enter a critical phase in the time it takes you to read the tbstat output and execute your kill statement. To do this safely, you have to stop the process completely, check it with tbstat while it is stopped, and then either do your kill or restart the process and stop it again, checking each time to see if the stopped process is safe to kill.

Run the kill - STOP command and continue running ps until the process is getting no more processor time. Then do a tbstat -u. Look to see if the third position of the flag field is anything but an X, which indicates that the job is not in a critical section, writing to the database. Note the value of the first column, which is the address of the user process. Now look at the last column of the tbstat -s output, which tells you the owner of any latches. Does your offending process own a latch? If it doesn't, and if it's not in a critical section in the tbstat -u output, and if the process really is stopped, kill the process with a kill -9. If it is in a critical section or if it's holding a latch, run the kill -CONT or whatever command restarts the process, let it run a while, and repeat the process. If you can stop the process when it's in a safe status, you have a decent chance of killing it without harm.

Remember, this is a last-ditch measure. Although I have used it for several years without harm to the database, I've seen stranger things happen to databases. I would use this only if I were prepared to restart the database from a crash and to recover from an archive. About the only thing that would justify it is if a process is destroying a table that, if the table were gone, I would have to recover from tape anyway.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.39.55