2. System Hangs and Panics

Anyone with any system administration experience has been there. You are in the middle of some production cycle or are just working on the desktop when the computer, for some mysterious reason, hangs or displays some elaborate screen message with a lot of HEX addresses and perhaps a stack of an offending NULL dereference.

What to do? In this chapter, we hope to provide an answer as we discuss kernel panics, oops, hangs, and hardware faults. We examine what the system does in these situations and discuss the tools required for initial analysis. We begin by discussing OS hangs. We then discuss kernel panics and oops panics. Finally, we conclude with hardware machine checks.

It is important to identify whether you are encountering a panic, a hang, or a hardware fault to know how to remedy the problem. Panics are easy to detect because they consist of the kernel voluntarily shutting down. Hangs can be more difficult to detect because the kernel has gone into some unknown state and the driver has ceased to respond for some reason, preventing the processes from being scheduled. Hardware faults occur at a lower level, independent of and beneath the OS, and are observed through firmware logs.

When you encounter a hang, panic, or hardware fault, determine whether it is easily reproducible. This information helps to identify whether the underlying problem is a hardware or software problem. If it is easily reproducible on different machines, chances are that the problem is software-related. If it is reproducible on only one machine, focus on ruling out a problem with supported hardware.

One final important point before we begin discussing hangs: Whether you are dealing with an OS hang or panic, you must confirm that the hardware involved is supported by the Linux distribution before proceeding. Make sure the manufacturer supports the Linux kernel and hardware configuration used. Contact the manufacturer or consult its documentation or official Web site. This step is so important because when the hardware is supported, the manufacturer has already contributed vast resources to ensure compatibility and operability with the Linux kernel. Conversely, if it is not supported, you will not have the benefit of this expertise, even if you can find the bug, and either the manufacturer would have to implement your fix, or you would have to modify the open source driver yourself. However, even if the hardware is not supported, you may find this chapter to be a helpful learning tool because we highlight why the driver, kernel module, application, and hardware are behaving as they are.

OS Hangs

OS hangs come in two types: interruptible and non-interruptible. The first step to remedying a hang is to identify the type of hang. We know we have an interruptible hang when it responds to an external interrupt. Conversely, we know we have a non-interruptible hang when it does not.

To determine whether the hang responds to an external interrupt, attempt a ping test, checking for a response. If a keyboard is attached, perform a test by simply pressing the Caps Lock key to see whether the Caps Lock light cycles. If you have console access, determine whether the console gives you line returns when you press the Enter key. If one or more of these yields the sought response, you know you have an interruptible hang.


Note

Any time an OS hangs, one or more offending processes are usually responsible for the hang. This is true whether software or hardware is ultimately to blame for the OS hang. Even when a hardware problem exists, a process has made a request of the hardware that the hardware could not fulfill, so processes stack up as a result.


Troubleshooting Interruptible Hangs

The first step in troubleshooting an interruptible hang is obtaining a stack trace of the offending process or processes by using the Magic SysRq keystroke. Some Linux distributions have this functionality enabled by default, whereas others do not. We recommend always having this functionality enabled. The following example shows how to enable it.

Check whether the Magic SysRq is enabled:

# cat /proc/sys/kernel/sysrq
0
( 0 = disabled 1 = enabled)

Because it is not enabled, enable it:

# echo 1 > /proc/sys/kernel/sysrq
# cat /proc/sys/kernel/sysrq
1

Alternatively, we can use the sysctl command:

# sysctl -n kernel.sysrq
0
# sysctl -w kernel.sysrq=1
# sysctl -n kernel.sysrq
1

To make this setting persistent, just put an entry into the configuration file:

# /etc/sysctl.conf
kernel.sysrq=1

When the functionality is enabled, a stack trace can be obtained by sending a Break+t to the console. Unfortunately, this can be more difficult than it first appears. With a standard VGA console, this is accomplished with the Alt+sysrq+t keystroke combination; however, the keystroke combination is different for other console emulators, in which case you would need to determine the required key sequence by contacting the particular manufacturer. For example, if a Windows user utilizes emulation software, such as Reflections, key mapping can be an issue. Linux distributions sometimes provide tools such as cu and minicom, which do not affect the key mapping by default.

With the latest 2.4 kernel releases, a new file is introduced in /proc called sysrq-trigger. By simply echoing a predefined character to this file, the administrator avoids the need to send a break to the console. However, if the terminal window is hung, the break sequence is the only way. The following example shows how to use the functionality to obtain a stack trace.

Using a serial console or virtual console (/dev/ttyS0 or /dev/vc/1), press Alt+sysrq+h.

After this key combination is entered and sysrq is enabled, the output on the screen is as follows:

kernel: SysRq : HELP : loglevel0-8 reBoot tErm kIll saK showMem Off
showPc unRaw Sync showTasks Unmount

The following is a description of each parameter:

loglevelSets the console logging level.

reBootResets the processor so that the system starts booting.

tErmSends SIGTERM to all processes except init.

kIllSends SIGKILL to all processes except init.

saKKills all process in virtual console: Secure Access Keys. This ensures that the login prompt is init’s and not some third-party application.

showMemShows system memory report.

OffShuts machine down.

showPcDumps registers: Includes PID of running process, instruction pointers, and control registers.

unRawTurns off keyboard RAW mode and sets it to ASCII (XLATE).

SyncSyncs filesystems.

showTasksDumps all tasks and their info.

UnmountAttempts to remount all mounted filesystems as read-only.

shoWcpusShows stacks for each CPU by “walking” each CPU (smp kernel).

With the latest kernels, it is possible to test the “Magic SysRq” functionality by writing a “key” character from the previous list to the /proc/sysrq-trigger file. The following example causes the CPU stacks to be written to the kernel ring buffer:

# echo w > /proc/sysrq-trigger

To view the processor stacks, simply execute the dmesg command or view the syslog.

# dmesg
...
SysRq : Show CPUs
CPU1:

Call Trace: [<e000000004415860>] sp=0xe0000040fc96fca0
bsp=0xe0000040fc969498 show_stack [kernel] 0x80
[<e000000004653d30>] sp=0xe0000040fc96fe60 bsp=0xe0000040fc969480
showacpu [kernel] 0x90
[<e000000004653d80>] sp=0xe0000040fc96fe60 bsp=0xe0000040fc969470
sysrq_handle_showcpus [kernel] 0x20
[<e000000004654380>] sp=0xe0000040fc96fe60 bsp=0xe0000040fc969428
__handle_sysrq_nolock [kernel] 0x120
[<e000000004654230>] sp=0xe0000040fc96fe60 bsp=0xe0000040fc9693f0
handle_sysrq [kernel] 0x70
[<e00000000458a930>] sp=0xe0000040fc96fe60 bsp=0xe0000040fc9693c8
write_sysrq_trigger [kernel] 0xf0
[<e0000000045167a0>] sp=0xe0000040fc96fe60 bsp=0xe0000040fc969348
sys_write [kernel] 0x1c0
[<e00000000440e900>] sp=0xe0000040fc96fe60 bsp=0xe0000040fc969348
ia64_ret_from_syscall [kernel] 0x0
CPU0:

Call Trace: [<e000000004415860>] sp=0xe0000040fbe07ac0
bsp=0xe0000040fbe01518 show_stack [kernel] 0x80
[<e000000004653d30>] sp=0xe0000040fbe07c80 bsp=0xe0000040fbe01500
showacpu [kernel] 0x90
[<e000000004446980>] sp=0xe0000040fbe07c80 bsp=0xe0000040fbe014a0
handle_IPI [kernel] 0x200

[<e000000004412500>] sp=0xe0000040fbe07c80 bsp=0xe0000040fbe01460
handle_IRQ_event [kernel] 0x100
[<e000000004412be0>] sp=0xe0000040fbe07c80 bsp=0xe0000040fbe01418
do_IRQ [kernel] 0x160
[<e000000004414e20>] sp=0xe0000040fbe07c80 bsp=0xe0000040fbe013d0
ia64_handle_irq [kernel] 0xc0
[<e00000000440e920>] sp=0xe0000040fbe07c80 bsp=0xe0000040fbe013d0
ia64_leave_kernel [kernel] 0x0
[<e00000000447b9b0>] sp=0xe0000040fbe07e20 bsp=0xe0000040fbe012c8
schedule [kernel] 0xa70
[<e000000004486ce0>] sp=0xe0000040fbe07e30 bsp=0xe0000040fbe01288
do_syslog [kernel] 0x460
[<e0000000045876f0>] sp=0xe0000040fbe07e60 bsp=0xe0000040fbe01260
kmsg_read [kernel] 0x30
[<e000000004516400>] sp=0xe0000040fbe07e60 bsp=0xe0000040fbe011d8
sys_read [kernel] 0x1c0
[<e00000000440e900>] sp=0xe0000040fbe07e60 bsp=0xe0000040fbe011d8
ia64_ret_from_syscall [kernel] 0x0
CPU3:

Call Trace: [<e000000004415860>] sp=0xe000004083e87b00
bsp=0xe000004083e81318 show_stack [kernel] 0x80
[<e000000004653d30>] sp=0xe000004083e87cc0 bsp=0xe000004083e81300
showacpu [kernel] 0x90
[<e000000004446980>] sp=0xe000004083e87cc0 bsp=0xe000004083e812a0
handle_IPI [kernel] 0x200
[<e000000004412500>] sp=0xe000004083e87cc0 bsp=0xe000004083e81260
handle_IRQ_event [kernel] 0x100
[<e000000004412be0>] sp=0xe000004083e87cc0 bsp=0xe000004083e81218
do_IRQ [kernel] 0x160
[<e000000004414e20>] sp=0xe000004083e87cc0 bsp=0xe000004083e811d0
ia64_handle_irq [kernel] 0xc0
[<e00000000440e920>] sp=0xe000004083e87cc0 bsp=0xe000004083e811d0
ia64_leave_kernel [kernel] 0x0
[<e0000000044160c0>] sp=0xe000004083e87e60 bsp=0xe000004083e811d0
default_idle [kernel] 0x0
[<e0000000044161e0>] sp=0xe000004083e87e60 bsp=0xe000004083e81160
cpu_idle [kernel] 0x100

[<e00000000499c380>] sp=0xe000004083e87e60 bsp=0xe000004083e81150
start_secondary [kernel] 0x80
[<e0000000044080c0>] sp=0xe000004083e87e60 bsp=0xe000004083e81150
start_ap [kernel] 0x1a0
CPU2:

Call Trace: [<e000000004415860>] sp=0xe0000040fdc77b00
bsp=0xe0000040fdc71318 show_stack [kernel] 0x80
[<e000000004653d30>] sp=0xe0000040fdc77cc0 bsp=0xe0000040fdc71300
showacpu [kernel] 0x90
[<e000000004446980>] sp=0xe0000040fdc77cc0 bsp=0xe0000040fdc712a0
handle_IPI [kernel] 0x200
[<e000000004412500>] sp=0xe0000040fdc77cc0 bsp=0xe0000040fdc71260
handle_IRQ_event [kernel] 0x100
[<e000000004412be0>] sp=0xe0000040fdc77cc0 bsp=0xe0000040fdc71218
do_IRQ [kernel] 0x160
[<e000000004414e20>] sp=0xe0000040fdc77cc0 bsp=0xe0000040fdc711d0
ia64_handle_irq [kernel] 0xc0
[<e00000000440e920>] sp=0xe0000040fdc77cc0 bsp=0xe0000040fdc711d0
ia64_leave_kernel [kernel] 0x0
[<e0000000044161d0>] sp=0xe0000040fdc77e60 bsp=0xe0000040fdc71160
cpu_idle [kernel] 0xf0
[<e00000000499c380>] sp=0xe0000040fdc77e60 bsp=0xe0000040fdc71150
start_secondary [kernel] 0x80
[<e0000000044080c0>] sp=0xe0000040fdc77e60 bsp=0xe0000040fdc71150
start_ap [kernel] 0x1a0

Scenario 2-1: Hanging OS

In this scenario, the OS hangs, but the user is unable to determine why. The way to start troubleshooting is to gather stacks and logs.

Because the OS is not responding to telnet, ssh, or any attempt to log in, we must resort to another log collection method. In this case, we test the keyboard to see whether the system still responds to interrupts. As mentioned at the start of this section, an easy test is to press the Caps Lock key and see whether the Caps Lock light toggles on and off. If not, the hang is considered non-interruptible, which we discuss later. If the light does toggle and the Magic SysRq keys are enabled, gather the system registers by pressing the Alt+sysrq+p key combination.

The following is output from the register dump:

SysRq: Show Regs (showPc)

Process:0,{             swapper}
kernel 2.4.9-e.3smp
EIP: 0010:[<c010542e>] CPU: 0EIP is at default_idle [kernel] 0x2e
  EFLAGS: 00000246 Not Tainted
EAX 00000000 EBX: c030a000 ECX: c030a000 EDX: 00000000
ESI: c0105400 EDI: c030a000 EBP: ffffe000 DS: 0018 ES:0018
CR0: 8005003b CR2: 0819d038 CR3: 1544c000 CR4: 000006d0

Call Trace: [<c0105492>] cpu_idle [kernel] 0x32
[<c0105000>] stext [kernel] 0x0
[<c02405e0>]  .rodata.str1.32 [kernel] 0x560


../drivers/char/sysrq.c
...
static struct sysrq_key_op sysrq_showregs_op = {
        .handler        = sysrq_handle_showregs,
        .help_msg       = "showPc",
        .action_msg     = "Show Regs",
};

Referring to the register dump output, we can assume the machine is in an idle loop because the kernel is in the default_idle function and the machine is no longer responding. This message also informs us that the kernel is not “tainted.” The latest source code provides us with the various tainted kernel states, as shown in the following code snippet.

linux/kernel/panic.c
...
/**
* print_tainted - return a string to represent the kernel taint state.
*
* 'P' - Proprietary module has been loaded.

* 'F' - Module has been forcibly loaded.
* 'S' - SMP with CPUs not designed for SMP.
* 'U' - Unsupported modules loaded.
* 'X' - Modules with external support loaded.
*
*     The string is overwritten by the next call to print_taint().
*/

const char *print_tainted(void)
{
        static char buf[20];
        if (tainted) {
                snprintf(buf, sizeof(buf), "Tainted: %c%c%c%c%c",
                        tainted & TAINT_MACHINE_CHECK ? 'M' : ' ',
                        tainted & TAINT_PROPRIETARY_MODULE ? 'P' : 'G',
                        tainted & TAINT_FORCED_MODULE ? 'F' : ' ',
                        tainted & TAINT_UNSAFE_SMP ? 'S' : ' ',
                        tainted & TAINT_NO_SUPPORT ? 'U' :
                                (tainted & TAINT_EXTERNAL_SUPPORT ? 'X' :
' '));
        }
        else
                snprintf(buf, sizeof(buf), "Not tainted");
        return(buf);
}

In most cases, if the kernel were in a tainted state, a tech support organization would suggest that you remove the “unsupported” kernel module that is tainting the kernel before proceeding to troubleshoot the issue. In this case, the kernel is not tainted, so we proceed along our original path.

Reviewing the register dump tells us the offset location for the instruction pointer. In this case, the offset is at default_idle+46 (0x2e hex = 46 dec). With this new information, we can use GDB to obtain the instruction details.

gdb vmlinux-2.4.9-e.3smp
(gdb) disassemble default_idle

Dump of assembler code for function default_idle:
0xc0105400 <default_idle>:      mov    $0xffffe000,%ecx
0xc0105405 <default_idle+5>:    and    %esp,%ecx
0xc0105407 <default_idle+7>:    mov    0x20(%ecx),%edx
0xc010540a <default_idle+10>:   mov    %edx,%eax
0xc010540c <default_idle+12>:   shl    $0x5,%eax
0xc010540f <default_idle+15>:   add    %edx,%eax
0xc0105411 <default_idle+17>:   cmpb   $0x0,0xc0366985(,%eax,4)
0xc0105419 <default_idle+25>:   je     0xc0105431 <default_idle+49>
0xc010541b <default_idle+27>:   mov    0xc0365208,%eax
0xc0105420 <default_idle+32>:   test   %eax,%eax
0xc0105422 <default_idle+34>:   jne    0xc0105431 <default_idle+49>
0xc0105424 <default_idle+36>:   cli
0xc0105425 <default_idle+37>:   mov    0x14(%ecx),%eax
0xc0105428 <default_idle+40>:   test   %eax,%eax
0xc010542a <default_idle+42>:   jne    0xc0105430 <default_idle+48>
0xc010542c <default_idle+44>:   sti
0xc010542d <default_idle+45>:   hlt
0xc010542e <default_idle+46>:   ret    ←  Our offset!
0xc010542f <default_idle+47>:   nop
0xc0105430 <default_idle+48>:   sti
0xc0105431 <default_idle+49>:   ret
0xc0105432 <default_idle+50>:   lea    0x0(%esi,1),%esi

Now we know that the OS is hung on a return to caller. At this point, we are stuck because this return could have been caused by some other instruction that had already taken place.

Solution 2-1: Update the Kernel

The problem was solved when we updated the kernel to the latest supported patch release. Before spending considerable time and resources tracking down what appears to be a bug, or “feature,” as we sometimes say, confirm that the kernel and all the relevant applications have been patched or updated to their latest revisions. In this case, after the kernel was patched, the hang was no longer reproducible.

The Magic SysRq is logged in three places: the kernel message ring buffer (read by dmesg), the syslog, and the console. The package responsible for this is sysklogd, which provides klogd and syslogd. Of course, not all events are logged to the console. Event levels control whether something is logged to the console. To enable all messages to be printed on the console, set the log level to 8 through dmesg -n 8 or klogd -c 8. If you are already on the console, you can use the SysRq keys to indicate the log level by pressing Alt+sysrq+level, where level is a number from 0 to 8. More details on these commands can be found in the dmesg and klogd man pages and of course in the source code.

Reviewing the source, we can see that not all the keyboard characters are used.

See: /drivers/char/sysrq.c
...
static struct sysrq_key_op *sysrq_key_table[SYSRQ_KEY_TABLE_LENGTH] = {
/* 0 */ &sysrq_loglevel_op,
/* 1 */ &sysrq_loglevel_op,
/* 2 */ &sysrq_loglevel_op,
/* 3 */ &sysrq_loglevel_op,
/* 4 */ &sysrq_loglevel_op,
/* 5 */ &sysrq_loglevel_op,
/* 6 */ &sysrq_loglevel_op,
/* 7 */ &sysrq_loglevel_op,
/* 8 */ &sysrq_loglevel_op,
/* 9 */ &sysrq_loglevel_op,
/* a */ NULL, /* Don't use for system provided sysrqs,
                 it is handled specially on the spark
                 and will never arrive */
/* b */ &sysrq_reboot_op,
/* c */ &sysrq_crash_op,
/* d */ NULL,
/* e */ &sysrq_term_op,
/* f */ NULL,
/* g */ NULL,
/* h */ NULL,
/* i */ &sysrq_kill_op,
/* j */ NULL,

#ifdef CONFIG_VT
/* k */ &sysrq_SAK_op,
#else
/* k */ NULL,
#endif
/* l */ NULL,
/* m */ &sysrq_showmem_op,
/* n */ NULL,
/* o */ NULL, /* This will often be registered
                 as 'Off' at init time */
/* p */ &sysrq_showregs_op,
/* q */ NULL,
/* r */ &sysrq_unraw_op,
/* s */ &sysrq_sync_op,
/* t */ &sysrq_showstate_op,
/* u */ &sysrq_mountro_op,
/* v */ NULL,
/* w */ &sysrq_showcpus_op,
/* x */ NULL,
/* w */ NULL,
/* z */ NULL
};
...

Collecting the dump is more difficult if the machine is not set up properly. In the case of an interruptible hang, the syslog daemon might not be able to write to its message file. In this case, we have to rely on the console to collect the dump messages. If the only console on the machine is a Graphics console, you must write the dump out by hand. Note that the dump messages are written only to the virtual console, not to X Windows. The Linux kernel addresses this panic scenario by making the LEDs on the keyboard blink, notifying the administrator that this is not an OS hang but rather an OS panic.

The following 2.4 series source code illustrates this LED-blinking feature that is used when the Linux kernel pulls a panic. Notice that we start with the kernel/panic.c source to determine which functions are called and to see whether anything relating to blinking is referenced.

# linux/kernel/panic.c
...
       for(;;) {
#if defined(CONFIG_X86) && defined(CONFIG_VT)
                extern void panic_blink(void);
                panic_blink();
#endif
                CHECK_EMERGENCY_SYNC
       }
...

We tracked down the panic_blink() function in the following source:

# linux/drivers/char/pc_keyb.c
...
static int blink_frequency = HZ/2;

/* Tell the user who may be running in X and not see the console that
   we have panicked. This is to distinguish panics from "real" lockups.
   Could in theory send the panic message as Morse, but that is left as
   an exercise for the reader. */
void panic_blink(void)
{
        static unsigned long last_jiffie;
        static char led;
        /* Roughly 1/2s frequency. KDB uses about 1s. Make sure it is
           different. */
        if (!blink_frequency)
                return;
        if (jiffies - last_jiffie > blink_frequency) {
                led ^= 0x01 | 0x04;
                while (kbd_read_status() & KBD_STAT_IBF) mdelay(1);
                kbd_write_output(KBD_CMD_SET_LEDS);
                mdelay(1);
                while (kbd_read_status() & KBD_STAT_IBF) mdelay(1);

                mdelay(1);
                kbd_write_output(led);
                last_jiffie = jiffies;
        }
}

static int __init panicblink_setup(char *str)
{
    int par;
    if (get_option(&str,&par))
            blink_frequency = par*(1000/HZ);
    return 1;
}
/* panicblink=0 disables the blinking as it caused problems with some
   console switches. Otherwise argument is ms of a blink period. */
__setup("panicblink=", panicblink_setup);

By default, the 2.6 kernel release does not include the panic_blink() function. It was later added through a patch.

Even though this source informs the user that the machine has pulled a panic, it does not give us the stack or even the kernel state. Maybe, if we were lucky, klogd was able to write it to the syslog, but if we were lucky, the machine would not have panicked. For this reason, we recommend configuring a serial console so that you can collect the panic message if a panic takes place. Refer to Scenario 2-3 for an illustration.

Troubleshooting Non-Interruptible Hangs

The non-interruptible hang is the worst kind of hang because the standard troubleshooting techniques mentioned previously do not work, so correcting the problem is substantially more difficult. Again, the first step is to confirm that the hardware is supported and that all the drivers have been tested with this configuration. Keep in mind that hardware vendors and OS distributions spend vast resources confirming the supported configurations. Therefore, if the machine you are troubleshooting is outside of the supported configuration, it would be considered a best effort by those in the Linux community. It would be best to remove the unsupported hardware or software and see whether the problem persists.

Try to determine what the OS was doing before the hang. Ask these questions: Does the hang occur frequently? What application(s) are spawned at the time of the hang? What, if any, hardware is being accessed (for example, tape, disk, CD, DVD, and so on)? What software or hardware changes have been made on the system since the hangs began?

The answers to these questions provide “reference points” and establish an overall direction for troubleshooting the hang. For example, an application using a third-party driver module might have caused a crash that left CPU spinlocks (hardware locks) in place. In this case, it is probably not a coincidence that the machine hung every time when the user loaded his or her new driver module.

You should attempt to isolate the application or hardware interfaces being used. This might be futile, though, because the needed information might not be contained within the logs. Chances are, the CPU is looping in some type of spinlock, or the kernel attempts to crash, but the bus is in a compromised state, preventing the kernel from proceeding with the crash handler.

When the kernel has gotten into a non-interruptible hang, the goal is to get the kernel to pull a panic, save state, and create a dump. Linux achieves this goal on some platforms with a boot option to the kernel, enabling non-maskable interrupts (nmi_watchdog). More detailed information can be found in linux/Documentation/nmi_watchdog.txt. In short, the kernel sends an interrupt to the CPU every five seconds. As long as the CPU responds, the kernel stays up. When the interrupt does not return, the NMI handler generates an oops, which can be used to debug the hang. However, as with interruptible hangs, we must be able to collect the kernel dump, and this is where the serial console plays a role. If panic_on_oops is enabled, the kernel pulls a panic(), enabling other dump collection mechanisms, which are discussed later in this chapter.

We recommend getting the hardware and distribution vendors involved. As stated earlier, they have already put vast resources into confirming the hardware and software operability.

Thus, the most effective ways to troubleshoot a non-interruptible hang can be obvious. For example, sometimes it is important to take out all unnecessary hardware and drivers, leaving the machine in a “bare bones” state. It might also be important to confirm that the OS kernel is fully up-to-date on any patches. In addition, stop all unnecessary software.

Scenario 2-2: Linux 2.4.9-e.27 Kernel

In this scenario, the user added new fiber cards to an existing database system, and now the machine hangs intermittently, always at night when peak loads are down. Nothing of note seems to trigger it, although the user is running an outdated kernel. The user tried gathering a sysrq+p for each processor on the system, which requires two keystrokes for each processor. With newer kernels, sysrq+w performs a “walk” of all the CPUs with one keystroke. The user then tries gathering sysrq+m and sysrq+t. Unfortunately, the console is hung and does not accept break sequences.

The next step is to disable all unnecessary drivers and to enable a forced kernel panic. The user set up nmi-watchdog so that he could get a forced oops panic. Additionally, all hardware monitors were disabled, and unnecessary driver modules were unloaded. After disabling the hardware monitors, the user noticed that the hangs stopped occurring.

Solution 2-2: Update the Hardware Monitor Driver

We provided the configuration and troubleshooting methodology (action plan) to the hardware vendor so that its staff could offer a solution. The fact that this symptom only took place when running their monitor was enough ammunition. The hardware event lab was aware of such events and already had released a newer monitor. While the administrator was waiting on the newer monitor module to be available, he removed the old one, preventing the kernel from experiencing hangs.

If the kernel hang persists and the combination of these tools does not lead to an answer, enabling kernel debugging might be the way to go. Directions on how to use KDB can be found at http://oss.sgi.com/projects/kdb/.

OS Panics

An OS panic is caused by some unexpected condition or kernel state that results in a voluntary kernel shutdown. In this case, we are not talking about the OS shutdown command, but rather a condition where the code finds itself calling panic().

Because the panic is a voluntary kernel shutdown, a reboot is necessary before troubleshooting can begin. By default, Linux does not reboot when encountering a panic(). Automatic system reboots can be set by entering the number of seconds to wait in /proc/sys/kernel/panic. 0 is the default for most Linux distributions, meaning that the system will not reboot and will remain in a hung state. Otherwise, a hardware-forced reset can be used.

Troubleshooting OS Panics

To troubleshoot an OS panic, first try to obtain a dump. Consult the console, which contains the panic string. The panic string leads us to the source of the panic. From there, we can determine the function calls that were made at the time of the panic. Sometimes the console data is not enough. If more data is required, a dump utility must be enabled. When the kernel pulls panic(), the crash dump takes control and writes the kernel memory to a dump device. To date, this feature is not in kernel.org.

Several competing technologies are available for obtaining a dump. For example, the Linux distributions SGI, SUSE, and HP Telco use LKCD (Linux Kernel Crash Dump). Red Hat offers netdump and its alternative diskdump (similar to lkcd). Again, these mechanisms get triggered as a result of a kernel panic() and depend on the dump driver supporting the underlying hardware. That being said, if the system’s state is unstable (for example, spinlocks, bus state compromised, and CPU interrupt state), these utilities might not be able to save a kernel dump.

Unlike other flavors of Unix, the Linux kernel does not have a native dump mechanism. Rather, in Linux, a kernel dump is the result of a panic in which one of the aforementioned capabilities is enabled.

Scenario 2-3: Users Experience Multiple OS Panics

In this case, the system administrator has a machine that has been in service for some time. This is her primary production machine, and she needs to add a new PCI host bus adapter (HBA). Instead of confirming that the kernel is at the supported patch level or, for that matter, confirming that the machine boots properly after having just installed 250 package updates two weeks earlier, she decides to simply shut down the system and install the new card. Because this will affect production, management has allotted 30 minutes to perform hardware addition, and then the machine must be back online.

After shutting down the system and adding the hardware, the system administrator gets the machine to boot with no errors. After a few minutes pass, however, the system administrator notices that the machine is no longer responding, and the console shows that the machine has panicked. Because production has been impacted, managers become involved, and the system administrator is under pressure to get the machine stabilized.

Because the machine was not booted since the last package updates were installed, it is very difficult to determine whether the PCI card is causing the problem. The first step is to review the stack trace in an attempt to isolate the code section that triggered the panic.

Stack traces appear like this:     Bad slab found on cache free list
      slab 0xf53d8580:
        next 0xf7f7a0b0, prev 0xf7f7a0b0, mem 0xf43ef000
        colouroff 0x0000, inuse 0xffffffe3, free 0x0000
      cache 0xf7f7a0a0 ("names_cache"):
        full    0xf7f788a0 <-> 0xf53d85a0
        partial 0xf7f7a0a8 <-> 0xf7f7a0a8
        free    0xf53d8580 <-> 0xf53d8580
        next    0xf7f7a200 <-> 0xf7f7bf38
        objsize 0x1000, flags 0x12000, num 0x0001, batchcount 0x001e
        order 0, gfp 0x0000, colour 0x0000:0x0020:0x0000
        slabcache 0xf7f7c060, growing 0, dflags 0x0001, failures 0

    kernel BUG at slab.c:2010!
    invalid operand: 0000
    Kernel 2.4.9-e.49smp
    CPU:      1
    EIP:      0010:[<c0138d95>]    Tainted: P
    EFLAGS: 00010082
    EIP is at proc_getdata [kernel] 0x145
    eax: 0000001e   ebx: f53d8580   ecx: c02f8b24    edx: 000054df
    esi: f7f7a0a0   edi: 0000085e   ebp: 00000013    esp: f42ffef8
    ds: 0018   es: 0018       ss: 0018
    Process bgscollect (pid: 2933, stackpage=f42ff000)
    Stack: c0267dfb 000007da 00000000 00000013 f42fff68 f8982000 00000c00
    00000000

       c0138eec f8982000 f42fff68 00000000 00000c00 f8982000 00000c00
    00000000
       c0169e8a f8982000 f42fff68 00000000 00000c00 f42fff64 00000000
    f42fe000
    Call Trace: [<c0267dfb>] .rodata.str1.1 [kernel] 0x2c16 (0xf42ffef8)
    [<c0138eec>] slabinfo_read_proc [kernel] 0x1c (0xf42fff18)
    [<c0169e8a>] proc_file_read [kernel] 0xda (0xf42fff38)
    [<c0146296>] sys_read [kernel] 0x96 (0xf42fff7c)
    [<c01073e3>] system_call [kernel] 0x33 (0xf42fffc0)

Immediately, we can see that this is a tainted kernel and that the module that has tainted the kernel is proprietary in nature. This module might be the culprit. However, because the machine has been in production for a while, it would be difficult to blame the panic on the driver module. However, the panic occurred because of memory corruption, which could be hardware or software related.

Continuing with troubleshooting, we note that additional stack traces from the console appear like this:

  2:40:50: ds: 0018   es: 0018   ss: 0018
12:40:50: Process kswapd (pid: 10, stackpage=f7f29000)
12:40:50: Stack: c0267dfb 00000722 00000000 f7f7a0b0 f7f7a0a8 c0137c13
c5121760 00000005
12:40:50:        00000000 00000000 00000000 00000018 000000c0 00000000
0008e000 c013ca6f
12:40:51:        000000c0 00000000 00000001 00000000 c013cb83 000000c0
00000000 c0105000
12:40:51: Call Trace: [<c0267dfb>] .rodata.str1.1 [kernel] 0x2c16
(0xf7f29f78)
12:40:51: [<c0137c13>] kmem_cache_shrink_nr [kernel] 0x53 (0xf7f29f8c)
12:40:51: [<c013ca6f>] do_try_to_free_pages [kernel] 0x7f (0xf7f29fb4)
12:40:51: [<c013cb83>] kswapd [kernel] 0x103 (0xf7f29fc8)
12:40:51: [<c0105000>] stext [kernel] 0x0 (0xf7f29fd4)
12:40:51: [<c0105000>] stext [kernel] 0x0 (0xf7f29fec)
12:40:51: [<c0105856>] arch_kernel_thread [kernel] 0x26 (0xf7f29ff0)
12:40:51: [<c013ca80>] kswapd [kernel] 0x0 (0xf7f29ff8)

12:40:51:
12:40:51:
12:40:51: Code: 0f 0b 58 5a 8b 03 45 39 f8 75 dd 8b 4e 2c 89 ea 8b 7e 4c d3
12:40:51:  <0>Kernel panic: not continuing
12:40:51: Uhhuh. NMI received for unknown reason 30.
12:51:30: Dazed and confused, but trying to continue.
12:51:30: Do you have a strange power saving mode enabled?

It is difficult to identify exactly what is causing the problem here; however, because NMI caused the panic, the problem is probably hardware related. The kernel error message, “NMI received for unknown reason,” informs us that the system administrator has set up NMI in case of a hardware hang. Looking through the source, we find this message mentioned in linux/arch/i386/kernel/traps.c.

The following is a snapshot of the source:

...
static void unknown_nmi_error(unsigned char reason, struct pt_regs *
regs)
{
#ifdef CONFIG_MCA
        /* Might actually be able to figure out what the guilty party
        * is. */
        if( MCA_bus ) {
                mca_handle_nmi();
                return;
        }
#endif
        printk("Uhhuh. NMI received for unknown reason %02x. ", reason);
        printk("Dazed and confused, but trying to continue ");
        printk("Do you have a strange power saving mode enabled? ");
}
...

Solution 2-3: Replace the PC Card

Because the HBA was new to the environment, and because replacing it was easier and faster than digging through the stacks and debugging each crash, we suggested that the administrator simply replace the card with a new HBA. After obtaining a new replacement for the PCI card, the kernel no longer experienced panics.

Troubleshooting Panics Resulting from Oops

It is possible for an oops to cause an OS panic. Sometimes applications attempt to use invalid pointers. As a result, the kernel identifies and kills the process that called into the kernel and lists its stack, memory address, and kernel register values. This scenario is known as a kernel oops. Usually the result of bad code, the oops is debugged with the ksymoops command. In today’s Linux distributions, klogd uses the kernel’s symbols to decode the oops and pass it off to the syslog daemon, which in turn writes the oops to the message file (normally /var/log/messages).

If the kernel, in killing the offending process, does not kill the interrupt handler, the OS does not panic. However, this does not mean that the kernel is safe to use. It is possible that the program just made a bad code reference; however, it is also possible for the application to put the kernel in such a state that more oops follow. If this occurs, focus on the first oops rather than subsequent ones. To avoid running the machine in this relatively unstable state, enable the “panic on oops” option, controlled by the file /proc/sys/kernel/panic_on_oops. Of course, the next time the kernel encounters any kind of oops (whether or not it is the interrupt handler), it panics. If the dump utilities are enabled, a dump that can be analyzed occurs.

Scenario 2-4: Oops Causes Frequent System Panics

In this scenario, an application has been performing many NULL pointer dereferences (oops), and because the system administrator has the kernel configured to panic on oops, each oops results in a panic. The oops only seems to take place when the system is under a heavy load. The heavy load is caused by an application called VMware. This product creates a virtual machine of another OS type. In this case, the system administrator is running several virtual machines under the Linux kernel.

We gather the VMware version from the customer along with the kernel version (also noted in the oops). The next step is to review the logs and screen dumps. The dump details are as follows:

Unable to handle kernel NULL pointer dereference at virtual address
00000084
*pde = 20ffd001
Oops: 0000
Kernel 2.4.9-e.38enterprise
CPU:    4
EIP:    0010:[<c0138692>] Tainted: PF
EFLAGS: 00013002
EIP is at do_ccupdate_local [kernel] 0x22
eax: 00000000   ebx: 00000004   ecx: f7f15efc   edx: c9cc8000
esi: 00000080   edi: c9cc8000   ebp: c0105420   esp: c9cc9f60
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, stackpage=c9cc9000)
Stack: c9cc8000 c9cc8000 c9cc8000 c0113bef f7f15ef8 c0105420 c02476da
c0105420
       c9cc8000 00000004 c9cc8000 c9cc8000 c0105420 00000000 c9cc0018
c9cc0018
       fffffffa c010544e 00000010 00003246 c01054b2 0402080c 00000000
00000000
Call Trace: [<c0113bef>] smp_call_function_interrupt [kernel] 0x2f
(0xc9cc9f6c)
[<c0105420>] default_idle [kernel] 0x0 (0xc9cc9f74)
[<c02476da>] call_call_function_interrupt [kernel] 0x5 (0xc9cc9f78)
[<c0105420>] default_idle [kernel] 0x0 (0xc9cc9f7c)
[<c0105420>] default_idle [kernel] 0x0 (0xc9cc9f90)
[<c010544e>] default_idle [kernel] 0x2e (0xc9cc9fa4)
[<c01054b2>] cpu_idle [kernel] 0x32 (0xc9cc9fb0)
[<c011ceb8>] printk [kernel] 0xd8 (0xc9cc9fd0)
[<c0265e4a>] .rodata.str1.1 [kernel] 0xd25 (0xc9cc9fe4)

Code: 8b 3c 1e 89 04 1e 8b 42 20 89 3c 81 5b 5e 5f c3 8d b4 26 00
LLT:10035: timer not called for 122 ticks

Kernel panic: not continuing
In idle task - not syncing

Interesting—“oops” in the swapper code. This does not sound right because we know that virtually no Linux machines panic because of swapper. So, in this case, we assume that some other software bundle or hardware exception is causing the anomaly. Notice that this kernel is tainted because it has a proprietary module that has been forcibly loaded.

While we are researching the traces and studying the source, the machine panics again. The details of the subsequent kernel panic follow:

Unable to handle kernel NULL pointer dereference at virtual address
00000074
*pde = 24c6b001
Oops: 0000
Kernel 2.4.9-e.38enterprise
CPU:    0
EIP:    0010:[<c0138692>] Tainted: PF
EFLAGS: 00013002
EIP is at do_ccupdate_local [kernel] 0x22
eax: 00000000   ebx: 00000004   ecx: c9cedefc   edx: e7998000
esi: 00000070   edi: 0000013b   ebp: e7999f90   esp: e7999df8
ds: 0018   es: 0018   ss: 0018
Process vmware (pid: 9131, stackpage=e7999000)
Stack: e7998000 e7998000 0000013b c0113bef c9cedef8 00000000 c02476da
00000000
       efa1fca0 c0335000 e7998000 0000013b e7999f90 00000000 e7990018
f90c0018
       fffffffa f90cc2a2 00000010 00003286 00003002 ca212014 00000001
e6643180
Call Trace: [<c0113bef>] smp_call_function_interrupt [kernel] 0x2f
(0xe7999e04)
[<c02476da>] call_call_function_interrupt [kernel] 0x5 (0xe7999e10)
[<f90cc2a2>] .text.lock [vmmon] 0x86 (0xe7999e3c)
[<c0119af2>] __wake_up [kernel] 0x42 (0xe7999e5c)
[<c01da9a9>] sock_def_readable [kernel] 0x39 (0xe7999e84)

[<c021e0ad>] unix_stream_sendmsg [kernel] 0x27d (0xe7999ea0)
[<c01d7a21>] sock_recvmsg [kernel] 0x31 (0xe7999ed0)
[<c01d79cc>] sock_sendmsg [kernel] 0x6c (0xe7999ee4)
[<c01d7bf7>] sock_write [kernel] 0xa7 (0xe7999f38)
[<c0146d36>] sys_write [kernel] 0x96 (0xe7999f7c)
[<c0156877>] sys_ioctl [kernel] 0x257 (0xe7999f94)
[<c01073e3>] system_call [kernel] 0x33 (0xe7999fc0)

Code: 8b 3c 1e 89 04 1e 8b 42 20 89 3c 81 5b 5e 5f c3 8d b4 26 00
<4>rtc: lost some interrupts at 256Hz.
Kernel panic: not continuing [Tue Jul 27
LLT:10035: timer not called for 150 ticks

The second panic reveals that the machine was in VMware code and on a different CPU.

Solution 2-4: Install a Patch

It took the combined efforts of many engineers to isolate and provide a fix for this scenario. The problem was found to reside in the smp_call_function() in the Red Hat Advanced Server 2.1 release, which used the 2.4.9 kernel. It turns out that the Linux kernel available on http://www.kernel.org did not contain the bug, so no other distributions experienced the issue. The Red Hat team, with the assistance of the VMware software team, provided a fix for the condition and resolved the oops panics.

Hardware Machine Checks

Finally, let us briefly discuss the IPF’s IA64 hardware aborts and interrupts. Whereas hangs and panics are normally associated with how the kernel handles the scenario, the firmware is in control during a Machine Check Abort (MCA) or INIT.

Work is under way for the IA64 kernel to handle INITs. An INIT is a Processor Abstraction Layer (PAL)-based interrupt where in essence a transfer of control has taken place from the OS to the hardware. This interrupt is serviced by PAL firmware, system firmware, or in some cases the OS. During an INIT, the System Abstraction Layer (SAL) checks for an OS INIT handler; if one is not present, a soft reset takes place. Some of the current Linux kernel releases have built-in support for the INIT handler, and if a dump utility is enabled, the kernel will perform a panic and create a memory dump. In addition to the dump, we can extract the processor registers from the Extensible Firmware Interface (EFI), which can enable us to determine the code of execution at the time of the INIT.

A hardware abort, better known on the IA64 platform as an MCA, tells the processors to save state and perform a hardware reset. Hardware normally is the cause; however, software can be the cause as well. Examples of such an abort would be any sort of double bit error (whether it be in memory bus, I/O bus, or CPU bus). The current kernel releases do not show any console output when the IA64 platform experiences an MCA; however, work is under way to remedy this issue.

The kernel is essentially independent of a hardware machine check. Other tools must be used to collect the savestate registers and isolate the hardware at fault. At the EFI prompt, the command errordump is used to collect savestate registers. At this point, the hardware vendor must be contacted to decode the dump and processor registers.

Summary

When encountering a kernel hang, panic, oops, or MCA (IA-64 only), remember that troubleshooting each condition involves key steps. In the event of an OS hang, you must decide whether it is an interruptible hang. After you determine this, you can proceed with troubleshooting. The goal with any hang, panic, oops, or MCA is to obtain a stack trace. This information is necessary early in the troubleshooting process to guide us to the source of the problem. Let us recap the key points of each type of scenario and the steps to troubleshooting them.

Interruptible hangs

  1. Use the Magic SysRq keys to attempt to gather stack of processor and offending processes.
  2. Check the registers: Alt+sysrq+p.
  3. Gather process stacks: Alt+sysrq+t.
  4. If SMP kernel, gather all processor stack: Alt+sysrq+w.
  5. Synchronize filesystems: Alt+sysrq+s.
  6. Reboot (soft reset) the system to clear the hang: Alt+sysrq+b.
  7. After system is booted, review all the logs.
  8. A serial console may be required to capture output of the sysrq keys.

Non-interruptible hangs

  1. Set up dump utility in case a panic is taking place. Recommended dump utilities include diskdump and lkcd. If running on an IPF system, a dump can be achieved by issuing a TOC, forcing a hardware INIT. The System Abstraction Layer then sees that the OS has an INIT handler. If the functionality is in place, the kernel handles the INIT and pulls panic(), utilizing the aforementioned dump utilities to create a dump. (SUSE Linux Enterprise Server 9 (ia64) - Kernel 2.6.5-7.97 uses lkcd and has this feature enabled by default.)
  2. On IA-32 x86 systems, the nmi_watchdog timer can be helpful in troubleshooting a hang. See linux/Documentation/nmi_watchdog.txt.
  3. As with interruptible hangs, review system logs.

Panics

  1. Collect the panic string.
  2. Review hardware and software logs.
  3. If problem cannot be identified through the console, the dump utilities must be enabled.

Oops

  1. Review the stack trace of the oops with ksymoops (no longer needed with the latest klogd and kernel releases).
  2. Locate the line that states Unable to handle kernel NULL pointer.
  3. Locate the line showing the instruction pointer (IP).
  4. Use gdb to look at the surrounding code.

MCA

At the EFI shell (IA64 only), collect the CPU registers by performing the following steps:

  1. shell> errdump mca > mca.out.
  2. shell> errdump init > init.out.
  3. Send to hardware vendor for review.

Although these are the key conditions and steps to remember, every troubleshooting process is unique and should be evaluated individually to determine the ideal troubleshooting path. Of course, before consuming vast resources troubleshooting a problem, confirm that you are running on the latest supported kernel and that all software/hardware combinations are in their vendor-supported configurations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.44.94