Debugging by Watching

Sometimes the problems you’re encountering are not that bad and running an application program in user space to examine the way the driver reacts to system calls can help track down minor problems or confirm that the driver is working correctly. For example, I was able to feel confident about scull after looking at how its read implementation reacted to read requests for different amounts of data.

There are various ways to watch a user-space program working. You can run a debugger on it to step through its functions, add print statements, or run the program under strace. The last technique is most interesting when the real goal is examining kernel code.

The strace command is a powerful tool that shows all the system calls issued by a user-space program. Not only does it show the calls, but it can also show the arguments to the calls, as well as return values in symbolic form. When a system call fails, both the symbolic value of the error (e.g., ENOMEM) and the corresponding string (Out of memory) are displayed. strace has many command-line options; the most useful are -t to display the time when each call is executed, -T to display the time spent in the call, and -o to redirect the output to a file. By default, strace prints tracing information on stderr.

strace receives information from the kernel itself. This means that a program can be traced regardless of whether it was compiled with debugging support (the -g option to gcc) or whether it is stripped. You can also attach tracing to a running process, similar to the way a debugger can connect to a running process and control it.

The trace information is often used to support bug reports sent to application developers, but it’s also invaluable to kernel programmers. We’ve seen how driver code executes by making system calls; strace allows us to check the consistency of each call’s input and output data.

For example, the following screen dump shows the last lines of tracing the command ls /dev > /dev/scull0:

% strace ls /dev > /dev/scull0
[...]
readdir(3, {d_ino=894, d_name="scull0"}) = 1
readdir(3, {d_ino=895, d_name="scull1"}) = 1
readdir(3, {d_ino=896, d_name="scull2"}) = 1
readdir(3, {d_ino=897, d_name="scull3"}) = 1
readdir(3, 0x8013000)                   = 0
close(3)                                = 0
brk(0x8035000)                          = 0x8035000
brk(0x8035000)                          = 0x8035000
brk(0x8025000)                          = 0x8025000
fstat(1, {st_mode=S_IFCHR|0664, st_rdev=makedev(63, 0), ...}) = 0
ioctl(1, TCGETS, 0xbffffac4)            = -1 EINVAL (Invalid argument)
write(1, "MAKEDEV
XOR
arp
atibm
audio
"..., 4096) = 4000
write(1, "3
ttyr4
ttyr5
ttyr6
ttyr7
t"..., 96) = 96
write(1, "3
ttys4
ttys5
ttys6
ttys7
t"..., 535) = 535
_exit(0)                                = ?

It’s apparent in the first write call that after ls finished looking in the target directory, it tried to write 4KB. Strangely (for ls), only four thousand bytes were written, and the operation was retried. However, we know that the write implementation in scull writes a single quantum at a time, so we could have expected the partial write. After a few steps, everything sweeps through, and the program exits successfully.

As another example, let’s read the scull device:

% strace wc -c /dev/scull0
[...]
open("/dev/scull0", O_RDONLY)           = 3
fstat(3, {st_mode=S_IFCHR|0664, st_rdev=makedev(63, 0), ...}) = 0
read(3, "MAKEDEV
XOR
arp
atibm
audio
"..., 16384) = 4000
read(3, "3
ttyr4
ttyr5
ttyr6
ttyr7
t"..., 16384) = 631
read(3, "", 16384)                      = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(4, 13), ...}) = 0
brk(0x800b000)                          = 0x800b000
ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0
write(1, "   4631 /dev/scull0
", 20   4631 /dev/scull0
)   = 20
close(3)                                = 0
_exit(0)                                = ?

As expected, read is able to retrieve only four thousand bytes at a time, but the total amount of data is the same. It’s interesting to note how retries are organized in this example, as opposed to the previous trace. wc is optimized for fast reading and thus bypasses the standard library, trying to read more data with a single system call. You can see from the read lines in the trace how wc tried to read 16KB at a time.

Unix experts can find much useful information in the output of strace. If you’re put off by all the symbols, you can limit yourself to watching how the file methods (open, read, and so on) work.

Personally, I find the tracing utility most useful for pinpointing run-time errors from system calls. Often the perror call in the application or demo program isn’t verbose enough to be useful for debugging, and being able to tell exactly which arguments to which system call triggered the error can be a great help.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.202.45