Core files

Core files capture the state of a failing program at the point that it terminates. You don't even have to be in the room with a debugger when the bug manifests itself. So when you see Segmentation fault (core dumped), don't shrug; investigate the core file and extract the goldmine of information in there.

The first observation is that core files are not created by default, but only when the core file resource limit for the process is non-zero. You can change it for the current shell using ulimit -c. To remove all limits on the size of core files, type the following:

$ ulimit -c unlimited

By default, the core file is named core and is placed in the current working directory of the process, which is the one pointed to by /proc/<PID>/cwd. There are a number of problems with this scheme. Firstly, when looking at a device with several files named core it is not obvious which program generated each one. Secondly, the current working directory of the process may well be in a read-only filesystem, or there may not be enough space to store the core file, or the process may not have permissions to write to the current working directory.

There are two files that control the naming and placement of core files. The first is /proc/sys/kernel/core_uses_pid. Writing a 1 to it causes the PID number of the dying process to be appended to the filename, which is somewhat useful as long as you can associate the PID number with a program name from log files.

Much more useful is /proc/sys/kernel/core_pattern, which gives you a lot more control over core files. The default pattern is core but you can change it to a pattern composed of these meta characters:

  • %p: the PID
  • %u: the real UID of the dumped process
  • %g: the real GID of the dumped process
  • %s: number of the signal causing the dump
  • %t: the time of dump, expressed as seconds since the Epoch, 1970-01-01 00:00:00 +0000 (UTC)
  • %h: the hostname
  • %e: the executable filename
  • %E: the pathname of the executable, with slashes (/) replaced by exclamation marks (!)
  • %c: the core file size soft resource limit of the dumped process

You can also use a pattern that begins with an absolute directory name so that all core files are gathered together in one place. As an example, the following pattern puts all core files into the /corefiles directory and names them with the program name and the time of the crash:

# echo /corefiles/core.%e.%t > /proc/sys/kernel/core_pattern

Following a core dump, you would find something like this:

$ ls /corefiles/
core.sort-debug.1431425613

For more information, refer to the man page core(5).

For more sophisticated processing of core files you can pipe them to a program that does some post processing. The core pattern begins with a pipe symbol | followed by the program name and parameters. My Ubuntu 14.04, for example, has this core pattern:

|/usr/share/apport/apport %p %s %c %P

Apport is the crash reporting tool used by Canonical. A crash reporting tool run in this way is run while the process is still in memory, and the kernel passes the core image data to it on standard input. Thus, this program can process the image, possibly stripping parts of it to reduce the size in the filesystem, or just scanning it at the time of the core dump for specific information. The program can look at various pieces of system data, for example, reading the /proc filesystem entries for the program, and can use ptrace system calls to operate on the program and read data from it. However, once the core image data is read from standard in, the kernel does various cleanups that make information about the process no longer available.

Using GDB to look at core files

Here is a sample GDB session looking at a core file:

$ arm-poky-linux-gnueabi-gdb sort-debug /home/chris/MELP/rootdirs/rootfs/corefiles/core.sort-debug.1431425613
[...]
Core was generated by `./sort-debug'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000085c8 in addtree (p=0x0, w=0xbeac4c60 "the") at sort-debug.c:41
41     p->word = strdup (w);

That shows that the program stopped at line 43. The list command shows the code in the immediate vicinity:

(gdb) list
37    static struct tnode *addtree (struct tnode *p, char *w)
38    {
39        int cond;
40
41        p->word = strdup (w);
42        p->count = 1;
43        p->left = NULL;
44        p->right = NULL;
45

The backtrace command (shortened to bt) shows how we got to this point:

(gdb) bt
#0  0x000085c8 in addtree (p=0x0, w=0xbeac4c60 "the") at sort-debug.c:41
#1  0x00008798 in main (argc=1, argv=0xbeac4e24) at sort-debug.c:89

An obvious mistake: addtree() was called with a null pointer.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.37.12