Chapter 11. Kernel Statistics

With contributions from Peter Boothby

The Solaris kernel provides a set of functions and data structures for device drivers and other kernel modules to export module-specific statistics to the outside world. This infrastructure, referred to as kstat, provides the following to the Solaris software developer:

  • C-language functions for device drivers and other kernel modules to present statistics

  • C-language functions for applications to retrieve statistics data from Solaris without needing to directly read kernel memory

  • Perl-based command-line program /usr/bin/kstat to access statistics data interactively or in shell scripts (introduced in Solaris 8)

  • Perl library interface for constructing custom performance-monitoring utilities

C-Level Kstat Interface

The Solaris libkstat library contains the C-language functions for accessing kstats from an application. These functions utilize the pseudo-device /dev/kstat to provide a secure interface to kernel data, obviating the need for programs that are setuid to root.

Since many developers are interested in accessing kernel statistics through C programs, this chapter focuses on libkstat. The chapter explains the data structures and functions, and provides example code to get you started using the library.

Data Structure Overview

Solaris kernel statistics are maintained in a linked list of structures referred to as the kstat chain. Each kstat has a common header section and a type-specific data section, as shown in Figure 11.1.

Kstat Chain

Figure 11.1. Kstat Chain

The chain is initialized at system boot time, but since Solaris is a dynamic operating system, this chain may change over time. Kstat entries can be added and removed from the system as needed by the kernel. For example, when you add an I/O board and all of its attached components to a running system by using Dynamic Reconfiguration, the device drivers and other kernel modules that interact with the new hardware will insert kstat entries into the chain.

The structure member ks_data is a pointer to the kstat’s data section. Multiple data types are supported: raw, named, timer, interrupt, and I/O. These are explained in Section 11.1.3.

The following header contains the full kstat header structure.

typedef struct kstat {
       /*
        * Fields relevant to both kernel and user
        */
       hrtime_t       ks_crtime;               /* creation time */
       struct kstat  *ks_next;                 /* kstat chain linkage */
       kid_t          ks_kid;                  /* unique kstat ID */
       char           ks_module[KSTAT_STRLEN]; /* module name */
       uchar_t        ks_resv;                 /* reserved */
       int            ks_instance;             /* module's instance */
       char           ks_name[KSTAT_STRLEN];   /* kstat name */
       uchar_t        ks_type;                 /* kstat data type */
       char           ks_class[KSTAT_STRLEN];  /* kstat class */
       uchar_t        ks_flags;                /* kstat flags */
       void          *ks_data;                 /* kstat type-specific data */
       uint_t         ks_ndata;                /* # of data records */
       size_t         ks_data_size;            /* size of kstat data section */
       hrtime_t       ks_snaptime;             /* time of last data snapshot */
       /*
        * Fields relevant to kernel only
        */
       int (*ks_update)(struct kstat *, int);
       void           *ks_private;
       int (*ks_snapshot)(struct kstat *, void *, int);
       void           *ks_lock;
} kstat_t;

The significant members are described below.

  • ks_crtime. This member reflects the time the kstat was created. Using the value, you can compute the rates of various counters since the kstat was created (“rate since boot” is replaced by the more general concept of “rate since kstat creation”).

    All times associated with kstats, such as creation time, last snapshot time, kstat_timer_t, kstat_io_t timestamps, and the like, are 64-bit nanosecond values.

    The accuracy of kstat timestamps is machine-dependent, but the precision (units) is the same across all platforms. Refer to the gethrtime(3C) man page for general information about high-resolution timestamps.

  • ks_next. kstats are stored as a NULL-terminated linked list or a chain.ks_next points to the next kstat in the chain.

  • ks_kid. This member is a unique identifier for the kstat.

  • ks_module and ks_instance. These members contain the name and instance of the module that created the kstat. In cases where there can only be one instance, ks_instance is 0. Refer to Section 11.1.4 for more information.

  • ks_name. This member gives a meaningful name to a kstat. For additional kstat namespace information, see Section 11.1.4.

  • ks_type. This member identifies the type of data in this kstat. Kstat data types are covered in Section 11.1.3.

  • ks_class. Each kstat can be characterized as belonging to some broad class of statistics, such as bus, disk, net, vm, or misc. This field can be used as a filter to extract related kstats.

    The following values are currently in use by Solaris:

    bus

    hat

    met

    rpc

    controller

    kmem_cache

    nfs

    ufs

    device_error

    kstat

    pages

    vm

    taskq

    mib2

    crypto

    errorq

    disk

    misc

    partition

    vmem

  • ks_data, ks_ndata, and ks_data_size. ks_data is a pointer to the kstat’s data section. The type of data stored there depends on ks_type. ks_ndata indicates the number of data records. Only some kstat types support multiple data records. The following kstats support multiple data records.

    • KSTAT_TYPE_RAW

    • KSTAT_TYPE_NAMED

    • KSTAT_TYPE_TIMER

      The following kstats support only one data record:

    • KSTAT_TYPE_INTR

    • KSTAT_TYPE_IO

      ks_data_size is the total size of the data section, in bytes.

  • ks_snaptime. Timestamp for the last data snapshot. With it, you can compute activity rates based on the following computational method:

    rate = (new_count - old_count) / (new_snaptimeold_snaptime)

Getting Started

To use kstats, a program must first call kstat_open(), which returns a pointer to a kstat control structure. The following header shows the structure members.

typedef struct kstat_ctl {
         kid_t     kc_chain_id;    /* current kstat chain ID */
         kstat_t   *kc_chain;      /* pointer to kstat chain */
         int       kc_kd;          /* /dev/kstat descriptor */
} kstat_ctl_t;

kc_chain points to the head of your copy of the kstat chain. You typically walk the chain or use kstat_lookup() to find and process a particular kind of kstat.kc_chain_id is the kstat chain identifier, or KCID, of your copy of the kstat chain. Its use is explained in Section 11.1.4.

To avoid unnecessary overhead in accessing kstat data, a program first searches the kstat chain for the type of information of interest, then uses the kstat_read() and kstat_data_lookup() functions to get the statistics data from the kernel.

The following code fragment shows how you might print out all kstat entries with information about disk I/O. It traverses the entire chain looking for kstats of ks_type KSTAT_TYPE_IO, calls kstat_read() to retrieve the data, and then processes the data with my_io_display(). How to implement this sample function is shown in <ref>.

     kstat_ctl_t     *kc;
     kstat_t        *ksp;
     kstat_io_t      kio;
     kc = kstat_open();
     for (ksp = kc->kc_chain; ksp != NULL; ksp = ksp->ks_next) {
       if (ksp->ks_type == KSTAT_TYPE_IO) {
          kstat_read(kc, ksp, &kio);
          my_io_display(kio);
       }
     }

Data Types

The data section of a kstat can hold one of five types, identified in the ks_type field. The following kstat types can hold multiple records. The number of records is held in ks_ndata.

  • KSTAT_TYPE_RAW

  • KSTAT_TYPE_NAMED

  • KSTAT_TYPE_TIMER

The other two types are KSTATE_TYPE_INTR and KSTATE_TYPE_IO. The field ks_data_size holds the size, in bytes, of the entire data section.

KSTAT_TYPE_RAW

The “raw” kstat type is treated as an array of bytes and is generally used to export well-known structures, such as vminfo (defined in /usr/include/sys/sysinfo.h). The following example shows one method of printing this information.

static void print_vminfo(kstat_t *kp)
{
     vminfo_t *vminfop;
     vminfop = (vminfo_t *)(kp->ks_data);

     printf("Free memory: %dn", vminfop->freemem);
     printf("Swap reserved: %dn" , vminfop->swap_resv);
     printf("Swap allocated: %dn" , vminfop->swap_alloc);
     printf("Swap available: %dn", vminfop->swap_avail);
     printf("Swap free: %dn", vminfop->swap_free);
}

KSTAT_TYPE_NAMED

This type of kstat contains a list of arbitrary name=value statistics. The following example shows the data structure used to hold named kstats.

typedef struct kstat_named {
        char    name[KSTAT_STRLEN];     /* name of counter */
        uchar_t data_type;              /* data type */
        union {
                char            c[16];  /* enough for 128-bit ints */
                int32_t         i32;
                uint32_t        ui32;
                struct {
                        union {
                                char            *ptr;    /* NULL-term string */
#if defined(_KERNEL) && defined(_MULTI_DATAMODEL)
                                caddr32_t       ptr32;
#endif
                                char            __pad[8]; /* 64-bit padding */
                        } addr;
                        uint32_t        len;    /* # bytes for strlen + '' */
                } str;
#if defined(_INT64_TYPE)
                int64_t         i64;
                uint64_t        ui64;
#endif
                long            l;
                ulong_t         ul;

                /* These structure members are obsolete */

                longlong_t      ll;
                u_longlong_t    ull;
                float           f;
                double          d;
        } value;                        /* value of counter */
} kstat_named_t;

#define KSTAT_DATA_CHAR         0
#define KSTAT_DATA_INT32        1
#define KSTAT_DATA_UINT32       2
#define KSTAT_DATA_INT64        3
#define KSTAT_DATA_UINT64       4

#if !defined(_LP64)
#define KSTAT_DATA_LONG         KSTAT_DATA_INT32
#define KSTAT_DATA_ULONG        KSTAT_DATA_UINT32
#else
#if !defined(_KERNEL)
#define KSTAT_DATA_LONG         KSTAT_DATA_INT64
#define KSTAT_DATA_ULONG        KSTAT_DATA_UINT64
#else
#define KSTAT_DATA_LONG         7       /* only visible to the kernel */
#define KSTAT_DATA_ULONG        8       /* only visible to the kernel */
#endif  /* !_KERNEL */
#endif  /* !_LP64 */
                                                                        See sys/kstat.h

The program in the above example uses a function my_named_display() to show how one might display named kstats.

Note that if the type is KSTAT_DATA_CHAR, the 16-byte value field is not guaranteed to be null-terminated. This is important to remember when you are printing the value with functions like printf().

KSTAT_TYPE_TIMER

This kstat holds event timer statistics. These provide basic counting and timing information for any type of event.

typedef struct kstat_timer {
        char            name[KSTAT_STRLEN];     /* event name */
        uchar_t         resv;                   /* reserved */
        u_longlong_t    num_events;             /* number of events */
        hrtime_t        elapsed_time;           /* cumulative elapsed time */
        hrtime_t        min_time;               /* shortest event duration */
        hrtime_t        max_time;               /* longest event duration */
        hrtime_t        start_time;             /* previous event start time */
        hrtime_t        stop_time;              /* previous event stop time */
} kstat_timer_t;
                                                                        See sys/kstat.h

KSTAT_TYPE_INTR

This type of kstat holds interrupt statistics. Interrupts are categorized as listed in Table 11.1 and as shown below the table.

Table 11.1. Types of Interrupt Kstats

Interrupt Type

Definition

Hard

Sourced from the hardware device itself

Soft

Induced by the system by means of some system interrupt source

Watchdog

Induced by a periodic timer call

Spurious

An interrupt entry point was entered but there was no interrupt to service

Multiple Service

An interrupt was detected and serviced just before returning from any of the other types

#define KSTAT_INTR_HARD      0
#define KSTAT_INTR_SOFT      1
#define KSTAT_INTR_WATCHDOG  2
#define KSTAT_INTR_SPURIOUS  3
#define KSTAT_INTR_MULTSVC   4
#define KSTAT_NUM_INTRS      5
typedef struct kstat_intr {
    uint_t intrs[KSTAT_NUM_INTRS]; /* interrupt counters */
} kstat_intr_t;
                                                                        See sys/kstat.h

KSTAT_TYPE_IO

This kstat counts I/O’s for statistical analysis.

typedef struct kstat_io {
     /*
      * Basic counters.
      */
     u_longlong_t     nread;      /* number of bytes read */
     u_longlong_t     nwritten;   /* number of bytes written */
     uint_t           reads;      /* number of read operations */
     uint_t           writes;     /* number of write operations */

     /*
      * Accumulated time and queue length statistics.
      */
     hrtime_t   wtime;            /* cumulative wait (pre-service) time */
     hrtime_t   wlentime;         /* cumulative wait length*time product*/
     hrtime_t   wlastupdate;      /* last time wait queue changed */
     hrtime_t   rtime;            /* cumulative run (service) time */
     hrtime_t   rlentime;         /* cumulative run length*time product */
     hrtime_t   rlastupdate;      /* last time run queue changed */
     uint_t     wcnt;             /* count of elements in wait state */
     uint_t     rcnt;             /* count of elements in run state */
} kstat_io_t;
                                                                        See sys/kstat.h

Accumulated Time and Queue Length Statistics

Time statistics are kept as a running sum of “active” time. Queue length statistics are kept as a running sum of the product of queue length and elapsed time at that length. That is, a Riemann sum for queue length integrated against time. Figure 11.2 illustrates a sample graphical representation of queue vs. time.

Queue Length Sampling

Figure 11.2. Queue Length Sampling

At each change of state (either an entry or exit from the queue), the elapsed time since the previous state change is added to the active time (wlen or rlen fields) if the queue length was non-zero during that interval.

The product of the elapsed time and the queue length is added to the running sum of the length (wlentime or rlentime fields) multiplied by the time.

Stated programmatically:

if (queue length != 0) {
    time += elapsed time since last state change;
    lentime +=  (elapsed time since last state change * queue length);
}

You can generalize this method to measure residency in any defined system. Instead of queue lengths, think of “outstanding RPC calls to server X.”

A large number of I/O subsystems have at least two basic lists of transactions they manage:

  • A list for transactions that have been accepted for processing but for which processing has yet to begin

  • A list for transactions that are actively being processed but that are not complete

For these reasons, two cumulative time statistics are defined:

  • Pre-service (wait) time

  • Service (run) time

The units of cumulative busy time are accumulated nanoseconds.

Kstat Names

The kstat namespace is defined by three fields from the kstat structure:

  • ks_module

  • ks_instance

  • ks_name

The combination of these three fields is guaranteed to be unique.

For example, imagine a system with four FastEthernet interfaces. The device driver module for Sun’s FastEthernet controller is called “hme”. The first Ethernet interface would be instance 0, the second instance 1, and so on. The “hme” driver provides two types of kstat for each interface. The first contains named kstats with performance statistics. The second contains interrupt statistics.

The kstat data for the first interface’s network statistics is found under ks_module == “hme”, ks_instance == 0, and ks_name == “hme0”. The interrupt statistics are contained in a kstat identified by ks_module == “hme”, ks_instance == 0, and ks_name == “hmec0”.

In that example, the combination of module name and instance number to make the ks_name field (“hme0” and “hmec0”) is simply a convention for this driver. Other drivers may use similar naming conventions to publish multiple kstat data types but are not required to do so; the module is required to make sure that the combination is unique.

How do you determine what kstats the kernel provides? One of the easiest ways with Solaris 8 is to run /usr/bin/kstat with no arguments. This command prints nearly all the current kstat data. The Solaris kstat command can dump most of the known kstats of type KSTAT_TYPE_RAW.

Functions

The following functions are available to C programs for accessing kstat data from user programs:

kstat_ctl_t * kstat_open(void);

Initializes a kstat control structure to provide access to the kernel statistics
library. It returns a pointer to this structure, which must be supplied as the kc argu-
ment in subsequent libkstat function calls.

kstat_t * kstat_lookup(kstat_ctl_t *kc, char *ks_module, int ks_instance,
                       char *ks_name);

Traverses the kstat chain searching for a kstat with a given ks_module, ks_instance, and
ks_name fields. If the ks_module is NULL, ks_instance is -1, or if ks_name is NULL, then
those fields are ignored in the search. For example, kstat_lookup(kc, NULL, -1, "foo")
simply finds the first kstat with the name "foo".

void * kstat_data_lookup(kstat_t *ksp, char *name);

Searches the kstat's data section for the record with the specified name. This operation
is valid only for kstat types that have named data records. Currently, only the KSTAT_
TYPE_NAMED and KSTAT_TYPE_TIMER kstats have named data records. You must first call
kstat_read() to get the data from the kernel. This routine then finds a particular
record in the data section.

kid_t kstat_read(kstat_ctl_t *kc, kstat_t *ksp, void *buf);

Gets data from the kernel for a particular kstat.
kid_t kstat_write(kstat_ctl_t *kc, kstat_t *ksp, void *buf);

Writes data to a particular kstat in the kernel. Only the superuser can use kstat_
write().

kid_t kstat_chain_update(kstat_ctl_t *kc);

Synchronizes the user's kstat header chain with that of the kernel.

int kstat_close(kstat_ctl_t *kc);

Frees all resources that were associated with the kstat control structure. This is done
automatically on exit(2) and execve(). (For more information on exit(2) and execve(),
see the exec(2) man page.)

Management of Chain Updates

Recall that the kstat chain is dynamic in nature. The libkstat library function kstat_open() returns a copy of the kernel’s kstat chain. Since the content of the kernel’s chain may change, your program should call the kstat_chain_update() function at the appropriate times to see if its private copy of the chain is the same as the kernel’s. This is the purpose of the KCID (stored in kc_chain_id in the kstat control structure).

Each time a kernel module adds or removes a kstat from the system’s chain, the KCID is incremented. When your program calls kstat_chain_update(), the function checks to see if the kc_chain_id in your program’s control structure matches the kernel’s. If not, kc_chain_update() rebuilds your program’s local kstat chain and returns the following:

  • The new KCID if the chain has been updated

  • 0 if no change has been made

  • -1 if some error was detected

If your program has cached some local data from previous calls to the kstat library, then a new KCID acts as a flag to indicate that you have up-to-date information. You can search the chain again to see if data that your program is interested in has been added or removed.

A practical example is the system command iostat. It caches some internal data about the disks in the system and needs to recognize that a disk has been brought on-line or off-line. If iostat is called with an interval argument, it prints I/O statistics every interval second. Each time through the loop, it calls kstat_chain_update() to see if something has changed. If a change took place, it figures out if a device of interest has been added or removed.

Putting It All Together

Your C source file must contain:

#include <kstat.h>

When your program is linked, the compiler command line must include the argument -lkstat.

$ cc -o print_some_kstats -lkstat print_some_kstats.c

The following is a short example program. First, it uses kstat_lookup() and kstat_read() to find the system’s CPU speed. Then it goes into an infinite loop to print a small amount of information about all kstats of type KSTAT_TYPE_IO. Note that at the top of the loop, it calls kstat_chain_update() to check that you have current data. If the kstat chain has changed, the program sends a short message on stderr.

/*  print_some_kstats.c:
 *  print out a couple of interesting things
 */
#include <kstat.h>
#include <stdio.h>
#include <inttypes.h>
#define SLEEPTIME 10

void my_named_display(char *, char *, kstat_named_t *);
void my_io_display(char *, char *, kstat_io_t);

main(int argc, char **argv)
{
     kstat_ctl_t    *kc;
     kstat_t       *ksp;
     kstat_io_t     kio;
     kstat_named_t *knp;

     kc = kstat_open();

     /*
      * Print out the CPU speed. We make two assumptions here:
      * 1) All CPUs are the same speed, so we'll just search for the
      *    first one;
      * 2) At least one CPU is online, so our search will always
      *    find something. :)
      */
     ksp = kstat_lookup(kc, "cpu_info", -1, NULL);
     kstat_read(kc, ksp, NULL);
     /* lookup the CPU speed data record */
     knp = kstat_data_lookup(ksp, "clock_MHz");
     printf("CPU speed of system is ");
     my_named_display(ksp->ks_name, ksp->ks_class, knp);
     printf("n");
     /* dump some info about all I/O kstats every
        SLEEPTIME seconds  */
     while(1) {
        /* make sure we have current data */
         if(kstat_chain_update(kc))
             fprintf(stderr, "<<State Changed>>n");
         for (ksp = kc->kc_chain; ksp != NULL; ksp = ksp->ks_next) {
           if (ksp->ks_type == KSTAT_TYPE_IO) {
              kstat_read(kc, ksp, &kio);
              my_io_display(ksp->ks_name, ksp->ks_class, kio);
           }
         }
         sleep(SLEEPTIME);
     } /* while(1) */

}

void my_io_display(char *devname, char *class, kstat_io_t k)
{
     printf("Name: %s Class: %sn",devname,class);
     printf("tnumber of bytes read %lldn", k.nread);
     printf("tnumber of bytes written %lldn", k.nwritten);
     printf("tnumber of read operations %dn", k.reads);
     printf("tnumber of write operations %dnn", k.writes);
}
void
my_named_display(char *devname, char *class, kstat_named_t *knp)
{
     switch(knp->data_type) {
     case KSTAT_DATA_CHAR:
          printf("%.16s",knp->value.c);
          break;
     case KSTAT_DATA_INT32:
          printf("%" PRId32,knp->value.i32);
          break;
     case KSTAT_DATA_UINT32:
          printf("%" PRIu32,knp->value.ui32);
          break;
     case KSTAT_DATA_INT64:
          printf("%" PRId64,knp->value.i64);
          break;
     case KSTAT_DATA_UINT64:
          printf("%" PRIu64,knp->value.ui64);
    }
}

Command-Line Interface

In this section, we explain tools with which you access kstat information with shell scripts. Included are a few examples to introduce the kstat(1m) program and the Perl language module it uses to extract kernel statistics.

The Solaris 8 OS introduced a new method to access kstat information from the command line or in custom-written scripts. You can use the command-line tool /usr/ bin/kstat interactively to print all or selected kstat information from a system. This program is written in the Perl language, and you can use the Perl XS extension module to write your own custom Perl programs. Both facilities are documented in the pages of the online manual.

The kstat Command

You can invoke the kstat command on the command line or within shell scripts to selectively extract kernel statistics. Like many other Solaris OS commands, kstat takes optional interval and count arguments for repetitive, periodic output. Its command options are quite flexible.

The first form follows standard UNIX command-line syntax, and the second form provides a way to pass some of the arguments as colon-separated fields. Both forms offer the same functionality. Each of the module, instance, name, or statistic specifiers may be a shell glob pattern or a Perl regular expression enclosed by “/” characters. You can use both specifier types within a single operand. Leaving a specifier empty is equivalent to using the “*” glob pattern for that specifier. Running kstat with no arguments will print out nearly all kstat entries from the running kernel (most, but not all kstats of KSTAT_TYPE_RAW are decoded).

The tests specified by the options are logically ANDed, and all matching kstats are selected. The argument for the -c, -i, -m, -n, and -s options can be specified as a shell glob pattern, or a Perl regular expression enclosed in “/” characters.

If you pass a regular expression containing shell metacharacters to the command, you must protect it from the shell by enclosing it with the appropriate quotation marks. For example, to show all kstats that have a statistics name beginning with intr in the module name cpu_stat, you could use the following script:

$ kstat -p -m cpu_stat -s 'intr*'
cpu_stat:0:cpu_stat0:intr       878951000
cpu_stat:0:cpu_stat0:intrblk    21604
cpu_stat:0:cpu_stat0:intrthread 668353070
cpu_stat:1:cpu_stat1:intr       211041358
cpu_stat:1:cpu_stat1:intrblk    280
cpu_stat:1:cpu_stat1:intrthread 209879640

The -p option used in the preceding example displays output in a parsable format. If you do not specify this option, kstat produces output in a human-readable, tabular format. In the following example, we leave out the -p flag and use the module:instance:name:statistic argument form and a Perl regular expression.

$ $ kstat cpu_stat:::/^intr/
module: cpu_stat                        instance: 0
name:   cpu_stat0                       class:     misc
        intr                            879131909
        intrblk                         21608
        intrthread                      668490486
module: cpu_stat                        instance: 1
name:   cpu_stat1                       class:     misc
        intr                            211084960
        intrblk                         280
        intrthread                      209923001

Sometimes you may just want to test for the existence of a kstat entry. You can use the -q flag, which returns the appropriate exit status for matches against given criteria. The exit codes are as follows:

  • 0: One or more statistics were matched.

  • 1: No statistics were matched.

  • 2: Invalid command-line options were specified.

  • 3: A fatal error occurred.

Suppose that you have a Bourne shell script gathering network statistics, and you want to see if the NFS server is configured. You might create a script such as the one in the following example.

#!/bin/sh
# ... do some stuff
# Check for NFS server
kstat -q nfs::nfs_server:
if [ $? = 0 ]; then
    echo "NFS Server configured"
else
    echo "No NFS Server configured"
fi
# ... do some more stuff
exit 0

Real-World Example That Uses kstat and nawk

If you are adept at writing shell scripts with editing tools like sed or awk, here is a simple example to create a network statistics utility with kstats.

The /usr/bin/netstat command has a command-line option -I interface by which you can to print out statistics about a particular network interface. Optionally, netstat takes an interval argument to print out the statistics every interval seconds. The following example illustrates that option.

$ netstat -I qfe0 5
    input   qfe0      output           input  (Total)    output
packets errs  packets errs  colls  packets errs  packets errs  colls
2971681 0     1920781 0     0      11198281 0     10147381 0     0
9       0     7       0     0      31      0     29      0     0
4       0     5       0     0      24      0     25      0     0
...

Unfortunately, this command accepts only one -I flag argument. What if you want to print statistics about multiple interfaces simultaneously, similar to what iostat does for disks? You could devise a Bourne shell script using kstat and nawk to provide this functionality. You want your output to look like the following example.

$ netstatMulti.sh ge0 ge2 ge1 5
              input             output
          packets    errs  packets    errs  colls
ge0       111702738  10    82259260   0     0
ge2       28475869   0     61288614   0     0
ge1       25542766   4     55587276   0     0
ge0       1638       0     1075       0     0
ge2       518        0     460        0     0
ge1       866        0     7688       0     0
...

The next example is the statistics script. Note that extracting the kstat information is simple, and most of the work goes into parsing and formatting the output. The script uses kstat -q to check the user’s arguments for valid interface names and then passes a list of formatted module:instance:name:statistic arguments to kstat before piping the output to nawk

#!/bin/sh
# netstatMulti.sh: print out netstat-like stats for
# multiple interfaces
#   using /usr/bin/kstat and nawk
USAGE="$0: interface_name ... interval"

INTERFACES="" # args list for kstat

while [ $# -gt 1 ]
do
    kstat -q -c net ::$1:   # test for valid interface
                            # name
    if [ $? != 0 ]; then
        echo $USAGE
        echo "  Interface $1 not found"
        exit 1
    fi
    INTERFACES="$INTERFACES ::$1:" # add to list
    shift
done

interval=$1

# check interval arg for int
if [ X`echo $interval | tr -d [0-9]` != X"" ]; then
        echo $USAGE
        exit 1
fi

kstat -p $INTERFACES $interval | nawk '
function process_stat(STATNAME, VALUE) {
    found = 0

    for(i=1;i<=5;i++) {
        if(STATNAME == FIELDS[i]) {
            found = 1
            break
        }
    }

    if ( found == 0 ) return

    kstat = sprintf("%s:%s", iface, STATNAME)
 
    if(kstat in b_kstats) {
       kstats[kstat] = VALUE - b_kstats[kstat]
    } else {
        b_kstats[kstat] = VALUE
       kstats[kstat] = VALUE
    }
}

function print_stats() {
    printf("%-10s",iface)
    for(i=1;i<=5;i++) {
        kstat = sprintf("%s:%s",iface,FIELDS[i])
        printf(FORMATS[i],kstats[kstat])
        printf(" ")
    }
    print " "
}

BEGIN {
    print "              input             output      "
    print "          packets    errs  packets     errs
      colls"
    split("ipackets,ierrors,opackets,oerrors,collisions",
      FIELDS,",")
    split("%-10u %-5u %-10u %-5u %-6u",FORMATS," ")
}

NF == 1 {
    if(iface) {
        print_stats()
    }
    split($0,t,":")
    iface = t[3]
    next
}

{
    split($1,stat,":")
    process_stat(stat[4], $2)
}

Using Perl to Access kstats

The previous example illustrates how simple it is to extract the information you need from the kernel; however, it also shows how tedious it can be to format the output in a shell script. Fortunately, the Perl extension module that /usr/bin/ kstat uses is documented so that you can write custom Perl programs. Because Perl is a “real programming language” and is ideally suited for text formatting, you can write solutions that are quite robust and comprehensive.

The Tied-Hash Interface to the kstat Facility

Access to kstats is made through a Perl extension in the XSUB interface module called Sun::Solaris::Kstat. To access Solaris kernel statistics in a Perl program, you use Sun::Solaris::Kstat; to import the module

The module contains two methods, new() and update(), correlating with the libkstat C functions kstat_open() and kstat_chain_update(). The module provides kstat data through a tree of hashes based on a three-part key, consisting of the module, instance, and name (ks_module, ks_instance, and ks_name are members of the C-language kstat struct). Following is a synopsis.

Sun::Solaris::Kstat->new();
Sun::Solaris::Kstat->update();
Sun::Solaris::Kstat->{module}{instance}{name}{statistic}

The lowest-level “statistic” member of the hierarchy is a tied hash implemented in the XSUB module and holds the following elements from struct kstat:

  • ks_crtime. Creation time, which is presented as the statistic crtime

  • ks_snaptime. Time of last data snapshot, which is presented as the statistic snaptime

  • ks_class. The kstat class, which is presented as the statistic class

  • ks_data. Kstat type-specific data decoded into individual statistics (the module produces one statistic per member of whatever structure is being decoded)

Because the module converts all kstat types, you need not worry about the different data structures for named and raw types. Most of the Solaris OS raw kstat entries are decoded by the module, giving you easy access to low-level data about things such as kernel memory allocation, swap, NFS performance, etc.

The update() Method

The update() method updates all the statistics you have accessed so far and adds a bit of functionality on top of the libkstat kstat_chain_update() function. If called in scalar context, it acts the same as kstat_chain_update(). It returns 0 if the kstat chain has not changed and 1 if it has. However, if update() is called in list context, it returns references to two arrays. The first array holds the keys of any kstats that have been added since the call to new() or the last call to update(); the second holds a list of entries that have been deleted. The entries in the arrays are strings of the form module:instance:name. This is useful for implementing programs that cache state information about devices, such as disks, that you can dynamically add or remove from a running system.

Once you access a kstat, it will always be read by subsequent calls to update(). To stop it from being reread, you can clear the appropriate hash. For example:

$kstat->{$module}{$instance}{$name} = ();

64-Bit Values

At the time the kstat tied-hash interface was first released on the Solaris 8 OS, Perl 5 could not yet internally support 64-bit integers, so the kstat module approximates these values.

  • Timer. Values ks_crtime and ks_snaptime in struct kstat are of type hrtime_t, as are values of timer kstats and the wtime, wlentime, wlastupdate, rtime, rlentime, and rlastupdate fields of the kstat I/O statistics structures. This is a C-type definition used for the Solaris high-resolution timer, which is a 64-bit integer value. These fields are measured by the kstat facility in nanoseconds, meaning that a 32-bit value would represent approximately four seconds. The alternative is to store the values as floating-point numbers, which offer approximately 53 bits of precision on present hardware. You can store 64-bit intervals and timers as floating-point values expressed in seconds, meaning that this module rounds up time-related kstats to approximately microsecond resolution.

  • Counters. Because it is not useful to store these values as 32-bit values and because floating-point values offer 53 bits of precision, all 64-bit counters are also stored as floating-point values.

Getting Started with Perl

As in our first example, the following example shows a Perl program that gives the same output as obtained by calling /usr/sbin/psrinfo without arguments.

#!/usr/bin/perl -w

# psrinfo.perl: emulate the Solaris psrinfo command
use strict;
use Sun::Solaris::Kstat;

my $kstat = Sun::Solaris::Kstat->new();

my $mh = $kstat->{cpu_info};

foreach my $cpu (keys(%$mh)) {
    my ($state, $when) = @{$kstat->{cpu_info}{$cpu}
       {"cpu_info".$cpu}}{qw(state state_begin)};
    my ($sec,$min,$hour,$mday,$mon,$year) =
       (localtime($when))[0..5];
    printf("%d	%-8s  since %.2d/%.2d/%.2d %.2d:%.2d:%.2d
",
        $cpu,$state,$mon + 1,$mday,$year - 100,$hour,$min,$sec);

}

This program produces the following output:

$ psrinfo.perl
0        on-line  since 07/09/01 08:29:00
1        on-line  since 07/09/01 08:29:07

The psrinfo command has a -v (verbose) option that prints much more detail about the processors in the system. The output looks like the following example:

$ psrinfo -v
Status of processor 0 as of: 08/17/01 16:52:44
  Processor has been on-line since 08/14/01 16:27:56.
  The sparcv9 processor operates at 400 MHz,
        and has a sparcv9 floating point processor.
Status of processor 1 as of: 08/17/01 16:52:44
  Processor has been on-line since 08/14/01 16:28:03.
  The sparcv9 processor operates at 400 MHz,
        and has a sparcv9 floating point processor.

All the information in the psrinfo command is accessible through the kstat interface. As an exercise, try modifying the simple psrinfo.perl example script to print the verbose information, as in this example.

netstatMulti Implemented in Perl

The Perl script in the following example has the same function as our previous example (in Section 11.2.2 ) that used the kstat and nawk commands. Note that we have to implement our own search methods to find the kstat entries that we want to work with. Although this script is not shorter than our first example, it is certainly easier to extend with new functionality. Without much work, you could create a generic search method, similar to how /usr/bin/kstat works, and import it into any Perl scripts that need to access Solaris kernel statistics.

#!/usr/bin/perl -w
# netstatMulti.perl: print out netstat-like stats for multiple interfaces
#  using the kstat tied hash facility

use strict;
use Sun::Solaris::Kstat;

my $USAGE = "usage: $0  ... interval";

######
# Main
######
sub interface_exists($);
sub get_kstats();
sub print_kstats();

# process args

my $argc = scalar(@ARGV);
my @interfaces = ();
my $fmt = "%-10s %-10u %-10u %-10u %-10u %-10u
";

if ($argc < 2) {
  print "$USAGE
";
  exit 1;
} elsif  ( !($ARGV[-1] =~ /^d+$/) ) {
   print "$USAGE
";
   print "   interval must be an integer.
";
   exit 1;
}

# get kstat chain a la kstat_open()
my $kstat = Sun::Solaris::Kstat->new();

# Check for interfaces
foreach my $interface (@ARGV[-($argc)..-2]) {
  my $iface;
  if(! ($iface = interface_exists($interface)) ){
    print "$USAGE
";
    print "   interface $interface not found.
";
    exit 1;
  }
  push @interfaces, $iface;
}

my $interval = $ARGV[-1];
# print header
print "           input                output     
";
print "    packets    errs       packets    errs       colls
";

# loop forever printing stats
while(1) {
  get_kstats();
  print_kstats();
  sleep($interval);
  $kstat->update();
}
#############
# Subroutines
#############

# search for the first kstat with given name
sub interface_exists($) {
  my ($name) = @_;
  my ($mod, $inst) = $name =~ /^(.+?)(d+)$/;
  return(exists($kstat->{$mod}{$inst}{$name})
         ? { module => $mod, instance => $inst, name => $name }
         : undef);
}

                       # get kstats for given interface
sub get_kstats() {
  my (@statnames) = ('ipackets','ierrors','opackets',
    'oerrors','collisions'),
  my ($m, $i, $n);
  foreach my $interface (@interfaces) {
    $m = $interface->{module};
    $i = $interface->{instance};
    $n = $interface->{name};
    foreach my $statname (@statnames) {
      my $stat = $kstat->{$m}{$i}{$n}{$statname};
      die "kstat not found: $m:$i:$n:$statname" unless defined $stat;
      my $begin_stat = "b_" . $statname; # name of first sample
      if(exists $interface->{$begin_stat}) {
        $interface->{$statname} = $stat -
          $interface->{$begin_stat};
      }else { # save first sample to calculate deltas
        $interface->{$statname} =  $stat;
        $interface->{$begin_stat} = $stat;
      }
    }
  }

}

 # print out formatted information a la netstat
 sub print_kstats() {
   foreach my $i (@interfaces) {
     printf($fmt,$i->{name},$i->{ipackets},$i->{ierrors},
       $i->{opackets},$i->{oerrors},$i->{collisions});
   }
 }

In the subroutine interface_exists(), you cache the members of the key if an entry is found. This way, you need not do another search in get_kstats(). You could fairly easily modify the script to display all network interfaces on the system (rather than take command-line arguments) and use the update() method to discover if interfaces are added or removed from the system (with ifconfig, for example). This exercise is left up to you.

Snooping a Program’s kstat Use with DTrace

Using DTrace, it is possible to examine the kstat instances that a program uses. The following DTrace script shows how this could be done.

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        printf("%-16s %-16s %-6s %s
",
            "CMD", "CLASS", "TYPE", "MOD:INS:NAME");
}

fbt::read_kstat_data:entry
{
        self->uk = (kstat_t *)copyin((uintptr_t)arg1, sizeof (kstat_t));
        printf("%-16s %-16s %-6s %s:%d:%s
", execname, self->uk->ks_class,
            self->uk->ks_type == 0 ? "raw"
            :  self->uk->ks_type == 1 ? "named"
            :  self->uk->ks_type == 2 ? "intr"
            :  self->uk->ks_type == 3 ? "io"
            :  self->uk->ks_type == 4 ? "timer" : "?",
            self->uk->ks_module, self->uk->ks_instance, self->uk->ks_name);
}

When we run the DTrace script above, it prints out the commands and their use of kstat.

# kstat_types.d
CMD              CLASS            TYPE   MOD:INS:NAME
vmstat           misc             named  cpu_info:0:cpu_info0
vmstat           misc             named  cpu:0:vm
vmstat           misc             named  cpu:0:sys
vmstat           disk             io     cmdk:0:cmdk0
vmstat           disk             io     sd:0:sd0
vmstat           misc             raw    unix:0:sysinfo
vmstat           vm               raw    unix:0:vminfo
vmstat           misc             named  unix:0:dnlcstats
vmstat           misc             named  unix:0:system_misc

Adding Statistics to the Solaris Kernel

The kstat mechanism provides lightweight statistics that are a stable part of kernel code. The kstat interface can provide standard information that would be reported from a user-level tool. For example, if you wanted to add your own device driver I/O statistics into the statistics pool reported by the iostat command, you would add a kstat provider.

The statistics reported by vmstat, iostat, and most of the other Solaris tools are gathered by a central kernel statistics subsystem, known as “kstat.” The kstat facility is an all-purpose interface for collecting and reporting named and typed data.

A typical scenario will have a kstat producer and a kstat reader. The kstat reader is a utility in user mode that reads, potentially aggregates, and then reports the results. For example, the vmstat utility is a kstat reader that aggregates statistics provided by the vm system in the kernel.

Statistics are named and accessed by a four-tuple: class, module, name, instance. Solaris 8 introduced a new method to access kstat information from the command line or in custom-written scripts. You can use the command-line tool /usr/bin/kstat interactively to print all or selected kstat information from a system. This program is written in the Perl language, and you can use the Perl XS extension module to write your own custom Perl programs. Both facilities are documented in the pages of the Perl online manual.

A kstat Provider Walkthrough

To add your own statistics to your Solaris kernel, you need to create a kstat provider, which consists of an initialization function to create the statistics group and then create a callback function that updates the statistics before they are read. The callback function is often used to aggregate or summarize information before it is reported to the reader. The kstat provider interface is defined in kstat(3KSTAT) and kstat(9S). More verbose information can be found in usr/ src/uts/common/sys/kstat.h.

The first step is to decide on the type of information you want to export. The two primary types are RAW and NAMED or IO. The RAW interface exports raw C data structures to userland; its use is strongly discouraged, since a change in the C structure will cause incompatibilities in the reader. The NAMED mechanisms are preferred since the data is typed and extensible. Both the NAMED and IO types use typed data.

The NAMED type provides single or multiple records of data and is the most common choice. The IO record provides I/O statistics only. It is collected and reported by the iostat command and therefore should be used only for items that can be viewed and reported as I/O devices (we do this currently for I/O devices and NFS file systems).

A simple example of NAMED statistics is the virtual memory summaries provided by system_pages.

$ kstat -n system_pages
module: unix                            instance: 0
name:   system_pages                    class:    pages
        availrmem                       343567
        crtime                          0
        desfree                         4001
        desscan                         25
        econtig                         4278190080
        fastscan                        256068
        freemem                         248309
        kernelbase                      3556769792
        lotsfree                        8002
        minfree                         2000
        nalloc                          11957763
        nalloc_calls                    9981
        nfree                           11856636
        nfree_calls                     6689
        nscan                           0
        pagesfree                       248309
        pageslocked                     168569
        pagestotal                      512136
        physmem                         522272
        pp_kernel                       64102
        slowscan                        100
        snaptime                        6573953.83957897

These are first declared and initialized by the following C structs in usr/src/ uts/common/os/kstat_fr.c.

struct {

        kstat_named_t physmem;
        kstat_named_t nalloc;
        kstat_named_t nfree;
        kstat_named_t nalloc_calls;
        kstat_named_t nfree_calls;
        kstat_named_t kernelbase;
        kstat_named_t econtig;
        kstat_named_t freemem;
        kstat_named_t availrmem;
        kstat_named_t lotsfree;
        kstat_named_t desfree;
        kstat_named_t minfree;
        kstat_named_t fastscan;
        kstat_named_t slowscan;
        kstat_named_t nscan;
        kstat_named_t desscan;
        kstat_named_t pp_kernel;
        kstat_named_t pagesfree;
        kstat_named_t pageslocked;
        kstat_named_t pagestotal;
} system_pages_kstat = {

        { "physmem",             KSTAT_DATA_ULONG },
        { "nalloc",              KSTAT_DATA_ULONG },
        { "nfree",               KSTAT_DATA_ULONG },
        { "nalloc_calls",        KSTAT_DATA_ULONG },
        { "nfree_calls",         KSTAT_DATA_ULONG },
        { "kernelbase",          KSTAT_DATA_ULONG },
        { "econtig",             KSTAT_DATA_ULONG },
        { "freemem",             KSTAT_DATA_ULONG },
        { "availrmem",           KSTAT_DATA_ULONG },
        { "lotsfree",            KSTAT_DATA_ULONG },
        { "desfree",             KSTAT_DATA_ULONG },
        { "minfree",             KSTAT_DATA_ULONG },
        { "fastscan",            KSTAT_DATA_ULONG },
        { "slowscan",            KSTAT_DATA_ULONG },
        { "nscan",               KSTAT_DATA_ULONG },
        { "desscan",             KSTAT_DATA_ULONG },
        { "pp_kernel",           KSTAT_DATA_ULONG },
        { "pagesfree",           KSTAT_DATA_ULONG },
        { "pageslocked",         KSTAT_DATA_ULONG },
        { "pagestotal",          KSTAT_DATA_ULONG },
};

These statistics are the simplest type, merely a basic list of 64-bit variables. Once declared, the kstats are registered with the subsystem.

static int system_pages_kstat_update(kstat_t *, int);

...

        kstat_t *ksp;

        ksp = kstat_create("unix", 0, "system_pages", "pages", KSTAT_TYPE_NAMED,
                sizeof (system_pages_kstat) / sizeof (kstat_named_t),
                KSTAT_FLAG_VIRTUAL);
        if (ksp) {
                ksp->ks_data = (void *) &system_pages_kstat;
                ksp->ks_update = system_pages_kstat_update;
                kstat_install(ksp);
        }

...

The kstat create function takes the 4-tuple description and the size of the kstat and provides a handle to the created kstats. The handle is then updated to include a pointer to the data and a callback function which will be invoked when the user reads the statistics.

The callback function when invoked has the task of updating the data structure pointed to by ks_data. If you choose not to update, simply set the callback function to default_kstat_update(). The system pages kstat preamble looks like this:

static int
system_pages_kstat_update(kstat_t *ksp, int rw)
{

        if (rw == KSTAT_WRITE) {
                return (EACCES);
        }

This basic preamble checks to see if the user code is trying to read or write the structure. (Yes, it’s possible to write to some statistics if the provider allows it.) Once basic checks are done, the update callback simply stores the statistics into the predefined data structure, and then returns.

...
        system_pages_kstat.freemem.value.ul     = (ulong_t)freemem;
        system_pages_kstat.availrmem.value.ul   = (ulong_t)availrmem;
        system_pages_kstat.lotsfree.value.ul    = (ulong_t)lotsfree;
        system_pages_kstat.desfree.value.ul     = (ulong_t)desfree;
        system_pages_kstat.minfree.value.ul     = (ulong_t)minfree;
        system_pages_kstat.fastscan.value.ul    = (ulong_t)fastscan;
        system_pages_kstat.slowscan.value.ul    = (ulong_t)slowscan;
        system_pages_kstat.nscan.value.ul       = (ulong_t)nscan;
        system_pages_kstat.desscan.value.ul     = (ulong_t)desscan;
        system_pages_kstat.pagesfree.value.ul   = (ulong_t)freemem;
...

        return (0);
}

That’s it for a basic named kstat.

I/O Statistics

In this section, we can see an example of how I/O stats are measured and recorded. As discussed in Section 11.1.3.5, there is special type of kstat for I/O statistics.

I/O devices are measured as a queue, using Reimann Sum—which is a count of the visits to the queue and a sum of the “active” time. These two metrics can be used to determine the average service time and I/O counts for the device. There are typically two queues for each device, the wait queue and the active queue. This represents the time spent after the request has been accepted and enqueued, and then the time spent active on the device.

An I/O device driver has a similar declare and create section, as we saw with the NAMED statistics. For instance, the floppy disk device driver (usr/src/uts/sun/io/fd.c) shows kstat_create() in the device driver attach function.

static int
fd_attach(dev_info_t *dip, ddi_attach_cmd_t cmd)
{
...
        fdc->c_un->un_iostat = kstat_create("fd", 0, "fd0", "disk",
            KSTAT_TYPE_IO, 1, KSTAT_FLAG_PERSISTENT);
        if (fdc->c_un->un_iostat) {
                fdc->c_un->un_iostat->ks_lock = &fdc->c_lolock;
                kstat_install(fdc->c_un->un_iostat);
        }
...
}

The per-I/O statistics are updated when the device driver strategy function and the location where the I/O is first received and queued. At this point, the I/O is marked as waiting on the wait queue.

#define KIOSP    KSTAT_IO_PTR(un->un_iostat)

static int
fd_strategy(register struct buf *bp)
{
        struct fdctlr *fdc;
        struct fdunit *un;

        fdc = fd_getctlr(bp->b_edev);
        un = fdc->c_un;
...
        /* Mark I/O as waiting on wait q */
        if (un->un_iostat) {
                kstat_waitq_enter(KIOSP);
        }

...
}

The I/O spends some time on the wait queue until the device is able to process the request. For each I/O the fdstart() routine moves the I/O from the wait queue to the run queue with the kstat_waitq_to_runq() function.

static void
fdstart(struct fdctlr *fdc)
{

...
                /* Mark I/O as active, move from wait to active q */
                if (un->un_iostat) {
                        kstat_waitq_to_runq(Kiosp);
                }
...

                /* Do I/O... */
...

When the I/O is complete (still in the fdstart() function), it is marked with kstat_runq_exit() as leaving the active queue. This updates the last part of the statistic, leaving us with the number of I/Os and the total time spent on each queue.

                /* Mark I/O as complete */
                if (un->un_iostat) {
                        if (bp->b_flags & B_READ) {
                                KIOSP->reads++;
                                KIOSP->nread +=
                                        (bp->b_bcount - bp->b_resid);
                        } else {
                                KIOSP->writes++;
                                KIOSP->nwritten += (bp->b_bcount - bp->b_resid);
                        }
                         kstat_runq_exit(KIOSP);
                }
                biodone(bp);

...

}

These statistics provide us with our familiar metrics, where actv is the average length of the queue of active I/Os and asvc_t is the average service time in the device. The wait queue is represented accordingly with wait and wsvc_t.

$ iostat -xn 10
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    1.2    0.1    9.2    1.1  0.1  0.5    0.1   10.4   1   1 fd0

Additional Information

Much of the information in this chapter derives from various SunSolve InfoDocs, Solaris white papers, and Solaris man pages (section 3KSTAT). For detailed information on the APIs, refer to the Solaris 8 Reference Manual Collection and Writing Device Drivers. Both publications are available at docs.sun.com.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.202.240