With contributions from Peter Boothby
The Solaris kernel provides a set of functions and data structures for device drivers and other kernel modules to export module-specific statistics to the outside world. This infrastructure, referred to as kstat, provides the following to the Solaris software developer:
C-language functions for device drivers and other kernel modules to present statistics
C-language functions for applications to retrieve statistics data from Solaris without needing to directly read kernel memory
Perl-based command-line program /usr/bin/kstat
to access statistics data interactively or in shell scripts (introduced in Solaris 8)
Perl library interface for constructing custom performance-monitoring utilities
The Solaris libkstat
library contains the C-language functions for accessing kstats from an application. These functions utilize the pseudo-device /dev/kstat
to provide a secure interface to kernel data, obviating the need for programs that are setuid to root.
Since many developers are interested in accessing kernel statistics through C programs, this chapter focuses on libkstat
. The chapter explains the data structures and functions, and provides example code to get you started using the library.
Solaris kernel statistics are maintained in a linked list of structures referred to as the kstat chain. Each kstat has a common header section and a type-specific data section, as shown in Figure 11.1.
The chain is initialized at system boot time, but since Solaris is a dynamic operating system, this chain may change over time. Kstat entries can be added and removed from the system as needed by the kernel. For example, when you add an I/O board and all of its attached components to a running system by using Dynamic Reconfiguration, the device drivers and other kernel modules that interact with the new hardware will insert kstat entries into the chain.
The structure member ks_data
is a pointer to the kstat’s data section. Multiple data types are supported: raw, named, timer, interrupt, and I/O. These are explained in Section 11.1.3.
The following header contains the full kstat
header structure.
typedef struct kstat { /* * Fields relevant to both kernel and user */ hrtime_t ks_crtime; /* creation time */ struct kstat *ks_next; /* kstat chain linkage */ kid_t ks_kid; /* unique kstat ID */ char ks_module[KSTAT_STRLEN]; /* module name */ uchar_t ks_resv; /* reserved */ int ks_instance; /* module's instance */ char ks_name[KSTAT_STRLEN]; /* kstat name */ uchar_t ks_type; /* kstat data type */ char ks_class[KSTAT_STRLEN]; /* kstat class */ uchar_t ks_flags; /* kstat flags */ void *ks_data; /* kstat type-specific data */ uint_t ks_ndata; /* # of data records */ size_t ks_data_size; /* size of kstat data section */ hrtime_t ks_snaptime; /* time of last data snapshot */ /* * Fields relevant to kernel only */ int (*ks_update)(struct kstat *, int); void *ks_private; int (*ks_snapshot)(struct kstat *, void *, int); void *ks_lock; } kstat_t;
The significant members are described below.
ks_crtime
. This member reflects the time the kstat was created. Using the value, you can compute the rates of various counters since the kstat was created (“rate since boot” is replaced by the more general concept of “rate since kstat creation”).
All times associated with kstats, such as creation time, last snapshot time, kstat_timer_t
, kstat_io_t
timestamps, and the like, are 64-bit nanosecond values.
The accuracy of kstat timestamps is machine-dependent, but the precision (units) is the same across all platforms. Refer to the gethrtime(3C)
man page for general information about high-resolution timestamps.
ks_next
. kstats are stored as a NULL-terminated linked list or a chain.ks_next
points to the next kstat in the chain.
ks_kid
. This member is a unique identifier for the kstat.
ks_module
and ks_instance
. These members contain the name and instance of the module that created the kstat. In cases where there can only be one instance, ks_instance
is 0. Refer to Section 11.1.4 for more information.
ks_name
. This member gives a meaningful name to a kstat. For additional kstat namespace information, see Section 11.1.4.
ks_type
. This member identifies the type of data in this kstat. Kstat data types are covered in Section 11.1.3.
ks_class
. Each kstat can be characterized as belonging to some broad class of statistics, such as bus, disk, net, vm, or misc. This field can be used as a filter to extract related kstats.
The following values are currently in use by Solaris:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ks_data
, ks_ndata
, and ks_data_size
. ks_data
is a pointer to the kstat’s data section. The type of data stored there depends on ks_type
. ks_ndata
indicates the number of data records. Only some kstat types support multiple data records. The following kstats support multiple data records.
KSTAT_TYPE_RAW
KSTAT_TYPE_NAMED
KSTAT_TYPE_TIMER
The following kstats support only one data record:
KSTAT_TYPE_INTR
KSTAT_TYPE_IO
ks_data_size
is the total size of the data section, in bytes.
ks_snaptime
. Timestamp for the last data snapshot. With it, you can compute activity rates based on the following computational method:
rate = (new_count -
old_count) / (new_snaptime – old_snaptime)
To use kstats, a program must first call kstat_open()
, which returns a pointer to a kstat
control structure. The following header shows the structure members.
typedef struct kstat_ctl { kid_t kc_chain_id; /* current kstat chain ID */ kstat_t *kc_chain; /* pointer to kstat chain */ int kc_kd; /* /dev/kstat descriptor */ } kstat_ctl_t;
kc_chain
points to the head of your copy of the kstat chain. You typically walk the chain or use kstat_lookup()
to find and process a particular kind of kstat.kc_chain_id
is the kstat chain identifier, or KCID, of your copy of the kstat chain. Its use is explained in Section 11.1.4.
To avoid unnecessary overhead in accessing kstat data, a program first searches the kstat chain for the type of information of interest, then uses the kstat_read()
and kstat_data_lookup()
functions to get the statistics data from the kernel.
The following code fragment shows how you might print out all kstat entries with information about disk I/O. It traverses the entire chain looking for kstats of ks_type KSTAT_TYPE_IO
, calls kstat_read()
to retrieve the data, and then processes the data with my_io_display()
. How to implement this sample function is shown in <ref>.
kstat_ctl_t *kc; kstat_t *ksp; kstat_io_t kio; kc = kstat_open(); for (ksp = kc->kc_chain; ksp != NULL; ksp = ksp->ks_next) { if (ksp->ks_type == KSTAT_TYPE_IO) { kstat_read(kc, ksp, &kio); my_io_display(kio); } }
The data section of a kstat can hold one of five types, identified in the ks_type
field. The following kstat types can hold multiple records. The number of records is held in ks_ndata
.
KSTAT_TYPE_RAW
KSTAT_TYPE_NAMED
KSTAT_TYPE_TIMER
The other two types are KSTATE_TYPE_INTR
and KSTATE_TYPE_IO
. The field ks_data_size
holds the size, in bytes, of the entire data section.
The “raw” kstat type is treated as an array of bytes and is generally used to export well-known structures, such as vminfo
(defined in /usr/include/sys/sysinfo.h
). The following example shows one method of printing this information.
static void print_vminfo(kstat_t *kp) { vminfo_t *vminfop; vminfop = (vminfo_t *)(kp->ks_data); printf("Free memory: %dn", vminfop->freemem); printf("Swap reserved: %dn" , vminfop->swap_resv); printf("Swap allocated: %dn" , vminfop->swap_alloc); printf("Swap available: %dn", vminfop->swap_avail); printf("Swap free: %dn", vminfop->swap_free); }
This type of kstat contains a list of arbitrary name=value statistics. The following example shows the data structure used to hold named kstats.
typedef struct kstat_named {
char name[KSTAT_STRLEN]; /* name of counter */
uchar_t data_type; /* data type */
union {
char c[16]; /* enough for 128-bit ints */
int32_t i32;
uint32_t ui32;
struct {
union {
char *ptr; /* NULL-term string */
#if defined(_KERNEL) && defined(_MULTI_DATAMODEL)
caddr32_t ptr32;
#endif
char __pad[8]; /* 64-bit padding */
} addr;
uint32_t len; /* # bytes for strlen + ' ' */
} str;
#if defined(_INT64_TYPE)
int64_t i64;
uint64_t ui64;
#endif
long l;
ulong_t ul;
/* These structure members are obsolete */
longlong_t ll;
u_longlong_t ull;
float f;
double d;
} value; /* value of counter */
} kstat_named_t;
#define KSTAT_DATA_CHAR 0
#define KSTAT_DATA_INT32 1
#define KSTAT_DATA_UINT32 2
#define KSTAT_DATA_INT64 3
#define KSTAT_DATA_UINT64 4
#if !defined(_LP64)
#define KSTAT_DATA_LONG KSTAT_DATA_INT32
#define KSTAT_DATA_ULONG KSTAT_DATA_UINT32
#else
#if !defined(_KERNEL)
#define KSTAT_DATA_LONG KSTAT_DATA_INT64
#define KSTAT_DATA_ULONG KSTAT_DATA_UINT64
#else
#define KSTAT_DATA_LONG 7 /* only visible to the kernel */
#define KSTAT_DATA_ULONG 8 /* only visible to the kernel */
#endif /* !_KERNEL */
#endif /* !_LP64 */
See sys/kstat.h
The program in the above example uses a function my_named_display()
to show how one might display named kstats.
Note that if the type is KSTAT_DATA_CHAR
, the 16-byte value field is not guaranteed to be null-terminated. This is important to remember when you are printing the value with functions like printf()
.
This kstat holds event timer statistics. These provide basic counting and timing information for any type of event.
typedef struct kstat_timer {
char name[KSTAT_STRLEN]; /* event name */
uchar_t resv; /* reserved */
u_longlong_t num_events; /* number of events */
hrtime_t elapsed_time; /* cumulative elapsed time */
hrtime_t min_time; /* shortest event duration */
hrtime_t max_time; /* longest event duration */
hrtime_t start_time; /* previous event start time */
hrtime_t stop_time; /* previous event stop time */
} kstat_timer_t;
See sys/kstat.h
This type of kstat holds interrupt statistics. Interrupts are categorized as listed in Table 11.1 and as shown below the table.
Table 11.1. Types of Interrupt Kstats
Interrupt Type | Definition |
---|---|
Hard | Sourced from the hardware device itself |
Soft | Induced by the system by means of some system interrupt source |
Watchdog | Induced by a periodic timer call |
Spurious | An interrupt entry point was entered but there was no interrupt to service |
Multiple Service | An interrupt was detected and serviced just before returning from any of the other types |
#define KSTAT_INTR_HARD 0
#define KSTAT_INTR_SOFT 1
#define KSTAT_INTR_WATCHDOG 2
#define KSTAT_INTR_SPURIOUS 3
#define KSTAT_INTR_MULTSVC 4
#define KSTAT_NUM_INTRS 5
typedef struct kstat_intr {
uint_t intrs[KSTAT_NUM_INTRS]; /* interrupt counters */
} kstat_intr_t;
See sys/kstat.h
This kstat counts I/O’s for statistical analysis.
typedef struct kstat_io {
/*
* Basic counters.
*/
u_longlong_t nread; /* number of bytes read */
u_longlong_t nwritten; /* number of bytes written */
uint_t reads; /* number of read operations */
uint_t writes; /* number of write operations */
/*
* Accumulated time and queue length statistics.
*/
hrtime_t wtime; /* cumulative wait (pre-service) time */
hrtime_t wlentime; /* cumulative wait length*time product*/
hrtime_t wlastupdate; /* last time wait queue changed */
hrtime_t rtime; /* cumulative run (service) time */
hrtime_t rlentime; /* cumulative run length*time product */
hrtime_t rlastupdate; /* last time run queue changed */
uint_t wcnt; /* count of elements in wait state */
uint_t rcnt; /* count of elements in run state */
} kstat_io_t;
See sys/kstat.h
Time statistics are kept as a running sum of “active” time. Queue length statistics are kept as a running sum of the product of queue length and elapsed time at that length. That is, a Riemann sum for queue length integrated against time. Figure 11.2 illustrates a sample graphical representation of queue vs. time.
At each change of state (either an entry or exit from the queue), the elapsed time since the previous state change is added to the active time (wlen
or rlen
fields) if the queue length was non-zero during that interval.
The product of the elapsed time and the queue length is added to the running sum of the length (wlentime
or rlentime
fields) multiplied by the time.
Stated programmatically:
if (queue length != 0) { time += elapsed time since last state change; lentime += (elapsed time since last state change * queue length); }
You can generalize this method to measure residency in any defined system. Instead of queue lengths, think of “outstanding RPC calls to server X.”
A large number of I/O subsystems have at least two basic lists of transactions they manage:
A list for transactions that have been accepted for processing but for which processing has yet to begin
A list for transactions that are actively being processed but that are not complete
For these reasons, two cumulative time statistics are defined:
Pre-service (wait) time
Service (run) time
The units of cumulative busy time are accumulated nanoseconds.
The kstat namespace is defined by three fields from the kstat
structure:
ks_module
ks_instance
ks_name
The combination of these three fields is guaranteed to be unique.
For example, imagine a system with four FastEthernet interfaces. The device driver module for Sun’s FastEthernet controller is called “hme”
. The first Ethernet interface would be instance 0, the second instance 1, and so on. The “hme”
driver provides two types of kstat for each interface. The first contains named kstats with performance statistics. The second contains interrupt statistics.
The kstat data for the first interface’s network statistics is found under ks_module == “hme”
, ks_instance == 0
, and ks_name == “hme0”
. The interrupt statistics are contained in a kstat identified by ks_module == “hme”
, ks_instance == 0
, and ks_name == “hmec0”
.
In that example, the combination of module name and instance number to make the ks_name
field (“hme0”
and “hmec0”
) is simply a convention for this driver. Other drivers may use similar naming conventions to publish multiple kstat data types but are not required to do so; the module is required to make sure that the combination is unique.
How do you determine what kstats the kernel provides? One of the easiest ways with Solaris 8 is to run /usr/bin/kstat
with no arguments. This command prints nearly all the current kstat data. The Solaris kstat
command can dump most of the known kstats of type KSTAT_TYPE_RAW
.
The following functions are available to C programs for accessing kstat data from user programs:
kstat_ctl_t * kstat_open(void); Initializes a kstat control structure to provide access to the kernel statistics library. It returns a pointer to this structure, which must be supplied as the kc argu- ment in subsequent libkstat function calls. kstat_t * kstat_lookup(kstat_ctl_t *kc, char *ks_module, int ks_instance, char *ks_name); Traverses the kstat chain searching for a kstat with a given ks_module, ks_instance, and ks_name fields. If the ks_module is NULL, ks_instance is -1, or if ks_name is NULL, then those fields are ignored in the search. For example, kstat_lookup(kc, NULL, -1, "foo") simply finds the first kstat with the name "foo". void * kstat_data_lookup(kstat_t *ksp, char *name); Searches the kstat's data section for the record with the specified name. This operation is valid only for kstat types that have named data records. Currently, only the KSTAT_ TYPE_NAMED and KSTAT_TYPE_TIMER kstats have named data records. You must first call kstat_read() to get the data from the kernel. This routine then finds a particular record in the data section. kid_t kstat_read(kstat_ctl_t *kc, kstat_t *ksp, void *buf); Gets data from the kernel for a particular kstat. kid_t kstat_write(kstat_ctl_t *kc, kstat_t *ksp, void *buf); Writes data to a particular kstat in the kernel. Only the superuser can use kstat_ write(). kid_t kstat_chain_update(kstat_ctl_t *kc); Synchronizes the user's kstat header chain with that of the kernel. int kstat_close(kstat_ctl_t *kc); Frees all resources that were associated with the kstat control structure. This is done automatically on exit(2) and execve(). (For more information on exit(2) and execve(), see the exec(2) man page.)
Recall that the kstat chain is dynamic in nature. The libkstat
library function kstat_open()
returns a copy of the kernel’s kstat chain. Since the content of the kernel’s chain may change, your program should call the kstat_chain_update()
function at the appropriate times to see if its private copy of the chain is the same as the kernel’s. This is the purpose of the KCID (stored in kc_chain_id
in the kstat
control structure).
Each time a kernel module adds or removes a kstat from the system’s chain, the KCID is incremented. When your program calls kstat_chain_update()
, the function checks to see if the kc_chain_id
in your program’s control structure matches the kernel’s. If not, kc_chain_update()
rebuilds your program’s local kstat chain and returns the following:
The new KCID if the chain has been updated
0
if no change has been made
-1
if some error was detected
If your program has cached some local data from previous calls to the kstat library, then a new KCID acts as a flag to indicate that you have up-to-date information. You can search the chain again to see if data that your program is interested in has been added or removed.
A practical example is the system command iostat
. It caches some internal data about the disks in the system and needs to recognize that a disk has been brought on-line or off-line. If iostat
is called with an interval argument, it prints I/O statistics every interval second. Each time through the loop, it calls kstat_chain_update()
to see if something has changed. If a change took place, it figures out if a device of interest has been added or removed.
Your C source file must contain:
#include <kstat.h>
When your program is linked, the compiler command line must include the argument -lkstat
.
$ cc -o print_some_kstats -lkstat print_some_kstats.c
The following is a short example program. First, it uses kstat_lookup()
and kstat_read()
to find the system’s CPU speed. Then it goes into an infinite loop to print a small amount of information about all kstats of type KSTAT_TYPE_IO
. Note that at the top of the loop, it calls kstat_chain_update()
to check that you have current data. If the kstat chain has changed, the program sends a short message on stderr.
/* print_some_kstats.c: * print out a couple of interesting things */ #include <kstat.h> #include <stdio.h> #include <inttypes.h> #define SLEEPTIME 10 void my_named_display(char *, char *, kstat_named_t *); void my_io_display(char *, char *, kstat_io_t); main(int argc, char **argv) { kstat_ctl_t *kc; kstat_t *ksp; kstat_io_t kio; kstat_named_t *knp; kc = kstat_open(); /* * Print out the CPU speed. We make two assumptions here: * 1) All CPUs are the same speed, so we'll just search for the * first one; * 2) At least one CPU is online, so our search will always * find something. :) */ ksp = kstat_lookup(kc, "cpu_info", -1, NULL); kstat_read(kc, ksp, NULL); /* lookup the CPU speed data record */ knp = kstat_data_lookup(ksp, "clock_MHz"); printf("CPU speed of system is "); my_named_display(ksp->ks_name, ksp->ks_class, knp); printf("n"); /* dump some info about all I/O kstats every SLEEPTIME seconds */ while(1) { /* make sure we have current data */ if(kstat_chain_update(kc)) fprintf(stderr, "<<State Changed>>n"); for (ksp = kc->kc_chain; ksp != NULL; ksp = ksp->ks_next) { if (ksp->ks_type == KSTAT_TYPE_IO) { kstat_read(kc, ksp, &kio); my_io_display(ksp->ks_name, ksp->ks_class, kio); } } sleep(SLEEPTIME); } /* while(1) */ } void my_io_display(char *devname, char *class, kstat_io_t k) { printf("Name: %s Class: %sn",devname,class); printf("tnumber of bytes read %lldn", k.nread); printf("tnumber of bytes written %lldn", k.nwritten); printf("tnumber of read operations %dn", k.reads); printf("tnumber of write operations %dnn", k.writes); } void my_named_display(char *devname, char *class, kstat_named_t *knp) { switch(knp->data_type) { case KSTAT_DATA_CHAR: printf("%.16s",knp->value.c); break; case KSTAT_DATA_INT32: printf("%" PRId32,knp->value.i32); break; case KSTAT_DATA_UINT32: printf("%" PRIu32,knp->value.ui32); break; case KSTAT_DATA_INT64: printf("%" PRId64,knp->value.i64); break; case KSTAT_DATA_UINT64: printf("%" PRIu64,knp->value.ui64); } }
In this section, we explain tools with which you access kstat information with shell scripts. Included are a few examples to introduce the kstat(1m)
program and the Perl language module it uses to extract kernel statistics.
The Solaris 8 OS introduced a new method to access kstat information from the command line or in custom-written scripts. You can use the command-line tool /usr/ bin/kstat
interactively to print all or selected kstat information from a system. This program is written in the Perl language, and you can use the Perl XS extension module to write your own custom Perl programs. Both facilities are documented in the pages of the online manual.
You can invoke the kstat
command on the command line or within shell scripts to selectively extract kernel statistics. Like many other Solaris OS commands, kstat
takes optional interval and count arguments for repetitive, periodic output. Its command options are quite flexible.
The first form follows standard UNIX command-line syntax, and the second form provides a way to pass some of the arguments as colon-separated fields. Both forms offer the same functionality. Each of the module, instance, name, or statistic specifiers may be a shell glob pattern or a Perl regular expression enclosed by “/” characters. You can use both specifier types within a single operand. Leaving a specifier empty is equivalent to using the “*” glob pattern for that specifier. Running kstat
with no arguments will print out nearly all kstat entries from the running kernel (most, but not all kstats of KSTAT_TYPE_RAW
are decoded).
The tests specified by the options are logically ANDed, and all matching kstats are selected. The argument for the -c
, -i
, -m
, -n
, and -s
options can be specified as a shell glob pattern, or a Perl regular expression enclosed in “/” characters.
If you pass a regular expression containing shell metacharacters to the command, you must protect it from the shell by enclosing it with the appropriate quotation marks. For example, to show all kstats that have a statistics name beginning with intr
in the module name cpu_stat
, you could use the following script:
$ kstat -p -m cpu_stat -s 'intr*'
cpu_stat:0:cpu_stat0:intr 878951000
cpu_stat:0:cpu_stat0:intrblk 21604
cpu_stat:0:cpu_stat0:intrthread 668353070
cpu_stat:1:cpu_stat1:intr 211041358
cpu_stat:1:cpu_stat1:intrblk 280
cpu_stat:1:cpu_stat1:intrthread 209879640
The -p
option used in the preceding example displays output in a parsable format. If you do not specify this option, kstat
produces output in a human-readable, tabular format. In the following example, we leave out the -p
flag and use the module:instance:name:statistic argument form and a Perl regular expression.
$ $ kstat cpu_stat:::/^intr/
module: cpu_stat instance: 0
name: cpu_stat0 class: misc
intr 879131909
intrblk 21608
intrthread 668490486
module: cpu_stat instance: 1
name: cpu_stat1 class: misc
intr 211084960
intrblk 280
intrthread 209923001
Sometimes you may just want to test for the existence of a kstat entry. You can use the -q
flag, which returns the appropriate exit status for matches against given criteria. The exit codes are as follows:
0
: One or more statistics were matched.
1
: No statistics were matched.
2
: Invalid command-line options were specified.
3
: A fatal error occurred.
Suppose that you have a Bourne shell script gathering network statistics, and you want to see if the NFS server is configured. You might create a script such as the one in the following example.
#!/bin/sh # ... do some stuff # Check for NFS server kstat -q nfs::nfs_server: if [ $? = 0 ]; then echo "NFS Server configured" else echo "No NFS Server configured" fi # ... do some more stuff exit 0
If you are adept at writing shell scripts with editing tools like sed
or awk
, here is a simple example to create a network statistics utility with kstats.
The /usr/bin/netstat
command has a command-line option -I
interface by which you can to print out statistics about a particular network interface. Optionally, netstat
takes an interval argument to print out the statistics every interval seconds. The following example illustrates that option.
$ netstat -I qfe0 5
input qfe0 output input (Total) output
packets errs packets errs colls packets errs packets errs colls
2971681 0 1920781 0 0 11198281 0 10147381 0 0
9 0 7 0 0 31 0 29 0 0
4 0 5 0 0 24 0 25 0 0
...
Unfortunately, this command accepts only one -I
flag argument. What if you want to print statistics about multiple interfaces simultaneously, similar to what iostat
does for disks? You could devise a Bourne shell script using kstat
and nawk
to provide this functionality. You want your output to look like the following example.
$ netstatMulti.sh ge0 ge2 ge1 5
input output
packets errs packets errs colls
ge0 111702738 10 82259260 0 0
ge2 28475869 0 61288614 0 0
ge1 25542766 4 55587276 0 0
ge0 1638 0 1075 0 0
ge2 518 0 460 0 0
ge1 866 0 7688 0 0
...
The next example is the statistics script. Note that extracting the kstat information is simple, and most of the work goes into parsing and formatting the output. The script uses kstat -q
to check the user’s arguments for valid interface names and then passes a list of formatted module:instance:name:statistic arguments to kstat
before piping the output to nawk
#!/bin/sh # netstatMulti.sh: print out netstat-like stats for # multiple interfaces # using /usr/bin/kstat and nawk USAGE="$0: interface_name ... interval" INTERFACES="" # args list for kstat while [ $# -gt 1 ] do kstat -q -c net ::$1: # test for valid interface # name if [ $? != 0 ]; then echo $USAGE echo " Interface $1 not found" exit 1 fi INTERFACES="$INTERFACES ::$1:" # add to list shift done interval=$1 # check interval arg for int if [ X`echo $interval | tr -d [0-9]` != X"" ]; then echo $USAGE exit 1 fi kstat -p $INTERFACES $interval | nawk ' function process_stat(STATNAME, VALUE) { found = 0 for(i=1;i<=5;i++) { if(STATNAME == FIELDS[i]) { found = 1 break } } if ( found == 0 ) return kstat = sprintf("%s:%s", iface, STATNAME) if(kstat in b_kstats) { kstats[kstat] = VALUE - b_kstats[kstat] } else { b_kstats[kstat] = VALUE kstats[kstat] = VALUE } } function print_stats() { printf("%-10s",iface) for(i=1;i<=5;i++) { kstat = sprintf("%s:%s",iface,FIELDS[i]) printf(FORMATS[i],kstats[kstat]) printf(" ") } print " " } BEGIN { print " input output " print " packets errs packets errs colls" split("ipackets,ierrors,opackets,oerrors,collisions", FIELDS,",") split("%-10u %-5u %-10u %-5u %-6u",FORMATS," ") } NF == 1 { if(iface) { print_stats() } split($0,t,":") iface = t[3] next } { split($1,stat,":") process_stat(stat[4], $2) }
The previous example illustrates how simple it is to extract the information you need from the kernel; however, it also shows how tedious it can be to format the output in a shell script. Fortunately, the Perl extension module that /usr/bin/ kstat
uses is documented so that you can write custom Perl programs. Because Perl is a “real programming language” and is ideally suited for text formatting, you can write solutions that are quite robust and comprehensive.
Access to kstats is made through a Perl extension in the XSUB interface module called Sun::Solaris::Kstat
. To access Solaris kernel statistics in a Perl program, you use Sun::Solaris::Kstat
; to import the module
The module contains two methods, new()
and update()
, correlating with the libkstat
C functions kstat_open()
and kstat_chain_update()
. The module provides kstat data through a tree of hashes based on a three-part key, consisting of the module, instance, and name (ks_module
, ks_instance
, and ks_name
are members of the C-language kstat struct
). Following is a synopsis.
Sun::Solaris::Kstat->new(); Sun::Solaris::Kstat->update(); Sun::Solaris::Kstat->{module}{instance}{name}{statistic}
The lowest-level “statistic” member of the hierarchy is a tied hash implemented in the XSUB module and holds the following elements from struct kstat
:
ks_crtime
. Creation time, which is presented as the statistic crtime
ks_snaptime
. Time of last data snapshot, which is presented as the statistic snaptime
ks_class
. The kstat class, which is presented as the statistic class
ks_data
. Kstat type-specific data decoded into individual statistics (the module produces one statistic per member of whatever structure is being decoded)
Because the module converts all kstat types, you need not worry about the different data structures for named and raw types. Most of the Solaris OS raw kstat entries are decoded by the module, giving you easy access to low-level data about things such as kernel memory allocation, swap, NFS performance, etc.
The update()
method updates all the statistics you have accessed so far and adds a bit of functionality on top of the libkstat kstat_chain_update()
function. If called in scalar context, it acts the same as kstat_chain_update()
. It returns 0
if the kstat chain has not changed and 1
if it has. However, if update()
is called in list context, it returns references to two arrays. The first array holds the keys of any kstats that have been added since the call to new()
or the last call to update()
; the second holds a list of entries that have been deleted. The entries in the arrays are strings of the form module:instance:name. This is useful for implementing programs that cache state information about devices, such as disks, that you can dynamically add or remove from a running system.
Once you access a kstat, it will always be read by subsequent calls to update()
. To stop it from being reread, you can clear the appropriate hash. For example:
$kstat->{$module}{$instance}{$name} = ();
At the time the kstat tied-hash interface was first released on the Solaris 8 OS, Perl 5 could not yet internally support 64-bit integers, so the kstat module approximates these values.
Timer. Values ks_crtime
and ks_snaptime
in struct kstat
are of type hrtime_t
, as are values of timer kstats and the wtime
, wlentime
, wlastupdate
, rtime
, rlentime
, and rlastupdate
fields of the kstat I/O statistics structures. This is a C-type definition used for the Solaris high-resolution timer, which is a 64-bit integer value. These fields are measured by the kstat facility in nanoseconds, meaning that a 32-bit value would represent approximately four seconds. The alternative is to store the values as floating-point numbers, which offer approximately 53 bits of precision on present hardware. You can store 64-bit intervals and timers as floating-point values expressed in seconds, meaning that this module rounds up time-related kstats to approximately microsecond resolution.
Counters. Because it is not useful to store these values as 32-bit values and because floating-point values offer 53 bits of precision, all 64-bit counters are also stored as floating-point values.
As in our first example, the following example shows a Perl program that gives the same output as obtained by calling /usr/sbin/psrinfo
without arguments.
#!/usr/bin/perl -w # psrinfo.perl: emulate the Solaris psrinfo command use strict; use Sun::Solaris::Kstat; my $kstat = Sun::Solaris::Kstat->new(); my $mh = $kstat->{cpu_info}; foreach my $cpu (keys(%$mh)) { my ($state, $when) = @{$kstat->{cpu_info}{$cpu} {"cpu_info".$cpu}}{qw(state state_begin)}; my ($sec,$min,$hour,$mday,$mon,$year) = (localtime($when))[0..5]; printf("%d %-8s since %.2d/%.2d/%.2d %.2d:%.2d:%.2d ", $cpu,$state,$mon + 1,$mday,$year - 100,$hour,$min,$sec); }
This program produces the following output:
$ psrinfo.perl
0 on-line since 07/09/01 08:29:00
1 on-line since 07/09/01 08:29:07
The psrinfo
command has a -v
(verbose) option that prints much more detail about the processors in the system. The output looks like the following example:
$ psrinfo -v
Status of processor 0 as of: 08/17/01 16:52:44
Processor has been on-line since 08/14/01 16:27:56.
The sparcv9 processor operates at 400 MHz,
and has a sparcv9 floating point processor.
Status of processor 1 as of: 08/17/01 16:52:44
Processor has been on-line since 08/14/01 16:28:03.
The sparcv9 processor operates at 400 MHz,
and has a sparcv9 floating point processor.
All the information in the psrinfo
command is accessible through the kstat
interface. As an exercise, try modifying the simple psrinfo.perl
example script to print the verbose information, as in this example.
The Perl script in the following example has the same function as our previous example (in Section 11.2.2 ) that used the kstat
and nawk
commands. Note that we have to implement our own search methods to find the kstat entries that we want to work with. Although this script is not shorter than our first example, it is certainly easier to extend with new functionality. Without much work, you could create a generic search method, similar to how /usr/bin/kstat
works, and import it into any Perl scripts that need to access Solaris kernel statistics.
#!/usr/bin/perl -w # netstatMulti.perl: print out netstat-like stats for multiple interfaces # using the kstat tied hash facility use strict; use Sun::Solaris::Kstat; my $USAGE = "usage: $0 ... interval"; ###### # Main ###### sub interface_exists($); sub get_kstats(); sub print_kstats(); # process args my $argc = scalar(@ARGV); my @interfaces = (); my $fmt = "%-10s %-10u %-10u %-10u %-10u %-10u "; if ($argc < 2) { print "$USAGE "; exit 1; } elsif ( !($ARGV[-1] =~ /^d+$/) ) { print "$USAGE "; print " interval must be an integer. "; exit 1; } # get kstat chain a la kstat_open() my $kstat = Sun::Solaris::Kstat->new(); # Check for interfaces foreach my $interface (@ARGV[-($argc)..-2]) { my $iface; if(! ($iface = interface_exists($interface)) ){ print "$USAGE "; print " interface $interface not found. "; exit 1; } push @interfaces, $iface; } my $interval = $ARGV[-1]; # print header print " input output "; print " packets errs packets errs colls "; # loop forever printing stats while(1) { get_kstats(); print_kstats(); sleep($interval); $kstat->update(); } ############# # Subroutines ############# # search for the first kstat with given name sub interface_exists($) { my ($name) = @_; my ($mod, $inst) = $name =~ /^(.+?)(d+)$/; return(exists($kstat->{$mod}{$inst}{$name}) ? { module => $mod, instance => $inst, name => $name } : undef); } # get kstats for given interface sub get_kstats() { my (@statnames) = ('ipackets','ierrors','opackets', 'oerrors','collisions'), my ($m, $i, $n); foreach my $interface (@interfaces) { $m = $interface->{module}; $i = $interface->{instance}; $n = $interface->{name}; foreach my $statname (@statnames) { my $stat = $kstat->{$m}{$i}{$n}{$statname}; die "kstat not found: $m:$i:$n:$statname" unless defined $stat; my $begin_stat = "b_" . $statname; # name of first sample if(exists $interface->{$begin_stat}) { $interface->{$statname} = $stat - $interface->{$begin_stat}; }else { # save first sample to calculate deltas $interface->{$statname} = $stat; $interface->{$begin_stat} = $stat; } } } } # print out formatted information a la netstat sub print_kstats() { foreach my $i (@interfaces) { printf($fmt,$i->{name},$i->{ipackets},$i->{ierrors}, $i->{opackets},$i->{oerrors},$i->{collisions}); } }
In the subroutine interface_exists()
, you cache the members of the key if an entry is found. This way, you need not do another search in get_kstats()
. You could fairly easily modify the script to display all network interfaces on the system (rather than take command-line arguments) and use the update()
method to discover if interfaces are added or removed from the system (with ifconfig
, for example). This exercise is left up to you.
Using DTrace, it is possible to examine the kstat
instances that a program uses. The following DTrace script shows how this could be done.
#!/usr/sbin/dtrace -s #pragma D option quiet dtrace:::BEGIN { printf("%-16s %-16s %-6s %s ", "CMD", "CLASS", "TYPE", "MOD:INS:NAME"); } fbt::read_kstat_data:entry { self->uk = (kstat_t *)copyin((uintptr_t)arg1, sizeof (kstat_t)); printf("%-16s %-16s %-6s %s:%d:%s ", execname, self->uk->ks_class, self->uk->ks_type == 0 ? "raw" : self->uk->ks_type == 1 ? "named" : self->uk->ks_type == 2 ? "intr" : self->uk->ks_type == 3 ? "io" : self->uk->ks_type == 4 ? "timer" : "?", self->uk->ks_module, self->uk->ks_instance, self->uk->ks_name); }
When we run the DTrace script above, it prints out the commands and their use of kstat
.
# kstat_types.d
CMD CLASS TYPE MOD:INS:NAME
vmstat misc named cpu_info:0:cpu_info0
vmstat misc named cpu:0:vm
vmstat misc named cpu:0:sys
vmstat disk io cmdk:0:cmdk0
vmstat disk io sd:0:sd0
vmstat misc raw unix:0:sysinfo
vmstat vm raw unix:0:vminfo
vmstat misc named unix:0:dnlcstats
vmstat misc named unix:0:system_misc
The kstat
mechanism provides lightweight statistics that are a stable part of kernel code. The kstat
interface can provide standard information that would be reported from a user-level tool. For example, if you wanted to add your own device driver I/O statistics into the statistics pool reported by the iostat
command, you would add a kstat
provider.
The statistics reported by vmstat
, iostat
, and most of the other Solaris tools are gathered by a central kernel statistics subsystem, known as “kstat.” The kstat
facility is an all-purpose interface for collecting and reporting named and typed data.
A typical scenario will have a kstat
producer and a kstat
reader. The kstat
reader is a utility in user mode that reads, potentially aggregates, and then reports the results. For example, the vmstat
utility is a kstat
reader that aggregates statistics provided by the vm system in the kernel.
Statistics are named and accessed by a four-tuple: class, module, name, instance. Solaris 8 introduced a new method to access kstat information from the command line or in custom-written scripts. You can use the command-line tool /usr/bin/kstat
interactively to print all or selected kstat information from a system. This program is written in the Perl language, and you can use the Perl XS extension module to write your own custom Perl programs. Both facilities are documented in the pages of the Perl online manual.
To add your own statistics to your Solaris kernel, you need to create a kstat
provider, which consists of an initialization function to create the statistics group and then create a callback function that updates the statistics before they are read. The callback function is often used to aggregate or summarize information before it is reported to the reader. The kstat
provider interface is defined in kstat(3KSTAT)
and kstat(9S)
. More verbose information can be found in usr/ src/uts/common/sys/kstat.h
.
The first step is to decide on the type of information you want to export. The two primary types are RAW and NAMED or IO. The RAW interface exports raw C data structures to userland; its use is strongly discouraged, since a change in the C structure will cause incompatibilities in the reader. The NAMED mechanisms are preferred since the data is typed and extensible. Both the NAMED and IO types use typed data.
The NAMED type provides single or multiple records of data and is the most common choice. The IO record provides I/O statistics only. It is collected and reported by the iostat
command and therefore should be used only for items that can be viewed and reported as I/O devices (we do this currently for I/O devices and NFS file systems).
A simple example of NAMED statistics is the virtual memory summaries provided by system_pages
.
$ kstat -n system_pages
module: unix instance: 0
name: system_pages class: pages
availrmem 343567
crtime 0
desfree 4001
desscan 25
econtig 4278190080
fastscan 256068
freemem 248309
kernelbase 3556769792
lotsfree 8002
minfree 2000
nalloc 11957763
nalloc_calls 9981
nfree 11856636
nfree_calls 6689
nscan 0
pagesfree 248309
pageslocked 168569
pagestotal 512136
physmem 522272
pp_kernel 64102
slowscan 100
snaptime 6573953.83957897
These are first declared and initialized by the following C structs in usr/src/ uts/common/os/kstat_fr.c.
struct { kstat_named_t physmem; kstat_named_t nalloc; kstat_named_t nfree; kstat_named_t nalloc_calls; kstat_named_t nfree_calls; kstat_named_t kernelbase; kstat_named_t econtig; kstat_named_t freemem; kstat_named_t availrmem; kstat_named_t lotsfree; kstat_named_t desfree; kstat_named_t minfree; kstat_named_t fastscan; kstat_named_t slowscan; kstat_named_t nscan; kstat_named_t desscan; kstat_named_t pp_kernel; kstat_named_t pagesfree; kstat_named_t pageslocked; kstat_named_t pagestotal; } system_pages_kstat = { { "physmem", KSTAT_DATA_ULONG }, { "nalloc", KSTAT_DATA_ULONG }, { "nfree", KSTAT_DATA_ULONG }, { "nalloc_calls", KSTAT_DATA_ULONG }, { "nfree_calls", KSTAT_DATA_ULONG }, { "kernelbase", KSTAT_DATA_ULONG }, { "econtig", KSTAT_DATA_ULONG }, { "freemem", KSTAT_DATA_ULONG }, { "availrmem", KSTAT_DATA_ULONG }, { "lotsfree", KSTAT_DATA_ULONG }, { "desfree", KSTAT_DATA_ULONG }, { "minfree", KSTAT_DATA_ULONG }, { "fastscan", KSTAT_DATA_ULONG }, { "slowscan", KSTAT_DATA_ULONG }, { "nscan", KSTAT_DATA_ULONG }, { "desscan", KSTAT_DATA_ULONG }, { "pp_kernel", KSTAT_DATA_ULONG }, { "pagesfree", KSTAT_DATA_ULONG }, { "pageslocked", KSTAT_DATA_ULONG }, { "pagestotal", KSTAT_DATA_ULONG }, };
These statistics are the simplest type, merely a basic list of 64-bit variables. Once declared, the kstats are registered with the subsystem.
static int system_pages_kstat_update(kstat_t *, int); ... kstat_t *ksp; ksp = kstat_create("unix", 0, "system_pages", "pages", KSTAT_TYPE_NAMED, sizeof (system_pages_kstat) / sizeof (kstat_named_t), KSTAT_FLAG_VIRTUAL); if (ksp) { ksp->ks_data = (void *) &system_pages_kstat; ksp->ks_update = system_pages_kstat_update; kstat_install(ksp); } ...
The kstat create
function takes the 4-tuple description and the size of the kstat and provides a handle to the created kstats. The handle is then updated to include a pointer to the data and a callback function which will be invoked when the user reads the statistics.
The callback function when invoked has the task of updating the data structure pointed to by ks_data
. If you choose not to update, simply set the callback function to default_kstat_update()
. The system pages kstat preamble looks like this:
static int system_pages_kstat_update(kstat_t *ksp, int rw) { if (rw == KSTAT_WRITE) { return (EACCES); }
This basic preamble checks to see if the user code is trying to read or write the structure. (Yes, it’s possible to write to some statistics if the provider allows it.) Once basic checks are done, the update callback simply stores the statistics into the predefined data structure, and then returns.
... system_pages_kstat.freemem.value.ul = (ulong_t)freemem; system_pages_kstat.availrmem.value.ul = (ulong_t)availrmem; system_pages_kstat.lotsfree.value.ul = (ulong_t)lotsfree; system_pages_kstat.desfree.value.ul = (ulong_t)desfree; system_pages_kstat.minfree.value.ul = (ulong_t)minfree; system_pages_kstat.fastscan.value.ul = (ulong_t)fastscan; system_pages_kstat.slowscan.value.ul = (ulong_t)slowscan; system_pages_kstat.nscan.value.ul = (ulong_t)nscan; system_pages_kstat.desscan.value.ul = (ulong_t)desscan; system_pages_kstat.pagesfree.value.ul = (ulong_t)freemem; ... return (0); }
That’s it for a basic named kstat
.
In this section, we can see an example of how I/O stats are measured and recorded. As discussed in Section 11.1.3.5, there is special type of kstat for I/O statistics.
I/O devices are measured as a queue, using Reimann Sum—which is a count of the visits to the queue and a sum of the “active” time. These two metrics can be used to determine the average service time and I/O counts for the device. There are typically two queues for each device, the wait queue and the active queue. This represents the time spent after the request has been accepted and enqueued, and then the time spent active on the device.
An I/O device driver has a similar declare and create section, as we saw with the NAMED statistics. For instance, the floppy disk device driver (usr/src/uts/sun/io/fd.c
) shows kstat_create()
in the device driver attach function.
static int fd_attach(dev_info_t *dip, ddi_attach_cmd_t cmd) { ... fdc->c_un->un_iostat = kstat_create("fd", 0, "fd0", "disk", KSTAT_TYPE_IO, 1, KSTAT_FLAG_PERSISTENT); if (fdc->c_un->un_iostat) { fdc->c_un->un_iostat->ks_lock = &fdc->c_lolock; kstat_install(fdc->c_un->un_iostat); } ... }
The per-I/O statistics are updated when the device driver strategy function and the location where the I/O is first received and queued. At this point, the I/O is marked as waiting on the wait queue.
#define KIOSP KSTAT_IO_PTR(un->un_iostat) static int fd_strategy(register struct buf *bp) { struct fdctlr *fdc; struct fdunit *un; fdc = fd_getctlr(bp->b_edev); un = fdc->c_un; ... /* Mark I/O as waiting on wait q */ if (un->un_iostat) { kstat_waitq_enter(KIOSP); } ... }
The I/O spends some time on the wait queue until the device is able to process the request. For each I/O the fdstart()
routine moves the I/O from the wait queue to the run queue with the kstat_waitq_to_runq()
function.
static void fdstart(struct fdctlr *fdc) { ... /* Mark I/O as active, move from wait to active q */ if (un->un_iostat) { kstat_waitq_to_runq(Kiosp); } ... /* Do I/O... */ ...
When the I/O is complete (still in the fdstart()
function), it is marked with kstat_runq_exit()
as leaving the active queue. This updates the last part of the statistic, leaving us with the number of I/Os and the total time spent on each queue.
/* Mark I/O as complete */ if (un->un_iostat) { if (bp->b_flags & B_READ) { KIOSP->reads++; KIOSP->nread += (bp->b_bcount - bp->b_resid); } else { KIOSP->writes++; KIOSP->nwritten += (bp->b_bcount - bp->b_resid); } kstat_runq_exit(KIOSP); } biodone(bp); ... }
These statistics provide us with our familiar metrics, where actv
is the average length of the queue of active I/Os and asvc_t
is the average service time in the device. The wait queue is represented accordingly with wait
and wsvc_t
.
$ iostat -xn 10
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
1.2 0.1 9.2 1.1 0.1 0.5 0.1 10.4 1 1 fd0
Much of the information in this chapter derives from various SunSolve InfoDocs, Solaris white papers, and Solaris man pages (section 3KSTAT). For detailed information on the APIs, refer to the Solaris 8 Reference Manual Collection and Writing Device Drivers. Both publications are available at docs.sun.com
.
52.15.129.253