Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 7. Networks

In this chapter, we review the tools available to monitor networking within and between Solaris systems. We examine tools for systemwide network statistics and per-process statistics.

Terms for Network Analysis

The following list of terms related to network analysis also serves as an overview of the topics in this section.

Packets. Network interface packet counts can be fetched from netstat -i and roughly indicate network activity.
Bytes. Measuring throughput in terms of bytes is useful because interface maximum throughput is measured in comparable terms, bits/sec. Byte statistics for interfaces are provided by Kstat, SNMP, nx.se, and nicstat.
Utilization. Heavy network use can degrade application response. The nicstat tool calculates utilization by dividing current throughput by a known maximum.
Saturation. Once an interface is saturated, network applications usually experience delays. Saturation can occur elsewhere on the network.
Errors. netstat -i is useful for printing error counts: collisions (small numbers are normal), input errors (bad FCS), and output errors (late collisions).
Link status. link_status plus link_speed and link_mode are three values to describe the state of the interface; they are provided by kstat or ndd.
Tests. There is great value in test driving the network to see what speed it can really manage. Tools such as TTCP can be used.
By-process. Network I/O by process can be analyzed with DTrace. Scripts such as tcptop and tcpsnoop perform this analysis.
TCP. Various TCP statistics are kept for MIB-II,^[1] plus additional statistics. These statistics are useful for troubleshooting and are obtained with kstat or netstat -s.
IP. Various IP statistics are kept for MIB-II, plus additional statistics. They are obtained with kstat or netstat -s.
ICMP. Tests, such as the ping and traceroute commands, that make use of ICMP can inform about the network surroundings. Various ICMP statistics, obtained with kstat or netstat -s, are also kept.

Table 7.1 summarizes and cross-references the tools discussed in this section.

Table 7.1. Tools for Network Analysis

Tool	Uses	Description	Ref.
`netstat`	Kstat	Kitchen sink of network statistics. Route table, established connections, interface packet counts, and errors	7.7.1
`kstat`	Kstat	For fetching raw kstat counters for each network interface and the TCP, IP, and ICMP modules	7.7.2, 7.9.2, 7.10.2, 7.11.1
`nx.se`	Kstat	For printing network interface and TCP throughput in terms of kilobytes	7.7.3
`nicstat`	Kstat	For printing network interface utilization	7.7.4
`snmpnetstat`	SNMP	For network interface statistics from SNMP	7.7.5
`checkcable`	Kstat,ndd	For network interface status: link speed, link mode, link up availability	7.7.6
`ping`	ICMP	To test whether remote hosts are "alive"	7.7.7
`traceroute`	UDP, ICMP	To print the path to a remote host, including delays to each hop	7.7.8
`snoop`	/dev	To capture network packets	7.7.9
`TTCP`	TCP	For applying a network traffic workload	7.7.10
`pathchar`	UDP, ICMP	For analysis of the path to a remote host, including speed between hops	7.7.11
`ntop`	libpcap	For reporting on sniffed traffic	7.7.12
`nfsstat`	Kstat	For viewing NFS client and server statistics	7.7.13, 7.7.14
`tcptop`	DTrace	For printing a by-process summary of network usages	7.8.1
`tcpsnoop`	DTrace	For tracing network packets by-process	7.8.2
`dtrace`	DTrace	For capturing TCP, IP, and ICMP statistics programmatically	7.9.4, 7.10.4, 7.11.3

Packets Are Not Bytes

The official tool in Solaris for monitoring network traffic is the netstat command.

$ netstat -i 1
    input   hme0      output           input  (Total)    output
packets errs  packets errs  colls  packets errs  packets errs  colls
141461153 29    152961282 0     0      234608752 29    246108881 0     0
295     0     2192    0     0      299     0     2196    0      0
296     0     2253    0     0      300     0     2257    0      0
295     0     2258    0     0      299     0     2262    0      0
179     0     1305    0     0      183     0     1309    0      0
...

In the above output, we can see that the hme0 interface had very few errors (which is useful to know) and was sending over 2,000 packets per second. Is 2, 000a lot? We don’t know whether this means the interface is at 100% utilization or 1% utilization; all it tells us is that traffic is occurring.

Measuring traffic by using packet counts is like measuring rainfall by listening for rain. Network cards are rated in terms of throughput, 100 Mbits/sec, 1000 Mbits/sec, etc. Measuring the current network traffic in similar terms (by using bytes) helps us understand how utilized the interface really is.

Bytes per second are indeed tracked by Kstat, and netstat is a Kstat consumer. However, netstat doesn’t surrender this information without a fight.^[2] These days we are supposed to use kstat to get it.

$ kstat -p 'hme:0:hme0:*bytes64'
hme:0:hme0:obytes64     51899673435
hme:0:hme0:rbytes64     47536009231

This output shows that byte statistics for network interfaces are indeed in Kstat, which will let us calculate a percent utilization. Later, we cover tools that help us do that. For now we discuss why network utilization, saturation, and errors are useful metrics to observe.

Network Utilization

The following points help describe the effects of network utilization.

Network events, like disk events, are slow. They are often measured in milliseconds. A client application that is heavily network bound will experience delays. Network server applications often obviate these delays by being multithreaded or multiprocess.
A network card that is at 100% utilization will most likely degrade application performance. However there are times where we expect 100% utilization, such as in bulk network transfers.
Dividing the current Kbytes/sec by the speed of the network card can provide a useful measure of network utilization.
Using only Kbytes/sec in a utilization calculation fails to account for per-packet overheads.
Unexpectedly high utilizations may be caused when auto-negotiation has failed by choosing a much slower speed.

Network Saturation

A network card that is sent more traffic than it can send in an interval queues data in various buffers, including the TCP buffer. This causes application delays as the network card clears the backlog.

An important point is that while your system may not be saturated, something else on the network may be. Often your network traffic will pass through several hops, any of which may be experiencing problems.

Network Errors

Errors can occur from network collisions and as such are a normal occurrence. With hubs they occurred so often that various rules were formulated to help us know what really was a problem (> 5% of packet counts).

Three types of errors are visible in the previous netstat -i output, examples are:

output:colls. Collisions. Normal in small doses.
input:errs. A frame failed its frame check sequence.
output:errs. Late collisions. A collision occurred after the first 64 bytes were sent.

The last two types of errors can be caused by bad wiring, faulty cards, auto-negotiation problems, and electromagnetic interference. If you are monitoring a microwave link, add “rain fade” and nesting pigeons to the list. And if your Solaris server happens to be on a satellite, you get to mention Solar winds as well.

Misconfigurations

Sometimes poor network performance is due to misconfigured components. This can be difficult to identify because there no error statistic indicates a fault; the misconfiguration might be found only after meticulous scrutiny of all network settings.

Places to check: all interface settings (ifconfig -a), route tables (netstat -rn), interface flags (link_speed/link_mode, discussed in Section 7.7.6), name server configurations (/etc/nsswitch.conf), DNS resolvers (/etc/resolv.conf), /var/adm/messages, FMA faults (fmadm faulty, fmdump), firewall configurations, and configurable network components (switches, routers, gateways).

Systemwide Statistics

The following tools allow us to observe network statistics, including statistics for TCP, IP, and each network interface, throughout the system.

`netstat` Command

The Solaris netstat command is the catch-all for a number of different network status programs.

$ netstat -i
Name  Mtu  Net/Dest      Address        Ipkts  Ierrs Opkts  Oerrs Collis Queue
lo0   8232 localhost     localhost      191    0     191    0     0      0
ipge0 1500 waterbuffalo  waterbuffalo   31152163 0     24721687 0     0      0

$ netstat -i 3
    input   ipge0     output       input  (Total)    output
packets errs  packets errs  colls  packets errs  packets errs  colls
31152218 0     24721731 0     0      31152409 0     24721922 0     0

$ netstat -I ipge0 -i 3
    input   ipge0     output       input  (Total)    output
packets errs  packets errs  colls  packets errs  packets errs  colls
31152284 0     24721797 0     0      31152475 0     24721988 0     0

netstat -i, mentioned earlier, prints only packet counts. We don’t know if they are big packets or small packets, and we cannot use them to accurately determine how utilized the network interface is. Other performance monitoring tools plot this as a “be all and end all” value—this is wrong.

Packet counts may help as an indicator of activity. A packet count of less than 100 per second can be treated as fairly idle; a worst case for Ethernet makes this around 150 Kbytes/sec (based on maximum MTU size).

The netstat -i output may be much more valuable for its error counts, as discussed in Section 7.5.

netstat -s dumps various network-related counters from kstat. This shows that Kstat does track at least some details in terms of bytes.

$ netstat -s | grep Bytes
        tcpOutDataSegs      =37367847   tcpOutDataBytes      =166744792
        tcpRetransSegs      =153437     tcpRetransBytes      =72298114
        tcpInAckSegs        =25548715   tcpInAckBytes        =148658291
        tcpInInorderSegs    =35290928   tcpInInorderBytes    =3637819567
        tcpInUnorderSegs    =324309     tcpInUnorderBytes    =406912945
        tcpInDupSegs        =152795     tcpInDupBytes        =73998299
        tcpInPartDupSegs    =  7896     tcpInPartDupBytes    =5821485
        tcpInPastWinSegs    =    38     tcpInPastWinBytes    =971347352

However, the byte values above are for TCP in total, including loopback traffic that didn’t travel through the network interfaces. These statistics can still be of some value, especially if large numbers of errors are observed. For more details on these and a reference table, see Section 7.9.

netstat -k on Solaris 9 and earlier dumped all kstat counters.

From the output we can see that there are byte counters (rbytes64, obytes64) for the hme0 interface, which is just what we need to measure per-interface traffic. However netstat -k was an undocumented switch that has now been dropped in Solaris 10. This is fine since there are better ways to get to kstat, including the C library, which is used by tools such as vmstat.

$ netstat -k | awk '/^hme0/,/^$/'
hme0:
ipackets 70847004 ierrors 6 opackets 73438793 oerrors 0 collisions 0
defer 0 framing 0 crc 0 sqe 0 code_violations 0 len_errors 0
ifspeed 100000000 buff 0 oflo 0 uflo 0 missed 6 tx_late_collisions 0
retry_error 0 first_collisions 0 nocarrier 0 nocanput 0
allocbfail 0 runt 0 jabber 0 babble 0 tmd_error 0 tx_late_error 0
rx_late_error 0 slv_parity_error 0 tx_parity_error 0 rx_parity_error 0
slv_error_ack 0 tx_error_ack 0 rx_error_ack 0 tx_tag_error 0
rx_tag_error 0 eop_error 0 no_tmds 0 no_tbufs 0 no_rbufs 0
rx_late_collisions 0 rbytes 289601566 obytes 358304357 multircv 558 multixmt 73411
brdcstrcv 3813836 brdcstxmt 1173700 norcvbuf 0 noxmtbuf 0   newfree 0
ipackets64 70847004 opackets64 73438793 rbytes64 47534241822 obytes64 51897911909
align_errors 0
fcs_errors 0   sqe_errors 0 defer_xmts 0 ex_collisions 0
macxmt_errors 0 carrier_errors 0 toolong_errors 0 macrcv_errors 0
link_duplex 0 inits 31 rxinits 0 txinits 0 dmarh_inits 0
dmaxh_inits 0 link_down_cnt 0 phy_failures 0 xcvr_vendor 524311
asic_rev 193 link_up 1

`kstat` Command

The Solaris Kernel Statistics framework tracks network usage, and as of Solaris 8, the kstat command fetches these details (see Chapter 11). This command has a variety of options for selecting statistics and can be executed by non-root users.

The -m option for kstat matches on a module name. In the following example, we use it to display all available statistics for the networking modules.

$ kstat -m tcp
module: tcp                             instance: 0
name:   tcp                             class:    mib2
        activeOpens                     803
        attemptFails                    312
        connTableSize                   56
...
$ kstat -m ip
module: ip                              instance: 0
name:   icmp                            class:    mib2
        crtime                          3.207830752
        inAddrMaskReps                  0
        inAddrMasks                     0
...

$ kstat -m hme
module: hme                             instance: 0
name:   hme0                            class:     net
name:   hme0                            class:     net

        align_errors                    0
        allocbfail                      0
...

These commands fetch statistics for ip, tcp, and hme (our Ethernet card). The first group of statistics (others were truncated) from the tcp and ip modules states their class as mib2: These statistic groups are maintained by the TCP and IP code for MIB-II and then copied into kstat during a kstat update.

The following kstat command fetches byte statistics for our network interface, printing output every second.

$ kstat -p 'hme:0:hme0:*bytes64' 1
hme:0:hme0:obytes64     51899673435
hme:0:hme0:rbytes64     47536009231

hme:0:hme0:obytes64     51899673847
hme:0:hme0:rbytes64     47536009709
...

Using kstat in this manner is currently the best way to fetch network interface statistics with tools currently shipped with Solaris. Other tools exist that take the final step and print this data in a more meaningful way: Kbytes/sec or percent utilization. Two such tools are nx.se and nicstat.

`nx.se` Tool

The SE Toolkit provides a language, SymbEL, that lets us write our own performance monitoring tools. It also contained a collection of example tools, including nx.se which helps us calculate network utilization.

$ se nx.se 1
Current tcp RtoMin is 400, interval 1, start Sun Oct  9 10:36:42 2005

10:36:43 Iseg/s Oseg/s InKB/s OuKB/s Rst/s  Atf/s  Ret%  Icn/s  Ocn/s
tcp      841.6    4.0  74.98   0.27   0.00   0.00   0.0   0.00   0.00
Name    Ipkt/s Opkt/s InKB/s OuKB/s IErr/s OErr/s Coll% NoCP/s Defr/s
hme0     845.5  420.8 119.91  22.56  0.000  0.000   0.0   0.00   0.00

10:36:44 Iseg/s Oseg/s InKB/s OuKB/s Rst/s  Atf/s  Ret%  Icn/s  Ocn/s
tcp      584.2    5.0  77.97   0.60   0.00   0.00   0.0   0.00   0.00
Name    Ipkt/s Opkt/s InKB/s OuKB/s IErr/s OErr/s Coll% NoCP/s Defr/s
hme0     579.2  297.1 107.95  16.16  0.000  0.000   0.0   0.00   0.00

Having KB/s lets us determine how busy our network interfaces are. Other useful fields include collision percent (Coll%), no-can-puts per second (NoCP/s), and defers per second (Defr/s), which may be evidence of network saturation. nx.se also prints useful TCP statistics above the interface lines.

`nicstat` Tool

nicstat, a tool from the freeware K9Toolkit, reports network utilization and saturation by interface. It is available as a C or Perl kstat consumer.

$ nicstat 1
    Time   Int   rKb/s   wKb/s   rPk/s   wPk/s    rAvs    wAvs   %Util      Sat
10:48:30  hme0    4.02    4.39    6.14    6.36  670.73  706.50    0.07     0.00
10:48:31  hme0    0.29    0.50    3.00    4.00   98.00  127.00    0.01     0.00
10:48:32  hme0    1.35    4.23   14.00   15.00   98.79  289.00    0.05     0.00
10:48:33  hme0   67.73   19.08  426.00  207.00  162.81   94.39    0.71     0.00
10:48:34  hme0  315.22  128.91 1249.00  723.00  258.44  182.58    3.64     0.00
10:48:35  hme0  529.96   67.53 2045.00 1046.00  265.37   66.11    4.89     0.00
10:48:36  hme0  454.14   62.16 2294.00 1163.00  202.72   54.73    4.23     0.00
10:48:37  hme0   93.55   15.78  583.00  295.00  164.31   54.77    0.90     0.00
10:48:38  hme0   74.84   32.41  516.00  298.00  148.52  111.38    0.88     0.00
10:48:39  hme0    0.76    4.17    7.00    9.00  111.43  474.00    0.04     0.00
                                                 See K9Toolkit; nicstat.c or nicstat.pl

In this example output of nicstat, we can see a small amount of network traffic, peaking at 4.89% utilization.

The following are the switches available from version 0.98 of the Perl version of nicestat.

$ nicstat -h
USAGE: nicstat [-hsz] [-i int[,int...]] | [interval [count]]
   eg, nicstat               # print a 1 second sample
       nicstat 1             # print continually every 1 second
       nicstat 1 5           # print 5 times, every 1 second
       nicstat -s            # summary output
       nicstat -i hme0       # print hme0 only

The utilization measurement is based on the current throughput divided by the maximum speed of the interface (if available through kstat). The saturation measurement is a value that reflects errors due to saturation if kstat found any.

This method for calculating utilization does not account for other per-packet costs, such as Ethernet preamble. These costs are generally minor, and we assume they do not greatly affect the utilization value.

SNMP

It’s worth mentioning that useful data is also available in SNMP, which is used by software such as MRTG (a popular freeware network utilization plotter). A full install of Solaris 10 provides Net-SNMP, putting many of the commands under /usr/sfw/bin.

Here we demonstrate the use of snmpget to fetch interface statistics.

$ snmpget -v1 -c public localhost ifOutOctets.2 ifInOctets.2
IF-MIB::ifOutOctets.2 = Counter32: 10016768
IF-MIB::ifInOctets.2 = Counter32: 11932165

The .2 corresponds to our primary interface. These values are the outbound and inbound bytes. In Solaris 10 a full description of the IF-MIB statistics can be found in /etc/sma/snmp/mibs/IF-MIB.txt.

Other software products fetch and present data from the IF-MIB, which is a valid and desirable approach for monitoring network interface activity. Solaris 10’s Net-SNMP supports SNMPv3, which provides User-based Security Module (USM) for the creation of user accounts and encrypted sessions; and View-based Access Control Module (VACM) to restrict users to view only the statistics they need. When configured, they greatly enhance the security of SNMP. For information on each, see snmpusm(1M) and snmpvacm(1M).

Net-SNMP also provides a version of netstat called snmpnetstat. Besides the standard output using -i, snmpnetstat has a -o option to print octets (bytes) instead of packets.

$ snmpnetstat -v1 -c public -i localhost
Name      Mtu Network   Address        Ipkts Ierrs Opkts Oerrs Queue
lo0      8232 loopback  localhost       6639     0  6639     0     0
hme0     1500 192.168.1 titan         385635     0 86686     0     0
hme0:1   1500 192.168.1 192.168.1.204      0     0     0     0     0
$
$ snmpnetstat -v1 -c public -o localhost
Name    Network   Address        Ioctets   Ooctets
lo0     loopback  localhost            0         0
hme0    192.168.1 titan          98241462 55500788
hme0:1  192.168.1 192.168.1.204        0         0

Input bytes (Ioctets) and output bytes (Ooctets) can be seen. Now all we need is an interval for this information to be of real value.

# snmpnetstat -v1 -c public -I hme0 -o localhost 10
     input   (hme0)     output                        input  (Total)     output
   packets     errs    packets     errs    colls    packets     errs    packets     errs  
   colls
    386946        0      88300        0        0     395919        0      97273        0  
       0
       452        0        797        0        0        538        0        883        0  
       0
         0        0          0        0        0          0        0          0        0  
       0
         0        0          0        0        0          0        0          0        0  
       0
       844        0       1588        0        0        952        0       1696        0  
       0
         0        0          0        0        0          0        0          0        0  
       0
         0        0          0        0        0          0        0          0        0  
       0
       548        0        965        0        0        656        0       1073        0  
       0
         0        0          0        0        0          0        0          0        0  
       0
         0        0          0        0        0          0        0          0        0  
       0
^C

Even though we provided the -o option, by also providing an interval (10 seconds), we caused the snmpnetstat command to revert to printing packet counts. Also, the statistics that SNMP uses are only updated every 30 seconds. Future versions of snmpnetstat may correctly print octets with intervals.

`checkcable` Tool

Sometimes network performance problems can be caused by incorrect auto-negotiation that selects a lower speed or duplex. There is a way to retrieve the settings that a particular network card has chosen, but there is not one way that works for all cards. It usually involves poking around with the ndd command and using a lookup table for your particular card to decipher the output of ndd.

Consistent data for network cards should be available from Kstat, and Sun does have a standard in place. However many of the network drivers were written before the standard existed, and some were written by third-party companies. The state of consistent Kstat data for network cards is improving and at some point in the future should boil down to a few well understood one-liners of the kstat command, such as:kstat -p | grep <interfacename>.

In the meantime, it is not always that easy. Some data is available from kstat, much of it from ndd. The following example demonstrates fetching ndd data for an hme card.

# ndd /dev/hme link_status
1
# ndd /dev/hme link_speed
1
# ndd /dev/hme link_mode
1

These numbers indicate a connected or unconnected cable (link_status), the current speed (link_speed), and the duplex (link_mode). What 1 or some other number means depends on the card. A list of available ndd variables for this card can be listed with ndd -get /dev/hme ? (the -get is optional).

SunSolve has Infodocs to explain what these numbers mean for various cards. If you have mainly one type of card at your site, you eventually remember what the numbers mean. As a very general rule, “1” is often good, “0” is often bad; so “0” for link_mode probably means half duplex.

The checkcable tool, available from the K9Toolkit, deciphers many card types for you.^[3] It uses both kstat and ndd to retrieve the network settings because not all the data is available to either kstat or ndd.

# checkcable
Interface    Link Duplex  Speed  AutoNEG
hme0          UP   FULL    100        ON

# checkcable
Interface    Link Duplex  Speed  AutoNEG
hme0         DOWN   FULL    100       ON

The first output has the hme0 interface as link-connected (UP), full duplex, 100 Mbits/sec, and auto-negotiation on; the second output was with the cable disconnected. The speed and duplex must be set to what the switch thinks they are set to so that the network link functions correctly.

There are still some cards that checkcable is unable to view. The state of card statistics is slowly getting better; eventually, checkcable will not be needed to translate these numbers.

`ping` Tool

ping is the classic network probe tool; it uses ICMP messages to test the response time of round-trip packets.

$ ping -s mars
PING mars: 56 data bytes
64 bytes from mars (192.168.1.1): icmp_seq=0. time=0.623 ms
64 bytes from mars (192.168.1.1): icmp_seq=1. time=0.415 ms
64 bytes from mars (192.168.1.1): icmp_seq=2. time=0.464 ms
^C
----mars PING Statistics----
3 packets transmitted, 3 packets received, 0% packet loss
round-trip (ms)  min/avg/max/stddev = 0.415/0.501/0.623/0.11

So we discover that mars is up and that it responds within 1 millisecond. Solaris 10 enhanced ping to print three decimal places for the times. ping is handy to see if a host is up, but that’s about all.

`traceroute` Tool

traceroute sends a series of UDP packets with an increasing TTL, and by watching the ICMP time-expired replies, we can discover the hops to a host (assuming the hops actually decrement the TTL):

$ traceroute www.sun.com
traceroute: Warning: Multiple interfaces found; using 260.241.10.2 @ hme0:1
traceroute to www.sun.com (209.249.116.195), 30 hops max, 40 byte packets
 1  tpggate (260.241.10.1)  21.224 ms  25.933 ms  25.281 ms
 2  172.31.217.14 (172.31.217.14)  49.565 ms  27.736 ms  25.297 ms
 3  syd-nxg-ero-zeu-2-gi-3-0.tpgi.com.au (220.244.229.9)  25.454 ms  22.066 ms  26.237
ms
 4  syd-nxg-ibo-l3-ge-0-2.tpgi.com.au (220.244.229.132)  42.216 ms *  37.675 ms
 5  220-245-178-199.tpgi.com.au (220.245.178.199)  40.727 ms  38.291 ms  41.468 ms
 6  syd-nxg-ibo-ero-ge-1-0.tpgi.com.au (220.245.178.193)  37.437 ms  38.223 ms  38.373
ms
 7  Gi11-2.gw2.syd1.asianetcom.net (202.147.41.193)  24.953 ms  25.191 ms  26.242 ms
 8  po2-1.gw1.nrt4.asianetcom.net (202.147.55.110)  155.811 ms  169.330 ms  153.217 ms
 9  Abovenet.POS2-2.gw1.nrt4.asianetcom.net (203.192.129.42)  150.477 ms  157.173 ms *
10  so-6-0-0.mpr3.sjc2.us.above.net (64.125.27.54)  240.077 ms  239.733 ms  244.015 ms
11  so-0-0-0.mpr4.sjc2.us.above.net (64.125.30.2)  224.560 ms  228.681 ms  221.149 ms
12  64.125.27.102 (64.125.27.102)  241.229 ms  235.481 ms  238.868 ms
13  * *^C

The times may provide some idea of where a network bottleneck is. We must also remember that networks are dynamic and that this may not be the permanent path to that host (and could even change as traceroute executes).

`snoop` Tool

The power to capture and inspect network packets live from the interface is provided by snoop, an indispensable tool. When network events don’t seem to be working, it can be of great value to verify that the packets are actually arriving in the first place.

snoop places a network device in “promiscuous mode” so that all network traffic, addressed to this host or not, is captured. You ought to have permission to be sniffing network traffic, as often snoop displays traffic contents—including user names and passwords.

# snoop
Using device /dev/hme (promiscuous mode)
     jupiter -> titan        TCP D=22 S=36570 Ack=1602213819 Seq=1929072366 Len=0
Win=49640
      titan -> jupiter      TCP D=36570 S=22 Push Ack=1929072366 Seq=1602213819 Len=128
Win=49640
     jupiter -> titan        TCP D=22 S=36570 Ack=1602213947 Seq=1929072366 Len=0
Win=49640
...

The most useful options include the following: don’t resolve hostnames (-r), change the device (-d), output to a capture file (-o), input from a capture file (-i), print semi-verbose (-V, one line per protocol layer), print full-verbose (-v, all details), and send packets to /dev/audio (-a). Packet filter syntax can also be applied.

By using output files, you can try different options when reading them (-v, -V). Moreover, outputting to a file incurs less CPU overhead than the default live output.

TTCP

Test TCP is a freeware tool that tests the throughput between two hops. It needs to be run on both the source and destination, and a Java version of TTCP runs on many different operating systems. Beware, it floods the network with traffic to perform its test.

The following is run on one host as a receiver. The options used here made the test run for a reasonable duration—around 60 seconds.

$ java ttcp -r -n 65536
Receive: buflen= 8192  nbuf= 65536 port= 5001
Then the following was run on the second host as the transmitter,

$ java ttcp -t jupiter -n 65536
Transmit: buflen= 8192  nbuf= 65536 port= 5001
Transmit connection:
  Socket[addr=jupiter/192.168.1.5,port=5001,localport=46684].
Transmit: 536870912 bytes in 46010 milli-seconds = 11668.57 KB/sec (93348.56 Kbps).

This example shows that the speed between these hosts for this test is around 11.6 megabytes per second.

It is not uncommon for people to test the speed of their network by transferring a large file around. This may be better than it sounds; any test is better than none.

`pathchar` Tool

After writing traceroute, Van Jacobson wrote pathchar, an amazing tool that identifies network bottlenecks. It operates like traceroute, but rather than printing response time to each hop, it prints bandwidth between each pair of hops.

# pathchar 192.168.1.1
pathchar to 192.168.1.1 (192.168.1.1)
 doing 32 probes at each of 64 to 1500 by 32
 0 localhost
 |    30 Mb/s,   79 us (562 us)
 1 neptune.drinks.com (192.168.2.1)
 |    44 Mb/s,   195 us (1.23 ms)
 2 mars.drinks.com (192.168.1.1)
2 hops, rtt 547 us (1.23 ms), bottleneck  30 Mb/s, pipe 7555 bytes

This tool works by sending “shaped” traffic over a long interval and carefully measuring the response times. It doesn’t flood the network like TTCP does.

Binaries for pathchar can be found on the Internet, but the source code has yet to be released. Some open source versions, based on the ideas from pathchar, are in development.

`ntop` Tool

ntop sniffs network traffic and issues comprehensive reports through a web interface. It is very useful, so long as you can (and are allowed to) snoop the traffic of interest. It is driven from a web browser aimed at localhost:3000.

# ntop
ntop v.1.3.1 MT [sparc-sun-solaris2.8] listening on [hme0,hme0:0,hme0:1].
Copyright 1998-2000 by Luca Deri <[email protected]>
Get the freshest ntop from http://www.ntop.org/

Initialising...
Loading plugins (if any)...
WARNING: Unable to find the plugins/ directory.
Waiting for HTTP connections on port 3000...
Sniffying...

NFS Client Statistics: `nfsstat -c`

$ nfsstat -c

Client rpc:
Connection oriented:
calls      badcalls   badxids    timeouts   newcreds   badverfs   timers
202499     0          0          0          0          0          0
cantconn   nomem      interrupts
0          0          0
Connectionless:
calls      badcalls   retrans    badxids    timeouts   newcreds   badverfs
0          0          0          0          0          0          0
timers     nomem      cantsend
0          0          0

Client nfs:
calls     badcalls  clgets    cltoomany
200657    0         200657    7
Version 2: (0 calls)
null     getattr  setattr  root     lookup   readlink read     wrcache
0 0%     0 0%     0 0%     0 0%     0 0%     0 0%     0 0%     0 0%
write    create   remove   rename   link     symlink  mkdir    rmdir
0 0%     0 0%     0 0%     0 0%     0 0%     0 0%     0 0%     0 0%
readdir  statfs
0 0%     0 0%
Version 3: (0 calls)
null        getattr     setattr     lookup      access      readlink
0 0%        0 0%        0 0%        0 0%        0 0%        0 0%
read        write       create      mkdir       symlink     mknod
0 0%        0 0%        0 0%        0 0%        0 0%        0 0%
remove      rmdir       rename      link        readdir     readdirplus
0 0%        0 0%        0 0%        0 0%        0 0%        0 0%
fsstat      fsinfo      pathconf    commit
0 0%        0 0%        0 0%        0 0%

Client statistics printed include retransmissions (retrans), unmatched replies (badxids), and timeouts. See nfsstat(1M) for verbose descriptions.

NFS Server Statistics: `nfsstat -s`

The server version of nfsstat prints a screenful of statistics to pick through. Of interest are the value of badcalls and the number of file operation statistics.

$ nfsstat -s

Server rpc:
Connection oriented:
calls      badcalls   nullrecv   badlen     xdrcall    dupchecks  dupreqs
5897288    0          0          0          0          372803     0
Connectionless:
calls      badcalls   nullrecv   badlen     xdrcall    dupchecks  dupreqs
87324      0          0          0          0          0          0

...
Version 4: (949163 calls)
null                compound
3175 0%             945988 99%
Version 4: (3284515 operations)
reserved            access              close               commit
0 0%                72954 2%            199208 6%           2948 0%
create              delegpurge          delegreturn         getattr
4 0%                0 0%                16451 0%            734376 22%
getfh               link                lock                lockt
345041 10%          6 0%                101 0%              0 0%
locku               lookup              lookupp             nverify
101 0%              145651 4%           5715 0%             171515 5%
open                openattr            open_confirm        open_downgrade
199410 6%           0 0%                271 0%              0 0%
putfh               putpubfh            putrootfh           read
914825 27%          0 0%                581 0%              130451 3%
readdir             readlink            remove              rename
5661 0%             11905 0%            15 0%               201 0%
renew               restorefh           savefh              secinfo
30765 0%            140543 4%           146336 4%           277 0%
setattr             setclientid         setclientid_confirm verify
23 0%               26 0%               26 0%               10 0%
write               release_lockowner   illegal
9118 0%             0 0%                0 0%
...

Per-Process Network Statistics

In this section, we explore tools to monitor network usage by process. We build on DTrace to provide these tools.

In previous versions of Solaris it was difficult to measure network I/O by process, just as it was difficult to measure disk I/O by process. Both of these problems have been solved with DTrace—disk by process is now trivial with the io provider. However, at the time of this writing, a network provider has yet to be released. So while network-by-process measurement is possible with DTrace, it is not straightforward.^[4]

`tcptop` Tool

tcptop, a DTrace-based tool from the freeware DTraceToolkit, summarizes TCP traffic by system and by process.

# tcptop 10
Sampling... Please wait.
2005 Jul  5 04:55:25,  load: 1.11,  TCPin:      2 Kb,  TCPout:    110 Kb

 UID    PID LADDR           LPORT FADDR           FPORT      SIZE NAME
 100  20876 192.168.1.5     36396 192.168.1.1        79      1160 finger
 100  20875 192.168.1.5     36395 192.168.1.1        79      1160 finger
 100  20878 192.168.1.5     36397 192.168.1.1        23      1303 telnet
 100  20877 192.168.1.5       859 192.168.1.1       514    115712 rcp
                                                                      See DTraceToolkit

The first line of the above report contains the date, CPU load average (one minute), and two TCP statistics, TCPin and TCPout. These are from the TCP (MIB); they track local host traffic as well as physical network traffic.

The rest of the report contains per-process data and includes fields for the PID, local address (LADDR), local port (LPORT), remote address (FADDR^[5]), remote port (FPORT), number of bytes transferred during sample (SIZE), and process name (NAME). tcptop retrieves this data by tracing TCP events

This particular version of tcptop captures these per-process details for connections that were established while tcptop was running and could observe the handshake. Since TCPin and TCPout fields are for all traffic, a large discrepancy between them and the per-process details may suggest that we missed observing handshakes for busy sessions.^[6]

It turns out to be quite difficult to kludge DTrace to trace network traffic by process such that it identifies all types of traffic correctly 100% of the time. Without a network provider, the events must be traced from fbt. The fbt provider is an unstable interface, meaning that probes may change for minor releases of Solaris.^[7]

The greatest problem with using DTrace to trace network traffic by process is that both inbound and outbound traffic are asynchronous to the process, so we can’t simply look at the on-CPU PID when the network event occurred. From user-land, when the PID is correct, there is no one single way that TCP traffic is generated, such that we could simply trace it then and there. We have to contend with many other issues; for example, when tracing traffic to the telnet server, we would want to identify in.telnetd as the process responsible (principle of least surprise?). However, in.telnetd never steps onto the CPU after establishing the connection, and instead we find that telnet traffic is caused by a plethora of unlikely suspects: ls, find, date, etc. With enough D code, though, we can solve these issues with DTrace.

`tcpsnoop` Tool

The tcpsnoop tool is the companion to tcptop. It is also from the DTraceToolkit and prints TCP packet details live by process.

# tcpsnoop
  UID    PID LADDR           LPORT DR RADDR           RPORT  SIZE CMD
  100  20892 192.168.1.5     36398 -> 192.168.1.1        79     54 finger
  100  20892 192.168.1.5     36398 <- 192.168.1.1        79     66 finger
  100  20892 192.168.1.5     36398 -> 192.168.1.1        79     54 finger
  100  20892 192.168.1.5     36398 -> 192.168.1.1        79     56 finger
  100  20892 192.168.1.5     36398 <- 192.168.1.1        79     54 finger
  100  20892 192.168.1.5     36398 <- 192.168.1.1        79    606 finger
  100  20892 192.168.1.5     36398 -> 192.168.1.1        79     54 finger
  100  20892 192.168.1.5     36398 <- 192.168.1.1        79     54 finger
  100  20892 192.168.1.5     36398 -> 192.168.1.1        79     54 finger
  100  20892 192.168.1.5     36398 -> 192.168.1.1        79     54 finger
  100  20892 192.168.1.5     36398 <- 192.168.1.1        79     54 finger
    0    242 192.168.1.5        23 <- 192.168.1.1     54224     54 inetd
    0    242 192.168.1.5        23 -> 192.168.1.1     54224     54 inetd
    0    242 192.168.1.5        23 <- 192.168.1.1     54224     54 inetd
    0    242 192.168.1.5        23 <- 192.168.1.1     54224     78 inetd
    0    242 192.168.1.5        23 -> 192.168.1.1     54224     54 inetd
    0  20893 192.168.1.5        23 -> 192.168.1.1     54224     57 in.telnetd
    0  20893 192.168.1.5        23 <- 192.168.1.1     54224     54 in.telnetd
    0  20893 192.168.1.5        23 -> 192.168.1.1     54224     78 in.telnetd
...

In the above output we can see a PID column and packet details, the result of tracking TCP traffic that has travelled on external interfaces. While running, tcpsnoop captured the details of an outbound finger command and an inbound telnet.

As with tcptop, this version of tcpsnoop examines newly connected sessions (while tcpsnoop has been running). This behavior can be useful because when the tcpsnoop tool is run over an existing network session (like ssh), it doesn’t trace its own output.

TCP Statistics

The TCP code maintains a large number of statistics for MIB-II, which is used by SNMP. These counters track details such as the number of established connections and the total number of segments sent, received, and retransmitted.

They could be used as an indicator of activity, although you must remember that these statistics usually include loopback traffic. You could also use them when you are troubleshooting networking issues: A large number of retransmissions may be a sign that a network fault is causing packet loss.

TCP statistics can be found in the following places:

TCP MIB-II statistics, listed in /etc/sma/snmp/mibs/TCP-MIB.txt on Solaris 10 or in RFC 2012; available from both the SNMP daemon and Kstat.
Solaris additions to TCP MIB-II, listed in /usr/include/inet/mib2.h and available from Kstat.
Extra Kstat collections maintained by the module.

TCP Statistics Internals

To explain how the TCP MIB statistics are maintained, we show tcp.c code that updates two of these statistics.

static int
tcp_snmp_get(queue_t *q, mblk_t *mpctl)
{
...
                        tcp = connp->conn_tcp;
                        UPDATE_MIB(&tcp_mib, tcpInSegs, tcp->tcp_ibsegs);
                        tcp->tcp_ibsegs = 0;
                        UPDATE_MIB(&tcp_mib, tcpOutSegs, tcp->tcp_obsegs);
                        tcp->tcp_obsegs = 0;
...
                                                              See uts/common/inet/tcp/tcp.c

UPDATE_MIB increases the statistic by the argument specified. Here the tcpInSegs and tcpOutSegs statistics are updated. These are from standard TCP MIBII statistics that the Solaris 10 SNMP daemon^[8] makes available; they are defined on Solaris 10 in the TCP-MIB.txt^[9] file.

The tcp.c code also maintains additional MIB statistics. For example,

void
tcp_rput_data(void *arg, mblk_t *mp, void *arg2)
{
...
                BUMP_MIB(&tcp_mib, tcpInDataInorderSegs);
                UPDATE_MIB(&tcp_mib, tcpInDataInorderBytes, seg_len);
...
                                                          See uts/common/inet/tcp/tcp.c

BUMP_MIB incremented the tcpInDataInorderSegs statistic by 1, then tcpInDataInorderBytes was updated. These are not standard statistics that are RFC defined, and as such they are not currently made available by the SNMP daemon. They are some of many extra and useful statistics maintained by the Solaris code.

A list of these extra statistics is in mib2.h after the comment that reads /* In addition to MIB-II */.

typedef struct mib2_tcp {
...
/* In addition to MIB-II */
...
        /* total # of data segments received in order */
        Counter tcpInDataInorderSegs;
        /* total # of data bytes received in order */
        Counter tcpInDataInorderBytes;
...
                                                           See /usr/include/inet/mib2.h

Table 7.2 lists all the extra statistics. The kstat view of TCP statistics (see Section 7.7.2) is copied from these MIB counters during each kstat update.

Table 7.2. TCP Kstat/MIB-II Statistics

Statistic	Description
`tcpRtoAlgorithm`	Algorithm used for transmit timeout value
`tcpRtoMin`	Minimum retransmit timeout (ms)
`tcpRtoMax`	Maximum retransmit timeout (ms)
`tcpMaxConn`	Maximum # of connections supported
`tcpActiveOpens`	# of direct transitions CLOSED -> SYN-SENT
`tcpPassiveOpens`	# of direct transitions LISTEN -> SYN-RCVD
`tcpAttemptFails`	# of direct SIN-SENT/RCVD -> CLOSED/LISTEN
`tcpEstabResets`	# of direct ESTABLISHED/CLOSE-WAIT -> CLOSED
`tcpCurrEstab`	# of connections ESTABLISHED or CLOSE-WAIT
`tcpInSegs`	Total # of segments received
`tcpOutSegs`	Total # of segments sent
`tcpRetransSegs`	Total # of segments retransmitted
`tcpConnTableSize`	Size of `tcpConnEntry_t`
`tcpOutRsts`	# of segments sent with `RST` flag
`...`	/* In addition to MIB-II */
`tcpOutDataSegs`	Total # of data segments sent
`tcpOutDataBytes`	Total # of bytes in data segments sent
`tcpRetransBytes`	Total # of bytes in segments retransmitted
`tcpOutAck`	Total of ACKs sent
`tcpOutAckDelayed`	Total # of delayed ACKs sent
`tcpOutUrg`	Total of segments sent with the `urg` flag on
`tcpOutWinUpdate`	Total # of window updates sent
`tcpOutWinProbe`	Total # of zero window probes sent
`tcpOutControl`	Total # of control segments sent (`syn`, `fin`, `rst`)
`tcpOutFastRetrans`	Total # of segments sent due to “fast retransmit”
`tcpInAckSegs`	Total # of ACK segments received
`tcpInAckBytes`	Total # of bytes ACKed
`tcpInDupAck`	Total # of duplicate ACKs
`tcpInAckUnsent`	Total # of ACKs acknowledging unsent data
`tcpInDataInorderSegs`	Total # of data segments received in order
`tcpInDataInorderBytes`	Total # of data bytes received in order
`tcpInDataUnorderSegs`	Total # of data segments received out of order
`tcpInDataUnorderBytes`	Total # of data bytes received out of order
`tcpInDataDupSegs`	Total # of complete duplicate data segments received
`tcpInDataDupBytes`	Total # of bytes in the complete duplicate data segments received
`tcpInDataPartDupSegs`	Total # of partial duplicate data segments received
`tcpInDataPartDupBytes`	Total # of bytes in the partial duplicate data segments received
`tcpInDataPastWinSegs`	Total # of data segments received past the window
`tcpInDataPastWinBytes`	Total # of data bytes received past the window
`tcpInWinProbe`	Total # of zero window probes received
`tcpInWinUpdate`	Total # of window updates received
`tcpInClosed`	Total # of data segments received after the connection has closed
`tcpRttNoUpdate`	Total # of failed attempts to update the rtt estimate
`tcpRttUpdate`	Total # of successful attempts to update the rtt estimate
`tcpTimRetrans`	Total # of retransmit timeouts
`tcpTimRetransDrop`	Total # of retransmit timeouts dropping the connection
`tcpTimKeepalive`	Total # of keepalive timeouts
`tcpTimKeepaliveProbe`	Total # of keepalive timeouts sending a probe
`tcpTimKeepaliveDrop`	Total # of keepalive timeouts dropping the connection
`tcpListenDrop`	Total # of connections refused because backlog is full on listen
`tcpListenDropQ0`	Total # of connections refused because half-open queue (q0) is full
`tcpHalfOpenDrop`	Total # of connections dropped from a full half-open queue (q0)
`tcpOutSackRetransSegs`	Total # of retransmitted segments by SACK retransmission
`tcp6ConnTableSize`	Size of `tcp6ConnEntry_t`

This behavior leads to an interesting situation: Since kstat provides a copy of all the MIB statistics that Solaris maintains, kstat provides a greater number of statistics than does SNMP. So to delve into TCP statistics in greater detail, use Kstat commands such as kstat and netstat -s.

TCP Statistics from Kstat

The kstat command can fetch all the TCP MIB statistics.

$ kstat -n tcp
module: tcp                             instance: 0
name:   tcp                             class:    mib2
        activeOpens                     812
        attemptFails                    312
        connTableSize                   56
        connTableSize6                  84
        crtime                          3.203529053
        currEstab                       5
        estabResets                     2
...

You can print all statistics from the TCP module by specifying -m instead of -n; -m, includes tcpstat, a collection of extra kstats that are not contained in the Solaris TCP MIB. And you can print individual statistics by using -s.

TCP Statistics Reference

Table 7.2 lists all the TCP MIB-II statistics and the Solaris additions. This list was taken from mib2.h. See TCP-MIB.txt for more information about some of these statistics.

TCP Statistics from DTrace

DTrace can probe TCP MIB statistics as they are incremented, as the BUMP_MIB and UPDATE_MIB macros were modified to do. The following command lists the TCP MIB statistics from DTrace.

# dtrace -ln 'mib:ip::tcp*'
   ID   PROVIDER            MODULE                          FUNCTION NAME
  789        mib                ip                  tcp_find_pktinfo tcpInErrs
  790        mib                ip                   ip_rput_data_v6 tcpInErrs
  791        mib                ip                      ip_tcp_input tcpInErrs
 1163        mib                ip                     tcp_ack_timer tcpOutAckDelayed
 1164        mib                ip              tcp_xmit_early_reset tcpOutRsts
 1165        mib                ip                      tcp_xmit_ctl tcpOutRsts
...

While it can be useful to trace these counters as they are incremented, some needs are still unfulfilled. For example, tracking network activity by PID, UID, project, or zone is not possible with these probes alone: There is no guarantee that they will fire in the context of the responsible thread, so DTrace’s variables such as execname and pid sometimes match the wrong process.

DTrace can be useful to capture these statistics during an interval of your choice. The following one-liner does this until you press Ctrl-C.

# dtrace -n 'mib:::tcp* { @[probename] = sum(arg0); }'
dtrace: description 'mib:::tcp* ' matched 93 probes
^C

  tcpInDataInorderSegs                                              7
  tcpInAckSegs                                                     14
  tcpRttUpdate                                                     14
  tcpInDataInorderBytes                                            16
  tcpOutDataSegs                                                   16
  tcpOutDataBytes                                                4889
  tcpInAckBytes                                                  4934

IP Statistics

As with TCP statistics, Solaris maintains a large number of statistics in the IP code for SNMP MIB-II. These often exclude loopback traffic and may be a better indicator of physical network activity than are the TCP statistics. They can also help with troubleshooting as various packet errors are tracked. The IP statistics can be found in the following places:

IP MIB-II statistics, listed in /etc/sma/snmp/mibs/IP-MIB.txt on Solaris 10 or in RFC 2011; available from both the SNMP daemon and Kstat.
Solaris additions to IP MIB-II, listed in /usr/include/inet/mib2.h and available from Kstat.
Extra Kstat collections maintained by the module.

IP Statistics Internals

The IP MIB statistics are maintained in the Solaris code in the same way as the TCP MIB statistics (see Section 7.9.1). The Solaris code also maintains additional IP statistics to extend MIB-II.

IP Statistics from Kstat

The kstat command can fetch all the IP MIB statistics as follows.

$ kstat -n ip
module: ip                              instance: 0
name:   ip                              class:    mib2
        addrEntrySize                   96
        crtime                          3.207689216
        defaultTTL                      255
        forwDatagrams                   0
        forwProhibits                   0
        forwarding                      2
        fragCreates                     0
...

You can print all Kstats from the IP module by using -m instead of -n. The -m option includes extra Kstats that are not related to the Solaris IP MIB. You can print individual statistics with -s.

IP Statistics Reference

Table 7.3 lists all the IP MIB-II statistics and the Solaris additions. This list was taken from mib2.h. See TCP-MIB.txt for more information about some of these statistics.

Table 7.3. IP Kstat/MIB-II Statistics

Statistic	Description
`ipForwarding`	Forwarder? 1 = gateway; 2 = not gateway
`ipDefaultTTL`	Default time-to-live for IPH
`ipInReceives`	# of input datagrams
`ipInHdrErrors`	# of datagram discards for IPH error
`ipInAddrErrors`	# of datagram discards for bad address
`ipForwDatagrams`	# of datagrams being forwarded
`ipInUnknownProtos`	# of datagram discards for unknown protocol
`ipInDiscards`	# of datagram discards of good datagrams
`ipInDelivers`	# of datagrams sent upstream
`ipOutRequests`	# of outdatagrams received from upstream
`ipOutDiscards`	# of good outdatagrams discarded
`ipOutNoRoutes`	# of outdatagram discards: no route found
`ipReasmTimeout`	Seconds received fragments held for reassembly.
`ipReasmReqds`	# of IP fragments needing reassembly
`ipReasmOKs`	# of datagrams reassembled
`ipReasmFails`	# of reassembly failures (not datagram count)
`ipFragOKs`	# of datagrams fragmented
`ipFragFails`	# of datagram discards for no fragmentation set
`ipFragCreates`	# of datagram fragments from fragmentation
`ipAddrEntrySize`	Size of `mib2_ipAddrEntry_t`
`ipRouteEntrySize`	Size of `mib2_ipRouteEntry_t`
`ipNetToMediaEntrySize`	Size of `mib2_ipNetToMediaEntry_t`
`ipRoutingDiscards`	# of valid route entries discarded
`...`	/The following defined in MIB-II as part of TCP and UDP groups /
`tcpInErrs`	Total # of segments received with error
`udpNoPorts`	# of received datagrams not deliverable (no application.)
`...`	/* In addition to MIB-II */
`ipInCksumErrs`	# of bad IP header checksums
`ipReasmDuplicates`	# of complete duplicates in reassembly
`ipReasmPartDups`	# of partial duplicates in reassembly
`ipForwProhibits`	# of packets not forwarded for administrative reasons
`udpInCksumErrs`	# of UDP packets with bad UDP checksums
`udpInOverflows`	# of UDP packets dropped because of queue overflow
`rawipInOverflows`	# of RAW IP packets (all IP protocols except UDP, TCP, and ICMP) dropped because of queue overflow
`...`	/* The following are private IPSEC MIB */
`ipsecInSucceeded`	# of incoming packets that succeeded with policy checks
`ipsecInFailed`	# of incoming packets that failed policy checks
`ipMemberEntrySize`	Size of `ip_member_t`
`ipInIPv6`	# of IPv6 packets received by IPv4 and dropped
`ipOutIPv6`	# of IPv6 packets transmitted by `ip_wput`
`ipOutSwitchIPv6`	# of times `ip_wput` has switched to become `ip_wput_v6`

IP Statistics from DTrace

As with TCP, DTrace can trace these statistics as they are updated. The following command lists the probes that correspond to IP MIB statistics whose name begins with “ip” (which is not quite all of them; see Table 7.3).

# dtrace -ln 'mib:ip::ip*'
   ID   PROVIDER            MODULE                          FUNCTION NAME
  691        mib                ip                  ndp_input_advert ipv6IfIcmpInBad...
  692        mib                ip                 ndp_input_solicit ipv6IfIcmpInBad...
  693        mib                ip                ill_frag_free_pkts ipReasmFails
  694        mib                ip                  ill_frag_timeout ipReasmFails
  695        mib                ip                  ill_frag_timeout ipv6ReasmFails
  697        mib                ip                   ip_wput_frag_v6 ipv6OutFragOKs
...

And the following one-liner tracks these statistics until Ctrl-C is pressed.

# dtrace -n 'mib:::ip* { @[probename] = sum(arg0); }'
dtrace: description 'mib:::ip* ' matched 209 probes
^C

  ipInDelivers                                                      6
  ipInReceives                                                     91
  ipOutRequests                                                   153

ICMP Statistics

ICMP statistics are maintained by Solaris in the same way as TCP and IP, as explained in the previous two sections. To avoid unnecessary repetition, we list only key points and differences in this section.

The MIB-II statistics are in /etc/sma/snmp/mibs/IP-MIB.txt and in RFC 2011, along with IP. Solaris has a few additions to the ICMP MIB.

ICMP Statistics from Kstat

The following command prints all of the ICMP MIB statistics.

$ kstat -n icmp
module: ip                              instance: 0
name:   icmp                            class:    mib2
        crtime                          3.207830752
        inAddrMaskReps                  0
        inAddrMasks                     0
...

ICMP Statistics Reference

Table 7.4 from mib2.h lists ICMP MIB-II statistics plus Solaris additions.

Table 7.4. ICMP Kstat/MIB-II Statistics

Statistic	Description
`icmpInMsgs`	Total # of received ICMP messages
`icmpInErrors`	# of received ICMP messages msgs with errors
`icmpInDestUnreachs`	# of received “dest unreachable” messages
`icmpInTimeExcds`	# of received “time exceeded” messages
`icmpInParmProbs`	# of received “parameter problem” messages
`icmpInSrcQuenchs`	# of received “source quench” messages
`icmpInRedirects`	# of received “ICMP redirect” messages
`icmpInEchos`	# of received “echo request” messages
`icmpInEchoReps`	# of received “echo reply” messages
`icmpInTimestamps`	# of received “timestamp” messages
`icmpInTimestampReps`	# of received “timestamp reply” messages
`icmpInAddrMasks`	# of received “address mask request” messages
`icmpInAddrMaskReps`	# of received “address mask reply” messages
`icmpOutMsgs`	total # of sent ICMP messages
`icmpOutErrors`	# of messages not sent for internal ICMP errors
`icmpOutDestUnreachs`	# of “dest unreachable” messages sent
`icmpOutTimeExcds`	# of “time exceeded” messages sent
`icmpOutParmProbs`	# of “parameter problem” messages sent
`icmpOutSrcQuenchs`	# of “source quench” messages sent
`icmpOutRedirects`	# of “ICMP redirect” messages sent
`icmpOutEchos`	# of “Echo request” messages sent
`icmpOutEchoReps`	# of “Echo reply” messages sent
`icmpOutTimestamps`	# of “timestamp request” messages sent
`icmpOutTimestampReps`	# of “timestamp reply” messages sent
`icmpOutAddrMasks`	# of “address mask request” messages sent
`icmpOutAddrMaskReps`	# of “address mask reply” messages sent
`...`	/* In addition to MIB-II */
`icmpInCksumErrs`	# of received packets with checksum errors
`icmpInUnknowns`	# of received packets with unknown codes
`icmpInFragNeeded`	# of received unreachables with “fragmentation needed”
`icmpOutFragNeeded`	# of sent unreachables with “fragmentation needed”
`icmpOutDrops`	# of messages not sent since original packet was broadcast/multicast or an ICMP error packet
`icmpInOverflows`	# of ICMP packets dropped because of queue overflow
`icmpInBadRedirects`	# of received “ICMP redirect” messages that are bad and thus ignored

ICMP Statistics from DTrace

The following DTrace one-liner tracks ICMP MIB events.

# dtrace -n 'mib:::icmp* { @[probename] = sum(arg0); }'
dtrace: description 'mib:::icmp* ' matched 34 probes
^C

  icmpInEchoReps                                                    1
  icmpInEchos                                                       3
  icmpOutEchoReps                                                   3
  icmpOutMsgs                                                       3
  icmpInMsgs                                                        4

Tracing Raw Network Functions

The fbt provider traces raw kernel functions, but its use is not recommended, because kernel functions may change between minor releases of Solaris, breaking DTrace scripts that used them. On the other hand, being able to trace these events is certainly better than not having the option at all.

The following example counts the frequency of TCP/IP functions called for this demonstration.

# dtrace -n 'fbt:ip::entry { @[probefunc] = count(); }'
dtrace: description 'fbt:ip::entry ' matched 1757 probes
^C
...
  ip_cksum                                                        519
  tcp_wput_data                                                  3058
  tcp_output                                                     3165
  tcp_wput                                                       3195
  squeue_enter                                                   3203

This one-liner matched 1, 757 probes for this build of Solaris 10 (the number of matches will vary for other builds). Another line of attack is the network driver itself. Here we demonstrate hme.

#  dtrace -n 'fbt:hme::entry { @[probefunc] = count(); }'
dtrace: description 'fbt:hme::entry '  matched 100 probes
^C
...
  hmewput                                                         221
  hmeintr                                                         320
  hme_check_dma_handle                                            668
  hme_check_acc_handle                                            762

The 100 probes provided by this hme driver may be sufficient for the task at hand and are easier to use than 1, 757 probes. rtls provides even fewer probes, 33.

^[1]Management Information Base, a collection of documented statistics that SNMP uses

^[2]The secret -k option that dumped all kstats has been dropped in Solaris 10 anyway.

^[3]checkcable is Perl, which can be read to see supported cards and contribution history.

^[4]The DTraceToolkit’s TCP tools are the only ones so far to measure tcp/pid events correctly. The shortest of the tools is over 400 lines. If a net provider is released, that script might be only 12 lines.

^[5]We chose the name “FADDR” after looking too long at the connection structure (struct conn_s).

^[6]A newer version of tcptop is in development to examine all sessions regardless of connection time (and has probably been released by the time you are reading this). The new version has an additional command-line option to revert to the older behavior.

^[7]Not only can the fbt probes change, but they have done so; a recent change to the kernel has changed TCP slightly, meaning that many of the DTrace TCP scripts need updating.

^[8]The SNMP daemon is based on Net-SNMP.

^[9]This file from RFC 2012 defines updated TCP statistics for SNMPv2. Also of interest is RFC 1213, the original MIB-II statistics, which include TCP.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 7. Networks

Create new playlist

Sign In

Sign Up