In Chapter 3 we discussed the data structures used by all interfaces and the initialization of those data structures. In this chapter we show how the Ethernet device driver operates once it has been initialized and is receiving and transmitting frames. The second half of this chapter covers the generic ioctl
commands for configuring network devices. Chapter 5 covers the SLIP and loopback drivers.
We won’t go through the entire source code for the Ethernet driver, since it is around 1,000 lines of C code (half of which is concerned with the hardware details of one particular interface card), but we do look at the device-independent Ethernet code and how the driver interfaces with the rest of the kernel.
If the reader is interested in going through the source code for a driver, the Net/3 release contains the source code for many different interfaces. Access to the interface’s technical specifications is required to understand the device-specific commands. Figure 4.1 shows the various drivers provided with Net/3, including the LANCE driver, which we discuss in this text.
Table 4.1. Ethernet drivers available in Net/3.
Device | File |
---|---|
DEC DEUNA Interface |
|
3Com Ethernet Interface |
|
Excelan EXOS 204 Interface |
|
Interlan Ethernet Communications Controller |
|
Interlan NP100 Ethernet Communications Controller |
|
Digital Q-BUS to NI Adapter |
|
CMC ENP-20 Ethernet Controller |
|
Excelan EXOS 202(VME) & 203(QBUS) |
|
ACC VERSAbus Ethernet Controller |
|
AMD 7990 LANCE Interface |
|
NE2000 Ethernet |
|
Western Digital 8003 Ethernet Adapter |
|
Network device drivers are accessed through the seven function pointers in the ifnet
structure (Figure 3.6). Figure 4.2 lists the entry points to our three example drivers.
Table 4.2. Interface functions for the example drivers.
ifnet | Ethernet | SLIP | Loopback | Description |
---|---|---|---|---|
|
| hardware initialization | ||
|
|
|
| accept and queue frame for transmission |
|
| begin transmission of frame | ||
| output complete (unused) | |||
|
|
|
| handle |
|
| reset the device to a known state | ||
| watch the device for failures or collect statistics |
Input functions are not included in Figure 4.2 as they are interrupt-driven for network devices. The configuration of interrupt service routines is hardware-dependent and beyond the scope of this book. We’ll identify the functions that handle device interrupts, but not the mechanism by which these functions are invoked.
Only the
if_output
andif_ioctl
functions are called with any consistency.if_init, if_done
, andif_reset
are never called or only called from device-specific code (e.g.,leinit
is called directly byleioctl
).if_start
is called only by theether_output
function.
The code for the Ethernet device driver and the generic interface ioctls
resides in two headers and three C files, which are listed in Figure 4.3.
The global variables shown in Figure 4.4 include the protocol input queues, the LANCE interface structure, and the Ethernet broadcast address.
le_softc
is an array, since there can be several Ethernet interfaces.
The statistics collected in the ifnet
structure for each interface are described in Figure 4.5.
Table 4.5. Statistics maintained in the ifnet
structure.
| Description | Used by SNMP |
---|---|---|
| #collisions on CSMA interfaces | |
| total #bytes received | • |
| #packets received with input errors | • |
| #packets received as multicasts or broadcasts | • |
| #packets received on interface | • |
| #packets dropped on input, by this interface | • |
| time of last change to statistics | • |
| #packets destined for unsupported protocol | • |
| total #bytes sent | • |
| #output errors on interface | • |
| #packets sent as multicasts | • |
| #packets sent on interface | • |
| #packets dropped during output | • |
| #packets in output queue |
Figure 4.6 shows some sample output from the netstat
command, which includes statistics from the ifnet
structure.
Table 4.6. Sample interface statistics.
|
---|
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll le0 1500 <Link>8.0.9.13.d.33 28680519 814 29234729 12 942798 le0 1500 128.32.33 128.32.33.5 28680519 814 29234729 12 942798 sl0* 296 <Link> 54036 0 45402 0 0 sl0* 296 128.32.33 128.32.33.5 54036 0 45402 0 0 sl1 296 <Link> 40397 0 33544 0 0 sl1 296 128.32.33 128.32.33.5 40397 0 33544 0 0 sl2* 296 <Link> 0 0 0 0 0 sl3* 296 <Link> 0 0 0 0 0 lo0 1536 <Link> 493599 0 493599 0 0 lo0 1536 127 127.0.0.1 493599 0 493599 0 0 |
The first column contains if_name
and if_unit
displayed as a string. If the interface is shut down (IFF_UP
is not set), an asterisk appears next to the name. In Figure 4.6, sl0, sl2
, and sl3
are shut down.
The second column shows if_mtu
. The output under the “Network” and “Address” headings depends on the type of address. For link-level addresses, the contents of sdl_data
from the sockaddr_dl
structure are displayed. For IP addresses, the subnet and unicast addresses are displayed. The remaining columns are if_ipackets, if_ierrors, if_opackets, if_oerrors
, and if_collisions
.
Approximately 3% of the packets collide on output (942,798/29,234,729 = 3%).
The SLIP output queues are never full on this machine since there are no output errors for the SLIP interfaces.
The 12 Ethernet output errors are problems detected by the LANCE hardware during transmission. Some of these errors may also be counted as collisions.
The 814 Ethernet input errors are also problems detected by the hardware, such as packets that are too short or that have invalid checksums.
Figure 4.7 shows a single interface entry object (ifEntry
) from the SNMP interface table (ifTable
), which is constructed from the ifnet
structures for each interface.
Table 4.7. Variables in interface table: ifTable
.
Interface table, index = <ifIndex> | ||
---|---|---|
SNMP variable |
| Description |
|
| uniquely identifies the interface |
|
| text name of interface |
|
| type of interface (e.g., Ethernet, SLIP, etc.) |
|
| MTU of the interface in bytes |
| (see text) | nominal speed of the interface in bits per second |
|
| media address (from |
| (see text) | desired state of the interface ( |
|
| operational state of the interface ( |
| (see text) | last time the statistics changed |
|
| total #input bytes |
|
| #input unicast packets |
|
| #input broadcast or multicast packets |
|
| #packets discarded because of implementation limits |
|
| #packets with errors |
|
| #packets destined to an unknown protocol |
|
| #output bytes |
|
| #output unicast packets |
|
| #output broadcast or multicast packets |
|
| #output packets dropped because of implementation limits |
|
| #output packets dropped because of errors |
|
| output queue length |
|
| SNMP object ID for media-specific information (not implemented) |
The ISODE SNMP agent derives ifSpeed
from if_type
and maintains an internal variable for ifAdminStatus
. The agent reports ifLastChange
based on if_lastchange
in the ifnet
structure but relative to the agent’s boot time, not the boot time of the system. The agent returns a null variable for ifSpecific
.
Net/3 Ethernet device drivers all follow the same general design. This is common for most Unix device drivers because the writer of a driver for a new interface card often starts with a working driver for another card and modifies it. In this section we’ll provide a brief overview of the Ethernet standard and outline the design of an Ethernet driver. We’ll refer to the LANCE driver to illustrate the design.
Figure 4.8 illustrates Ethernet encapsulation of an IP packet.
Ethernet frames consist of 48-bit destination and source addresses followed by a 16-bit type field that identifies the format of the data carried by the frame. For IP packets, the type is 0x0800
(2048). The frame is terminated with a 32-bit CRC (cyclic redundancy check), which detects errors in the frame.
We are describing the original Ethernet framing standard published in 1982 by Digital Equipment Corp., Intel Corp., and Xerox Corp., as it is the most common form used today in TCP/IP networks. An alternative form is specified by the IEEE (Institute of Electrical and Electronics Engineers) 802.2 and 802.3 standards. Section 2.2 in Volume 1 describes the differences between the two forms. See [Stallings 1987] for more information on the IEEE standards.
Encapsulation of IP packets for Ethernet is specified by RFC 894 [Hornig 1984] and for 802.3 networks by RFC 1042 [Postel and Reynolds 1988].
We will refer to the 48-bit Ethernet addresses as hardware addresses. The translation from IP to hardware addresses is done by the ARP protocol described in Chapter 21 (RFC 826 [Plummer 1982]) and from hardware to IP addresses by the RARP protocol (RFC 903 [Finlayson et al. 1984]). Ethernet addresses come in two types, unicast and multicast. A unicast address specifies a single Ethernet interface, and a multicast address specifies a group of Ethernet interfaces. An Ethernet broadcast is a multicast received by all interfaces. Ethernet unicast addresses are assigned by the device’s manufacturer, although some devices allow the address to be changed by software.
Some DECNET protocols require the hardware addresses of a multihomed host to be identical, so DECNET must be able to change the Ethernet unicast address of a device.
Figure 4.9 illustrates the data structures and functions that are part of the Ethernet interface.
In figures, a function is identified by an ellipse (
leintr
), data structures by a box (le_softc[0]
), and a group of functions by a rounded box (ARP protocol).
In the top left corner of Figure 4.9 we show the input queues for the OSI Connectionless Network Layer (clnl
) protocol, IP, and ARP. We won’t say anything more about clnlintrq
, but include it to emphasize that ether_input
demultiplexes Ethernet frames into multiple protocol queues.
Technically, OSI uses the term Connectionless Network Protocol (CLNP versus CLNL) but we show the terminology used by the Net/3 code. The official standard for CLNP is ISO 8473. [Stallings 1993] summarizes the standard.
The le_softc
interface structure is in the center of Figure 4.9. We are interested only in the ifnet
and arpcom
portions of the structure. The remaining portions are specific to the LANCE hardware. We showed the ifnet
structure in Figure 3.6 and the arpcom
structure in Figure 3.26.
We start with the reception of Ethernet frames. For now, we assume that the hardware has been initialized and the system has been configured so that leintr
is called when the interface generates an interrupt. In normal operation, an Ethernet interface receives frames destined for its unicast hardware address and for the Ethernet broadcast address. When a complete frame is available, the interface generates an interrupt and the kernel calls leintr
.
In Chapter 12, we’ll see that many Ethernet interfaces may be configured to receive Ethernet multicast frames (other than broadcasts).
Some interfaces can be configured to run in promiscuous mode in which the interface receives all frames that appear on the network. The
tcpdump
program described in Volume 1 can take advantage of this feature using BPF.
leintr
examines the hardware and, if a frame has arrived, calls leread
to transfer the frame from the interface to a chain of mbufs (with m_devget
). If the hardware reports that a frame transmission has completed or an error has been detected (such as a bad checksum), leintr
updates the appropriate interface statistics, resets the hardware, and calls lestart
, which attempts to transmit another frame.
All Ethernet device drivers deliver their received frames to ether_input
for further processing. The mbuf chain constructed by the device driver does not include the Ethernet header, so it is passed as a separate argument to ether_input
. The ether_header
structure is shown in Figure 4.10.
Table 4.10. The ether_header
structure.
------------------------------------------------------------------------ if_ether.h 38 struct ether_header { 39 u_char ether_dhost[6]; /* Ethernet destination address */ 40 u_char ether_shost[6]; /* Ethernet source address */ 41 u_short ether_type; /* Ethernet frame type */ 42 }; ------------------------------------------------------------------------ if_ether.h |
38-42
The Ethernet CRC is not generally available. It is computed and checked by the interface hardware, which discards frames that arrive with an invalid CRC. The Ethernet device driver is responsible for converting ether_type
between network and host byte order. Outside of the driver, it is always in host byte order.
The leread
function (Figure 4.11) starts with a contiguous buffer of memory passed to it by leintr
and constructs an ether_header
structure and a chain of mbufs. The chain contains the data from the Ethernet frame. leread
also passes the incoming frame to BPF.
Table 4.11. leread
function.
------------------------------------------------------------------------- if_le.c 528 leread(unit, buf, len) 529 int unit; 530 char *buf; 531 int len; 532 { 533 struct le_softc *le = &le_softc[unit]; 534 struct ether_header *et; 535 struct mbuf *m; 536 int off, resid, flags; 537 le->sc_if.if_ipackets++; 538 et = (struct ether_header *) buf; 539 et->ether_type = ntohs((u_short) et->ether_type); 540 /* adjust input length to account for header and CRC */ 541 len = len - sizeof(struct ether_header) - 4; 542 off = 0; 543 if (len <= 0) { 544 if (ledebug) 545 log(LOG_WARNING, 546 "le%d: ierror(runt packet): from %s: len=%d ", 547 unit, ether_sprintf(et->ether_shost), len); 548 le->sc_runt++; 549 le->sc_if.if_ierrors++; 550 return; 551 } 552 flags = 0; 553 if (bcmp((caddr_t) etherbroadcastaddr, 554 (caddr_t) et->ether_dhost, sizeof(etherbroadcastaddr)) == 0) 555 flags |= M_BCAST; 556 if (et->ether_dhost[0] & 1) 557 flags |= M_MCAST; 558 /* 559 * Check if there's a bpf filter listening on this interface. 560 * If so, hand off the raw packet to enet. 561 */ 562 if (le->sc_if.if_bpf) { 563 bpf_tap(le->sc_if.if_bpf, buf, len + sizeof(struct ether_header)); 564 /* 565 * Keep the packet if it's a broadcast or has our 566 * physical ethernet address (or if we support 567 * multicast and it's one). 568 */ 569 if ((flags & (M_BCAST | M_MCAST)) == 0 && 570 bcmp(et->ether_dhost, le->sc_addr, 571 sizeof(et->ether_dhost)) != 0) 572 return; 573 } 574 /* 575 * Pull packet off interface. Off is nonzero if packet 576 * has trailing header; m_devget will then force this header 577 * information to be at the front, but we still have to drop 578 * the type and length which are at the front of any trailer data. 579 */ 580 m = m_devget((char *) (et + 1), len, off, &le->sc_if, 0); 581 if (m == 0) 582 return; 583 m->m_flags |= flags; 584 ether_input(&le->sc_if, et, m); 585 } ------------------------------------------------------------------------- if_le.c |
528-539
The leintr
function passes three arguments to leread:unit
, which identifies the particular interface card that received a frame; buf
, which points to the received frame; and len
, the number of bytes in the frame (including the header and the CRC).
The function constructs the ether_header
structure by pointing et
to the front of the buffer and converting the Ethernet type value to host byte order.
540-551
The number of data bytes is computed by subtracting the sizes of the Ethernet header and the CRC from len
. Runt packets, which are too short to be a valid Ethernet frame, are logged, counted, and discarded.
552-557
Next, the destination address is examined to determine if it is the Ethernet broadcast or an Ethernet multicast address. The Ethernet broadcast address is a special case of an Ethernet multicast address; it has every bit set. etherbroadcastaddr
is an array defined as
u_char etherbroadcastaddr[6] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
This is a convenient way to define a 48-bit value in C. This technique works only if we assume that characters are 8-bit values something that isn’t guaranteed by ANSI C.
If bcmp
reports that etherbroadcastaddr
and ether_dhost
are the same, the M_BCAST
flag is set.
An Ethernet multicast addresses is identified by the low-order bit of the most significant byte of the address. Figure 4.12 illustrates this.
In Chapter 12 we’ll see that not all Ethernet multicast frames are IP multicast datagrams and that IP must examine the packet further.
If the multicast bit is on in the address, M_MCAST
is set in the mbuf header. The order of the tests is important: first ether_input
compares the entire 48-bit address to the Ethernet broadcast address, and if they are different it checks the low-order bit of the most significant byte to identify an Ethernet multicast address (Exercise 4.1).
558-573
If the interface is tapped by BPF, the frame is passed directly to BPF by calling bpf_tap
. We’ll see that for SLIP and the loopback interfaces, a special BPF frame is constructed since those networks do not have a link-level header (unlike Ethernet).
When an interface is tapped by BPF, it can be configured to run in promiscuous mode and receive all Ethernet frames that appear on the network instead of the subset of frames normally received by the hardware. The packet is discarded by leread
if it was sent to a unicast address that does not match the interface’s address.
574-585
m_devget
(Section 2.6) copies the data from the buffer passed to leread
to an mbuf chain it allocates. The first argument to m_devget
points to the first byte after the Ethernet header, which is the first data byte in the frame. If m_devget
runs out of memory, leread
returns immediately. Otherwise the broadcast and multicast flags are set in the first mbuf in the chain, and ether_input
processes the packet.
ether_input
, shown in Figure 4.13, examines the ether_header
structure to determine the type of data that has been received and then queues the received packet for processing.
Table 4.13. ether_input
function.
------------------------------------------------------------------ if_ethersubr.c 196 void 197 ether_input(ifp, eh, m) 198 struct ifnet *ifp; 199 struct ether_header *eh; 200 struct mbuf *m; 201 { 202 struct ifqueue *inq; 203 struct llc *l; 204 struct arpcom *ac = (struct arpcom *) ifp; 205 int s; 206 if ((ifp->if_flags & IFF_UP) == 0) { 207 m_freem(m); 208 return; 209 } 210 ifp->if_lastchange = time; 211 ifp->if_ibytes += m->m_pkthdr.len + sizeof(*eh); 212 if (bcmp((caddr_t) etherbroadcastaddr, (caddr_t) eh->ether_dhost, 213 sizeof(etherbroadcastaddr)) == 0) 214 m->m_flags |= M_BCAST; 215 else if (eh->ether_dhost[0] & 1) 216 m->m_flags |= M_MCAST; 217 if (m->m_flags & (M_BCAST | M_MCAST)) 218 ifp->if_imcasts++; 219 switch (eh->ether_type) { 220 case ETHERTYPE_IP: 221 schednetisr(NETISR_IP); 222 inq = &ipintrq; 223 break; 224 case ETHERTYPE_ARP: 225 schednetisr(NETISR_ARP); 226 inq = &arpintrq; 227 break; 228 default: 229 if (eh->ether_type > ETHERMTU) { 230 m_freem(m); 231 return; 232 } /* OSI code */ 307 } 308 s = splimp(); 309 if (IF_QFULL(inq)) { 310 IF_DROP(inq); 311 m_freem(m); 312 } else 313 IF_ENQUEUE(inq, m); 314 splx(s); 315 } ------------------------------------------------------------------ if_ethersubr.c |
196-209
The arguments to ether_input
are ifp
, a pointer to the receiving interface’s ifnet
structure; eh
, a pointer to the Ethernet header of the received packet; and m
, a pointer to the received packet (excluding the Ethernet header).
Any packets that arrive on an inoperative interface are silently discarded. The interface may not have been configured with a protocol address, or may have been disabled by an explicit request from the ifconfig
(8)
program (Section 6.6).
210-218
The variable time
is a global timeval
structure that the kernel maintains with the current time and date, as the number of seconds and microseconds past the Unix Epoch (00:00:00 January 1, 1970, Coordinated Universal Time [UTC]). A brief discussion of UTC can be found in [Itano and Ramsey 1993]. We’ll encounter the timeval
structure throughout the Net/3 sources:
struct timeval { long tv_sec; /* seconds */ long tv_usec; /* and microseconds */ };
ether_input
updates if_lastchange
with the current time and increments if_ibytes
by the size of the incoming packet (the packet length plus the 14-byte Ethernet header).
Next, ether_input
repeats the tests done by leread
to determine if the packet is a broadcast or multicast packet.
Some kernels may not have been compiled with the BPF code, so the test must also be done in
ether_input
.
219-227
ether_input
jumps according to the Ethernet type field. For an IP packet, schednetisr
schedules an IP software interrupt and the IP input queue, ipintrq
, is selected. For an ARP packet, the ARP software interrupt is scheduled and arpintrq
is selected.
An isr is an interrupt service routine.
In previous BSD releases, ARP packets were processed immediately while at the network interrupt level by calling
arpinput
directly. By queueing the packets, they can be processed at the software interrupt level.If other Ethernet types are to be handled, a kernel programmer would add additional cases here. Alternately, a process can receive other Ethernet types using BPF. For example, RARP servers are normally implemented using BPF under Net/3.
228-307
The default
case processes unrecognized Ethernet types or packets that are encapsulated according to the 802.3 standard (such as the OSI connectionless transport). The Ethernet type field and the 802.3 length field occupy the same position in an Ethernet frame. The two encapsulations can be distinguished because the range of types in an Ethernet encapsulation is distinct from the range of lengths in the 802.3 encapsulation (Figure 4.14). We have omitted the OSI code. [Stallings 1993] contains a description of the OSI link-level protocols.
There are many additional Ethernet type values that are assigned to various protocols; we don’t show them in Figure 4.14. RFC 1700 [Reynolds and Postel 1994] contains a list of the more common types.
308-315
Finally, ether_input
places the packet on the selected queue or discards the packet if the queue is full. We’ll see in Figures 7.23 and 21.16 that the default limit for the IP and ARP input queues is 50 (ipqmaxlen
) packets each.
When ether_input
returns, the device driver tells the hardware that it is ready to receive the next packet, which may already be present in the device. The packet input queues are processed when the software interrupt scheduled by schednetisr
occurs (Section 1.12). Specifically, ipintr
is called to process the packets on the IP input queue, and arpintr
is called to process the packets on the ARP input queue.
We now examine the output of Ethernet frames, which starts when a network-level protocol such as IP calls the if_output
function, specified in the interface’s ifnet
structure. The if_output
function for all Ethernet devices is ether_output
(Figure 4.2). ether_output
takes the data portion of an Ethernet frame, encapsulates it with the 14-byte Ethernet header, and places it on the interface’s send queue. This is a large function so we describe it in four parts:
verification,
protocol-specific processing,
frame construction, and
interface queueing.
Figure 4.15 includes the first part of the function.
Table 4.15. ether_output
function: verification.
---------------------------------------------------------------- if_ethersubr.c 49 int 50 ether_output(ifp, m0, dst, rt0) 51 struct ifnet *ifp; 52 struct mbuf *m0; 53 struct sockaddr *dst; 54 struct rtentry *rt0; 55 { 56 short type; 57 int s, error = 0; 58 u_char edst[6]; 59 struct mbuf *m = m0; 60 struct rtentry *rt; 61 struct mbuf *mcopy = (struct mbuf *) 0; 62 struct ether_header *eh; 63 int off, len = m->m_pkthdr.len; 64 struct arpcom *ac = (struct arpcom *) ifp; 65 if ((ifp->if_flags & (IFF_UP | IFF_RUNNING)) != (IFF_UP | IFF_RUNNING)) 66 senderr(ENETDOWN); 67 ifp->if_lastchange = time; 68 if (rt = rt0) { 69 if ((rt->rt_flags & RTF_UP) == 0) { 70 if (rt0 = rt = rtalloc1(dst, 1)) 71 rt->rt_refcnt--; 72 else 73 senderr(EHOSTUNREACH); 74 } 75 if (rt->rt_flags & RTF_GATEWAY) { 76 if (rt->rt_gwroute == 0) 77 goto lookup; 78 if (((rt = rt->rt_gwroute)->rt_flags & RTF_UP) == 0) { 79 rtfree(rt); 80 rt = rt0; 81 lookup: rt->rt_gwroute = rtalloc1(rt->rt_gateway, 1); 82 if ((rt = rt->rt_gwroute) == 0) 83 senderr(EHOSTUNREACH); 84 } 85 } 86 if (rt->rt_flags & RTF_REJECT) 87 if (rt->rt_rmx.rmx_expire == 0 || 88 time.tv_sec < rt->rt_rmx.rmx_expire) 89 senderr(rt == rt0 ? EHOSTDOWN : EHOSTUNREACH); 90 } ---------------------------------------------------------------- if_ethersubr.c |
49-64
The arguments to ether_output
are ifp
, which points to the outgoing interface’s ifnet
structure; m0
, the packet to send; dst
, the destination address of the packet; and rt0
, routing information.
65-67
The macro senderr
is called throughout ether_output
.
#define senderr(e) { error = (e); goto bad;}
senderr
saves the error code and jumps to bad
at the end of the function, where the packet is discarded and ether_output
returns error
.
If the interface is up and running, ether_output
updates the last change time for the interface. Otherwise, it returns ENETDOWN
.
68-74
rt0
points to the routing entry located by ip_output
and passed to ether_output
. If ether_output
is called from BPF, rt0
can be null, in which case control passes to the code in Figure 4.16. Otherwise, the route is verified. If the route is not valid, the routing tables are consulted and EHOSTUNREACH
is returned if a route cannot be located. At this point, rt0
and rt
point to a valid route for the next-hop destination.
Table 4.16. ether_output
function: network protocol processing.
-------------------------------------------------------------------- if_ethersubr.c 91 switch (dst->sa_family) { 92 case AF_INET: 93 if (!arpresolve(ac, rt, m, dst, edst)) 94 return (0); /* if not yet resolved */ 95 /* If broadcasting on a simplex interface, loopback a copy */ 96 if ((m->m_flags & M_BCAST) && (ifp->if_flags & IFF_SIMPLEX)) 97 mcopy = m_copy(m, 0, (int) M_COPYALL); 98 off = m->m_pkthdr.len - m->m_len; 99 type = ETHERTYPE_IP; 100 break; 101 case AF_ISO: /* OSI code */ 142 case AF_UNSPEC: 143 eh = (struct ether_header *) dst->sa_data; 144 bcopy((caddr_t) eh->ether_dhost, (caddr_t) edst, sizeof(edst)); 145 type = eh->ether_type; 146 break; 147 default: 148 printf("%s%d: can't handle af%d ", ifp->if_name, ifp->if_unit, 149 dst->sa_family); 150 senderr(EAFNOSUPPORT); 151 } -------------------------------------------------------------------- if_ethersubr.c |
75-85
If the next hop for the packet is a gateway (versus a final destination), a route to the gateway is located and pointed to by rt
. If a gateway route cannot be found, EHOSTUNREACH
is returned. At this point, rt
points to the route for the next-hop destination. The next hop may be a gateway or the final destination.
86-90
The RTF_REJECT
flag is enabled by the ARP code to discard packets to the destination when the destination is not responding to ARP requests. This is described with Figure 21.24.
ether_output
processing continues according to the destination address of the packet. Since Ethernet devices respond only to Ethernet addresses, to send a packet, ether_output
must find the Ethernet address that corresponds to the IP address of the next-hop destination. The ARP protocol (Chapter 21) implements this translation. Figure 4.16 shows how the driver accesses the ARP protocol.
91-101
ether_output
jumps according to sa_family
in the destination address. We show only the AF_INET, AF_ISO
, and AF_UNSPEC
cases in Figure 4.16 and have omitted the code for AF_ISO
.
The AF_INET
case calls arpresolve
to determine the Ethernet address corresponding to the destination IP address. If the Ethernet address is already in the ARP cache, arpresolve
returns 1 and ether_output
proceeds. Otherwise this IP packet is held by ARP, and when ARP determines the address, it calls ether_output
from the function in_arpinput
.
Assuming the ARP cache contains the hardware address, ether_output
checks if the packet is going to be broadcast and if the interface is simplex (i.e., it can’t receive its own transmissions). If both tests are true, m_copy
makes a copy of the packet. After the switch
, the copy is queued as if it had arrived on the Ethernet interface. This is required by the definition of broadcasting; the sending host must receive a copy of the packet.
We’ll see in Chapter 12 that multicast packets may also be looped back to be received on the output interface.
142-146
Some protocols, such as ARP, need to specify the Ethernet destination and type explicitly. The address family constant AF_UNSPEC
indicates that dst
points to an Ethernet header. bcopy
duplicates the destination address in edst
and assigns the Ethernet type to type
. It isn’t necessary to call arpresolve
(as for AF_INET
) because the Ethernet destination address has been provided explicitly by the caller.
147-151
Unrecognized address families generate a console message and ether_output
returns EAFNOSUPPORT
.
In the next section of ether_output
, shown in Figure 4.17, the Ethernet frame is constructed.
Table 4.17. ether_output
function: Ethernet frame construction.
---------------------------------------------------------------------- if_ethersubr.c 152 if (mcopy) 153 (void) looutput(ifp, mcopy, dst, rt); 154 /* 155 * Add local net header. If no space in first mbuf, 156 * allocate another. 157 */ 158 M_PREPEND(m, sizeof(struct ether_header), M_DONTWAIT); 159 if (m == 0) 160 senderr(ENOBUFS); 161 eh = mtod(m, struct ether_header *); 162 type = htons((u_short) type); 163 bcopy((caddr_t) &type, (caddr_t) &eh->ether_type, 164 sizeof(eh->ether_type)); 165 bcopy((caddr_t)edst, (caddr_t)eh->ether_dhost, sizeof (edst)); 166 bcopy((caddr_t)ac->ac_enaddr, (caddr_t)eh->ether_shost, 167 sizeof(eh->ether_shost)); ---------------------------------------------------------------------- if_ethersubr.c |
152-167
If the code in the switch
made a copy of the packet, the copy is processed as if it had been received on the output interface by calling looutput
. The loopback interface and looutput
are described in Section 5.4.
M_PREPEND
ensures that there is room for 14 bytes at the front of the packet.
Most protocols arrange to leave room at the front of the mbuf chain so that
M_PREPEND
needs only to adjust some pointers (e.g.,sosend
for UDP output in Section 16.7 andigmp_sendreport
in Section 13.6).
ether_output
forms the Ethernet header from type, edst
, and ac_enaddr
(Figure 3.26). ac_enaddr
is the unicast Ethernet address associated with the output interface and is the source Ethernet address for all frames transmitted on the interface. ether_output
overwrites the source address the caller may have specified in the ether_header
structure with ac_enaddr
. This makes it more difficult to forge the source address of an Ethernet frame.
At this point, the mbuf contains a complete Ethernet frame except for the 32-bit CRC, which is computed by the Ethernet hardware during transmission. The code shown in Figure 4.18 queues the frame for transmission by the device.
Table 4.18. ether_output
function: output queueing.
--------------------------------------------------------------------- if_ethersubr.c 168 s = splimp(); 169 /* 170 * Queue message on interface, and start output if interface 171 * not yet active. 172 */ 173 if (IF_QFULL(&ifp->if_snd)) { 174 IF_DROP(&ifp->if_snd); 175 splx(s); 176 senderr(ENOBUFS); 177 } 178 IF_ENQUEUE(&ifp->if_snd, m); 179 if ((ifp->if_flags & IFF_OACTIVE) == 0) 180 (*ifp->if_start) (ifp); 181 splx(s); 182 ifp->if_obytes += len + sizeof(struct ether_header); 183 if (m->m_flags & M_MCAST) 184 ifp->if_omcasts++; 185 return (error); 186 bad: 187 if (m) 188 m_freem(m); 189 return (error); 190 } --------------------------------------------------------------------- if_ethersubr.c |
168-185
If the output queue is full, ether_output
discards the frame and returns ENOBUFS
. If the output queue is not full, the frame is placed on the interface’s send queue, and the interface’s if_start
function transmits the next frame if the interface is not already active.
186-190
The senderr
macro jumps to bad
where the frame is discarded and an error code is returned.
The lestart
function dequeues frames from the interface output queue and arranges for them to be transmitted by the LANCE Ethernet card. If the device is idle, the function is called to begin transmitting frames. An example appears at the end of ether_output
(Figure 4.18), where lestart
is called indirectly through the interface’s if_start
function.
If the device is busy, it generates an interrupt when it completes transmission of the current frame. The driver calls lestart
to dequeue and transmit the next frame. Once started, the protocol layer can queue frames without calling lestart
since the driver dequeues and transmits frames until the queue is empty.
Figure 4.19 shows the lestart
function. lestart
assumes splimp
has been called to block any device interrupts.
Table 4.19. lestart
function.
---------------------------------------------------------------------------- if_le.c 325 lestart(ifp) 326 struct ifnet *ifp; 327 { 328 struct le_softc *le = &le_softc[ifp->if_unit]; 329 struct letmd *tmd; 330 struct mbuf *m; 331 int len; 332 if ((le->sc_if.if_flags & IFF_RUNNING) == 0) 333 return (0); /* device-specific code */ 335 do { /* device-specific code */ 340 IF_DEQUEUE(&le->sc_if.if_snd, m); 341 if (m == 0) 342 return (0); 343 len = leput(le->sc_r2->ler2_tbuf[le->sc_tmd], m); 344 /* 345 * If bpf is listening on this interface, let it 346 * see the packet before we commit it to the wire. 347 */ 348 if (ifp->if_bpf) 349 bpf_tap(ifp->if_bpf, le->sc_r2->ler2_tbuf[le->sc_tmd], 350 len); /* device-specific code */ 359 } while (++le->sc_txcnt < LETBUF); 360 le->sc_if.if_flags |= IFF_OACTIVE; 361 return (0); 362 } ---------------------------------------------------------------------------- if_le.c |
325-333
If the interface is not initialized, lestart
returns immediately.
335-342
If the interface is initialized, the next frame is removed from the queue. If the interface output queue is empty, lestart
returns.
343-350
leput
copies the frame in m
to the hardware buffer pointed to by the first argument to leput
. If the interface is tapped by BPF, the frame is passed to bpf_tap
. We have omitted the device-specific code that initiates the transmission of the frame from the hardware buffer.
359
lestart
stops passing frames to the device when le->sc_txcnt
equals LETBUF
. Some Ethernet interfaces can queue more than one outgoing Ethernet frame. For the LANCE driver, LETBUF
is the number of hardware transmit buffers available to the driver, and le->sc_txcnt
keeps track of how many of the buffers are in use.
360-362
Finally, lestart
turns on IFF_OACTIVE
in the ifnet
structure to indicate the device is busy transmitting frames.
There is an unfortunate side effect to queueing multiple frames in the device for transmission. According to [Jacobson 1988a], the LANCE chip is able to transmit queued frames with very little delay between frames. Unfortunately, some [broken] Ethernet devices drop the frames because they can’t process the incoming data fast enough.
This interacts badly with an application such as NFS that sends large UDP datagrams (often greater than 8192 bytes) that are fragmented by IP and queued in the LANCE device as multiple Ethernet frames. Fragments are lost on the receiving side, resulting in many incomplete datagrams and high delays as NFS retransmits the entire UDP datagram.
Jacobson noted that Sun’s LANCE driver only queued one frame at a time, perhaps to avoid this problem.
The ioctl
system call supports a generic command interface used by a process to access features of a device that aren’t supported by the standard system calls. The prototype for ioctl
is:
int ioctl (int fd, unsigned long com, ...);
fd is a descriptor, usually a device or network connection. Each type of descriptor supports its own set of ioctl
commands specified by the second argument, com. A third argument is shown as “ ” in the prot otype, since it is a pointer of some type that depends on the ioctl
command being invoked. If the command is retrieving information, the third argument must point to a buffer large enough to hold the data. In this text, we discuss only the ioctl
commands applicable to socket descriptors.
The prototype we show for system calls is the one used by a process to issue the system call. We’ll see in Chapter 15 that the function within the kernel that implements a system call has a different prototype.
We describe the implementation of the ioctl
system call in Chapter 17 but we discuss the implementation of individual ioctl
commands throughout the text.
The first ioctl
commands we discuss provide access to the network interface structures that we have described. Throughout the text we summarize ioctl
commands as shown in Figure 4.20.
Table 4.20. Interface ioctl
commands.
Command | Third argument | Function | Description |
---|---|---|---|
|
|
| retrieve list of interface configuration |
|
|
| get interface flags |
|
|
| get interface metric |
|
|
| set interface flags |
|
|
| set interface metric |
The first column shows the symbolic constant that identifies the ioctl
command (the second argument, com). The second column shows the type of the third argument passed to the ioctl
system call for the command shown in the first column. The third column names the function that implements the command.
Figure 4.21 shows the organization of the various functions that process ioctl
commands. The shaded functions are the ones we describe in this chapter. The remaining functions are described in other chapters.
The ioctl
system call routes the five commands shown in Figure 4.20 to the ifioctl
function shown in Figure 4.22.
Table 4.22. ifioctl
function: overview and SIOCGIFCONF
.
------------------------------------------------------------------------------------ if.c 394 int 395 ifioctl(so, cmd, data, p) 396 struct socket *so; 397 int cmd; 398 caddr_t data; 399 struct proc *p; 400 { 401 struct ifnet *ifp; 402 struct ifreq *ifr; 403 int error; 404 if (cmd == SIOCGIFCONF) 405 return (ifconf(cmd, data)); 406 ifr = (struct ifreq *) data; 407 ifp = ifunit(ifr->ifr_name); 408 if (ifp == 0) 409 return (ENXIO); 410 switch (cmd) { /* other interface ioctl commands (Figures 4.29 and 12.11) */ 447 default: 448 if (so->so_proto == 0) 449 return (EOPNOTSUPP); 450 return ((*so->so_proto->pr_usrreq) (so, PRU_CONTROL, 451 cmd, data, ifp)); 452 } 453 return (0); 454 } ------------------------------------------------------------------------------------ if.c |
394-405
For the SIOCGIFCONF
command, ifioctl
calls ifconf
to construct a table of variable-length ifreq
structures.
406-410
For the remaining ioctl
commands, the data argument is a pointer to an ifreq
structure. ifunit
searches the ifnet
list for an interface with the text name provided by the process in ifr->ifr_name
(e.g., "sl0","le1"
, or "lo0"
). If there is no matching interface, ifioctl
returns ENXIO
. The remaining code depends on cmd
and is described with Figure 4.29.
447-454
If the interface ioctl
command is not recognized, ifioctl
forwards the command to the user-request function of the protocol associated with the socket on which the request was made. For IP, these commands are issued on a UDP socket and udp_usrreq
is called. The commands that fall into this category are described in Figure 6.10. Section 23.10 describes the udp_usrreq
function in detail.
If control falls out of the switch
, 0 is returned.
ifconf
provides a standard way for a process to discover the interfaces present and the addresses configured on a system. Interface information is represented by ifreq
and ifconf
structures shown in Figures 4.23 and 4.24.
Table 4.23. ifreq
structure.
---------------------------------------------------------------------------- if.h 262 struct ifreq { 263 #define IFNAMSIZ 16 264 char ifr_name[IFNAMSIZ]; /* if name, e.g. "en0" */ 265 union { 266 struct sockaddr ifru_addr; 267 struct sockaddr ifru_dstaddr; 268 struct sockaddr ifru_broadaddr; 269 short ifru_flags; 270 int ifru_metric; 271 caddr_t ifru_data; 272 } ifr_ifru; 273 #define ifr_addr ifr_ifru.ifru_addr /* address */ 274 #define ifr_dstaddr ifr_ifru.ifru_dstaddr /* other end of p-to-p link */ 275 #define ifr_broadaddr ifr_ifru.ifru_broadaddr /* broadcast address */ 276 #define ifr_flags ifr_ifru.ifru_flags /* flags */ 277 #define ifr_metric ifr_ifru.ifru_metric /* metric */ 278 #define ifr_data ifr_ifru.ifru_data /* for use by interface */ 279 }; ---------------------------------------------------------------------------- if.h |
Table 4.24. ifconf
structure.
----------------------------------------------------------------------------- if.h 292 struct ifconf { 293 int ifc_len; /* size of associated buffer */ 294 union { 295 caddr_t ifcu_buf; 296 struct ifreq *ifcu_req; 297 } ifc_ifcu; 298 #define ifc_buf ifc_ifcu.ifcu_buf /* buffer address */ 299 #define ifc_req ifc_ifcu.ifcu_req /* array of structures returned */ 300 }; ----------------------------------------------------------------------------- if.h |
262-279
An ifreq
structure contains the name of an interface in ifr_name
. The remaining members in the union are accessed by the various ioctl
commands. As usual, macros simplify the syntax required to access the members of the union.
292-300
In the ifconf
structure, ifc_len
is the size in bytes of the buffer pointed to by ifc_buf
. The buffer is allocated by a process but filled in by ifconf
with an array of variable-length ifreq
structures. For the ifconf
function, ifr_addr
is the relevant member of the union in the ifreq
structure. Each ifreq
structure has a variable length because the length of ifr_addr
(a sockaddr
structure) varies according to the type of address. The sa_len
member from the sockaddr
structure must be used to locate the end of each entry. Figure 4.25 illustrates the data structures manipulated by ifconf
.
In Figure 4.25, the data on the left is in the kernel and the data on the right is in a process. We’ll refer to this figure as we discuss the ifconf
function listed in Figure 4.26.
Table 4.26. ifconf
function.
------------------------------------------------------------------------- if.c 462 int 463 ifconf(cmd, data) 464 int cmd; 465 caddr_t data; 466 { 467 struct ifconf *ifc = (struct ifconf *) data; 468 struct ifnet *ifp = ifnet; 469 struct ifaddr *ifa; 470 char *cp, *ep; 471 struct ifreq ifr, *ifrp; 472 int space = ifc->ifc_len, error = 0; 473 ifrp = ifc->ifc_req; 474 ep = ifr.ifr_name + sizeof(ifr.ifr_name) - 2; 475 for (; space > sizeof(ifr) && ifp; ifp = ifp->if_next) { 476 strncpy(ifr.ifr_name, ifp->if_name, sizeof(ifr.ifr_name) - 2); 477 for (cp = ifr.ifr_name; cp < ep && *cp; cp++) 478 continue; 479 *cp++ = '0' + ifp->if_unit; 480 *cp = 'e0'; 481 if ((ifa = ifp->if_addrlist) == 0) { 482 bzero((caddr_t) & ifr.ifr_addr, sizeof(ifr.ifr_addr)); 483 error = copyout((caddr_t) & ifr, (caddr_t) ifrp, 484 sizeof(ifr)); 485 if (error) 486 break; 487 space -= sizeof(ifr), ifrp++; 488 } else 489 for (; space > sizeof(ifr) && ifa; ifa = ifa->ifa_next) { 490 struct sockaddr *sa = ifa->ifa_addr; 491 if (sa->sa_len <= sizeof(*sa)) { 492 ifr.ifr_addr = *sa; 493 error = copyout((caddr_t) & ifr, (caddr_t) ifrp, 494 sizeof(ifr)); 495 ifrp++; 496 } else { 497 space -= sa->sa_len - sizeof(*sa); 498 if (space < sizeof(ifr)) 499 break; 500 error = copyout((caddr_t) & ifr, (caddr_t) ifrp, 501 sizeof(ifr.ifr_name)); 502 if (error == 0) 503 error = copyout((caddr_t) sa, 504 (caddr_t) & ifrp->ifr_addr, sa->sa_len); 505 ifrp = (struct ifreq *) 506 (sa->sa_len + (caddr_t) & ifrp->ifr_addr); 507 } 508 if (error) 509 break; 510 space -= sizeof(ifr); 511 } 512 } 513 ifc->ifc_len -= space; 514 return (error); 515 } ------------------------------------------------------------------------- if.c |
462-474
The two arguments to ifconf
are: cmd
, which is ignored; and data
, which points to a copy of the ifconf
structure specified by the process.
ifc
is data
cast to a ifconf
structure pointer. ifp
traverses the interface list starting at ifnet
(the head of the list), and ifa
traverses the address list for each interface. cp
and ep
control the construction of the text interface name within ifr
, which is the ifreq
structure that holds an interface name and address before they are copied to the process’s buffer. ifrp
points to this buffer and is advanced after each address is copied. space
is the number of bytes remaining in the process’s buffer, cp
is used to search for the end of the name, and ep
marks the last possible location for the numeric portion of the interface name.
475-488
The for
loop traverses the list of interfaces. For each interface, the text name is copied to ifr_name
followed by the text representation of the if_unit
number. If no addresses have been assigned to the interface, an address of all 0s is constructed, the resulting ifreq
structure is copied to the process, space
is decreased, and ifrp
is advanced.
489-515
If the interface has one or more addresses, the for
loop processes each one. The address is added to the interface name in ifr
and then ifr
is copied to the process. Addresses longer than a standard sockaddr
structure don’t fit in ifr
and are copied directly out to the process. After each address, space
and ifrp
are adjusted. After all the interfaces are processed, the length of the buffer is updated (ifc->ifc_len
) and ifconf
returns. The ioctl
system call takes care of copying the new contents of the ifconf
structure back to the ifconf
structure in the process.
Figure 4.27 shows the configuration of the interface structures after the Ethernet, SLIP, and loopback interfaces have been initialized.
Figure 4.28 shows the contents of ifc
and buffer
after the following code is executed.
struct ifconf ifc; /* SIOCGIFCONF adjusts this */ char buffer[144]; /* contains interface addresses when ioctl returns */ int s; /* any socket */ ifc.ifc_len = 144; ifc.ifc_buf = buffer; if (ioctl(s, SIOCGIFCONF, &ifc) < 0 ) { perror("ioctl failed"); exit(1); }
There are no restrictions on the type of socket specified with the SIOCGIFCONF
command, which, as we have seen, returns the addresses for all protocol families.
In Figure 4.28, ifc_len
has been changed from 144 to 108 by ioctl
since the three addresses returned in the buffer only occupy 108 (3×36) bytes. Three sockaddr_dl
addresses are returned and the last 36 bytes of the buffer are unused. The first 16 bytes of each entry contain the text name of the interface. In this case only 3 of the 16 bytes are used.
ifr_addr
has the form of a sockaddr
structure, so the first value is the length (20 bytes) and the second value is the type of address (18, AF_LINK
). The next value is sdl_index
, which is different for each interface as is sdl_type
(6, 28, and 24 correspond to IFT_ETHER, IFT_SLIP
, and IFT_LOOP
).
The next three values are sa_nlen
(the length of the text name), sa_alen
(the length of the hardware address), and sa_slen
(unused). sa_nlen
is 3 for all three entries. sa_alen
is 6 for the Ethernet address and 0 for both the SLIP and loopback interfaces. sa_slen
is always 0.
Finally, the text interface name appears, followed by the hardware address (Ethernet only). Neither the SLIP nor the loopback interface store a hardware-level address in the sockaddr_dl
structure.
In the example, only sockaddr_dl
addresses are returned (because no other address types were configured in Figure 4.27), so each entry in the buffer is the same size. If other addresses (e.g., IP or OSI addresses) were configured for an interface, they would be returned along with the sockaddr_dl
addresses, and the size of each entry would vary according to the type of address returned.
The four remaining interface commands from Figure 4.20 (SIOCGIFFLAGS, SIOCGIFMETRIC, SIOCSIFFLAGS
, and SIOCSIFMETRIC
) are handled by the ifioctl
function. Figure 4.29 shows the case
statements for these commands.
Table 4.29. ifioctl
function: flags and metrics.
--------------------------------------------------------------------------- if.c 410 switch (cmd) { 411 case SIOCGIFFLAGS: 412 ifr->ifr_flags = ifp->if_flags; 413 break; 414 case SIOCGIFMETRIC: 415 ifr->ifr_metric = ifp->if_metric; 416 break; 417 case SIOCSIFFLAGS: 418 if (error = suser(p->p_ucred, &p->p_acflag)) 419 return (error); 420 if (ifp->if_flags & IFF_UP && (ifr->ifr_flags & IFF_UP) == 0) { 421 int s = splimp(); 422 if_down(ifp); 423 splx(s); 424 } 425 if (ifr->ifr_flags & IFF_UP && (ifp->if_flags & IFF_UP) == 0) { 426 int s = splimp(); 427 if_up(ifp); 428 splx(s); 429 } 430 ifp->if_flags = (ifp->if_flags & IFF_CANTCHANGE) | 431 (ifr->ifr_flags & ~IFF_CANTCHANGE); 432 if (ifp->if_ioctl) 433 (void) (*ifp->if_ioctl) (ifp, cmd, data); 434 break; 435 case SIOCSIFMETRIC: 436 if (error = suser(p->p_ucred, &p->p_acflag)) 437 return (error); 438 ifp->if_metric = ifr->ifr_metric; 439 break; --------------------------------------------------------------------------- if.c |
410-416
For the two SIOCG
xxx
commands, ifioctl
copies the if_flags
or if_metric
value for the interface into the ifreq
structure. For the flags, the ifr_flags
member of the union is used and for the metric, the ifr_metric
member is used (Figure 4.23).
417-429
To change the interface flags, the calling process must have superuser privileges. If the process is shutting down a running interface or bringing up an interface that isn’t running, if_down
or if_up
are called respectively.
430-434
Recall from Figure 3.7 that some interface flags cannot be changed by a process. The expression (ifp->if_flags & IFF_CANTCHANGE
) clears the interface flags that can be changed by the process, and the expression (ifr->ifr_flags &~IFF_CANTCHANGE
) clears the flags in the request that may not be changed by the process. The two expressions are ORed together and saved as the new value for ifp>if_flags
. Before returning, the request is passed to the if_ioctl
function associated with the device (e.g., leioctl
for the LANCE driver Figure 4.31).
435-439
Changing the interface metric is easier; as long as the process has superuser privileges, ifioctl
copies the new metric into if_metric
for the interface.
With the ifconfig
program, an administrator can enable and disable an interface by setting or clearing the IFF_UP
flag through the SIOCSIFFLAGS
command. Figure 4.30 shows the code for the if_down
and if_up
functions.
Table 4.30. if_down
and if_up
functions.
--------------------------------------------------------------- if.c 292 void 293 if_down(ifp) 294 struct ifnet *ifp; 295 { 296 struct ifaddr *ifa; 297 ifp->if_flags &= ~IFF_UP; 298 for (ifa = ifp->if_addrlist; ifa; ifa = ifa->ifa_next) 299 pfctlinput(PRC_IFDOWN, ifa->ifa_addr); 300 if_qflush(&ifp->if_snd); 301 rt_ifmsg(ifp); 302 } 308 void 309 if_up(ifp) 310 struct ifnet *ifp; 311 { 312 struct ifaddr *ifa; 313 ifp->if_flags |= IFF_UP; 314 rt_ifmsg(ifp); 315 } --------------------------------------------------------------- if.c |
292-302
When an interface is shut down, the IFF_UP
flag is cleared and the PRC_IFDOWN
command is issued by pfctlinput
(Section 7.7) for each address associated with the interface. This gives each protocol an opportunity to respond to the interface being shut down. Some protocols, such as OSI, terminate connections using the interface. IP attempts to reroute connections through other interfaces if possible. TCP and UDP ignore failing interfaces and rely on the routing protocols to find alternate paths for the packets.
if_qflush
discards any packets queued for the interface. The routing system is notified of the change by rt_ifmsg
. TCP retransmits the lost packets automatically; UDP applications must explicitly detect and respond to this condition on their own.
308-315
When an interface is enabled, the IFF_UP
flag is set and rt_ifmsg
notifies the routing system that the interface status has changed.
We saw in Figure 4.29 that for the SIOCSIFFLAGS
command, ifioctl
calls the if_ioctl
function for the interface. In our three sample interfaces, the slioctl
and loioctl
functions return EINVAL
for this command, which is ignored by ifioctl
. Figure 4.31 shows the leioctl
function and SIOCSIFFLAGS
processing of the LANCE Ethernet driver.
Table 4.31. leioctl
function: SIOCSIFFLAGS
.
--------------------------------------------------------------------------- if_le.c 614 leioctl(ifp, cmd, data) 615 struct ifnet *ifp; 616 int cmd; 617 caddr_t data; 618 { 619 struct ifaddr *ifa = (struct ifaddr *) data; 620 struct le_softc *le = &le_softc[ifp->if_unit]; 621 struct lereg1 *ler1 = le->sc_r1; 622 int s = splimp(), error = 0; 623 switch (cmd) { /* SIOCSIFADDR code (Figure 6.28) */ 638 case SIOCSIFFLAGS: 639 if ((ifp->if_flags & IFF_UP) == 0 && 640 ifp->if_flags & IFF_RUNNING) { 641 LERDWR(le->sc_r0, LE_STOP, ler1->ler1_rdp); 642 ifp->if_flags &= ~IFF_RUNNING; 643 } else if (ifp->if_flags & IFF_UP && 644 (ifp->if_flags & IFF_RUNNING) == 0) 645 leinit(ifp->if_unit); 646 /* 647 * If the state of the promiscuous bit changes, the interface 648 * must be reset to effect the change. 649 */ 650 if (((ifp->if_flags ^ le->sc_iflags) & IFF_PROMISC) && 651 (ifp->if_flags & IFF_RUNNING)) { 652 le->sc_iflags = ifp->if_flags; 653 lereset(ifp->if_unit); 654 lestart(ifp); 655 } 656 break; /* SIOCADDMULTI and SIOCDELMULTI code (Figure 12.31) */ 672 default: 673 error = EINVAL; 674 } 675 splx(s); 676 return (error); 677 } --------------------------------------------------------------------------- if_le.c |
614-623
leioctl
casts the third argument, data
, to an ifaddr
structure pointer and saves the value in ifa
. The le
pointer references the le_softc
structure indexed by ifp->if_unit
. The switch
statement, based on cmd
, makes up the main body of the function.
638-656
Only the SIOCSIFFLAGS
case is shown in Figure 4.31. By the time ifioctl
calls leioctl
, the interface flags have been changed. The code shown here forces the physical interface into a state that matches the configuration of the flags. If the interface is going down (IFF_UP
is not set), but the interface is operating, the interface is shut down. If the interface is going up but is not operating, the interface is initialized and restarted.
If the promiscuous bit has been changed, the interface is shut down, reset, and restarted to implement the change.
The expression including the exclusive OR and IFF_PROMISC
is true only if the request changes the IFF_PROMISC
bit.
672-677
The default
case for unrecognized commands posts EINVAL
, which is returned at the end of the function.
In this chapter we described the implementation of the LANCE Ethernet device driver, which we refer to throughout the text. We saw how the Ethernet driver detects broadcast and multicast addresses on input, how the Ethernet and 802.3 encapsulations are detected, and how incoming frames are demultiplexed to the appropriate protocol queue. In Chapter 21 we’ll see how IP addresses (unicast, broadcast, and multicast) are converted into the correct Ethernet addresses on output.
Finally, we discussed the protocol-specific ioctl
commands that access the interface-layer data structures.
4.1 | In |
4.1 |
When the interface is not tapped, the tests must be done in |
4.2 | In |
4.2 | If the tests were reversed, the broadcast flag would never be set. If the second |
18.218.196.182