Chapter 23. UDP: User Datagram Protocol

Introduction

The User Datagram Protocol, or UDP, is a simple, datagram-oriented, transport-layer protocol: each output operation by a process produces exactly one UDP datagram, which causes one IP datagram to be sent.

A process accesses UDP by creating a socket of type SOCK_DGRAM in the Internet domain. By default the socket is termed unconnected. Each time the process sends a datagram it must specify the destination IP address and port number. Each time a datagram is received for the socket, the process can receive the source IP address and port number from the datagram.

We mentioned in Section 22.5 that a UDP socket can also be connected to one particular IP address and port number. This causes all datagrams written to the socket to go to that destination, and only datagrams arriving from that IP address and port number are passed to the process.

This chapter examines the implementation of UDP.

Code Introduction

There are nine UDP functions in a single C file and various UDP definitions in two headers, as shown in Figure 23.1.

Table 23.1. Files discussed in this chapter.

File

Description

netinet/udp.h

netinet/udp_var.h

udphdr structure definition other UDP definitions

netinet/udp_usrreq.c

UDP functions

Figure 23.2 shows the relationship of the six main UDP functions to other kernel functions. The shaded ellipses are the six functions that we cover in this chapter. We also cover three additional UDP functions that are called by some of these six functions.

Relationship of UDP functions to rest of kernel.

Figure 23.2. Relationship of UDP functions to rest of kernel.

Global Variables

Seven global variables are introduced in this chapter, which are shown in Figure 23.3.

Table 23.3. Global variables introduced in this chapter.

Variable

Datatype

Description

udb

struct inpcb

head of the UDP PCB list

udp_last_inpcb

struct inpcb *

pointer to PCB for last received datagram: one-behind cache

udpcksum

int

flag for calculating and verifying UDP checksum

udp_in

struct sockaddr_in

holds sender’s IP address and port on input

udpstat

struct udpstat

UDP statistics (Figure 23.4)

udp_recvspace

u_long

default size of socket receive buffer, 41,600 bytes

udp_sendspace

u_long

default size of socket send buffer, 9216 bytes

Statistics

Various UDP statistics are maintained in the global structure udpstat, described in Figure 23.4. We’ll see where these counters are incremented as we proceed through the code.

Table 23.4. UDP statistics maintained in the udpstat structure.

udpstat member

Description

Used by SNMP

udps_badlen

#received datagrams with data length larger than packet

udps_badsum

#received datagrams with checksum error

udps_fullsock

#received datagrams not delivered because input socket full

 

udps_hdrops

#received datagrams with packet shorter than header

udps_ipackets

total #received datagrams

udps_noport

#received datagrams with no process on destination port

udps_noportbcast

#received broadcast/multicast datagrams with no process on dest. port

udps_opackets

total #output datagrams

udpps_pcbcachemiss

#received input datagrams missing pcb cache

 

Figure 23.5 shows some sample output of these statistics, from the netstat -s command.

Table 23.5. Sample UDP statistics.

netstat -s output

udpstat member

18,575,142 datagrams received

udps_ipackets

  • 0 with incomplete header

udps_hdrops

  • 18 with bad data length field

udps_badlen

  • 58 with bad checksum

udps_badsum

  • 84,079 dropped due to no socket

udps_noport

  • 446 broadcast/multicast datagrams dropped due to no socket

udps_noportbcast

  • 5,356 dropped due to full socket buffers

udps_fullsock

  • 18,485,185 delivered

(see text)

18,676,277 datagrams output

udps_opackets

The number of UDP datagrams delivered (the second from last line of output) is the number of datagrams received (udps_ipackets) minus the six variables that precede it in Figure 23.5.

SNMP Variables

Figure 23.6 shows the four simple SNMP variables in the UDP group and which counters from the udpstat structure implement that variable.

Table 23.6. Simple SNMP variables in udp group.

SNMP variable

udpstat member

Description

udpInDatagrams

udps_ipackets

#received datagrams delivered to processes

udpInErrors

udps_hdrops +

udps_badsum +

udps_badlen

#undeliverable UDP datagrams for reasons other than no application at destination port (e.g., UDP checksum error)

udpNoPorts

udps_noport +

udps_noportbcast

#received datagrams for which no application process was at the destination port

udpOutDatagrams

udps_opackets

#datagrams sent

Figure 23.7 shows the UDP listener table, named udpTable. The values returned by SNMP for this table are taken from a UDP PCB, not the udpstat structure.

Table 23.7. Variables in UDP listener table: udpTable.

UDP listener table, index = <udpLocalAddress>.<udpLocalPort>

SNMP variable

PCB variable

Description

udpLocalAddress

inp_laddr

local IP address for this listener

udpLocalPort

inp_lport

local port number for this listener

UDP protosw Structure

Figure 23.8 lists the protocol switch entry for UDP.

Table 23.8. The UDP protosw structure.

Member

inetsw[1]

Description

pr_type

SOCK_DGRAM

UDP provides datagram packet services

pr_domain

&inetdomain

UDP is part of the Internet domain

pr_protocol

IPPROTO_UDP (17)

appears in the ip_p field of the IP header

pr_flags

PR_ATOMIC|PR_ADDR

socket layer flags, not used by protocol processing

pr_input

udp_input

receives messages from IP layer

pr_output

0

not used by UDP

pr_ctlinput

udp_ctlinput

control input function for ICMP errors

pr_ctloutput

ip_ctloutput

respond to administrative requests from a process

pr_usrreq

udp_usrreq

respond to communication requests from a process

pr_init

udp_init

initialization for UDP

pr_fasttimo

0

not used by UDP

pr_slowtimo

0

not used by UDP

pr_drain

0

not used by UDP

pr_sysctl

udp_sysctl

for sysctl(8) system call

We describe the five functions that begin with udp_ in this chapter. We also cover a sixth function, udp_output, which is not in the protocol switch entry but is called by udp_usrreq when a UDP datagram is output.

UDP Header

The UDP header is defined as a udphdr structure. Figure 23.9 shows the C structure and Figure 23.10 shows a picture of the UDP header.

Table 23.9. udphdr structure.

----------------------------------------------------------------------- udp.h
 39 struct udphdr {
 40     u_short uh_sport;           /* source port */
 41     u_short uh_dport;           /* destination port */
 42     short   uh_ulen;            /* udp length */
 43     u_short uh_sum;             /* udp checksum */
 44 };
----------------------------------------------------------------------- udp.h
UDP header and optional data.

Figure 23.10. UDP header and optional data.

In the source code the UDP header is normally referenced as an IP header immediately followed by a UDP header. This is how udp_input processes received IP datagrams, and how udp_output builds outgoing IP datagrams. This combined IP/UDP header is a udpiphdr structure, shown in Figure 23.11.

Table 23.11. udpiphdr structure: combined IP/UDP header.

------------------------------------------------------------------------ udp_var.h
 38 struct udpiphdr {
 39     struct ipovly ui_i;         /* overlaid ip structure */
 40     struct udphdr ui_u;         /* udp header */
 41 };

 42 #define ui_next     ui_i.ih_next
 43 #define ui_prev     ui_i.ih_prev
 44 #define ui_x1       ui_i.ih_x1
 45 #define ui_pr       ui_i.ih_pr
 46 #define ui_len      ui_i.ih_len
 47 #define ui_src      ui_i.ih_src
 48 #define ui_dst      ui_i.ih_dst
 49 #define ui_sport    ui_u.uh_sport
 50 #define ui_dport    ui_u.uh_dport
 51 #define ui_ulen     ui_u.uh_ulen
 52 #define ui_sum      ui_u.uh_sum
------------------------------------------------------------------------ udp_var.h

The 20-byte IP header is defined as an ipovly structure, shown in Figure 23.12.

Table 23.12. ipovly structure.

------------------------------------------------------------------------- ip_var.h
 38 struct ipovly {
 39     caddr_t ih_next, ih_prev;   /* for protocol sequence q's */
 40     u_char  ih_x1;              /* (unused) */
 41     u_char  ih_pr;              /* protocol */
 42     short   ih_len;             /* protocol length */
 43     struct in_addr ih_src;      /* source internet address */
 44     struct in_addr ih_dst;      /* destination internet address */
 45 };
------------------------------------------------------------------------- ip_var.h

Unfortunately this structure is not a real IP header, as shown in Figure 8.8. The size is the same (20 bytes) but the fields are different. We’ll return to this discrepancy when we discuss the calculation of the UDP checksum in Section 23.6.

udp_init Function

The domaininit function calls UDP’s initialization function (udp_init, Figure 23.13) at system initialization time.

Table 23.13. udp_init function.

--------------------------------------------------------------------- udp_usrreq.c
 50 void
 51 udp_init()
 52 {
 53     udb.inp_next = udb.inp_prev = &udb;
 54 }
--------------------------------------------------------------------- udp_usrreq.c

The only action performed by this function is to set the next and previous pointers in the head PCB (udb) to point to itself. This is an empty doubly linked list.

The remainder of the udb PCB is initialized to 0, although the only other field used in this head PCB is inp_lport, the next UDP ephemeral port number to allocate. In the solution for Exercise 22.4 we mention that because this local port number is initialized to 0, the first ephemeral port number will be 1024.

udp_output Function

UDP output occurs when the application calls one of the five write functions: send, sendto, sendmsg, write, or writev. If the socket is connected, any of the five functions can be called, although a destination address cannot be specified with sendto or sendmsg. If the socket is unconnected, only sendto and sendmsg can be called, and a destination address must be specified. Figure 23.14 summarizes how these five write functions end up with udp_output being called, which in turn calls ip_output.

How the five write functions end up calling udp_output.

Figure 23.14. How the five write functions end up calling udp_output.

All five functions end up calling sosend, passing a pointer to a msghdr structure as an argument. The data to output is packaged into an mbuf chain and an optional destination address and optional control information are also put into mbufs by sosend. A PRU_SEND request is issued.

UDP calls the function udp_output, which we show the first half of in Figure 23.15. The four arguments are inp, a pointer to the socket Internet PCB; m, a pointer to the mbuf chain for output; addr, an optional pointer to an mbuf with the destination address packaged as a sockaddr_in structure; and control, an optional pointer to an mbuf with control information from sendmsg.

Table 23.15. udp_output function: temporarily connect an unconnected socket.

---------------------------------------------------------------------- udp_usrreq.c
333 int
334 udp_output(inp, m, addr, control)
335 struct inpcb *inp;
336 struct mbuf *m;
337 struct mbuf *addr, *control;
338 {
339     struct udpiphdr *ui;
340     int     len = m->m_pkthdr.len;
341     struct in_addr laddr;
342     int     s, error = 0;

343     if (control)
344         m_freem(control);       /* XXX */

345     if (addr) {
346         laddr = inp->inp_laddr;
347         if (inp->inp_faddr.s_addr != INADDR_ANY) {
348             error = EISCONN;
349             goto release;
350         }
351         /*
352          * Must block input while temporarily connected.
353          */
354         s = splnet();
355         error = in_pcbconnect(inp, addr);
356         if (error) {
357             splx(s);
358             goto release;
359         }
360     } else {
361         if (inp->inp_faddr.s_addr == INADDR_ANY) {
362             error = ENOTCONN;
363             goto release;
364         }
365     }
366     /*
367      * Calculate data length and get an mbuf for UDP and IP headers.
368      */
369     M_PREPEND(m, sizeof(struct udpiphdr), M_DONTWAIT);
370     if (m == 0) {
371         error = ENOBUFS;
372         goto release;
373     }
                                                                                   
                   /* remainder of function shown in Figure 23.20 */               
                                                                                   
409   release:
410     m_freem(m);
411     return (error);
412 }
---------------------------------------------------------------------- udp_usrreq.c

Discard optional control information

333-344

Any optional control information is discarded by m_freem, without generating an error. UDP output does not use control information for any purpose.

The comment XXX is because the control information is ignored without generating an error. Other protocols, such as the routing domain and TCP, generate an error if the process passes control information.

Temporarily connect an unconnected socket

345-359

If the caller specifies a destination address for the UDP datagram (addr is nonnull), the socket is temporarily connected to that destination address by in_pcbconnect. The socket will be disconnected at the end of this function. Before doing this connect, a check is made as to whether the socket is already connected, and, if so, the error EISCONN is returned. This is why a sendto that specifies a destination address on a connected socket returns an error.

Before the socket is temporarily connected, IP input processing is stopped by splnet. This is done because the temporary connect changes the foreign address, foreign port, and possibly the local address in the socket’s PCB. If a received UDP datagram were processed while this PCB was temporarily connected, that datagram could be delivered to the wrong process. Setting the processor priority to splnet only stops a software interrupt from causing the IP input routine to be executed (Figure 1.12), it does not prevent the interface layer from accepting incoming packets and placing them onto IP’s input queue.

[Partridge and Pink 1993] note that this operation of temporarily connecting the socket is expensive and consumes nearly one-third of the cost of each UDP transmission.

The local address from the PCB is saved in laddr before temporarily connecting, because if it is the wildcard address it will be changed by in_pcbconnect when it calls in_pcbbind.

The same rules apply to the destination address that would apply if the process called connect, since in_pcbconnect is called for both cases.

360-364

If the process doesn’t specify a destination address, and the socket is not connected, ENOTCONN is returned.

Prepend IP and UDP headers

366-373

M_PREPEND allocates room for the IP and UDP headers in front of the data. Figure 1.8 showed one scenario, assuming there is not room in the first mbuf on the chain for the 28 bytes of header. Exercise 23.1 details the other possible scenarios. The flag M_DONTWAIT is specified because if the socket is temporarily connected, IP processing is blocked, and M_PREPEND should not block.

Earlier Berkeley releases incorrectly specified M_WAIT here.

Prepending IP/UDP Headers and Mbuf Clusters

There is a subtle interaction between the M_PREPEND macro and mbuf clusters. If the user data is placed into a cluster by sosend, then 56 bytes (max_hdr from Figure 7.17) are left unused at the beginning of the cluster, allowing room for the Ethernet, IP, and UDP headers. This is to prevent M_PREPEND from allocating another mbuf just to hold these headers. M_PREPEND calls M_LEADINGSPACE to calculate how much space is available at the beginning of the mbuf:

   #define M_LEADINGSPACE(m) 
       ((m)->m_flags & M_EXT ? /* (m)->m_data - (m)-
>m_ext.ext_buf */ 0 : 
           (m)->m_flags & M_PKTHDR ? (m)->m_data - (m)-
>m_pktdat : 
           (m)->m_data - (m)->m_dat)

The code that correctly calculates the amount of room at the front of a cluster is commented out, and the macro always returns 0 if the data is in a cluster. This means that when the user data is in a cluster, M_PREPEND always allocates a new mbuf for the protocol headers instead of using the room allocated for this purpose by sosend.

The reason for commenting out the correct code in M_LEADINGSPACE is that the cluster might be shared (Section 2.9), and, if it is shared, using the space before the user’s data in the cluster could wipe out someone else’s data.

With UDP data, clusters are not shared, since udp_output does not save a copy of the data. TCP, however, saves a copy of the data in its send buffer (waiting for the data to be acknowledged), and if the data is in a cluster, it is shared. But tcp_output doesn’t call M_LEADINGSPACE, because sosend leaves room for only 56 bytes at the beginning of the cluster for datagram protocols. tcp_output always calls MGETHDR instead, to allocate an mbuf for the protocol headers.

UDP Checksum Calculation and Pseudo-Header

Before showing the last half of udp_output we describe how UDP fills in some of the fields in the IP/UDP headers, calculates the UDP checksum, and passes the IP/UDP headers and the data to IP for output. The way this is done with the ipovly structure is tricky.

Figure 23.16 shows the 28-byte IP/UDP headers that are built by udp_output in the first mbuf in the chain pointed to by m. The unshaded fields are filled in by udp_output and the shaded fields are filled in by ip_output. This figure shows the format of the headers as they appear on the wire.

IP/UDP headers: unshaded fields filled in by UDP; shaded fields filled in by IP.

Figure 23.16. IP/UDP headers: unshaded fields filled in by UDP; shaded fields filled in by IP.

The UDP checksum is calculated over three areas: (1) a 12-byte pseudo-header containing fields from the IP header, (2) the 8-byte UDP header, and (3) the UDP data. Figure 23.17 shows the 12 bytes of pseudo-header used for the checksum computation, along with the UDP header. The UDP header used for the checksum calculation is identical to the UDP header that appears on the wire (Figure 23.16).

Pseudo-header used for checksum computation and UDP header.

Figure 23.17. Pseudo-header used for checksum computation and UDP header.

The following three facts are used in computing the UDP checksum. (1) The third 32-bit word in the pseudo-header (Figure 23.17) looks similar to the third 32-bit word in the IP header (Figure 23.16): two 8-bit values and a 16-bit value. (2) The order of the three 32-bit values in the pseudo-header is irrelevant. Actually, the computation of the Internet checksum does not depend on the order of the 16-bit values that are used (Section 8.7). (3) Including additional 32-bit words of 0 in the checksum computation has no effect.

udp_output takes advantage of these three facts and fills in the fields in the udpiphdr structure (Figure 23.11), which we depict in Figure 23.18. This structure is contained in the first mbuf in the chain pointed to by the argument m.

udpiphdr structure used by udp_output.

Figure 23.18. udpiphdr structure used by udp_output.

The last three 32-bit words in the 20-byte IP header (the five members ui_x1, ui_pr, ui_len, ui_src, and ui_dst) are used as the pseudo-header for the checksum computation. The first two 32-bit words in the IP header (ui_next and ui_prev) are also used in the checksum computation, but they’re initialized to 0, and don’t affect the checksum.

Figure 23.19 summarizes the operations we’ve described.

Operations to fill in IP/UDP headers and calculate UDP checksum.

Figure 23.19. Operations to fill in IP/UDP headers and calculate UDP checksum.

  1. The top picture shown in Figure 23.19 is the protocol definition of the pseudo-header, which corresponds to Figure 23.17.

  2. The middle picture is the udpiphdr structure that is used in the source code, which corresponds to Figure 23.11. (To make the figure readable, the prefix ui_ has been left off all the members.) This is the structure built by udp_output in the first mbuf and then used to calculate the UDP checksum.

  3. The bottom picture shows the IP/UDP headers that appear on the wire, which corresponds to Figure 23.16. The seven fields with an arrow above are filled in by udp_output before the checksum computation. The three fields with an asterisk above are filled in by udp_output after the checksum computation. The remaining six shaded fields are filled in by ip_output.

Figure 23.20 shows the last half of the udp_output function.

Table 23.20. udp_output function: fill in headers, calculate checksum, pass to IP.

---------------------------------------------------------------------- udp_usrreq.c
374     /*
375      * Fill in mbuf with extended UDP header
376      * and addresses and length put into network format.
377      */
378     ui = mtod(m, struct udpiphdr *);
379     ui->ui_next = ui->ui_prev = 0;
380     ui->ui_x1 = 0;
381     ui->ui_pr = IPPROTO_UDP;
382     ui->ui_len = htons((u_short) len + sizeof(struct udphdr));
383     ui->ui_src = inp->inp_laddr;
384     ui->ui_dst = inp->inp_faddr;
385     ui->ui_sport = inp->inp_lport;
386     ui->ui_dport = inp->inp_fport;
387     ui->ui_ulen = ui->ui_len;

388     /*
389      * Stuff checksum and output datagram.
390      */
391     ui->ui_sum = 0;
392     if (udpcksum) {
393         if ((ui->ui_sum = in_cksum(m, sizeof(struct udpiphdr) + len)) == 0)
394                     ui->ui_sum = 0xffff;
395     }
396     ((struct ip *) ui)->ip_len = sizeof(struct udpiphdr) + len;
397     ((struct ip *) ui)->ip_ttl = inp->inp_ip.ip_ttl;    /* XXX */
398     ((struct ip *) ui)->ip_tos = inp->inp_ip.ip_tos;    /* XXX */
399     udpstat.udps_opackets++;
400     error = ip_output(m, inp->inp_options, &inp->inp_route,
401               inp->inp_socket->so_options & (SO_DONTROUTE | SO_BROADCAST),
402                       inp->inp_moptions);

403     if (addr) {
404         in_pcbdisconnect(inp);
405         inp->inp_laddr = laddr;
406         splx(s);
407     }
408     return (error);
---------------------------------------------------------------------- udp_usrreq.c

Prepare pseudo-header for checksum computation

374-387

All the members in the udpiphdr structure (Figure 23.18) are set to their respective values. The local and foreign sockets from the PCB are already in network byte order, but the UDP length must be converted to network byte order. The UDP length is the number of bytes of data (len, which can be 0) plus the size of the UDP header (8). The UDP length field appears twice in the UDP checksum calculation: ui_len and ui_ulen. One of them is redundant.

Calculate checksum

388-395

The checksum is calculated by first setting it to 0 and then calling in_cksum. If UDP checksums are disabled (a bad idea see Section 11.3 of Volume 1), 0 is sent as the checksum. If the calculated checksum is 0, 16 one bits are stored in the header instead of 0. (In one’s complement arithmetic, all one bits and all zero bits are both considered 0.) This allows the receiver to distinguish between a UDP packet without a checksum (the checksum field is 0) versus a UDP packet with a checksum whose value is 0 (the checksum is 16 one bits).

The variable udpcksum (Figure 23.3) normally defaults to 1, enabling UDP checksums. The kernel can be compiled for 4.2BSD compatibility, which initializes udpcksum to 0.

Fill in UDP length, TTL, and TOS

396-398

The pointer ui is cast to a pointer to a standard IP header (ip), and three fields in the IP header are set by UDP. The IP length field is set to the amount of data in the UDP datagram, plus 28, the size of the IP/UDP headers. Notice that this field in the IP header is stored in host byte order, not network byte order like the rest of the multibyte fields in the header. ip_output converts it to network byte order before transmission.

The TTL and TOS fields in the IP header are then set from the values in the socket’s PCB. These values are defaulted by UDP when the socket is created, but can be changed by the process using setsockopt. Since these three fields IP length, TTL, and TOS are not par t of the pseudo-header and not used in the UDP checksum computation, they must be set after the checksum is calculated but before ip_output is called.

Send datagram

400-402

ip_output sends the datagram. The second argument, inp_options, are IP options the process can set using setsockopt. These IP options are placed into the IP header by ip_output. The third argument is a pointer to the cached route in the PCB, and the fourth argument is the socket options. The only socket options that are passed to ip_output are SO_DONTROUTE (bypass the routing tables) and SO_BROADCAST (allow broadcasting). The final argument is a pointer to the multicast options for this socket.

Disconnect temporarily connected socket

403-407

If the socket was temporarily connected, in_pcbdisconnect disconnects the socket, the local IP address is restored in the PCB, and the interrupt level is restored to its saved value.

udp_input Function

UDP output is driven by a process calling one of the five write functions. The functions shown in Figure 23.14 are all called directly as part of the system call. UDP input, on the other hand, occurs when IP input receives an IP datagram on its input queue whose protocol field specifies UDP. IP calls the function udp_input through the pr_input function in the protocol switch table (Figure 8.15). Since IP input is at the software interrupt level, udp_input also executes at this level. The goal of udp_input is to place the UDP datagram onto the appropriate socket’s buffer and wake up any process blocked for input on that socket.

We’ll divide our discussion of the udp_input function into three sections:

  1. the general validation that UDP performs on the received datagram,

  2. processing UDP datagrams destined for a unicast address: locating the appropriate PCB and placing the datagram onto the socket’s buffer, and

  3. processing UDP datagrams destined for a broadcast or multicast address: the datagram may be delivered to multiple sockets.

This last step is new with the support of multicasting in Net/3, but consumes almost one-third of the code.

General Validation of Received UDP Datagram

Figure 23.21 shows the first section of UDP input.

Table 23.21. udp_input function: general validation of received UDP datagram.

---------------------------------------------------------------------- udp_usrreq.c
 55 void
 56 udp_input(m, iphlen)
 57 struct mbuf *m;
 58 int     iphlen;
 59 {
 60     struct ip *ip;
 61     struct udphdr *uh;
 62     struct inpcb *inp;
 63     struct mbuf *opts = 0;
 64     int     len;
 65     struct ip save_ip;

 66     udpstat.udps_ipackets++;

 67     /*
 68      * Strip IP options, if any; should skip this,
 69      * make available to user, and use on returned packets,
 70      * but we don't yet have a way to check the checksum
 71      * with options still present.
 72      */
 73     if (iphlen > sizeof(struct ip)) {
 74         ip_stripoptions(m, (struct mbuf *) 0);
 75         iphlen = sizeof(struct ip);
 76     }
 77     /*
 78      * Get IP and UDP header together in first mbuf.
 79      */
 80     ip = mtod(m, struct ip *);
 81     if (m->m_len < iphlen + sizeof(struct udphdr)) {
 82         if ((m = m_pullup(m, iphlen + sizeof(struct udphdr))) == 0) {
 83             udpstat.udps_hdrops++;
 84             return;
 85         }
 86         ip = mtod(m, struct ip *);
 87     }
 88     uh = (struct udphdr *) ((caddr_t) ip + iphlen);

 89     /*
 90      * Make mbuf data length reflect UDP length.
 91      * If not enough data to reflect UDP length, drop.
 92      */
 93     len = ntohs((u_short) uh->uh_ulen);
 94     if (ip->ip_len != len) {
 95         if (len > ip->ip_len) {
 96             udpstat.udps_badlen++;
 97             goto bad;
 98         }
 99         m_adj(m, len - ip->ip_len);
100         /* ip->ip_len = len; */
101     }
102     /*
103      * Save a copy of the IP header in case we want to restore
104      * it for sending an ICMP error message in response.
105      */
106     save_ip = *ip;

107     /*
108      * Checksum extended UDP header and data.
109      */
110     if (udpcksum && uh->uh_sum) {
111         ((struct ipovly *) ip)->ih_next = 0;
112         ((struct ipovly *) ip)->ih_prev = 0;
113         ((struct ipovly *) ip)->ih_x1 = 0;
114         ((struct ipovly *) ip)->ih_len = uh->uh_ulen;
115         if (uh->uh_sum = in_cksum(m, len + sizeof(struct ip))) {
116             udpstat.udps_badsum++;
117             m_freem(m);
118             return;
119         }
120     }
---------------------------------------------------------------------- udp_usrreq.c

55-65

The two arguments to udp_input are m, a pointer to an mbuf chain containing the IP datagram, and iphlen, the length of the IP header (including possible IP options).

Discard IP options

67-76

If IP options are present they are discarded by ip_stripoptions. As the comments indicate, UDP should save a copy of the IP options and make them available to the receiving process through the IP_RECVOPTS socket option, but this isn’t implemented yet.

77-88

If the length of the first mbuf on the mbuf chain is less than 28 bytes (the size of the IP header plus the UDP header), m_pullup rearranges the mbuf chain so that at least 28 bytes are stored contiguously in the first mbuf.

Verify UDP length

89-101

There are two lengths associated with a UDP datagram: the length field in the IP header (ip_len) and the length field in the UDP header (uh_ulen). Recall that ipintr subtracted the length of the IP header from ip_len before calling udp_input (Figure 10.11). The two lengths are compared and there are three possibilities:

  1. ip_len equals uh_ulen. This is the common case.

  2. ip_len is greater than uh_ulen. The IP datagram is too big, as shown in Figure 23.22.

    UDP length too small.

    Figure 23.22. UDP length too small.

    The code believes the smaller of the two lengths (the UDP header length) and m_adj removes the excess bytes of data from the end of the datagram. In the code the second argument to m_adj is negative, which we said in Figure 2.20 trims data from the end of the mbuf chain. It is possible in this scenario that the UDP length field has been corrupted. If so, the datagram will probably be discarded shortly, assuming the sender calculated the UDP checksum, that this checksum detects the error, and that the receiver verifies the checksum. The IP length field should be correct since it was verified by IP against the amount of data received from the interface, and the IP length field is covered by the mandatory IP header checksum.

  3. ip_len is less than uh_ulen. The IP datagram is smaller than possible, given the length in the UDP header. Figure 23.23 shows this case.

    UDP length too big.

    Figure 23.23. UDP length too big.

Something is wrong and the datagram is discarded. There is no other choice here: if the UDP length field has been corrupted, it can’t be detected with the UDP checksum. The correct UDP length is needed to calculate the checksum.

As we’ve said, the UDP length is redundant. In Chapter 28 we’ll see that TCP does not have a length field in its header it uses the IP length field, minus the lengths of the IP and TCP headers, to determine the amount of data in the datagram. Why does the UDP length field exist? Possibly to add a small amount of error checking, since UDP checksums are optional.

Save copy of IP header and verify UDP checksum

102-106

udp_input saves a copy of the IP header before verifying the checksum, because the checksum computation wipes out some of the fields in the original IP header.

110

The checksum is verified only if UDP checksums are enabled for the kernel (udpcksum), and if the sender calculated a UDP checksum (the received checksum is nonzero).

This test is incorrect. If the sender calculated a checksum, it should be verified, regardless of whether outgoing checksums are calculated or not. The variable udpcksum should only specify whether outgoing checksums are calculated. Unfortunately many vendors have copied this incorrect test, although many vendors today finally ship their kernels with UDP checksums enabled by default.

111-120

Before calculating the checksum, the IP header is referenced as an ipovly structure (Figure 23.18) and the fields are initialized as described in the previous section when the UDP checksum is calculated by udp_output.

At this point special code is executed if the datagram is destined for a broadcast or multicast IP address. We defer this code until later in the section.

Demultiplexing Unicast Datagrams

Assuming the datagram is destined for a unicast address, Figure 23.24 shows the code that is executed.

Table 23.24. udp_input function: demultiplex unicast datagram.

--------------------------------------------------------------------- udp_usrreq.c
                                                                                      
            /* demultiplex broadcast & multicast datagrams (Figure 23.26) */      
                                                                                      
206     /*
207      * Locate pcb for unicast datagram.
208      */
209     inp = udp_last_inpcb;
210     if (inp->inp_lport != uh->uh_dport ||
211         inp->inp_fport != uh->uh_sport ||
212         inp->inp_faddr.s_addr != ip->ip_src.s_addr ||
213         inp->inp_laddr.s_addr != ip->ip_dst.s_addr) {

214         inp = in_pcblookup(&udb, ip->ip_src, uh->uh_sport,
215                            ip->ip_dst, uh->uh_dport, INPLOOKUP_WILDCARD);
216         if (inp)
217             udp_last_inpcb = inp;
218         udpstat.udpps_pcbcachemiss++;
219     }
220     if (inp == 0) {
221         udpstat.udps_noport++;
222         if (m->m_flags & (M_BCAST | M_MCAST)) {
223             udpstat.udps_noportbcast++;
224             goto bad;
225         }
226         *ip = save_ip;
227         ip->ip_len += iphlen;
228         icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_PORT, 0, 0);
229         return;
230     }
--------------------------------------------------------------------- udp_usrreq.c

Check one-behind cache

206-209

UDP maintains a pointer to the last Internet PCB for which it received a datagram, udp_last_inpcb. Before calling in_pcblookup, which might have to search many PCBs on the UDP list, the foreign and local addresses and ports of that last PCB are compared against the received datagram. This is called a one-behind cache [Partridge and Pink 1993], and it is based on the assumption that the next datagram received has a high probability of being destined for the same socket as the last received datagram [Mogul 1991]. This cache was introduced with the 4.3BSD Tahoe release.

210-213

The order of the four comparisons between the cached PCB and the received datagram is intentional. If the PCBs don’t match, the comparisons should stop as soon as possible. The highest probability is that the destination port numbers are different this is therefore the first test. The lowest probability of a mismatch is between the local addresses, especially on a host with just one interface, so this is the last test.

Unfortunately this one-behind cache, as coded, is practically useless [Partridge and Pink 1993]. The most common type of UDP server binds only its well-known port, leaving its local address, foreign address, and foreign port wildcarded. The most common type of UDP client does not connect its UDP socket; it specifies the destination address for each datagram using sendto. Therefore most of the time the three values in the PCB inp_laddr, inp_faddr, and inp_fport are wildcards. In the cache comparison the four values in the received datagram are never wildcards, meaning the cache entry will compare equal with the received datagram only when the PCB has all four local and foreign values specified to nonwildcard values. This happens only for a connected UDP socket.

On the system bsdi, the counter udpps_pcbcachemiss was 41,253 and the counter udps_ipackets was 42,485. This is less than a 3% cache hit rate.

The netstat -s command prints most of the fields in the udpstat structure (Figure 23.5). Unfortunately the Net/3 version, and most vendor’s versions, never print udpps_pcbcachemiss. If you want to see the value, use a debugger to examine the variable in the running kernel.

Search all UDP PCBs

214-218

Assuming the comparison with the cached PCB fails, in_pcblookup searches for a match. The INPLOOKUP_WILDCARD flag is specified, allowing a wildcard match. If a match is found, the pointer to the PCB is saved in udp_last_inpcb, which we said is a cache of the last received UDP datagram’s PCB.

Generate ICMP port unreachable error

220-230

If a matching PCB is not found, UDP normally generates an ICMP port unreachable error. First the m_flags for the received mbuf chain is checked to see if the datagram was sent to a link-level broadcast or multicast destination address. It is possible to receive an IP datagram with a unicast IP address that was sent to a broadcast or multicast link-level address, but an ICMP port unreachable error must not be generated. If it is OK to generate the ICMP error, the IP header is restored to its received value (save_ip) and the IP length is also set back to its original value.

This check for a link-level broadcast or multicast address is redundant. icmp_error also performs this check. The only advantage in this redundant check is to maintain the counter udps_noportbcast in addition to the counter udps_noport.

The addition of iphlen back into ip_len is a bug. icmp_error will also do this, causing the IP length field in the IP header returned in the ICMP error to be 20 bytes too large. You can tell if a system has this bug by adding a few lines of code to the Traceroute program (Chapter 8 of Volume 1) to print this field in the ICMP port unreachable that is returned when the destination host is finally reached.

Figure 23.25 is the next section of processing for a unicast datagram, delivering the datagram to the socket corresponding to the destination PCB.

Table 23.25. udp_input function: deliver unicast datagram to socket.

------------------------------------------------------------------- udp_usrreq.c
231     /*
232      * Construct sockaddr format source address.
233      * Stuff source address and datagram in user buffer.
234      */
235     udp_in.sin_port = uh->uh_sport;
236     udp_in.sin_addr = ip->ip_src;

237     if (inp->inp_flags & INP_CONTROLOPTS) {
238         struct mbuf **mp = &opts;

239         if (inp->inp_flags & INP_RECVDSTADDR) {
240             *mp = udp_saveopt((caddr_t) & ip->ip_dst,
241                               sizeof(struct in_addr), IP_RECVDSTADDR);
242             if (*mp)
243                 mp = &(*mp)->m_next;
244         }
245 #ifdef notyet
246         /* IP options were tossed above */
247         if (inp->inp_flags & INP_RECVOPTS) {
248             *mp = udp_saveopt((caddr_t) opts_deleted_above,
249                               sizeof(struct in_addr), IP_RECVOPTS);
250             if (*mp)
251                 mp = &(*mp)->m_next;
252         }
253         /* ip_srcroute doesn't do what we want here, need to fix */
254         if (inp->inp_flags & INP_RECVRETOPTS) {
255             *mp = udp_saveopt((caddr_t) ip_srcroute(),
256                               sizeof(struct in_addr), IP_RECVRETOPTS);
257             if (*mp)
258                 mp = &(*mp)->m_next;
259         }
260 #endif
261     }
262     iphlen += sizeof(struct udphdr);
263     m->m_len -= iphlen;
264     m->m_pkthdr.len -= iphlen;
265     m->m_data += iphlen;
266     if (sbappendaddr(&inp->inp_socket->so_rcv, (struct sockaddr *) &udp_in,
267                      m, opts) == 0) {
268         udpstat.udps_fullsock++;
269         goto bad;
270     }
271     sorwakeup(inp->inp_socket);
272     return;

273   bad:
274     m_freem(m);
275     if (opts)
276         m_freem(opts);
277 }
------------------------------------------------------------------- udp_usrreq.c

Return source IP address and source port

231-236

The source IP address and source port number from the received IP datagram are stored in the global sockaddr_in structure udp_in. This structure is passed as an argument to sbappendaddr later in the function.

Using a global to hold the IP address and port number is OK because udp_input is single threaded. When this function is called by ipintr it processes the received datagram completely before returning. Also, sbappendaddr copies the socket address structure from the global into an mbuf.

IP_RECVDSTADDR socket option

237-244

The constant INP_CONTROLOPTS is the combination of the three socket options that the process can set to cause control information to be returned through the recvmsg system call for a UDP socket (Figure 22.5). The IP_RECVDSTADDR socket option returns the destination IP address from the received UDP datagram as control information. The function udp_saveopt allocates an mbuf of type MT_CONTROL and stores the 4-byte destination IP address in the mbuf. We show this function in Section 23.8.

This socket option appeared with 4.3BSD Reno and was intended for applications such as TFTP, the Trivial File Transfer Protocol, that should not respond to client requests that are sent to a broadcast address. Unfortunately, even if the receiving application uses this option, it is nontrivial to determine if the destination IP address is a broadcast address or not (Exercise 23.6).

When the multicasting changes were added in 4.4BSD, this code was left in only for datagrams destined for a unicast address. We’ll see in Figure 23.26 that this option is not implemented for datagrams sent to a broadcast of multicast address. This defeats the purpose of the option!

Table 23.26. udp_input function: demultiplexing of broadcast and multicast datagrams.

--------------------------------------------------------------------- udp_usrreq.c
121     if (IN_MULTICAST(ntohl(ip->ip_dst.s_addr)) ||
122         in_broadcast(ip->ip_dst, m->m_pkthdr.rcvif)) {
123         struct socket *last;
124         /*
125          * Deliver a multicast or broadcast datagram to *all* sockets
126          * for which the local and remote addresses and ports match
127          * those of the incoming datagram.  This allows more than
128          * one process to receive multi/broadcasts on the same port.
129          * (This really ought to be done for unicast datagrams as
130          * well, but that would cause problems with existing
131          * applications that open both address-specific sockets and
132          * a wildcard socket listening to the same port -- they would
133          * end up receiving duplicates of every unicast datagram.
134          * Those applications open the multiple sockets to overcome an
135          * inadequacy of the UDP socket interface, but for backwards
136          * compatibility we avoid the problem here rather than
137          * fixing the interface.  Maybe 4.5BSD will remedy this?)
138          */

139         /*
140          * Construct sockaddr format source address.
141          */
142         udp_in.sin_port = uh->uh_sport;
143         udp_in.sin_addr = ip->ip_src;
144         m->m_len -= sizeof(struct udpiphdr);
145         m->m_data += sizeof(struct udpiphdr);
146         /*
147          * Locate pcb(s) for datagram.
148          * (Algorithm copied from raw_intr().)
149          */
150         last = NULL;
151         for (inp = udb.inp_next; inp != &udb; inp = inp->inp_next) {
152             if (inp->inp_lport != uh->uh_dport)
153                 continue;
154             if (inp->inp_laddr.s_addr != INADDR_ANY) {
155                 if (inp->inp_laddr.s_addr !=
156                     ip->ip_dst.s_addr)
157                     continue;
158             }
159             if (inp->inp_faddr.s_addr != INADDR_ANY) {
160                 if (inp->inp_faddr.s_addr !=
161                     ip->ip_src.s_addr ||
162                     inp->inp_fport != uh->uh_sport)
163                     continue;
164             }
165             if (last != NULL) {
166                 struct mbuf *n;

167                 if ((n = m_copy(m, 0, M_COPYALL)) != NULL) {
168                     if (sbappendaddr(&last->so_rcv,
169                                      (struct sockaddr *) &udp_in,
170                                      n, (struct mbuf *) 0) == 0) {
171                         m_freem(n);
172                         udpstat.udps_fullsock++;
173                     } else
174                         sorwakeup(last);
175                 }
176             }
177             last = inp->inp_socket;
178             /*
179              * Don't look for additional matches if this one does
180              * not have either the SO_REUSEPORT or SO_REUSEADDR
181              * socket options set.  This heuristic avoids searching
182              * through all pcbs in the common case of a non-shared
183              * port.  It assumes that an application will never
184              * clear these options after setting them.
185              */
186             if ((last->so_options & (SO_REUSEPORT | SO_REUSEADDR) == 0))
187                 break;
188         }

189         if (last == NULL) {
190             /*
191              * No matching pcb found; discard datagram.
192              * (No need to send an ICMP Port Unreachable
193              * for a broadcast or multicast datgram.)
194              */
195             udpstat.udps_noportbcast++;
196             goto bad;
197         }
198         if (sbappendaddr(&last->so_rcv, (struct sockaddr *) &udp_in,
199                          m, (struct mbuf *) 0) == 0) {
200             udpstat.udps_fullsock++;
201             goto bad;
202         }
203         sorwakeup(last);
204         return;
205     }
--------------------------------------------------------------------- udp_usrreq.c

Unimplemented socket options

245-260

This code is commented out because it doesn’t work. The intent of the IP_RECVOPTS socket option is to return the IP options from the received datagram as control information, and the intent of IP_RECVRETOPTS socket option is to return source route information. The manipulation of the mp variable by all three IP_RECV socket options is to build a linked list of up to three mbufs that are then placed onto the socket’s buffer by sbappendaddr. The code shown in Figure 23.25 only returns one option as control information, so the m_next pointer of that mbuf is always a null pointer.

Append data to socket’s receive queue

262-272

At this point the received datagram (the mbuf chain pointed to by m), is ready to be placed onto the socket’s receive queue along with a socket address structure representing the sender’s IP address and port (udp_in), and optional control information (the destination IP address, the mbuf pointed to by opts). This is done by sbappendaddr. Before calling this function, however, the pointer and lengths of the first mbuf on the chain are adjusted to ignore the IP and UDP headers. Before returning, sorwakeup is called for the receiving socket to wake up any processes asleep on the socket’s receive queue.

Error return

273-276

If an error is encountered during UDP input processing, udp_input jumps to the label bad. The mbuf chain containing the datagram is released, along with the mbuf chain containing any control information (if present).

Demultiplexing Multicast and Broadcast Datagrams

We now return to the portion of udp_input that handles datagrams sent to a broadcast or multicast IP address. The code is shown in Figure 23.26.

121-138

As the comments indicate, these datagrams are delivered to all sockets that match, not just a single socket. The inadequacy of the UDP interface that is mentioned refers to the inability of a process to receive asynchronous errors on a UDP socket (notably ICMP port unreachables) unless the socket is connected. We described this in Section 22.11.

139-145

The source IP address and port number are saved in the global sockaddr_in structure udp_in, which is passed to sbappendaddr. The mbuf chain’s length and data pointer are updated to ignore the IP and UDP headers.

146-164

The large for loop scans each UDP PCB to find all matching PCBs. in_pcblookup is not called for this demultiplexing because it returns only one PCB, whereas the broadcast or multicast datagram may be delivered to more than one PCB.

If the local port in the PCB doesn’t match the destination port from the received datagram, the entry is ignored. If the local address in the PCB is not the wildcard, it is compared to the destination IP address and the entry is skipped if they’re not equal. If the foreign address in the PCB is not a wildcard, it is compared to the source IP address and if they match, the foreign port must also match the source port. This last test assumes that if the socket is connected to a foreign IP address it must also be connected to a foreign port, and vice versa. This is the same logic we saw in in_pcblookup.

165-177

If this is not the first match found (last is nonnull), a copy of the datagram is placed onto the receive queue for the previous match. Since sbappendaddr releases the mbuf chain when it is done, a copy is first made by m_copy. Any processes waiting for this data are awakened by sorwakeup. A pointer to this matching socket structure is saved in last.

This use of the variable last avoids calling m_copy (an expensive operation since an entire mbuf chain is copied) unless there are multiple recipients for a given datagram. In the common case of a single recipient, the for loop just sets last to the single matching PCB, and when the loop terminates, sbappendaddr places the mbuf chain onto the socket’s receive queue a copy is not made.

178-188

If this matching socket doesn’t have either the SO_REUSEPORT or the SO_REUSEADDR socket option set, then there’s no need to check for additional matches and the loop is terminated. The datagram is placed onto the single socket’s receive queue in the call to sbappendaddr outside the loop.

189-197

If last is null at the end of the loop, no matches were found. An ICMP error is not generated because the datagram was sent to a broadcast or multicast IP address.

198-204

The final matching entry (which could be the only matching entry) has the original datagram (m) placed onto its receive queue. After sorwakeup is called, udp_input returns, since the processing the broadcast or multicast datagram is complete.

The remainder of the function (shown previously in Figure 23.24) handles unicast datagrams.

Connected UDP Sockets and Multihomed Hosts

There is a subtle problem when using a connected UDP socket to exchange datagrams with a process on a multihomed host. Datagrams from the peer may arrive with a different source IP address and will not be delivered to the connected socket.

Consider the example shown in Figure 23.27.

Example of connected UDP socket sending datagram to a multihomed host.

Figure 23.27. Example of connected UDP socket sending datagram to a multihomed host.

Three steps take place.

  1. The client on bsdi creates a UDP socket and connects it to 140.252.1.29, the PPP interface on sun, not the Ethernet interface. A datagram is sent on the socket to the server.

    The server on sun receives the datagram and accepts it, even though it arrives on an interface that differs from the destination IP address. (sun is acting as a router, so whether it implements the weak end system model or the strong end system model doesn’t matter.) The datagram is delivered to the server, which is waiting for client requests on an unconnected UDP socket.

  2. The server sends a reply, but since the reply is being sent on an unconnected UDP socket, the source IP address for the reply is chosen by the kernel based on the outgoing interface (140.252.13.33). The destination IP address in the request is not used as the source address for the reply.

    When the reply is received by bsdi it is not delivered to the client’s connected UDP socket since the IP addresses don’t match.

  3. bsdi generates an ICMP port unreachable error since the reply can’t be demultiplexed. (This assumes that there is not another process on bsdi eligible to receive the datagram.)

The problem in this example is that the server does not use the destination IP address from the request as the source IP address of the reply. If it did, the problem wouldn’t exist, but this solution is nontrivial see Exercise 23.10. We’ll see in Figure 28.16 that a TCP server uses the destination IP address from the client as the source IP address from the server, if the server has not explicitly bound a local IP address to its socket.

udp_saveopt Function

If a process specifies the IP_RECVDSTADDR socket option, to receive the destination IP address from the received datagram udp_saveopt is called by udp_input:

*mp = udp_saveopt((caddr_t) &ip->ip_dst, sizeof(struct in_addr),
                  IP_RECVDSTADDR);

Figure 23.28 shows this function.

Table 23.28. udp_saveopt function: create mbuf with control information.

---------------------------------------------------------------------- udp_usrreq.c
278 /*
279  * Create a "control" mbuf containing the specified data
280  * with the specified type for presentation with a datagram.
281  */
282 struct mbuf *
283 udp_saveopt(p, size, type)
284 caddr_t p;
285 int     size;
286 int     type;
287 {
288     struct cmsghdr *cp;
289     struct mbuf *m;

290     if ((m = m_get(M_DONTWAIT, MT_CONTROL)) == NULL)
291         return ((struct mbuf *) NULL);
292     cp = (struct cmsghdr *) mtod(m, struct cmsghdr *);
293     bcopy(p, CMSG_DATA(cp), size);
294     size += sizeof(*cp);
295     m->m_len = size;
296     cp->cmsg_len = size;
297     cp->cmsg_level = IPPROTO_IP;
298     cp->cmsg_type = type;
299     return (m);
300 }
---------------------------------------------------------------------- udp_usrreq.c

278-289

The arguments are p, a pointer to the information to be stored in the mbuf (the destination IP address from the received datagram); size, its size in bytes (4 in this example, the size of an IP address); and type, the type of control information (IP_RECVDSTADDR).

290-299

An mbuf is allocated, and since the code is executing at the software interrupt layer, M_DONTWAIT is specified. The pointer cp points to the data portion of the mbuf, and it is cast into a pointer to a cmsghdr structure (Figure 16.14). The IP address is copied from the IP header into the data portion of the cmsghdr structure by bcopy. The length of the mbuf is then set (to 16 in this example), followed by the remainder of the cmsghdr structure. Figure 23.29 shows the final state of the mbuf.

Mbuf containing destination address from received datagram as control information.

Figure 23.29. Mbuf containing destination address from received datagram as control information.

The cmsg_len field contains the length of the cmsghdr structure (12) plus the size of the cmsg_data field (4 for this example). If the application calls recvmsg to receive the control information, it must go through the cmsghdr structure to determine the type and length of the cmsg_data field.

udp_ctlinput Function

When icmp_input receives an ICMP error (destination unreachable, parameter problem, redirect, source quench, and time exceeded) the corresponding protocol’s pr_ctlinput function is called:

    if (ctlfunc = inetsw[ ip_protox[icp->icmp_ip.ip_p]
].pr_ctlinput)
        (*ctlfunc)(code, (struct sockaddr *)&icmpsrc, &icp
>icmp_ip);

For UDP, Figure 22.32 showed that the function udp_ctlinput is called. We show this function in Figure 23.30.

Table 23.30. udp_ctlinput function: process received ICMP errors.

--------------------------------------------------------------------- udp_usrreq.c
314 void
315 udp_ctlinput(cmd, sa, ip)
316 int     cmd;
317 struct sockaddr *sa;
318 struct ip *ip;
319 {
320     struct udphdr *uh;
321     extern struct in_addr zeroin_addr;
322     extern u_char inetctlerrmap[];

323     if (!PRC_IS_REDIRECT(cmd) &&
324         ((unsigned) cmd >= PRC_NCMDS || inetctlerrmap[cmd] == 0))
325         return;
326     if (ip) {
327         uh = (struct udphdr *) ((caddr_t) ip + (ip->ip_hl << 2));
328         in_pcbnotify(&udb, sa, uh->uh_dport, ip->ip_src, uh->uh_sport,
329                      cmd, udp_notify);
330     } else
331         in_pcbnotify(&udb, sa, 0, zeroin_addr, 0, cmd, udp_notify);
332 }
--------------------------------------------------------------------- udp_usrreq.c

314-322

The arguments are cmd, one of the PRC_xxx constants from Figure 11.19; sa, a pointer to a sockaddr_in structure containing the source IP address from the ICMP message; and ip, a pointer to the IP header that caused the error. For the destination unreachable, parameter problem, source quench, and time exceeded errors, the pointer ip points to the IP header that caused the error. But when udp_ctlinput is called by pfctlinput for redirects (Figure 22.32), sa points to a sockaddr_in structure containing the destination address that should be redirected, and ip is a null pointer. There is no loss of information in this final case, since we saw in Section 22.11 that a redirect is applied to all TCP and UDP sockets connected to the destination address. The nonnull third argument is needed, however, for other errors, such as a port unreachable, since the protocol header following the IP header contains the unreachable port.

323-325

If the error is not a redirect, and either the PRC_xxx value is too large or there is no error code in the global array inetctlerrmap, the ICMP error is ignored. To understand this test we need to review what happens to a received ICMP message.

  1. icmp_input converts the ICMP type and code into a PRC_xxx error code.

  2. The PRC_xxx error code is passed to the protocol’s control-input function.

  3. The Internet protocols (TCP and UDP) map the PRC_xxx error code into one of the Unix errno values using inetctlerrmap, and this value is returned to the process.

Figures 11.1 and 11.2 summarize this processing of ICMP messages.

Returning to Figure 23.30, we can see what happens to an ICMP source quench that arrives in response to a UDP datagram. icmp_input converts the ICMP message into the error PRC_QUENCH and udp_ctlinput is called. But since the errno column for this ICMP error is blank in Figure 11.2, the error is ignored.

326-331

The function in_pcbnotify notifies the appropriate PCBs of the ICMP error. If the third argument to udp_ctlinput is nonnull, the source and destination UDP ports from the datagram that caused the error are passed to in_pcbnotify along with the source IP address.

udp_notify Function

The final argument to in_pcbnotify is a pointer to a function that in_pcbnotify calls for each PCB that is to receive the error. The function for UDP is udp_notify and we show it in Figure 23.31.

Table 23.31. udp_notify function: notify process of an asynchronous error.

--------------------------------------------------------------------- udp_usrreq.c
305 static void
306 udp_notify(inp, errno)
307 struct inpcb *inp;
308 int     errno;
309 {
310     inp->inp_socket->so_error = errno;
311     sorwakeup(inp->inp_socket);
312     sowwakeup(inp->inp_socket);
313 }
--------------------------------------------------------------------- udp_usrreq.c

301-313

The errno value, the second argument to this function, is stored in the socket’s so_error variable. By setting this socket variable, the socket becomes readable and writable if the process calls select. Any processes waiting to receive or send on the socket are then awakened to receive the error.

udp_usrreq Function

The protocol’s user-request function is called for a variety of operations. We saw in Figure 23.14 that a call to any one of the five write functions on a UDP socket ends up calling UDP’s user-request function with a request of PRU_SEND.

Figure 23.32 shows the beginning and end of udp_usrreq. The body of the switch is discussed in separate figures following this figure. The function arguments are described in Figure 15.17.

Table 23.32. Body of udp_usrreq function.

--------------------------------------------------------------------- udp_usrreq.c
417 int
418 udp_usrreq(so, req, m, addr, control)
419 struct socket *so;
420 int     req;
421 struct mbuf *m, *addr, *control;
422 {
423     struct inpcb *inp = sotoinpcb(so);
424     int     error = 0;
425     int     s;

426     if (req == PRU_CONTROL)
427         return (in_control(so, (int) m, (caddr_t) addr,
428                            (struct ifnet *) control));
429     if (inp == NULL && req != PRU_ATTACH) {
430         error = EINVAL;
431         goto release;
432     }
433     /*
434      * Note: need to block udp_input while changing
435      * the udp pcb queue and/or pcb addresses.
436      */
437     switch (req) {
                                                                                  
                                   /* switch cases */                             
                                                                                  
522     default:
523         panic("udp_usrreq");
524     }

525   release:
526     if (control) {
527         printf("udp control data unexpectedly retaineden");
528         m_freem(control);
529     }
530     if (m)
531         m_freem(m);
532     return (error);
533 }
--------------------------------------------------------------------- udp_usrreq.c

417-428

The PRU_CONTROL request is from the ioctl system call. The function in_control processes the request completely.

429-432

The socket pointer was converted to the PCB pointer when inp was declared at the beginning of the function. The only time a null PCB pointer is allowed is when a new socket is being created (PRU_ATTACH).

433-436

The comment indicates that whenever entries are being added to or deleted from UDP’s PCB list, the code must be protected by splnet. This is done because udp_usrreq is called as part of a system call, and it doesn’t want to be interrupted by UDP input (called by IP input, which is called as a software interrupt) while it is modifying the doubly linked list of PCBs. UDP input is also blocked while modifying the local or foreign addresses or ports in a PCB, to prevent a received UDP datagram from being delivered incorrectly by in_pcblookup.

We now discuss the individual case statements. The PRU_ATTACH request, shown in Figure 23.33, is from the socket system call.

Table 23.33. udp_usrreq function: PRU_ATTACH and PRU_DETACH requests.

---------------------------------------------------------------------- udp_usrreq.c
438     case PRU_ATTACH:
439         if (inp != NULL) {
440             error = EINVAL;
441             break;
442         }
443         s = splnet();
444         error = in_pcballoc(so, &udb);
445         splx(s);
446         if (error)
447             break;
448         error = soreserve(so, udp_sendspace, udp_recvspace);
449         if (error)
450             break;
451         ((struct inpcb *) so->so_pcb)->inp_ip.ip_ttl = ip_defttl;
452         break;

453     case PRU_DETACH:
454         udp_detach(inp);
455         break;
---------------------------------------------------------------------- udp_usrreq.c

438-447

If the socket structure already points to a PCB, EINVAL is returned. in_pcballoc allocates a new PCB, adds it to the front of UDP’s PCB list, and links the socket structure and the PCB to each other.

448-450

soreserve reserves buffer space for a receive buffer and a send buffer for the socket. As noted in Figure 16.7, soreserve just enforces system limits; the buffer space is not actually allocated. The default values for the send and receive buffer sizes are 9216 bytes (udp_sendspace) and 41,600 bytes (udp_recvspace). The former allows for a maximum UDP datagram size of 9200 bytes (to hold 8 Kbytes of data in an NFS packet), plus the 16-byte sockaddr_in structure for the destination address. The latter allows for 40 1024-byte datagrams to be queued at one time for the socket. The process can change these defaults by calling setsockopt.

451-452

There are two fields in the prototype IP header in the PCB that the process can change by calling setsockopt: the TTL and the TOS. The TTL defaults to 64 (ip_defttl) and the TOS defaults to 0 (normal service), since the PCB is initialized to 0 by in_pcballoc.

453-455

The close system call issues the PRU_DETACH request. The function udp_detach, shown in Figure 23.34, is called. This function is also called later in this section for the PRU_ABORT request.

Table 23.34. udp_detach function: delete a UDP PCB.

------------------------------------------------------------ udp_usrreq.c
534 static void
535 udp_detach(inp)
536 struct inpcb *inp;
537 {
538     int     s = splnet();

539     if (inp == udp_last_inpcb)
540         udp_last_inpcb = &udb;
541     in_pcbdetach(inp);
542     splx(s);
543 }
------------------------------------------------------------ udp_usrreq.c

If the last-received PCB pointer (the one-behind cache) points to the PCB being detached, the cache pointer is set to the head of the UDP list (udb). The function in_pcbdetach removes the PCB from UDP’s list and releases the PCB.

Returning to udp_usrreq, a PRU_BIND request is the result of the bind system call and a PRU_LISTEN request is the result of the listen system call. Both are shown in Figure 23.35.

Table 23.35. udp_usrreq function: PRU_BIND and PRU_LISTEN requests.

-------------------------------------------------------------- udp_usrreq.c
456     case PRU_BIND:
457         s = splnet();
458         error = in_pcbbind(inp, addr);
459         splx(s);
460         break;

461     case PRU_LISTEN:
462         error = EOPNOTSUPP;
463         break;
-------------------------------------------------------------- udp_usrreq.c

456-460

All the work for a PRU_BIND request is done by in_pcbbind.

461-463

The PRU_LISTEN request is invalid for a connectionless protocol it is used only by connection-oriented protocols.

We mentioned earlier that a UDP application, either a client or server (normally a client), can call connect. This fixes the foreign IP address and port number that this socket can send to or receive from. Figure 23.36 shows the PRU_CONNECT, PRU_CONNECT2, and PRU_ACCEPT requests.

Table 23.36. udp_usrreq function: PRU_CONNECT, PRU_CONNECT2, and PRU_ACCEPT requests.

----------------------------------------------------------------- udp_usrreq.c
464     case PRU_CONNECT:
465         if (inp->inp_faddr.s_addr != INADDR_ANY) {
466             error = EISCONN;
467             break;
468
469         s = splnet();
470         error = in_pcbconnect(inp, addr);
471         splx(s);
472         if (error == 0)
473             soisconnected(so);
474         break;

475     case PRU_CONNECT2:
476         error = EOPNOTSUPP;
477         break;

478     case PRU_ACCEPT:
479         error = EOPNOTSUPP;
480         break;
----------------------------------------------------------------- udp_usrreq.c

464-474

If the socket is already connected, EISCONN is returned. The socket should never be connected at this point, because a call to connect on an already-connected UDP socket generates a PRU_DISCONNECT request before this PRU_CONNECT request. Otherwise in_pcbconnect does all the work. If no errors are encountered, soisconnected marks the socket structure as being connected.

475-477

The socketpair system call issues the PRU_CONNECT2 request, which is defined only for the Unix domain protocols.

478-480

The PRU_ACCEPT request is from the accept system call, which is defined only for connection-oriented protocols.

The PRU_DISCONNECT request can occur in two cases for a UDP socket:

  1. When a connected UDP socket is closed, PRU_DISCONNECT is called before PRU_DETACH.

  2. When a connect is issued on an already-connected UDP socket, soconnect issues the PRU_DISCONNECT request before the PRU_CONNECT request.

Figure 23.37 shows the PRU_DISCONNECT request.

Table 23.37. udp_usrreq function: PRU_DISCONNECT request.

------------------------------------------------------------------ udp_usrreq.c
481     case PRU_DISCONNECT:
482         if (inp->inp_faddr.s_addr == INADDR_ANY) {
483             error = ENOTCONN;
484             break;
485         }
486         s = splnet();
487         in_pcbdisconnect(inp);
488         inp->inp_laddr.s_addr = INADDR_ANY;
489         splx(s);
490         so->so_state &= ~SS_ISCONNECTED;    /* XXX */
491         break;
------------------------------------------------------------------ udp_usrreq.c

If the socket is not already connected, ENOTCONN is returned. Otherwise in_pcbdisconnect sets the foreign IP address to 0.0.0.0 and the foreign port to 0. The local address is also set to 0.0.0.0, since this PCB variable could have been set by connect.

A call to shutdown specifying that the process has finished sending data generates the PRU_SHUTDOWN request, although it is rare for a process to issue this system call for a UDP socket. Figure 23.38 shows the PRU_SHUTDOWN, PRU_SEND, and PRU_ABORT requests.

Table 23.38. udp_usrreq function: PRU_SHUTDOWN, PRU_SEND, and PRU_ABORT requests.

------------------------------------------------------------------- udp_usrreq.c
492     case PRU_SHUTDOWN:
493         socantsendmore(so);
494         break;

495     case PRU_SEND:
496         return (udp_output(inp, m, addr, control));

497     case PRU_ABORT:
498         soisdisconnected(so);
499         udp_detach(inp);
500         break;
------------------------------------------------------------------- udp_usrreq.c

492-494

socantsendmore sets the socket’s flags to prevent any future output.

495-496

In Figure 23.14 we showed how the five write functions ended up calling udp_usrreq with a PRU_SEND request. udp_output sends the datagram. udp_usrreq returns, to avoid falling through to the label release (Figure 23.32), since the mbuf chain containing the data (m) must not be released yet. IP output appends this mbuf chain to the appropriate interface output queue, and the device driver will release the mbuf when the data has been transmitted.

The only buffering of UDP output within the kernel is on the interface’s output queue. If there is room in the socket’s send buffer for the datagram and destination address, sosend calls udp_usrreq, which we see calls udp_output. We saw in Figure 23.20 that ip_output is then called, which calls ether_output for an Ethernet, placing the datagram onto the interface’s output queue (if there is room). If the process calls sendto faster than the interface can transmit the datagrams, ether_output can return ENOBUFS, which is returned to the process.

497-500

A PRU_ABORT request should never be generated for a UDP socket, but if it is, the socket is disconnected and the PCB detached.

The PRU_SOCKADDR and PRU_PEERADDR requests are from the getsockname and getpeername system calls, respectively. These two requests, and the PRU_SENSE request, are shown in Figure 23.39.

Table 23.39. udp_usrreq function: PRU_SOCKADDR, PRU_PEERADDR, and PRU_SENSE requests.

-------------------------------------------------------------------- udp_usrreq.c
501     case PRU_SOCKADDR:
502         in_setsockaddr(inp, addr);
503         break;

504     case PRU_PEERADDR:
505         in_setpeeraddr(inp, addr);
506         break;

507     case PRU_SENSE:
508         /*
509          * fstat: don't bother with a blocksize.
510          */
511         return (0);
-------------------------------------------------------------------- udp_usrreq.c

501-506

The functions in_setsockaddr and in_setpeeraddr fetch the information from the PCB, storing the result in the addr argument.

507-511

The fstat system call generates the PRU_SENSE request. The function returns OK, but doesn’t return any other information. We’ll see later that TCP returns the size of the send buffer as the st_blksize element of the stat structure.

The remaining seven PRU_xxx requests, shown in Figure 23.40, are not supported for a UDP socket.

Table 23.40. udp_usrreq function: unsupported requests.

--------------------------------------------------------------------- udp_usrreq.c
512     case PRU_SENDOOB:
513     case PRU_FASTTIMO:
514     case PRU_SLOWTIMO:
515     case PRU_PROTORCV:
516     case PRU_PROTOSEND:
517         error = EOPNOTSUPP;
518         break;

519     case PRU_RCVD:
520     case PRU_RCVOOB:
521         return (EOPNOTSUPP);    /* do not free mbuf's */
--------------------------------------------------------------------- udp_usrreq.c

There is a slight difference in how the last two are handled because PRU_RCVD doesn’t pass a pointer to an mbuf as an argument (m is a null pointer) and PRU_RCVOOB passes a pointer to an mbuf for the protocol to fill in. In both cases the error is immediately returned, without breaking out of the switch and releasing the mbuf chain. With PRU_RCVOOB the caller releases the mbuf that it allocated.

udp_sysctl Function

The sysctl function for UDP supports only a single option, the UDP checksum flag. The system administrator can enable or disable UDP checksums using the sysctl(8) program. Figure 23.41 shows the udp_sysctl function. This function calls sysctl_int to fetch or set the value of the integer udpcksum.

Table 23.41. udp_sysctl function.

--------------------------------------------------------------------- udp_usrreq.c
547 udp_sysctl(name, namelen, oldp, oldlenp, newp, newlen)
548 int    *name;
549 u_int   namelen;
550 void   *oldp;
551 size_t *oldlenp;
552 void   *newp;
553 size_t  newlen;
554 {
555     /* All sysctl names at this level are terminal. */
556     if (namelen != 1)
557         return (ENOTDIR);

558     switch (name[0]) {
559     case UDPCTL_CHECKSUM:
560         return (sysctl_int(oldp, oldlenp, newp, newlen, &udpcksum));
561     default:
562         return (ENOPROTOOPT);
563     }
564     /* NOTREACHED */
565 }
--------------------------------------------------------------------- udp_usrreq.c

Implementation Refinements

UDP PCB Cache

In Section 22.12 we talked about some general features of PCB searching and how the code we’ve seen uses a linear search of the protocol’s PCB list. We now tie this together with the one-behind cache used by UDP in Figure 23.24.

The problem with the one-behind cache occurs when the cached PCB contains wildcard values (for either the local address, foreign address, or foreign port): the cached value never matches any received datagram. One solution tested in [Partridge and Pink 1993] is to modify the cache to not compare wildcarded values. That is, instead of comparing the foreign address in the PCB with the source address in the datagram, compare these two values only if the foreign address in the PCB is not a wildcard.

There’s a subtle problem with this approach [Partridge and Pink 1993]. Assume there are two sockets bound to local port 555. One has the remaining three elements wildcarded, while the other has connected to the foreign address 128.1.2.3 and the foreign port 1600. If we cache the first PCB and a datagram arrives from 128.1.2.3, port 1600, we can’t ignore comparing the foreign addresses just because the cached value has a wildcarded foreign address. This is called cache hiding. The cached PCB has hidden another PCB that is a better match in this example.

To get around cache hiding requires more work when a new entry is added to or deleted from the cache. Those PCBs that hide other PCBs cannot be cached. This is not a problem, however, because the normal scenario is to have one socket per local port. The example we just gave with two sockets bound to local port 555, while possible (especially on a multihomed host), is rare.

The next enhancement tested in [Partridge and Pink 1993] is to also remember the PCB of the last datagram sent. This is motivated by [Mogul 1991], who shows that half of all datagrams received are replies to the last datagram that was sent. Cache hiding is a problem here also, so PCBs that would hide other PCBs are not cached.

The results of these two caches shown in [Partridge and Pink 1993] on a general-purpose system measured for around 100,000 received UDP datagrams show a 57% hit rate for the last-received PCB cache and a 30% hit rate for the last-sent PCB cache. The amount of CPU time spent in udp_input is more than halved, compared to the version with no caching.

These two caches still depend on a certain amount of locality: that with a high probability the UDP datagram that just arrived is either from the same peer as the last UDP datagram received or from the peer to whom the last datagram was sent. The latter is typical for request-response applications that send a datagram and wait for a reply. [McKenney and Dove 1992] show that some applications, such as data entry into an online transaction processing (OLTP) system, don’t yield the high cache hit rates that [Partridge and Pink 1993] observed. As we mentioned in Section 22.12, placing the PCBs onto hash chains provided an order of magnitude improvement over the last-received and last-sent caches for a system with thousands of OLTP connections.

UDP Checksum

The next area for improving the implementation is to combine the copying of data between the process and the kernel with the calculation of the checksum. In Net/3, each byte of data is processed twice during an output operation: once when copied from the process into an mbuf (the function uiomove, which is called by sosend), and again when the UDP checksum is calculated (by the function in_cksum, which is called by udp_output). This happens on input as well as output.

[Partridge and Pink 1993] modified the UDP output processing from what we showed in Figure 23.14 so that a UDP-specific function named udp_sosend is called instead of sosend. This new function calculates the checksum of the UDP header and the pseudo-header in-line (instead of calling the general-purpose function in_cksum) and then copies the data from the process into an mbuf chain using a special function named in_uiomove (instead of the general-purpose uiomove). This new function copies the data and updates the checksum. The amount of time spent copying the data and calculating the checksum is reduced with this technique by about 40 to 45%.

On the receive side the scenario is different. UDP calculates the checksum of the UDP header and the pseudo-header, removes the UDP header, and queues the data for the appropriate socket. When the application reads the data, a special version of soreceive (called udp_soreceive) completes the calculation of the checksum while copying the data into the user’s buffer. If the checksum is in error, however, the error is not detected until the entire datagram has been copied into the user’s buffer. In the normal case of a blocking socket, udp_soreceive just waits for the next datagram to arrive. But if the socket is nonblocking, the error EWOULDBLOCK must be returned if another datagram is not ready to be passed to the process. This implies two changes in the socket interface for a nonblocking read from a UDP socket:

  1. The select function can indicate that a nonblocking UDP socket is readable, yet the error EWOULDBLOCK is unexpectedly returned by one of the read functions if the checksum fails.

  2. Since a checksum error is detected after the datagram has been copied into the user’s buffer, the application’s buffer is changed even though no data is returned by the read.

Even with a blocking socket, if the datagram with the checksum error contains 100 bytes of data and the next datagram without an error contains 40 bytes of data, recvfrom returns a length of 40, but the 60 bytes that follow in the user’s buffer have also been modified.

[Partridge and Pink 1993] compare the timings for a copy versus a copy-with-checksum for six different computers. They show that the checksum is calculated for free during the copy operation on many architectures. This occurs when memory access speeds and CPU processing speeds are mismatched, as is true for many current RISC processors.

Summary

UDP is a simple, connectionless protocol, which is why we cover it before looking at TCP. UDP output is simple: IP and UDP headers are prepended to the user’s data, as much of the header is filled in as possible, and the result is passed to ip_output. The only complication is calculating the UDP checksum, which involves prepending a pseudo-header just for the checksum computation. We’ll encounter a similar pseudo-header for the calculation of the TCP checksum in Chapter 26.

When udp_input receives a datagram, it first performs a general validation (the length and checksum); the processing then differs depending on whether the destination IP address is a unicast address or a broadcast or multicast address. A unicast datagram is delivered to at most one process, but a broadcast or multicast datagram can be delivered to multiple processes. A one-behind cache is maintained for unicast datagrams, which maintains a pointer to the last Internet PCB for which a UDP datagram was received. We saw, however, that because of the prevalence of wildcard addressing with UDP applications, this cache is practically useless.

The udp_ctlinput function is called to handle received ICMP messages, and the udp_usrreq function handles the PRU_xxx requests from the socket layer.

Exercises

23.1

List the five types of mbuf chains that udp_output passes to ip_output. (Hint: look at sosend.)

23.1

sosend places the user data into a single mbuf if the size is less than or equal to 100 bytes; into two mbufs if the size is less than or equal to 207 bytes; or into one or more mbufs, each with a cluster, otherwise. Furthermore, sosend calls MH_ALIGN if the size is less than 100 bytes, which, it is hoped, will allow room at the beginning of the mbuf for the protocol headers. Since udp_output calls M_PREPEND, the following five scenarios are possible: (1) If the size of the user data is less than or equal to 72 bytes, a single mbuf contains the IP header, UDP header, and data. (2) If the size is between 73 and 100 bytes, one mbuf is allocated by sosend for the data and another is allocated by M_PREPEND for the IP and UDP headers. (3) If the size is between 101 and 207 bytes, two mbufs are allocated by sosend for the data and another by M_PREPEND for the IP and UDP headers. (4) If the size is between 208 and MCLBYTES, one mbuf with a cluster is allocated by sosend for the data and another by M_PREPEND for the IP and UDP headers. (5) Beyond this size, sosend allocates as many mbufs with clusters as necessary to hold the data (up to 64 for a maximum data size of 65507 bytes with 1024-byte clusters), and one mbuf is allocated by M_PREPEND for the IP and UDP headers.

23.2

What happens to the answer for the previous exercise when the process specifies IP options for the outgoing datagram?

23.2

IP options are passed to ip_output, which calls ip_insertoptions to insert the options into the outgoing IP datagram. This function in turn allocates a new mbuf to hold the IP header including options if the first mbuf in the chain points to a cluster (which never happens with UDP output) or if there is not enough room at the beginning of the first mbuf in the chain for the options. In scenario 1 from the previous solution, the size of the options determines whether another mbuf is allocated by ip_insertoptions: if the size of the user data is less than 100–28– optlen, (where optlen is the number of bytes of IP options), there is room in the mbuf for the IP header with options, the UDP header, and the data.

In scenarios 2, 3, 4, and 5, the first mbuf in the chain is always allocated by M_PREPEND just for the IP and UDP headers. M_PREPEND calls m_prepend, which calls MH_ALIGN, moving the 28 bytes of headers to the end of the mbuf, hence there is always room for the maximum of 40 bytes of IP options in this first mbuf in the chain.

23.3

Does a UDP client need to call bind? Why or why not?

23.3

No. The function in_pcbconnect is called, either when the application calls connect or when the first datagram is sent on an unconnected UDP socket. Since the local address is a wildcard and the local port is 0, in_pcbconnect sets the local port to an ephemeral port (by calling in_pcbbind) and sets the local address based on the route to the destination.

23.4

What happens to the processor priority level in udp_output if the socket is unconnected and the call to M_PREPEND in Figure 23.15 fails?

23.4

The processor priority level is left at splnet; it is not restored to the saved value. This is a bug.

23.5

udp_output does not check for a destination port of 0. Is it possible to send a UDP datagram with a destination port of 0?

23.5

No. in_pcbconnect will not allow a connection to port 0. Even if the process doesn’t call connect directly, an implicit connect is performed, so in_pcbconnect is called regardless.

23.6

Assuming the IP_RECVDSTADDR socket option worked when a datagram was sent to a broadcast address, how can you then determine if this address is a broadcast address?

23.6

The application must call ioctl with the SIOCGIFCONF command to return information on all configured IP interfaces. The destination address in the received UDP datagram must then be compared against all the IP addresses and broadcast addresses in the list returned by ioctl. (As an alternative to ioctl, the sysctl system call described in Section 19.14 can also be used to obtain the information on all the configured interfaces.)

23.7

Who releases the mbuf that udp_saveopt (Figure 23.28) allocates?

23.7

recvit releases the mbuf with the control information.

23.8

How can a process disconnect a connected UDP socket? That is, the process calls connect and exchanges datagrams with that peer, and then the process wants to disconnect the socket, allowing it to call sendto and send a datagram to some other host.

23.8

To disconnect a connected UDP socket, call connect with an invalid address, such as 0.0.0.0, and a port of 0. Since the socket is already connected, soconnect calls sodisconnect, which calls udp_usrreq with a PRU_DISCONNECT request. This sets the foreign address to 0.0.0.0 and the foreign port to 0, allowing a subsequent call to sendto that specifies a destination address to succeed. Specifying the invalid address causes the PRU_CONNECT request from sodisconnect to fail. We don’t want the connect to succeed, we just want the PRU_DISCONNECT request executed and this back door through connect is the only way to execute this request, since the sockets API doesn’t provide a disconnect function.

The manual page for connect(2) usually contains the following note that hints at this: “Datagram sockets may dissolve the association by connecting to an invalid address, such as a null address.” What this note fails to mention is that the call to connect for the invalid address is expected to return an error. The term null address is also vague: it means the IP address 0.0.0.0, not a null pointer for the second argument to bind.

23.9

In our discussion of Figure 22.25 we noted that a UDP application that calls connect with a foreign IP address of 255.255.255.255 actually sends datagrams out the primary interface with a destination IP address corresponding to the broadcast address of that interface. What happens if a UDP application uses an unconnected socket instead, calling sendto with a destination address of 255.255.255.255?

23.9

Since an unconnected UDP socket is temporarily connected to the foreign IP address by in_pcbconnect, the scenario is the same as if the process calls connect: the datagram is sent out the primary interface with a destination IP address corresponding to the broadcast address of that interface.

23.10

After discussing the problem with Figure 23.27, we mentioned that this problem would not exist if the server used the destination IP address from the request as the source IP address of the reply. Explain how the server could do this.

23.10

The server must set the IP_RECVDSTADDR socket option and use recvmsg to obtain the destination IP address from the client’s request. For this address to be the source IP address of the reply requires that this IP address be bound to the socket. Since you cannot bind a socket more than once, the server must create a brand new socket for each reply.

23.11

Implement changes to allow a process to perform path MTU discovery using UDP: the process must be able to set the “don’t fragment” bit in the resulting IP datagram and be told if the corresponding ICMP destination unreachable error is received.

23.11

Notice in ip_output (Figure 8.22) that IP does not modify the DF bit supplied by the caller. A new socket option could be defined to cause udp_output to set the DF bit before passing datagrams to IP.

23.12

Does the variable udp_in need to be global?

23.12

No. It is used only in the udp_input function and should be local to that function.

23.13

Modify udp_input to save the IP options and make them available to the receiver with the IP_RECVOPTS socket option.

23.14

Fix the one-behind cache in Figure 23.24.

23.15

Fix udp_input to implement the IP_RECVOPTS and IP_RETOPTS socket options.

23.16

Fix udp_input so that the IP_RECVDSTADDR socket option works for datagrams sent to a broadcast or multicast address.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.240.222