A process accesses the raw IP layer by creating a socket of type SOCK_RAW
in the Internet domain. There are three uses for raw sockets:
Raw sockets allow a process to send and receive ICMP and IGMP messages.
The Ping program uses this type of socket to send ICMP echo requests and to receive ICMP echo replies.
Some routing daemons use this feature to track ICMP redirects that are processed by the kernel. We saw in Section 19.7 that Net/3 generates an RTM_REDIRECT
message on a routing socket when a redirect is processed, obviating the need for this use of raw sockets.
This feature is also used to implement protocols based on ICMP, such as router advertisement and router solicitation (Section 9.6 of Volume 1), which use ICMP but are better implemented as user processes than within the kernel.
The multicast routing daemon uses a raw IGMP socket to send and receive IGMP messages.
Raw sockets let a process build its own IP headers. The Traceroute program uses this feature to build its own UDP datagrams, including the IP and UDP headers.
Raw sockets let a process read and write IP datagrams with an IP protocol type that the kernel doesn’t support.
The gated
program uses this to support three routing protocols that are built directly on IP: EGP, HELLO, and OSPF.
This type of raw socket can also be used to experiment with new transport layers on top of IP, instead of adding support to the kernel. It is usually much easier to debug code within a user process than it is within the kernel.
This chapter examines the implementation of raw IP sockets.
There are five raw IP functions in a single C file, shown in Figure 32.1.
Figure 32.2 shows the relationship of the five raw IP functions to other kernel functions.
The shaded ellipses are the five functions that we cover in this chapter. Be aware that the “rip” prefix used within the raw IP functions stands for “raw IP” and not the “Routing Information Protocol,” whose common acronym is RIP.
Four global variables are introduced in this chapter, which are shown in Figure 32.3.
Table 32.3. Global variables introduced in this chapter.
Variable | Datatype | Description |
---|---|---|
|
| head of the raw IP Internet PCB list |
|
| contains sender’s IP address on input |
|
| default size of socket receive buffer, 8192 bytes default size of socket send buffer, 8192 bytes |
Raw IP maintains two of the counters in the ipstat
structure (Figure 8.4). We describe these in Figure 32.4.
The use of the ips_noproto
counter with SNMP is shown in Figure 8.6. Figure 8.5 shows some sample output of these two counters.
Unlike all other protocols, raw IP is accessed through multiple entries in the inetsw
array. There are four entries in this structure with a socket type of SOCK_RAW
, each with a different protocol value:
IPPROTO_ICMP
(protocol value of 1),
IPPROTO_IGMP
(protocol value of 2),
IPPROTO_RAW
(protocol value of 255), and
raw wildcard entry (protocol value of 0).
The first two entries for ICMP and IGMP were described earlier (Figures 11.12 and 13.9). The difference in these four entries can be summarized as follows:
If the process creates a raw socket (SOCK_RAW
) with a nonzero protocol value (the third argument to socket
), and if that value matches IPPROTO_ICMP, IPPROTO_IGMP
, or IPPROTO_RAW
, then the corresponding protosw
entry is used.
If the process creates a raw socket with a nonzero protocol value that is not known to the kernel, the wildcard entry with a protocol of 0 is matched by pffindproto
. This allows a process to handle any IP protocol that is not known to the kernel, without making kernel modifications.
We saw in Section 7.8 that all entries in the ip_protox
array that are unknown are set to point to the entry for IPPROTO_RAW
, whose protocol switch entry we show in Figure 32.5.
Table 32.5. The raw IP protosw
structure.
Member |
| Description |
---|---|---|
|
| raw socket |
|
| raw IP is part of the Internet domain |
|
| appears in the |
|
| socket layer flags, not used by protocol processing |
|
| receives messages from IP layer |
|
| not used by raw IP |
|
| not used by raw IP |
|
| respond to administrative requests from a process |
|
| respond to communication requests from a process |
|
| not used by raw IP |
|
| not used by raw IP |
|
| not used by raw IP |
|
| not used by raw IP |
|
| not used by raw IP |
We describe the three functions that begin with rip_
in this chapter. We also cover the function rip_output
, which is not in the protocol switch entry but is called by rip_usrreq
when a raw IP datagram is output.
The fifth raw IP function, rip_init
, is contained only in the wildcard entry. The initialization function must be called only once, so it could appear in either the IPPROTO_RAW
entry or in the wildcard entry.
What Figure 32.5 doesn’t show, however, is that other protocols (ICMP and IGMP) also reference some of the raw IP functions in their protosw
entries. Figure 32.6 compares the relevant fields in the protosw
entries for the four SOCK_RAW
protocols. To highlight the differences, values in these rows are in a bolder font when they differ.
Table 32.6. Comparison of protocol switch values for raw sockets.
|
| |||
---|---|---|---|---|
|
|
| wildcard (0) | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The implementation of raw sockets has changed with the different BSD releases. The entry with a protocol of
IPPROTO_RAW
has always been used as the wildcard entry in theip_protox
table for unknown IP protocols. The entry with a protocol of 0 has always been the default entry, to allow processes to read and write IP datagrams with a protocol that the kernel doesn’t support.Usage of the
IPPROTO_RAW
entry by a process started when Traceroute was developed by Van Jacobson, because Traceroute was the first process that needed to write its own IP headers (to change the TTL field). The kernel patches to 4.3BSD and Net/1 to support Traceroute included a change torip_output
so that if the protocol wasIPPROTO_RAW
, it was assumed the process had passed a complete IP datagram, including the IP header. This was changed with Net/2 when theIP_HDRINCL
socket option was introduced, removing this overloading of theIPPROTO_RAW
protocol and allowing a process to send its own IP header with the wildcard entry.
The domaininit
function calls the raw IP initialization function rip_init
(Figure 32.7) at system initialization time.
The only action performed by this function is to set the next and previous pointers in the head PCB (rawinpcb
) to point to itself. This is an empty doubly linked list.
Whenever a socket of type SOCK_RAW
is created by the socket
system call, we’ll see that the raw IP PRU_ATTACH
function creates an Internet PCB and puts it onto the rawinpcb
list.
Since all entries in the ip_protox
array for unknown protocols are set to point to the entry for IPPROTO_RAW
(Section 7.8), and since the pr_input
function for this protocol is rip_input
(Figure 32.6), this function is called for all IP datagrams that have a protocol value that the kernel doesn’t recognize. But from Figure 32.2 we see that both ICMP and IGMP also call rip_input
. This happens under the following conditions:
icmp_input
calls rip_input
for all unknown ICMP message types and for all ICMP messages that are not reflected.
igmp_input
calls rip_input
for all IGMP packets.
One reason for calling rip_input
in these two cases is to allow a process with a raw socket to handle new ICMP and IGMP messages that might not be supported by the kernel.
Figure 32.8 shows the rip_input
function.
Table 32.8. rip_input
function.
------------------------------------------------------------------------- raw_ip.c 59 void 60 rip_input(m) 61 struct mbuf *m; 62 { 63 struct ip *ip = mtod(m, struct ip *); 64 struct inpcb *inp; 65 struct socket *last = 0; 66 ripsrc.sin_addr = ip->ip_src; 67 for (inp = rawinpcb.inp_next; inp != &rawinpcb; inp = inp->inp_next) { 68 if (inp->inp_ip.ip_p && inp->inp_ip.ip_p != ip->ip_p) 69 continue; 70 if (inp->inp_laddr.s_addr && 71 inp->inp_laddr.s_addr == ip->ip_dst.s_addr) 72 continue; 73 if (inp->inp_faddr.s_addr && 74 inp->inp_faddr.s_addr == ip->ip_src.s_addr) 75 continue; 76 if (last) { 77 struct mbuf *n; 78 if (n = m_copy(m, 0, (int) M_COPYALL)) { 79 if (sbappendaddr(&last->so_rcv, &ripsrc, 80 n, (struct mbuf *) 0) == 0) 81 /* should notify about lost packet */ 82 m_freem(n); 83 else 84 sorwakeup(last); 85 } 86 } 87 last = inp->inp_socket; 88 } 89 if (last) { 90 if (sbappendaddr(&last->so_rcv, &ripsrc, 91 m, (struct mbuf *) 0) == 0) 92 m_freem(m); 93 else 94 sorwakeup(last); 95 } else { 96 m_freem(m); 97 ipstat.ips_noproto++; 98 ipstat.ips_delivered--; 99 } 100 } ------------------------------------------------------------------------- raw_ip.c |
59-66
The source address from the IP datagram is put into the global variable ripsrc
, which becomes an argument to sbappendaddr
whenever a matching PCB is found. Unlike UDP, there is no concept of a port number with raw IP, so the sin_port
field in the sockaddr_in
structure is always 0.
67-88
Raw IP handles its list of PCBs differently from UDP and TCP. We saw that these two protocols maintain a pointer to the PCB for the most recently received datagram (a one-behind cache) and call the generic function in_pcblookup
to search for a single “best” match when the received datagram does not equal the cache entry. Raw IP has completely different criteria for a matching PCB, so it searches the PCB list itself. in_pcblookup
cannot be used because a raw IP datagram can be delivered to multiple sockets, so every PCB on the raw PCB list must be scanned. This is similar to UDP’s handling of a received datagram destined for a broadcast or multicast address (Figure 23.26).
68-69
If the protocol field in the PCB is nonzero, and if it doesn’t match the protocol field in the IP header, the PCB is ignored. This implies that a raw socket with a protocol value of 0 (the third argument to socket
) can match any received raw IP datagram.
70-75
If the local address in the PCB is nonzero, and if it doesn’t match the destination IP address in the IP header, the PCB is ignored. If the foreign address in the PCB is nonzero, and if it doesn’t match the source IP address in the IP header, the PCB is ignored.
These three tests imply that a process can create a raw socket with a protocol of 0, not bind a local address, and not connect to a foreign address, and the process receives all datagrams processed by rip_input
.
Lines 71 and 74 both contain the same bug: the test for equality should be a test for inequality.
76-94
sbappendaddr
passes a copy of the received datagram to the process. The use of the variable last
is similar to what we saw in Figure 23.26: since sbappendaddr
releases the mbuf after placing it onto the appropriate queue, if more than one process receives a copy of the datagram, rip_input
must make a copy by calling m_copy
. But if only one process receives the datagram, there’s no need to make a copy.
95-99
If no matching sockets are found for the datagram, the mbuf is released, ips_noproto
is incremented, and ips_delivered
is decremented. This latter counter was incremented by IP just before calling the rip_input
(Figure 8.15). It must be decremented so that the two SNMP counters, ipInDiscards
and ipInDelivers
(Figure 8.6) are correct, since the datagram was not really delivered to a transport layer.
At the beginning of this section we mentioned that
icmp_input
callsrip_input
for unknown message types and for messages that are not reflected. This means that the receipt of an ICMP host unreachable causesips_noproto
to be incremented if there are no raw listeners whose PCB is matched byrip_input
. That’s one reason this counter has such a large value in Figure 8.5. The description of this counter as being “unknown or unsupported protocols” is not entirely accurate.Net/3 does not generate an ICMP destination unreachable message with code 2 (protocol unreachable) when an IP datagram is received with a protocol field that is not handled by either the kernel or some process through a raw socket. RFC 1122 says an implementation should generate this ICMP error. (See Exercise 32.4.)
We saw in Figure 32.6 that rip_output
is called for output for raw sockets by ICMP, IGMP, and raw IP. Output occurs when the application calls one of the five write functions: send, sendto, sendmsg, write
, or writev
. If the socket is connected, any of the five functions can be called, although a destination address cannot be specified with sendto
or sendmsg
. If the socket is unconnected, only sendto
and sendmsg
can be called, and a destination address must be specified.
The function rip_output
is shown in Figure 32.9.
Table 32.9. rip_output
function.
------------------------------------------------------------------------- raw_ip.c 105 int 106 rip_output(m, so, dst) 107 struct mbuf *m; 108 struct socket *so; 109 u_long dst; 110 { 111 struct ip *ip; 112 struct inpcb *inp = sotoinpcb(so); 113 struct mbuf *opts; 114 int flags = (so->so_options & SO_DONTROUTE) | IP_ALLOWBROADCAST; 115 /* 116 * If the user handed us a complete IP packet, use it. 117 * Otherwise, allocate an mbuf for a header and fill it in. 118 */ 119 if ((inp->inp_flags & INP_HDRINCL) == 0) { 120 M_PREPEND(m, sizeof(struct ip), M_WAIT); 121 ip = mtod(m, struct ip *); 122 ip->ip_tos = 0; 123 ip->ip_off = 0; 124 ip->ip_p = inp->inp_ip.ip_p; 125 ip->ip_len = m->m_pkthdr.len; 126 ip->ip_src = inp->inp_laddr; 127 ip->ip_dst.s_addr = dst; 128 ip->ip_ttl = MAXTTL; 129 opts = inp->inp_options; 130 } else { 131 ip = mtod(m, struct ip *); 132 if (ip->ip_id == 0) 133 ip->ip_id = htons(ip_id++); 134 opts = NULL; 135 /* XXX prevent ip_output from overwriting header fields */ 136 flags |= IP_RAWOUTPUT; 137 ipstat.ips_rawout++; 138 } 139 return (ip_output(m, opts, &inp->inp_route, flags, inp->inp_moptions)); 140 } ------------------------------------------------------------------------- raw_ip.c |
119-128
If the IP_HDRINCL
socket option is not defined, M_PREPEND
allocates room for an IP header, and fields in the IP header are filled in. The fields that are not filled in here are left for ip_output
to initialize (Figure 8.22). The protocol field is set to the value stored in the PCB, which we’ll see in Figure 32.10 is the third argument to the socket
system call.
Table 32.10. rip_usrreq
function: PRU_ATTACH
request.
------------------------------------------------------------------------- raw_ip.c 194 int 195 rip_usrreq(so, req, m, nam, control) 196 struct socket *so; 197 int req; 198 struct mbuf *m, *nam, *control; 199 { 200 int error = 0; 201 struct inpcb *inp = sotoinpcb(so); 202 extern struct socket *ip_mrouter; 203 switch (req) { 204 case PRU_ATTACH: 205 if (inp) 206 panic("rip_attach"); 207 if ((so->so_state & SS_PRIV) == 0) { 208 error = EACCES; 209 break; 210 } 211 if ((error = soreserve(so, rip_sendspace, rip_recvspace)) || 212 (error = in_pcballoc(so, &rawinpcb))) 213 break; 214 inp = (struct inpcb *) so->so_pcb; 215 inp->inp_ip.ip_p = (int) nam; 216 break; ------------------------------------------------------------------------- raw_ip.c |
The TOS is set to 0 and the TTL to 255. These values are always used for a raw socket when the kernel fills in the header. This differs from UDP and TCP where the process had the capability of setting the IP_TTL
and IP_TOS
socket options.
129
Any IP options set by the process with the IP_OPTIONS
socket options are passed to ip_output
through the opts
variable.
130-133
If the IP_HDRINCL
socket option is set, the caller supplies a completed IP header at the front of the datagram. The only modification made to this IP header is to set the ID field if the value supplied by the process is 0. The ID field of an IP datagram can be 0. The assignment of the ID field here by rip_output
is just a convention that allows the process to set it to 0, asking the kernel to assign an ID value based on the kernel’s current ip_id
variable.
134-136
The opts
variable is set to a null pointer, which ignores any IP options the process may have set with the IP_OPTIONS
socket option. The convention here is that if the caller builds its own IP header, that header includes any IP options the caller might want. The flags
variable must also include the IP_RAWOUTPUT
flag, telling ip_output
to leave the header alone.
137
The counter ips_rawout
is incremented. Running Traceroute causes this variable to be incremented by 1 for each datagram sent by Traceroute.
The operation of
rip_output
has changed over time. When theIP_HDRINCL
socket option is used in Net/3, the only change made to the IP header byrip_output
is to set the ID field, if the process sets it to 0. The Net/3ip_output
function does nothing to the IP header fields because theIP_RAWOUTPUT
flag is set. Net/2, however, always set certain fields in the IP header, even if theIP_HDRINCL
socket option was set: the IP version was set to 4, the fragment offset was set to 0, and the more-fragments flag was cleared.
The protocol’s user-request function is called for a variety of operations. As with the UDP and TCP user-request functions, rip_usrreq
is a large switch
statement, with one case
for each PRU_
xxx request.
The PRU_ATTACH
request, shown in Figure 32.10, is from the socket
system call.
194-206
Since the socket
function creates a new socket
structure each time it is called, that structure cannot point to an Internet PCB.
207-210
Only the superuser can create a raw socket. This is to prevent random users from writing their own IP datagrams to the network.
211-215
Space is reserved for input and output queues, and in_pcballoc
allocates a new Internet PCB. The PCB is added to the raw IP PCB list (rawinpcb
). The PCB is linked to the socket
structure. The nam
argument to rip_usrreq
is the third argument to the socket
system call: the protocol. It is stored in the PCB since it is used by rip_input
to demultiplex received datagrams, and its value is placed into the protocol field of outgoing datagrams by rip_output
(if IP_HDRINCL
is not set).
A raw IP socket can be connected to a foreign IP address similar to a UDP socket being connected to a foreign IP address. This fixes the foreign IP address from which the raw socket receives datagrams, as we saw in rip_input
. Since raw IP is a connectionless protocol like UDP, a PRU_DISCONNECT
request can occur in two cases:
When a connected raw socket is closed, PRU_DISCONNECT
is called before PRU_DETACH
.
When a connect
is issued on an already-connected raw socket, soconnect
issues the PRU_DISCONNECT
request before the PRU_CONNECT
request.
Figure 32.11 shows the PRU_DISCONNECT, PRU_ABORT
, and PRU_DETACH
requests.
Table 32.11. rip_usrreq
function: PRU_DISCONNECT, PRU_ABORT
, and PRU_DETACH
requests.
------------------------------------------------------------------------- raw_ip.c 217 case PRU_DISCONNECT: 218 if ((so->so_state & SS_ISCONNECTED) == 0) { 219 error = ENOTCONN; 220 break; 221 } 222 /* FALLTHROUGH */ 223 case PRU_ABORT: 224 soisdisconnected(so); 225 /* FALLTHROUGH */ 226 case PRU_DETACH: 227 if (inp == 0) 228 panic("rip_detach"); 229 if (so == ip_mrouter) 230 ip_mrouter_done(); 231 in_pcbdetach(inp); 232 break; ------------------------------------------------------------------------- raw_ip.c |
217-222
The socket must already be connected to disconnect or else an error is returned.
223-225
A PRU_ABORT
abort should never be issued for a raw IP socket, but this case
also handles the fall through from PRU_DISCONNECT
. The socket is marked as disconnected.
226-230
The close
system call issues the PRU_DETACH
request, and this case
also handles the fall through from the PRU_DISCONNECT
request. If the socket
structure is the one used for multicast routing (ip_mrouner
), multicast routing is disabled by calling ip_mrouter_done
. Normally the mrouted
(8) daemon issues the DVMRP_DONE
socket option to disable multicast routing, so this check handles the case of the router daemon terminating (i.e., crashing) without issuing the socket option.
231
The Internet PCB is released by in_pcbdetach
, which also removes the PCB from the list of raw IP PCBs (rawinpcb
).
A raw IP socket can be bound to a local IP address with the PRU_BIND
request, shown in Figure 32.12. We saw in rip_input
that the socket will receive only datagrams sent to this IP address.
Table 32.12. rip_usrreq
function: PRU_BIND
request.
------------------------------------------------------------------------- raw_ip.c 233 case PRU_BIND: 234 { 235 struct sockaddr_in *addr = mtod(nam, struct sockaddr_in *); 236 if (nam->m_len != sizeof(*addr)) { 237 error = EINVAL; 238 break; 239 } 240 if ((ifnet == 0) || 241 ((addr->sin_family != AF_INET) && 242 (addr->sin_family != AF_IMPLINK)) || 243 (addr->sin_addr.s_addr && 244 ifa_ifwithaddr((struct sockaddr *) addr) == 0)) { 245 error = EADDRNOTAVAIL; 246 break; 247 } 248 inp->inp_laddr = addr->sin_addr; 249 break; 250 } ------------------------------------------------------------------------- raw_ip.c |
233-250
The process fills in a sockaddr_in
structure with the local IP address. The following three conditions must all be true, or else the error EADDRNOTAVAIL
is returned:
at least one interface must be configured,
the address family must be AF_INET
(or AF_IMPLINK
, a historical artifact), and
if the IP address being bound is not 0.0.0.0, it must correspond to a local interface. For the call to ifa_ifwithaddr
to succeed, the port number in the caller’s sockaddr_in
must be 0.
The local IP address is stored in the PCB.
A process can also connect a raw IP socket to a particular foreign IP address. We saw in rip_input
that this restricts the process so that it receives only IP datagrams with a source IP address equal to the connected IP address. A process has the option of calling bind, connect
, both, or neither, depending on the type of filtering it wants rip_input
to place on received datagrams. Figure 32.13 shows the PRU_CONNECT
request.
Table 32.13. rip_usrreq
function: PRU_CONNECT
request.
------------------------------------------------------------------------- raw_ip.c 251 case PRU_CONNECT: 252 { 253 struct sockaddr_in *addr = mtod(nam, struct sockaddr_in *); 254 if (nam->m_len != sizeof(*addr)) { 255 error = EINVAL; 256 break; 257 } 258 if (ifnet == 0) { 259 error = EADDRNOTAVAIL; 260 break; 261 } 262 if ((addr->sin_family != AF_INET) && 263 (addr->sin_family != AF_IMPLINK)) { 264 error = EAFNOSUPPORT; 265 break; 266 } 267 inp->inp_faddr = addr->sin_addr; 268 soisconnected(so); 269 break; 270 } ------------------------------------------------------------------------- raw_ip.c |
251-270
If the caller’s sockaddr_in
is initialized correctly and at least one IP interface is configured, the specified foreign IP address is stored in the PCB. Notice that this process differs from the connection of a UDP socket to a foreign address. In the UDP case, in_pcbconnect
acquires a route to the foreign address and also stores the outgoing interface as the local address (Figure 22.9). With raw IP, only the foreign IP address is stored in the PCB, and unless the process also calls bind, only the foreign address is compared by rip_input
.
A call to shutdown
specifying that the process has finished sending data generates the PRU_SHUTDOWN
request, although it is rare for a process to issue this system call for a raw IP socket. Figure 32.14 shows the PRU_CONNECT2
and PRU_SHUTDOWN
requests.
Table 32.14. rip_usrreq
function: PRU_CONNECT2
and PRU_SHUTDOWN
requests.
---------------------------------------------------------------------------- raw_ip.c 271 case PRU_CONNECT2: 272 error = EOPNOTSUPP; 273 break; 274 /* 275 * Mark the connection as being incapable of further input. 276 */ 277 case PRU_SHUTDOWN: 278 socantsendmore(so); 279 break; ---------------------------------------------------------------------------- raw_ip.c |
271-273
The PRU_CONNECT2
request is not supported for a raw IP socket.
274-279
socantsendmore
sets the socket’s flags to prevent any future output.
In Figure 23.14 we showed how the five write functions call the protocol’s pr_usrreq
function with a PRU_SEND
request. We show this request in Figure 32.15.
Table 32.15. rip_usrreq
function: PRU_SEND
request.
-------------------------------------------------------------------------- raw_ip.c 280 /* 281 * Ship a packet out. The appropriate raw output 282 * routine handles any massaging necessary. 283 */ 284 case PRU_SEND: 285 { 286 u_long dst; 287 if (so->so_state & SS_ISCONNECTED) { 288 if (nam) { 289 error = EISCONN; 290 break; 291 } 292 dst = inp->inp_faddr.s_addr; 293 } else { 294 if (nam == NULL) { 295 error = ENOTCONN; 296 break; 297 } 298 dst = mtod(nam, struct sockaddr_in *)->sin_addr.s_addr; 299 } 300 error = rip_output(m, so, dst); 301 m = NULL; 302 break; 303 } -------------------------------------------------------------------------- raw_ip.c |
280-303
If the socket state is connected, the caller cannot specify a destination address (the nam
argument). Likewise, if the state is unconnected, a destination address is required. If all is OK, in either state, dst
is set to the destination IP address. rip_output
sends the datagram. The mbuf pointer m
is set to a null pointer, to prevent it from being released at the end of the function. This is because the interface output routine will release the mbuf after it has been output. (Remember that rip_output
passes the mbuf chain to ip_output
, who appends it to the interface’s output queue.)
The final part of rip_usrreq
is shown in Figure 32.16. The PRU_SENSE
request, generated by the fstat
system call, returns nothing. The PRU_SOCKADDR
and PRU_PEERADDR
requests are from the getsockname
and getpeername
system calls, respectively. The remaining requests are not supported.
Table 32.16. rip_usrreq
function: remaining requests.
---------------------------------------------------------------------------- raw_ip.c 304 case PRU_SENSE: 305 /* 306 * fstat: don't bother with a blocksize. 307 */ 308 return (0); 309 /* 310 * Not supported. 311 */ 312 case PRU_RCVOOB: 313 case PRU_RCVD: 314 case PRU_LISTEN: 315 case PRU_ACCEPT: 316 case PRU_SENDOOB: 317 error = EOPNOTSUPP; 318 break; 319 case PRU_SOCKADDR: 320 in_setsockaddr(inp, nam); 321 break; 322 case PRU_PEERADDR: 323 in_setpeeraddr(inp, nam); 324 break; 325 default: 326 panic("rip_usrreq"); 327 } 328 if (m != NULL) 329 m_freem(m); 330 return (error); 331 } ---------------------------------------------------------------------------- raw_ip.c |
319-324
The functions in_setsockaddr
and in_setpeeraddr
fetch the information from the PCB, storing the result in the nam
argument.
The setsockopt
and getsockopt
system calls invoke the rip_ctloutput
function. Only one IP socket option is handled here, along with eight socket options related to multicast routing.
Figure 32.17 shows the first part of the rip_ctloutput
function.
Table 32.17. rip_usrreq
function: process IP_HDRINCL
socket option.
------------------------------------------------------------------------------- raw_ip.c 144 int 145 rip_ctloutput(op, so, level, optname, m) 146 int op; 147 struct socket *so; 148 int level, optname; 149 struct mbuf **m; 150 { 151 struct inpcb *inp = sotoinpcb(so); 152 int error; 153 if (level != IPPROTO_IP) 154 return (EINVAL); 155 switch (optname) { 156 case IP_HDRINCL: 157 if (op == PRCO_SETOPT || op == PRCO_GETOPT) { 158 if (m == 0 || *m == 0 || (*m)->m_len < sizeof(int)) 159 return (EINVAL); 160 if (op == PRCO_SETOPT) { 161 if (*mtod(*m, int *)) 162 inp->inp_flags |= INP_HDRINCL; 163 else 164 inp->inp_flags &= ~INP_HDRINCL; 165 (void) m_free(*m); 166 } else { 167 (*m)->m_len = sizeof(int); 168 *mtod(*m, int *) = inp->inp_flags & INP_HDRINCL; 169 } 170 return (0); 171 } 172 break; ------------------------------------------------------------------------------- raw_ip.c |
144-172
The size of the mbuf that contains either the new value of the option or will hold the current value of the option must be at least as large as an integer. For the setsockopt
system call, the flag is set if the integer value in the mbuf is nonzero, or cleared otherwise. For the getsockopt
system call, the value returned in the mbuf is either 0 or the nonzero value of the flag. The function returns, to avoid the processing at the end of the switch
statement for other IP options.
Figure 32.18 shows the last portion of the rip_ctloutput
function. It handles eight multicast routing socket options.
Table 32.18. rip_usrreq
function: process multicast routing socket option.
------------------------------------------------------------------------- raw_ip.c 173 case DVMRP_INIT: 174 case DVMRP_DONE: 175 case DVMRP_ADD_VIF: 176 case DVMRP_DEL_VIF: 177 case DVMRP_ADD_LGRP: 178 case DVMRP_DEL_LGRP: 179 case DVMRP_ADD_MRT: 180 case DVMRP_DEL_MRT: /* shown in Figure 14.9 */ 188 } 189 return (ip_ctloutput(op, so, level, optname, m)); 190 } ------------------------------------------------------------------------- raw_ip.c |
173-188
These eight socket options are valid only for the setsockopt
system call. They are processed by the ip_mrouter_cmd
function as discussed with Figure 14.9.
189
Any other IP socket options, such as IP_OPTIONS
to set the IP options, are processed by ip_ctloutput
.
Raw sockets provide three capabilities for an IP host.
They are used to send and receive ICMP and IGMP messages.
They allow a process to build its own IP headers.
They allow additional IP-based protocols to be supported in a user process.
We saw that raw IP output is simple it just fills in a few fields in the IP header b ut it allows a process to supply its own IP header. This allows diagnostic programs to create any type of IP datagram.
Raw IP input provides three types of filtering for incoming IP datagrams. The process chooses to receive datagrams based on (1) the protocol field, (2) the source IP address (set by connect
), and (3) the destination IP address (set by bind
). The process chooses which combination of these three filters (if any) to apply.
32.1 | Assume the |
32.1 | 0 in the first example, and 255 in the second. Both of these values are reserved in RFC 1700 [Reynolds and Postel 1994] and should not appear in datagrams. This means, for example, that a socket created with a protocol of |
32.2 | A process creates a raw socket with a protocol value of |
32.2 | Since the IP protocol value of 255 is reserved, datagrams should never appear on the wire with this protocol value. Since this is a nonzero protocol value, the first of the three tests in |
32.3 | A process creates a raw socket with a protocol value of 0. What type of IP datagrams will the process receive on this socket? |
32.3 | Even though this protocol value is reserved and datagrams should never appear on the wire with this value, the first of the three tests in |
32.4 | Modify |
32.4 | Since the array |
32.5 | If a process wants to write its own IP datagrams with its own IP header, what are the differences in using a raw IP socket with the |
32.5 | In both cases the process must build its own IP header, in addition to whatever follows the IP header (UDP datagram, TCP segment, or whatever). With a raw IP socket, output is normally done using BPF requires the process to supply a complete data-link header, such as an Ethernet header. Output is normally done by calling |
32.6 | When would a process read from a raw IP socket, and when would it read from BPF? |
32.6 | A raw IP socket receives only IP datagrams destined for an IP protocol that the kernel does not process itself. A process cannot receive TCP segments or UDP datagrams on a raw socket, for example. BPF can receive all frames received on a specified interface, regardless of whether they are IP datagrams or not. The |
3.149.243.32