Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 20. Routing Sockets

Introduction

A process sends and receives the routing messages described in the previous chapter by using a socket in the routing domain. The socket system call is issued specifying a family of PF_ROUTE and a socket type of SOCK_RAW.

The process can then send five routing messages to the kernel:

RTM_ADD: add a new route.
RTM_DELETE: delete an existing route.
RTM_GET: fetch all the information about a route.
RTM_CHANGE: change the gateway, interface, or metrics of an existing route.
RTM_LOCK: specify which metrics the kernel should not modify.

Additionally, the process can receive any of the other seven types of routing messages that are generated by the kernel when some event, such as interface down, redirect received, etc., occurs.

This chapter looks at the routing domain, the routing control blocks that are created for each routing socket, the function that handles messages from a process (route_output), the function that sends routing messages to one or more processes (raw_input), and the various functions that support all the socket operations on a routing socket.

`routedomain` and `protosw` Structures

Before describing the routing socket functions, we need to discuss additional details about the routing domain; the SOCK_RAW protocol supported in the routing domain; and routing control blocks, one of which is associated with each routing socket.

Figure 20.1 lists the domain structure for the PF_ROUTE domain, named routedomain.

Table 20.1. routedomain structure.

Member	`Value`	Description
`dom_family`	`PF_ROUTE`	protocol family for domain
`dom_name`	`route`	name
`dom_init`	`route_init`	domain initialization, Figure 18.30
`dom_externalize`	`0`	not used in routing domain
`dom_dispose`	`0`	not used in routing domain
`dom_protosw`	`routesw`	protocol switch structure, Figure 20.2
`dom_protoswNPROTOSW`		pointer past end of protocol switch structure
`dom_next`		filled in by `domaininit`, Figure 7.15
`dom_rtattach`	`0`	not used in routing domain
`dom_rtoffset`	`0`	not used in routing domain
`dom_maxrtkey`	`0`	not used in routing domain

Unlike the Internet domain, which supports multiple protocols (TCP, UDP, ICMP, etc.), only one protocol (of type SOCK_RAW) is supported in the routing domain. Figure 20.2 lists the protocol switch entry for the PF_ROUTE domain.

Table 20.2. The routing protocol protosw structure.

Member	`routesw[0]`	Description
`pr_type`	`SOCK_RAW`	raw socket
`pr_domain`	`&routedomain`	part of the routing domain
`pr_protocol`	`0`
`pr_flags`	`PR_ATOMIC\|PR_ADDR`	socket layer flags, not used by protocol processing
`pr_input`	`raw_input`	this entry not used; `raw_input` called directly
`pr_output`	`route_output`	called for `PRU_SEND` requests
`pr_ctlinput`	`raw_ctlinput`	control input function
`pr_ctloutput`	`0`	not used
`pr_usrreq`	`route_usrreq`	respond to communication requests from a process
`pr_init`	`raw_init`	initialization
`pr_fasttimo`	`0`	not used
`pr_slowtimo`	`0`	not used
`pr_drain`	`0`	not used
`pr_sysctl`	`sysctl_rtable`	for `sysctl`(8) system call

Routing Control Blocks

Each time a routing socket is created with a call of the form

socket(PF_ROUTE, SOCK_RAW, protocol);

the corresponding PRU_ATTACH request to the protocol’s user-request function (route_usrreq) allocates a routing control block and links it to the socket structure. The protocol can restrict the messages sent to the process on this socket to one particular family. If a protocol of AF_INET is specified, for example, only routing messages containing Internet addresses will be sent to the process. A protocol of 0 causes all routing messages from the kernel to be sent on the socket.

Recall that we call these structures routing control blocks, not raw control blocks, to avoid confusion with the raw IP control blocks in Chapter 32.

Figure 20.3 shows the definition of the rawcb structure.

Table 20.3. rawcb structure.

----------------------------------------------------------------------- raw_cb.h
 39 struct rawcb {
 40     struct rawcb *rcb_next;     /* doubly linked list */
 41     struct rawcb *rcb_prev;
 42     struct socket *rcb_socket;  /* back pointer to socket */
 43     struct sockaddr *rcb_faddr; /* destination address */
 44     struct sockaddr *rcb_laddr; /* socket's address */
 45     struct sockproto rcb_proto; /* protocol family, protocol */
 46 };

 47 #define sotorawcb(so)       ((struct rawcb *)(so)->so_pcb)
----------------------------------------------------------------------- raw_cb.h

Additionally, a global of the same name, rawcb, is allocated as the head of the doubly linked list. Figure 20.4 shows the arrangement.

Figure 20.4. Relationship of raw protocol control blocks to other data structures.

39-47

We showed the sockproto structure in Figure 19.26. Its sp_family member is set to PF_ROUTE and its sp_protocol member is set to the third argument to the socket system call. The rcb_faddr member is permanently set to point to route_src, which we described with Figure 19.26. rcb_laddr is always a null pointer.

`raw_init` Function

The raw_init function, shown in Figure 20.5, is the protocol initialization function in the protosw structure in Figure 20.2. We described the entire initialization of the routing domain with Figure 18.29.

Table 20.5. raw_init function: initialize doubly linked list of routing control blocks.

-------------------------------------------------------------------- raw_usrreq.c
 38 void
 39 raw_init()
 40 {

 41     rawcb.rcb_next = rawcb.rcb_prev = &rawcb;
 42 }
-------------------------------------------------------------------- raw_usrreq.c

38-42

The function initializes the doubly linked list of routing control blocks by setting the next and previous pointers of the head structure to point to itself.

`route_output` Function

As we showed in Figure 18.11, route_output is called when the PRU_SEND request is issued to the protocol’s user-request function, which is the result of a write operation by a process to a routing socket. In Figure 18.9 we indicated that five different types of routing messages are accepted by the kernel from a process.

Since this function is invoked as a result of a write by a process, the data from the process (the routing message to process) is in an mbuf chain from sosend. Figure 20.6 shows an overview of the processing steps, assuming the process sends an RTM_ADD command, specifying three addresses: the destination, its gateway, and a network mask (hence this is a network route, not a host route).

Figure 20.6. Example processing of an RTM_ADD command from a process.

There are numerous points to note in this figure, most of which we’ll cover as we proceed through the source code for route_output. Also note that, to save space, we omit the RTAX_ prefix for each array index in the rt_addrinfo structure.

The process specifies which socket address structures follow the fixed-length rt_msghdr structure by setting the bitmask rtm_addrs. We show a bitmask of 0x07, which corresponds to a destination address, a gateway address, and a network mask (Figure 19.19). The RTM_ADD command requires the first two; the third is optional. Another optional address, the genmask specifies the mask to be used for generating cloned routes.
The write system call (the sosend function) copies the buffer from the process into an mbuf chain in the kernel.
m_copydata copies the mbuf chain into a buffer that route_output obtains using malloc. It is easier to access all the information in the structure and the socket address structures that follow when stored in a single contiguous buffer than it is when stored in an mbuf chain.
The function rt_xaddrs is called by route_output to take the bitmask and build the rt_addrinfo structure that points into the buffer. The code in route_output references these structures using the names shown in the fifth column in Figure 19.19. The bitmask is also copied into the rti_addrs member.
route_output normally modifies the rt_msghdr structure. If an error occurs, the corresponding errno value is returned in rtm_errno (for example, EEXIST if the route already exists); otherwise the flag RTF_DONE is logically ORed into the rtm_flags supplied by the process.
The rt_msghdr structure and the addresses that follow become input to 0 or more processes that are reading from a routing socket. The buffer is first converted back into an mbuf chain by m_copyback. raw_input goes through all the routing PCBs and passes a copy to the appropriate processes. We also show that a process with a routing socket receives a copy of each message it writes to that socket unless it disables the SO_USELOOPBACK socket option.
To avoid receiving a copy of their own routing messages, some programs, such as route, call shutdown with a second argument of 0 to prevent any data from being received on the routing socket.

We examine the source code for route_output in seven parts. Figure 20.7 shows an overview of the function.

Table 20.7. Summary of route_output processing steps.

------------------------------------------------------------------------------
int
route_output()
{
    R_Malloc() to allocate buffer;
    m_copydata() to copy from mbuf chain into buffer;
    rt_xaddrs() to build rt_addrinfo{};

    switch (message type) {
    case RTM_ADD:
        rtrequest(RTM_ADD);
        rt_setmetrics();
        break;
    case RTM_DELETE:
        rtrequest(RTM_DELETE);
        break;

    case RTM_GET:
    case RTM_CHANGE:
    case RTM_LOCK:
        rtalloc1();

        switch (message type) {
        case RTM_GET:
            rt_msg2(RTM_GET);
            break;

        case RTM_CHANGE:
            change appropriate fields;
            /* fall through */

        case RTM_LOCK:
            set rmx_locks;
            break;
        }
        break;
    }

    set rtm_error if error, else set RTF_DONE flag;

    m_copyback() to copy from buffer into mbuf chain;

    raw_input();    /* mbuf chain to appropriate processes */
}
------------------------------------------------------------------------------

The first part of route_output is shown in Figure 20.8.

Table 20.8. route_output function: initial processing, copy message from mbuf chain.

--------------------------------------------------------------------------- rtsock.c
113 int
114 route_output(m, so)
115 struct mbuf *m;
116 struct socket *so;
117 {
118     struct rt_msghdr *rtm = 0;
119     struct rtentry *rt = 0;
120     struct rtentry *saved_nrt = 0;
121     struct rt_addrinfo info;
122     int     len, error = 0;
123     struct ifnet *ifp = 0;
124     struct ifaddr *ifa = 0;

125 #define senderr(e) { error = e; goto flush;}
126     if (m == 0 || ((m->m_len < sizeof(long)) &&
127                            (m = m_pullup(m, sizeof(long))) == 0))
128                 return (ENOBUFS);
129     if ((m->m_flags & M_PKTHDR) == 0)
130         panic("route_output");
131     len = m->m_pkthdr.len;
132     if (len < sizeof(*rtm) ||
133         len != mtod(m, struct rt_msghdr *)->rtm_msglen) {
134         dst = 0;
135         senderr(EINVAL);
136     }
137     R_Malloc(rtm, struct rt_msghdr *, len);
138     if (rtm == 0) {
139         dst = 0;
140         senderr(ENOBUFS);
141     }
142     m_copydata(m, 0, len, (caddr_t) rtm);
143     if (rtm->rtm_version != RTM_VERSION) {
144         dst = 0;
145         senderr(EPROTONOSUPPORT);
146     }
147     rtm->rtm_pid = curproc->p_pid;

148     info.rti_addrs = rtm->rtm_addrs;
149     rt_xaddrs((caddr_t) (rtm + 1), len + (caddr_t) rtm, &info);

150     if (dst == 0)
151         senderr(EINVAL);

152     if (genmask) {
153         struct radix_node *t;
154         t = rn_addmask((caddr_t) genmask, 1, 2);
155         if (t && Bcmp(genmask, t->rn_key, *(u_char *) genmask) == 0)
156             genmask = (struct sockaddr *) (t->rn_key);
157         else
158             senderr(ENOBUFS);
159     }
--------------------------------------------------------------------------- rtsock.c

Check mbuf for validity

113-136

The mbuf chain is checked for validity: its length must be at least the size of an rt_msghdr structure. The first longword is fetched from the data portion of the mbuf, which contains the rtm_msglen value.

Allocate buffer

137-142

A buffer is allocated to hold the entire message and m_copydata copies the message from the mbuf chain into the buffer.

Check version number

143-146

The version of the message is checked. In the future, should a new version of the routing messages be introduced, this member could be used to provide support for older versions.

147-149

The process ID is copied into rtm_pid and the bitmask supplied by the process is copied into info.rti_addrs, a structure local to this function. The function rt_xaddrs (shown in the next section) fills in the eight socket address pointers in the info structure to point into the buffer now containing the message.

Destination address required

150-151

A destination address is a required address for all commands. If the info.rti_info[RTAX_DST] element is a null pointer, EINVAL is returned. Remember that dst refers to this array element (Figure 19.19).

Handle optional `genmask`

152-159

A genmask is optional and is used as the network mask for routes created when the RTF_CLONING flag is set (Figure 19.8). rn_addmask adds the mask to the tree of masks, first searching for an existing entry for the mask and then referencing that entry if found. If the mask is found or added to the mask tree, an additional check is made that the entry in the mask tree really equals the genmask value, and, if so, the genmask pointer is replaced with a pointer to the mask in the mask tree.

Figure 20.9 shows the next part of route_output, which handles the RTM_ADD and RTM_DELETE commands.

Table 20.9. route_output function: process RTM_ADD and RTM_DELETE commands.

-------------------------------------------------------------------------- rtsock.c
160     switch (rtm->rtm_type) {

161     case RTM_ADD:
162         if (gate == 0)
163             senderr(EINVAL);
164         error = rtrequest(RTM_ADD, dst, gate, netmask,
165                           rtm->rtm_flags, &saved_nrt);
166         if (error == 0 && saved_nrt) {
167             rt_setmetrics(rtm->rtm_inits,
168                           &rtm->rtm_rmx, &saved_nrt->rt_rmx);
169             saved_nrt->rt_refcnt--;
170             saved_nrt->rt_genmask = genmask;
171         }
172         break;

173     case RTM_DELETE:
174         error = rtrequest(RTM_DELETE, dst, gate, netmask,
175                           rtm->rtm_flags, (struct rtentry **) 0);
176         break;
-------------------------------------------------------------------------- rtsock.c

162-163

An RTM_ADD command requires the process to specify a gateway.

164-165

rtrequest processes the request. The netmask pointer can be null if the route being entered is a host route. If all is OK, the pointer to the new routing table entry is returned through saved_nrt.

166-172

The rt_metrics structure is copied from the caller’s buffer into the routing table entry. The reference count is decremented and the genmask pointer is stored (possibly a null pointer).

173-176

Processing the RTM_DELETE command is simple because all the work is done by rtrequest. Since the final argument is a null pointer, rtrequest calls rtfree if the reference count is 0, deleting the entry from the routing table (Figure 19.7).

The next part of the processing is shown in Figure 20.10, which handles the common code for the RTM_GET, RTM_CHANGE, and RTM_LOCK commands.

Table 20.10. route_output function: common processing for RTM_GET, RTM_CHANGE, and RTM_LOCK.

-------------------------------------------------------------------------- rtsock.c
177     case RTM_GET:
178     case RTM_CHANGE:
179     case RTM_LOCK:
180         rt = rtalloc1(dst, 0);
181         if (rt == 0)
182             senderr(ESRCH);
183         if (rtm->rtm_type != RTM_GET) {     /* XXX: too grotty */
184             struct radix_node *rn;
185             extern struct radix_node_head *mask_rnhead;

186             if (Bcmp(dst, rt_key(rt), dst->sa_len) != 0)
187                 senderr(ESRCH);
188             if (netmask && (rn = rn_search(netmask,
189                                            mask_rnhead->rnh_treetop)))
190                 netmask = (struct sockaddr *) rn->rn_key;
191             for (rn = rt->rt_nodes; rn; rn = rn->rn_dupedkey)
192                 if (netmask == (struct sockaddr *) rn->rn_mask)
193                     break;
194             if (rn == 0)
195                 senderr(ETOOMANYREFS);
196             rt = (struct rtentry *) rn;
197         }
-------------------------------------------------------------------------- rtsock.c

Locate existing entry

177-182

Since all three commands reference an existing entry, rtalloc1 locates the entry. If the entry isn’t found, ESRCH is returned.

Do not allow network match

183-187

For the RTM_CHANGE and RTM_LOCK commands, a network match is inadequate: an exact match with the routing table key is required. Therefore, if the dst argument doesn’t equal the routing table key, the match was a network match and ESRCH is returned.

Use network mask to find correct entry

188-193

Even with an exact match, if there are duplicate keys, each with a different network mask, the correct entry must still be located. If a netmask argument was supplied, it is looked up in the mask table (mask_rnhead). If found, the netmask pointer is replaced with the pointer to the mask in the mask tree. Each leaf node in the duplicate key list is examined, looking for an entry with an rn_mask pointer that equals netmask. This test compares the pointers, not the structures that they point to. This works because all masks appear in the mask tree, and only one copy of each unique mask is stored in this tree. In the common case, keys are not duplicated, so the for loop iterates once. If a host entry is being modified, a mask must not be specified and then both netmask and rn_mask are null pointers (which are equal). But if an entry that has an associated mask is being modified, that mask must be specified as the netmask argument.

194-195

If the for loop terminates without finding a matching network mask, ETOOMANYREFS is returned.

The comment XXX is because this function must go to all this work to find the desired entry. All these details should be hidden in another function similar to rtalloc1 that detects a network match and handles a mask argument.

The next part of this function, shown in Figure 20.11, continues processing the RTM_GET command. This command is unique among the commands supported by route_output in that it can return more data than it was passed. For example, only a single socket address structure is required as input, the destination, but at least two are returned: the destination and its gateway. With regard to Figure 20.6, this means the buffer allocated for m_copydata to copy into might need to be increased in size.

Table 20.11. route_output function: RTM_GET processing.

-------------------------------------------------------------------------------- rtsock.c
198         switch (rtm->rtm_type) {

199         case RTM_GET:
200             dst = rt_key(rt);
201             gate = rt->rt_gateway;
202             netmask = rt_mask(rt);
203             genmask = rt->rt_genmask;
204             if (rtm->rtm_addrs & (RTA_IFP | RTA_IFA)) {
205                 if (ifp = rt->rt_ifp) {
206                     ifpaddr = ifp->if_addrlist->ifa_addr;
207                     ifaaddr = rt->rt_ifa->ifa_addr;
208                     rtm->rtm_index = ifp->if_index;
209                 } else {
210                     ifpaddr = 0;
211                     ifaaddr = 0;
212                 }
213             }
214             len = rt_msg2(RTM_GET, &info, (caddr_t) 0,
215                           (struct walkarg *) 0);
216             if (len > rtm->rtm_msglen) {
217                 struct rt_msghdr *new_rtm;
218                 R_Malloc(new_rtm, struct rt_msghdr *, len);
219                 if (new_rtm == 0)
220                     senderr(ENOBUFS);
221                 Bcopy(rtm, new_rtm, rtm->rtm_msglen);
222                 Free(rtm);
223                 rtm = new_rtm;
224             }
225             (void) rt_msg2(RTM_GET, &info, (caddr_t) rtm,
226                            (struct walkarg *) 0);
227             rtm->rtm_flags = rt->rt_flags;
228             rtm->rtm_rmx = rt->rt_rmx;
229             rtm->rtm_addrs = info.rti_addrs;
230             break;
-------------------------------------------------------------------------------- rtsock.c

Return destination, gateway, and masks

198-203

Four pointers are stored in the rti_info array: dst, gate, netmask, and genmask. The latter two might be null pointers. These pointers in the info structure point to the socket address structures that will be returned to the process.

Return interface information

204-213

The process can set the masks RTA_IFP and RTA_IFA in the rtm_flags bitmask. If either or both are set, the process wants to receive the contents of both the ifaddr structures pointed to by this routing table entry: the link-level address of the interface (pointed to by rt_ifp>if_addrlist) and the protocol address for this entry (pointed to by rt_ifa>ifa_addr). The interface index is also returned.

Construct reply

214-224

rt_msg2 is called with a null third pointer to calculate the length of the routing message corresponding to RTM_GET and the addresses pointed to by the info structure. If the length of the result message exceeds the length of the input message, then a new buffer is allocated, the input message is copied into the new buffer, the old buffer is released, and rtm is set to point to the new buffer.

225-230

rt_msg2 is called again, this time with a nonnull third pointer, which builds the result message in the buffer. The final three members in the rt_msghdr structure are then filled in.

Figure 20.12 shows the processing of the RTM_CHANGE and RTM_LOCK commands.

Table 20.12. route_output function: RTM_CHANGE and RTM_LOCK processing.

-------------------------------------------------------------------------- rtsock.c
231         case RTM_CHANGE:
232             if (gate && rt_setgate(rt, rt_key(rt), gate))
233                 senderr(EDQUOT);
234             /* new gateway could require new ifaddr, ifp; flags may also be
235                different; ifp may be specified by ll sockaddr when protocol
236                address is ambiguous */
237             if (ifpaddr && (ifa = ifa_ifwithnet(ifpaddr)) &&
238                 (ifp = ifa->ifa_ifp))
239                 ifa = ifaof_ifpforaddr(ifaaddr ? ifaaddr : gate,
240                                        ifp);
241             else if ((ifaaddr && (ifa = ifa_ifwithaddr(ifaaddr))) ||
242                      (ifa = ifa_ifwithroute(rt->rt_flags,
243                                             rt_key(rt), gate)))
244                 ifp = ifa->ifa_ifp;
245             if (ifa) {
246                 struct ifaddr *oifa = rt->rt_ifa;
247                 if (oifa != ifa) {
248                     if (oifa && oifa->ifa_rtrequest)
249                         oifa->ifa_rtrequest(RTM_DELETE,
250                                             rt, gate);
251                     IFAFREE(rt->rt_ifa);
252                     rt->rt_ifa = ifa;
253                     ifa->ifa_refcnt++;
254                     rt->rt_ifp = ifp;
255                 }
256             }
257             rt_setmetrics(rtm->rtm_inits, &rtm->rtm_rmx,
258                           &rt->rt_rmx);
259             if (rt->rt_ifa && rt->rt_ifa->ifa_rtrequest)
260                 rt->rt_ifa->ifa_rtrequest(RTM_ADD, rt, gate);
261             if (genmask)
262                 rt->rt_genmask = genmask;
263             /*
264              * Fall into
265              */
266         case RTM_LOCK:
267             rt->rt_rmx.rmx_locks &= ~(rtm->rtm_inits);
268             rt->rt_rmx.rmx_locks |=
269                 (rtm->rtm_inits & rtm->rtm_rmx.rmx_locks);
270             break;
271         }
272         break;

273     default:
274         senderr(EOPNOTSUPP);
275     }
-------------------------------------------------------------------------- rtsock.c

Change gateway

231-233

If a gate address was passed by the process, rt_setgate is called to change the gateway for the entry.

Locate new interface

234-244

The new gateway (if changed) can also require new rt_ifp and rt_ifa pointers. The process can specify these new values by passing either an ifpaddr socket address structure or an ifaaddr socket address structure. The former is tried first, and then the latter. If neither is passed by the process, the rt_ifp and rt_ifa pointers are left alone.

Check if interface changed

245-256

If an interface was located (ifa is nonnull), then the existing rt_ifa pointer for the route is compared to the new value. If it has changed, new values for rt_ifp and rt_ifa are stored in the routing table entry. Before doing this the interface request function (if defined) is called with a command of RTM_DELETE. The delete is required because the link-layer information from one type of network to another can be quite different, say changing a route from an X.25 network to an Ethernet, and the output routines must be notified.

Update metrics

257-258

The metrics in the routing table entry are updated by rt_setmetrics.

Call interface request function

259-260

If an interface request function is defined, it is called with a command of RTM_ADD.

Store clone generation mask

261-262

If the process specifies the genmask argument, the pointer to the mask that was obtained in Figure 20.8 is saved in rt_genmask.

Update bitmask of locked metrics

266-270

The RTM_LOCK command updates the bitmask stored in rt_rmx.rmx_locks. Figure 20.13 shows the values of the different bits in this bitmask, one value per metric.

Table 20.13. Constants to initialize or lock metrics.

Constant	Value	Description
`RTV_MTU`	`0x01`	initialize or lock `rmx_mtu`
`RTV_HOPCOUNT`	`0x02`	initialize or lock `rmx_hopcount`
`RTV_EXPIRE`	`0x04`	initialize or lock `rmx_expire`
`RTV_RPIPE`	`0x08`	initialize or lock `rmx_recvpipe`
`RTV_SPIPE`	`0x10`	initialize or lock `rmx_sendpipe`
`RTV_SSTHRESH`	`0x20`	initialize or lock `rmx_ssthresh`
`RTV_RTT`	`0x40`	initialize or lock `rmx_rtt`
`RTV_RTTVAR`	`0x80`	initialize or lock `rmx_rttvar`

The rmx_locks member of the rt_metrics structure in the routing table entry is the bitmask telling the kernel which metrics to leave alone. That is, those metrics specified by rmx_locks won’t be updated by the kernel. The only use of these metrics by the kernel is with TCP, as noted with Figure 27.3. The rmx_pksent metric cannot be locked or initialized, but it turns out this member is never even referenced or updated by the kernel.

The rtm_inits value in the message from the process specifies the bitmask of which metrics were just initialized by rt_setmetrics. The rtm_rmx.rmx_locks value in the message specifies the bitmask of which metrics should now be locked. The value of rt_rmx.rmx_locks is the bitmask in the routing table of which metrics are currently locked. First, any bits to be initialized (rtm_inits) are unlocked. Any bits that are both initialized (rtm_inits) and locked (rtm_rmx.rmx_locks) are locked.

273-275

This default is for the switch at the beginning of Figure 20.9 and catches any of the routing commands other than the five that are supported in messages from a process.

The final part of route_output, shown in Figure 20.14, sends the reply to raw_input.

Table 20.14. route_output function: pass results to raw_input.

----------------------------------------------------------------------------- rtsock.c
276   flush:
277     if (rtm) {
278         if (error)
279             rtm->rtm_errno = error;
280         else
281             rtm->rtm_flags |= RTF_DONE;
282     }
283     if (rt)
284         rtfree(rt);
285     {
286         struct rawcb *rp = 0;
287         /*
288          * Check to see if we don't want our own messages.
289          */
290         if ((so->so_options & SO_USELOOPBACK) == 0) {
291             if (route_cb.any_count <= 1) {
292                 if (rtm)
293                     Free(rtm);
294                 m_freem(m);
295                 return (error);
296             }
297             /* There is another listener, so construct message */
298             rp = sotorawcb(so);
299         }
300         if (rtm) {
301             m_copyback(m, 0, rtm->rtm_msglen, (caddr_t) rtm);
302             Free(rtm);
303         }
304         if (rp)
305             rp->rcb_proto.sp_family = 0;    /* Avoid us */
306         if (dst)
307             route_proto.sp_protocol = dst->sa_family;
308         raw_input(m, &route_proto, &route_src, &route_dst);
309         if (rp)
310             rp->rcb_proto.sp_family = PF_ROUTE;
311     }
312     return (error);
313 }
----------------------------------------------------------------------------- rtsock.c

Return error or OK

276-282

flush is the label jumped to by the senderr macro defined at the beginning of the function. If an error occurred it is returned in the rtm_errno member; otherwise the RTF_DONE flag is set.

Release held route

283-284

If a route is being held, it is released. The call to rtalloc1 at the beginning of Figure 20.10 holds the route, if found.

No process to receive message

285-296

The SO_USELOOPBACK socket option is true by default and specifies that the sending process is to receive a copy of each routing message that it writes to a routing socket. (If the sender doesn’t receive a copy, it can’t receive any of the information returned by RTM_GET.) If that option is not set, and the total count of routing sockets is less than or equal to 1, there are no other processes to receive the message and the sender doesn’t want a copy. The buffer and mbuf chain are both released and the function returns.

Other listeners but no loopback copy

297-299

There is at least one other listener but the sending process does not want a copy. The pointer rp, which defaults to null, is set to point to the routing control block for the sender and is also used as a flag that the sender doesn’t want a copy.

Convert buffer into mbuf chain

300-303

The buffer is converted back into an mbuf chain (Figure 20.6) and the buffer released.

Avoid loopback copy

304-305

If rp is set, some other process might want the message but the sender does not want a copy. The sp_family member of the sender’s routing control block is temporarily set to 0, but the sp_family of the message (the route_proto structure, shown with Figure 19.26) has a family of PF_ROUTE. This trick prevents raw_input from passing a copy of the result to the sending process because raw_input does not pass a copy to any socket with an sp_family of 0.

Set address family of routing message

306-308

If dst is a nonnull pointer, the address family of that socket address structure becomes the protocol of the routing message. With the Internet protocols this value would be PF_INET. A copy is passed to the appropriate listeners by raw_input.

309-313

If the sp_family member in the calling process was temporarily set to 0, it is reset to PF_ROUTE, its normal value.

`rt_xaddrs` Function

The rt_xaddrs function is called only once from route_output (Figure 20.8) after the routing message from the process has been copied from the mbuf chain into a buffer and after the bitmask from the process (rtm_addrs) has been copied into the rti_info member of an rt_addrinfo structure. The purpose of rt_xaddrs is to take this bitmask and set the pointers in the rti_info array to point to the corresponding address in the buffer. Figure 20.15 shows the function.

Table 20.15. rt_xaddrs function: fill rti_into array with pointers.

-------------------------------------------------------------------------- rtsock.c
330 #define ROUNDUP(a) 
331     ((a) > 0 ? (1 + (((a) - 1) | (sizeof(long) - 1))) : sizeof(long))
332 #define ADVANCE(x, n) (x += ROUNDUP((n)->sa_len))

333 static void
334 rt_xaddrs(cp, cplim, rtinfo)
335 caddr_t cp, cplim;
336 struct rt_addrinfo *rtinfo;
337 {
338     struct sockaddr *sa;
339     int     i;

340     bzero(rtinfo->rti_info, sizeof(rtinfo->rti_info));
341     for (i = 0; (i < RTAX_MAX) && (cp < cplim); i++) {
342         if ((rtinfo->rti_addrs & (1 << i)) == 0)
343             continue;
344         rtinfo->rti_info[i] = sa = (struct sockaddr *) cp;
345         ADVANCE(cp, sa);
346     }
347 }
-------------------------------------------------------------------------- rtsock.c

330-340

The array of pointers is set to 0 so all the pointers to address structures not appearing in the bitmask will be null.

341-347

Each of the 8 (RTAX_MAX) possible bits in the bitmask is tested and, if set, a pointer is stored in the rti_info array to the corresponding socket address structure. The ADVANCE macro takes the sa_len field of the socket address structure, rounds it up to the next multiple of 4 bytes, and increments the pointer cp accordingly.

`rt_setmetrics` Function

This function was called twice from route_output: when a new route was added and when an existing route was changed. The rtm_inits member in the routing message from the process specifies which of the metrics the process wants to initialize from the rtm_rmx array. The bit values in the bitmask are shown in Figure 20.13.

Notice that both rtm_addrs and rtm_inits are bitmasks in the message from the process, the former specifying the socket address structures that follow, and the latter specifying which metrics are to be initialized. Socket address structures whose bits don’t appear in rtm_addrs don’t even appear in the routing message, to save space. But the entire rt_metrics array always appears in the fixed-length rt_msghdr structure elements in the array whose bits are not set in rtm_inits are ignored.

Figure 20.16 shows the rt_setmetrics function.

Table 20.16. rt_setmetrics function: set elements of the rt_metrics structure.

--------------------------------------------------------------------- rtsock.c
314 void
315 rt_setmetrics(which, in, out)
316 u_long  which;
317 struct rt_metrics *in, *out;
318 {
319 #define metric(f, e) if (which & (f)) out->e = in->e;
320     metric(RTV_RPIPE, rmx_recvpipe);
321     metric(RTV_SPIPE, rmx_sendpipe);
322     metric(RTV_SSTHRESH, rmx_ssthresh);
323     metric(RTV_RTT, rmx_rtt);
324     metric(RTV_RTTVAR, rmx_rttvar);
325     metric(RTV_HOPCOUNT, rmx_hopcount);
326     metric(RTV_MTU, rmx_mtu);
327     metric(RTV_EXPIRE, rmx_expire);
328 #undef metric
329 }
--------------------------------------------------------------------- rtsock.c

314-318

The which argument is always the rtm_inits member of the routing message from the process. in points to the rt_metrics structure from the process, and out points to the rt_metrics structure in the routing table entry that is being created or modified.

319-329

Each of the 8 bits in the bitmask is tested and if set, the corresponding metric is copied. Notice that when a new routing table entry is being created with the RTM_ADD command, route_output calls rtrequest, which sets the entire routing table entry to 0 (Figure 19.9). Hence, any metrics not specified by the process in the routing message default to 0.

`raw_input` Function

All routing messages destined for a process those that originate from within the kernel and those that originate from a process ar e given to raw_input, which selects the processes to receive the message. Figure 18.11 summarizes the four functions that call raw_input.

When a routing socket is created, the family is always PF_ROUTE and the protocol, the third argument to socket, can be 0, which means the process wants to receive all routing messages, or a value such as AF_INET, which restricts the socket to messages containing addresses of that specific protocol family. A routing control block is created for each routing socket (Section 20.3) and these two values are stored in the sp_family and sp_protocol members of the rcb_proto structure.

Figure 20.17 shows the raw_input function.

Table 20.17. raw_input function: pass routing messages to 0 or more processes.

----------------------------------------------------------------------- raw_usrreq.c
 51 void
 52 raw_input(m0, proto, src, dst)
 53 struct mbuf *m0;
 54 struct sockproto *proto;
 55 struct sockaddr *src, *dst;
 56 {
 57     struct rawcb *rp;
 58     struct mbuf *m = m0;
 59     int     sockets = 0;
 60     struct socket *last;

 61     last = 0;
 62     for (rp = rawcb.rcb_next; rp != &rawcb; rp = rp->rcb_next) {
 63         if (rp->rcb_proto.sp_family != proto->sp_family)
 64             continue;
 65         if (rp->rcb_proto.sp_protocol &&
 66             rp->rcb_proto.sp_protocol != proto->sp_protocol)
 67             continue;
 68         /*
 69          * We assume the lower level routines have
 70          * placed the address in a canonical format
 71          * suitable for a structure comparison.
 72          *
 73          * Note that if the lengths are not the same
 74          * the comparison will fail at the first byte.
 75          */
 76 #define equal(a1, a2) 
 77   (bcmp((caddr_t)(a1), (caddr_t)(a2), a1->sa_len) == 0)
 78         if (rp->rcb_laddr && !equal(rp->rcb_laddr, dst))
 79             continue;
 80         if (rp->rcb_faddr && !equal(rp->rcb_faddr, src))
 81             continue;
 82         if (last) {
 83             struct mbuf *n;
 84             if (n = m_copy(m, 0, (int) M_COPYALL)) {
 85                 if (sbappendaddr(&last->so_rcv, src,
 86                                  n, (struct mbuf *) 0) == 0)
 87                     /* should notify about lost packet */
 88                     m_freem(n);
 89                 else {
 90                     sorwakeup(last);
 91                     sockets++;
 92                 }
 93             }
 94         }
 95         last = rp->rcb_socket;
 96     }
 97     if (last) {
 98         if (sbappendaddr(&last->so_rcv, src,
 99                          m, (struct mbuf *) 0) == 0)
100             m_freem(m);
101         else {
102             sorwakeup(last);
103             sockets++;
104         }
105     } else
106         m_freem(m);
107 }
----------------------------------------------------------------------- raw_usrreq.c

51-61

In all four calls to raw_input that we’ve seen, the proto, src, and dst arguments are pointers to the three globals route_proto, route_src, and route_dst, which are declared and initialized as shown with Figure 19.26.

Compare address family and protocol

62-67

The for loop goes through every routing control block checking for a match. The family in the control block (normally PF_ROUTE) must match the family in the sockproto structure or the control block is skipped. Next, if the protocol in the control block (the third argument to socket) is nonzero, it must match the family in the sockproto structure, or the message is skipped. Hence a process that creates a routing socket with a protocol of 0 receives all routing messages.

Compare local and foreign addresses

68-81

These two tests compare the local address in the control block and the foreign address in the control block, if specified. Currently the process is unable to set the rcb_laddr or rcb_faddr members of the control block. Normally a process would set the former with bind and the latter with connect, but that is not possible with routing sockets in Net/3. Instead, we’ll see that route_usrreq permanently connects the socket to the route_src socket address structure, which is OK since that is always the src argument to this function.

Append message to socket receive buffer

82-107

If last is nonnull, it points to the most recently seen socket structure that should receive this message. If this variable is nonnull, a copy of the message is appended to that socket’s receive buffer by m_copy and sbappendaddr, and any processes waiting on this receive buffer are awakened. Then last is set to point to this socket that just matched the previous tests. The use of last is to avoid calling m_copy (an expensive operation) if only one process is to receive the message.

If N processes are to receive the message, the first N—1 receive a copy and the final one receives the message itself.

The variable sockets that is incremented within this function is not used. Since it is incremented only when a message is passed to a process, if it is 0 at the end of the function it indicates that no process received the message (but the value isn’t stored anywhere).

`route_usrreq` Function

route_usrreq is the routing protocol’s user-request function. It is called for a variety of operations. Figure 20.18 shows the function.

Table 20.18. route_usrreq function: process PRU_xxx requests.

----------------------------------------------------------------------------- rtsock.c
 64 int
 65 route_usrreq(so, req, m, nam, control)
 66 struct socket *so;
 67 int     req;
 68 struct mbuf *m, *nam, *control;
 69 {


 70     int     error = 0;
 71     struct rawcb *rp = sotorawcb(so);
 72     int     s;

 73     if (req == PRU_ATTACH) {
 74         MALLOC(rp, struct rawcb *, sizeof(*rp), M_PCB, M_WAITOK);
 75         if (so->so_pcb = (caddr_t) rp)
 76             bzero(so->so_pcb, sizeof(*rp));
 77     }
 78     if (req == PRU_DETACH && rp) {
 79         int     af = rp->rcb_proto.sp_protocol;
 80         if (af == AF_INET)
 81             route_cb.ip_count--;
 82         else if (af == AF_NS)
 83             route_cb.ns_count--;
 84         else if (af == AF_ISO)
 85             route_cb.iso_count--;
 86         route_cb.any_count--;
 87     }
 88     s = splnet();
 89     error = raw_usrreq(so, req, m, nam, control);
 90     rp = sotorawcb(so);
 91     if (req == PRU_ATTACH && rp) {
 92         int     af = rp->rcb_proto.sp_protocol;
 93         if (error) {
 94             free((caddr_t) rp, M_PCB);
 95             splx(s);
 96             return (error);
 97         }
 98         if (af == AF_INET)
 99             route_cb.ip_count++;
100         else if (af == AF_NS)
101             route_cb.ns_count++;
102         else if (af == AF_ISO)
103             route_cb.iso_count++;
104         route_cb.any_count++;

105         rp->rcb_faddr = &route_src;
106         soisconnected(so);
107         so->so_options |= SO_USELOOPBACK;
108     }
109     splx(s);
110     return (error);
111 }
----------------------------------------------------------------------------- rtsock.c

`PRU_ATTACH:` allocate control block

64-77

The PRU_ATTACH request is issued when the process calls socket. Memory is allocated for a routing control block. The pointer returned by MALLOC is stored in the so_pcb member of the socket structure, and if the memory was allocated, the rawcb structure is set to 0.

`PRU_DETACH:` decrement counters

78-87

The close system call issues the PRU_DETACH request. If the socket structure points to a protocol control block, two of the counters in the route_cb structure are decremented: one is the any_count and one is based on the protocol.

Process request

88-90

The function raw_usrreq is called to process the PRU_xxx request further.

Increment counters

91-104

If the request is PRU_ATTACH and the socket points to a routing control block, a check is made for an error from raw_usrreq. Two of the counters in the route_cb structure are then incremented: one is the any_count and one is based on the protocol.

Connect socket

105-106

The foreign address in the routing control block is set to route_src. This permanently connects the new socket to receive routing messages from the PF_ROUTE family.

Enable `SO_USELOOPBACK` by default

107-111

The SO_USELOOPBACK socket option is enabled. This is a socket option that defaults to being enabled a ll others default to being disabled.

`raw_usrreq` Function

raw_usrreq performs most of the processing for the user request in the routing domain. It was called by route_usrreq in the previous section. The reason the user-request processing is divided between these two functions is that other protocols (e.g., the OSI CLNP) call raw_usrreq but not route_usrreq. raw_usrreq is not intended to be the pr_usrreq function for a protocol. Instead it is a common subroutine called by the various pr_usrreq functions.

Figure 20.19 shows the beginning and end of the raw_usrreq function. The body of the switch is discussed in separate figures following this figure.

Table 20.19. Body of raw_usrreq function.

--------------------------------------------------------------- raw_usrreq.c
119 int
120 raw_usrreq(so, req, m, nam, control)
121 struct socket *so;
122 int     req;
123 struct mbuf *m, *nam, *control;
124 {
125     struct rawcb *rp = sotorawcb(so);
126     int     error = 0;
127     int     len;

128     if (req == PRU_CONTROL)
129         return (EOPNOTSUPP);
130     if (control && control->m_len) {
131         error = EOPNOTSUPP;
132         goto release;
133     }
134     if (rp == 0) {
135         error = EINVAL;
136         goto release;
137     }
138     switch (req) {
                                                                      
                              /* switch cases */                      
                                                                      
262     default:
263         panic("raw_usrreq");
264     }
265   release:
266     if (m != NULL)
267         m_freem(m);
268     return (error);
269 }
--------------------------------------------------------------- raw_usrreq.c

`PRU_CONTROL` requests invalid

119-129

The PRU_CONTROL request is from the ioctl system call and is not supported in the routing domain.

Control information invalid

130-133

If control information was passed by the process (using the sendmsg system call) an error is returned, since the routing domain doesn’t use this optional information.

Socket must have a control block

134-137

If the socket structure doesn’t point to a routing control block, an error is returned. If a new socket is being created, it is the caller’s responsibility (i.e., route_usrreq) to allocate this control block and store the pointer in the so_pcb member before calling this function.

262-269

The default for this switch catches two requests that are not handled by case statements: PRU_BIND and PRU_CONNECT. The code for these two requests is present but commented out in Net/3. Therefore issuing the bind or connect system calls on a routing socket causes a kernel panic. This is a bug. Fortunately it requires a superuser process to create this type of socket.

We now discuss the individual case statements. Figure 20.20 shows the processing for the PRU_ATTACH and PRU_DETACH requests.

Table 20.20. raw_usrreq function: PRU_ATTACH and PRU_DETACH requests.

----------------------------------------------------------------------- raw_usrreq.c
139         /*
140          * Allocate a raw control block and fill in the
141          * necessary info to allow packets to be routed to
142          * the appropriate raw interface routine.
143          */
144     case PRU_ATTACH:
145         if ((so->so_state & SS_PRIV) == 0) {
146             error = EACCES;
147             break;
148         }
149         error = raw_attach(so, (int) nam);
150         break;

151         /*
152          * Destroy state just before socket deallocation.
153          * Flush data or not depending on the options.
154          */
155     case PRU_DETACH:
156         if (rp == 0) {
157             error = ENOTCONN;
158             break;
159         }
160         raw_detach(rp);
161         break;
----------------------------------------------------------------------- raw_usrreq.c

139-148

The PRU_ATTACH request is a result of the socket system call. A routing socket must be created by a superuser process.

149-150

The function raw_attach (Figure 20.24) links the control block into the doubly linked list. The nam argument is the third argument to socket and gets stored in the control block.

151-159

The PRU_DETACH is issued by the close system call. The test of a null rp pointer is superfluous, since the test was already done before the switch statement.

160-161

raw_detach (Figure 20.25) removes the control block from the doubly linked list.

Figure 20.21 shows the processing of the PRU_CONNECT2, PRU_DISCONNECT, and PRU_SHUTDOWN requests.

Table 20.21. raw_usrreq function: PRU_CONNECT2, PRU_DISCONNECT, and PRU_SHUTDOWN requests.

--------------------------------------------------------------------- raw_usrreq.c
186     case PRU_CONNECT2:
187         error = EOPNOTSUPP;
188         goto release;

189     case PRU_DISCONNECT:
190         if (rp->rcb_faddr == 0) {
191             error = ENOTCONN;
192             break;
193         }
194         raw_disconnect(rp);
195         soisdisconnected(so);
196         break;

197         /*
198          * Mark the connection as being incapable of further input.
199          */
200     case PRU_SHUTDOWN:
201         socantsendmore(so);
202         break;
--------------------------------------------------------------------- raw_usrreq.c

186-188

The PRU_CONNECT2 request is from the socketpair system call and is not supported in the routing domain.

189-196

Since a routing socket is always connected (Figure 20.18), the PRU_DISCONNECT request is issued by close before the PRU_DETACH request. The socket must already be connected to a foreign address, which is always true for a routing socket. raw_disconnect and soisdisconnected complete the processing.

197-202

The PRU_SHUTDOWN request is from the shutdown system call when the argument specifies that no more writes will be performed on the socket. socantsendmore disables further writes.

The most common request for a routing socket, PRU_SEND, and the PRU_ABORT and PRU_SENSE requests are shown in Figure 20.22.

Table 20.22. raw_usrreq function: PRU_SEND, PRU_ABORT, and PRU_SENSE requests.

---------------------------------------------------------------------- raw_usrreq.c
203         /*
204          * Ship a packet out.  The appropriate raw output
205          * routine handles any massaging necessary.
206          */
207     case PRU_SEND:
208         if (nam) {
209             if (rp->rcb_faddr) {
210                 error = EISCONN;
211                 break;
212             }
213             rp->rcb_faddr = mtod(nam, struct sockaddr *);
214         } else if (rp->rcb_faddr == 0) {
215             error = ENOTCONN;
216             break;
217         }
218         error = (*so->so_proto->pr_output) (m, so);
219         m = NULL;
220         if (nam)
221             rp->rcb_faddr = 0;
222         break;

223     case PRU_ABORT:
224         raw_disconnect(rp);
225         sofree(so);
226         soisdisconnected(so);
227         break;

228     case PRU_SENSE:
229         /*
230          * stat: don't bother with a blocksize.
231          */
232         return (0);
---------------------------------------------------------------------- raw_usrreq.c

203-217

The PRU_SEND request is issued by sosend when the process writes to the socket. If a nam argument is specified, that is, the process specified a destination address using either sendto or sendmsg, an error is returned because route_usrreq always sets rcb_faddr for a routing socket.

218-222

The message in the mbuf chain pointed to by m is passed to the protocol’s pr_output function, which is route_output.

223-227

If a PRU_ABORT request is issued, the control block is disconnected, the socket is released, and the socket is disconnected.

228-232

The PRU_SENSE request is issued by the fstat system call. The function returns OK.

Figure 20.23 shows the remaining PRU_xxx requests.

Table 20.23. raw_usrreq function: final part.

---------------------------------------------------------------------- raw_usrreq.c
233         /*
234          * Not supported.
235          */
236     case PRU_RCVOOB:
237     case PRU_RCVD:
238         return (EOPNOTSUPP);

239     case PRU_LISTEN:
240     case PRU_ACCEPT:
241     case PRU_SENDOOB:
242         error = EOPNOTSUPP;
243         break;

244     case PRU_SOCKADDR:
245         if (rp->rcb_laddr == 0) {
246             error = EINVAL;
247             break;
248         }
249         len = rp->rcb_laddr->sa_len;
250         bcopy((caddr_t) rp->rcb_laddr, mtod(nam, caddr_t), (unsigned) len);
251         nam->m_len = len;
252         break;

253     case PRU_PEERADDR:
254         if (rp->rcb_faddr == 0) {
255             error = ENOTCONN;
256             break;
257         }
258         len = rp->rcb_faddr->sa_len;
259         bcopy((caddr_t) rp->rcb_faddr, mtod(nam, caddr_t), (unsigned) len);
260         nam->m_len = len;
261         break;
---------------------------------------------------------------------- raw_usrreq.c

233-243

These five requests are not supported.

244-261

The PRU_SOCKADDR and PRU_PEERADDR requests are from the getsockname and getpeername system calls respectively. The former always returns an error, since the bind system call, which sets the local address, is not supported in the routing domain. The latter always returns the contents of the socket address structure route_src, which was set by route_usrreq as the foreign address.

`raw_attach, raw_detach`, and `raw_disconnect` Functions

The raw_attach function, shown in Figure 20.24, was called by raw_input to finish processing the PRU_ATTACH request.

Table 20.24. raw_attach function.

------------------------------------------------------------------------- raw_cb.c
 49 int
 50 raw_attach(so, proto)
 51 struct socket *so;
 52 int     proto;
 53 {
 54     struct rawcb *rp = sotorawcb(so);
 55     int     error;

 56     /*
 57      * It is assumed that raw_attach is called
 58      * after space has been allocated for the
 59      * rawcb.
 60      */
 61     if (rp == 0)
 62         return (ENOBUFS);
 63     if (error = soreserve(so, raw_sendspace, raw_recvspace))
 64         return (error);
 65     rp->rcb_socket = so;
 66     rp->rcb_proto.sp_family = so->so_proto->pr_domain->dom_family;
 67     rp->rcb_proto.sp_protocol = proto;
 68     insque(rp, &rawcb);
 69     return (0);
 70 }
------------------------------------------------------------------------- raw_cb.c

49-64

The caller must have already allocated the raw protocol control block. soreserve sets the high-water marks for the send and receive buffers to 8192. This should be more than adequate for the routing messages.

65-67

A pointer to the socket structure is stored in the protocol control block along with the dom_family (which is PF_ROUTE from Figure 20.1 for the routing domain) and the proto argument (which is the third argument to socket).

68-70

insque adds the control block to the front of the doubly linked list headed by the global rawcb.

The raw_detach function, shown in Figure 20.25, was called by raw_input to finish processing the PRU_DETACH request.

Table 20.25. raw_detach function.

------------------------------------------------------------------------- raw_cb.c
 75 void
 76 raw_detach(rp)
 77 struct rawcb *rp;
 78 {
 79     struct socket *so = rp->rcb_socket;

 80     so->so_pcb = 0;
 81     sofree(so);
 82     remque(rp);
 83     free((caddr_t) (rp), M_PCB);
 84 }
------------------------------------------------------------------------- raw_cb.c

75-84

The so_pcb pointer in the socket structure is set to null and the socket is released. The control block is removed from the doubly linked list by remque and the memory used for the control block is released by free.

The raw_disconnect function, shown in Figure 20.26, was called by raw_input to process the PRU_DISCONNECT and PRU_ABORT requests.

Table 20.26. raw_disconnect function.

--------------------------------------------------------------------- raw_cb.c
 88 void
 89 raw_disconnect(rp)
 90 struct rawcb *rp;
 91 {

 92     if (rp->rcb_socket->so_state & SS_NOFDREF)
 93         raw_detach(rp);
 94 }
--------------------------------------------------------------------- raw_cb.c

88-94

If the socket does not reference a descriptor, raw_detach releases the socket and control block.

Summary

A routing socket is a raw socket in the PF_ROUTE domain. Routing sockets can be created only by a superuser process. If a nonprivileged process wants to read the routing information contained in the kernel, the sysctl system call supported by the routing domain can be used (we described this in the previous chapter).

This chapter was our first encounter with the protocol control blocks (PCBs) that are normally associated with each socket. In the routing domain a special rawcb contains information about the routing socket: the local and foreign addresses, the address family, and the protocol. We’ll see in Chapter 22 that the larger Internet protocol control block (inpcb) is used with UDP, TCP, and raw IP sockets. The concepts are the same, however: the socket structure is used by the socket layer, and the PCB, a rawcb or an inpcb, is used by the protocol layer. The socket structure points to the PCB and vice versa.

The route_output function handles the five routing requests that can be issued by a process. raw_input delivers a routing message to one or more routing sockets, depending on the protocol and address family. The various PRU_xxx requests for a routing socket are handled by raw_usrreq and route_usrreq. In later chapters we’ll encounter additional xxx_usrreq functions, one per protocol (UDP, TCP, and raw IP), each consisting of a switch statement to handle each request.

Exercises

20.1	List two ways a process can receive the return value from `route_output` when the process writes a message to a routing socket. Which method is more reliable?
20.1	The return value is returned in the `rtm_errno` member of the message (Figure 20.14) and also as the return value from `write` (Figure 20.22). The latter is more reliable since the former may run into mbuf starvation, causing the reply message to be discarded (Figure 20.17).
20.2	What happens when a process specifies a nonzero protocol argument to the `socket` system call, since the `pr_protocol` member of the `routesw` structure is 0?
20.2	For a `SOCK_RAW` socket, the `pffindproto` function (Figure 7.20) returns the entry with a protocol of 0 (the wildcard) if an exact match isn’t found.
20.3	Routes in the routing table (other than ARP entries) never time out. Implement a timeout on routes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 20. Routing Sockets

Create new playlist

Sign In

Sign Up