The various protocols within the kernel don’t access the routing trees directly, using the functions from the previous chapter, but instead call a few functions that we describe in this chapter: rtalloc
and rtalloc1
are two that perform routing table lookups, rtrequest
adds and deletes routing table entries, and rtinit
is called by most interfaces when the interface goes up or down.
Routing messages communicate information in two directions. A process such as the route
command or one of the routing daemons (routed
or gated
) writes routing messages to a routing socket, causing the kernel to add a new route, delete an existing route, or modify an existing route. The kernel also generates routing messages that can be read by any routing socket when events occur in which the processes might be interested: an interface has gone down, a redirect has been received, and so on. In this chapter we cover the formats of these routing messages and the information contained therein, and we save our discussion of routing sockets until the next chapter.
Another interface provided by the kernel to the routing tables is through the sysctl
system call, which we describe at the end of this chapter. This system call allows a process to read the entire routing table or a list of all the configured interfaces and interface addresses.
rtalloc
and rtalloc1
are the functions normally called to look up an entry in the routing table. Figure 19.1 shows rtalloc
.
Table 19.1. rtalloc
function.
------------------------------------------------------------------------- route.c 58 void 59 rtalloc(ro) 60 struct route *ro; 61 { 62 if (ro->ro_rt && ro->ro_rt->rt_ifp && (ro->ro_rt->rt_flags & RTF_UP)) 63 return; /* XXX */ 64 ro->ro_rt = rtalloc1(&ro->ro_dst, 1); 65 } ------------------------------------------------------------------------- route.c |
58-65
The argument ro
is often the pointer to a route
structure contained in an Internet PCB (Chapter 22) which is used by UDP and TCP. If ro
already points to an rtentry
structure (ro_rt
is nonnull), and that structure points to an interface structure, and the route is up, the function returns. Otherwise rtalloc1
is called with a second argument of 1. We’ll see the purpose of this argument shortly.
rtalloc1
, shown in Figure 19.2, calls the rnh_matchaddr
function, which is always rn_match
(Figure 18.17) for Internet addresses.
Table 19.2. rtalloc1
function.
------------------------------------------------------------------------- route.c 66 struct rtentry * 67 rtalloc1(dst, report) 68 struct sockaddr *dst; 69 int report; 70 { 71 struct radix_node_head *rnh = rt_tables[dst->sa_family]; 72 struct rtentry *rt; 73 struct radix_node *rn; 74 struct rtentry *newrt = 0; 75 struct rt_addrinfo info; 76 int s = splnet(), err = 0, msgtype = RTM_MISS; 77 if (rnh && (rn = rnh->rnh_matchaddr((caddr_t) dst, rnh)) && 78 ((rn->rn_flags & RNF_ROOT) == 0)) { 79 newrt = rt = (struct rtentry *) rn; 80 if (report && (rt->rt_flags & RTF_CLONING)) { 81 err = rtrequest(RTM_RESOLVE, dst, SA(0), 82 SA(0), 0, &newrt); 83 if (err) { 84 newrt = rt; 85 rt->rt_refcnt++; 86 goto miss; 87 } 88 if ((rt = newrt) && (rt->rt_flags & RTF_XRESOLVE)) { 89 msgtype = RTM_RESOLVE; 90 goto miss; 91 } 92 } else 93 rt->rt_refcnt++; 94 } else { 95 rtstat.rts_unreach++; 96 miss:if (report) { 97 bzero((caddr_t) & info, sizeof(info)); 98 info.rti_info[RTAX_DST] = dst; 99 rt_missmsg(msgtype, &info, 0, err); 100 } 101 } 102 splx(s); 103 return (newrt); 104 } ------------------------------------------------------------------------- route.c |
66-76
The first argument is a pointer to a socket address structure containing the address to search for. The sa_family
member selects the routing table to search.
77-78
If the following three conditions are met, the search is successful.
A routing table exists for the protocol family,
rn_match
returns a nonnull pointer, and
the matching radix_node
does not have the RNF_ROOT
flag set.
Remember that the two leaves that mark the end of the tree both have the RNF_ROOT
flag set.
94-101
If the search fails because any one of the three conditions is not met, the statistic rts_unreach
is incremented and if the second argument to rtalloc1
(report
) is nonzero, a routing message is generated that can be read by any interested processes on a routing socket. The routing message has the type RTM_MISS
, and the function returns a null pointer.
79
If all three of the conditions are met, the lookup succeeded and the pointer to the matching radix_node
is stored in rt
and newrt
. Notice that in the definition of the rtentry
structure (Figure 18.24) the two radix_node
structures are at the beginning, and, as shown in Figure 18.8, the first of these two structures contains the leaf node. Therefore the pointer to a radix_node
structure returned by rn_match
is really a pointer to an rtentry
structure, which is the matching leaf node.
80-82
If the caller specified a nonzero second argument, and if the RTF_CLONING
flag is set, rtrequest
is called with a command of RTM_RESOLVE
to create a new rtentry
structure that is a clone of the one that was located. This feature is used by ARP and for multicast addresses.
83-87
If rtrequest
returns an error, newrt
is set back to the entry returned by rn_match
and its reference count is incremented. A jump is made to miss
where an RTM_MISS
message is generated.
88-91
If rtrequest
succeeds but the newly cloned entry has the RTF_XRESOLVE
flag set, a jump is made to miss
, this time to generate an RTM_RESOLVE
message. The intent of this message is to notify a user process when the route is created, and it could be used with the conversion of IP addresses to X.121 addresses.
92-93
When the search succeeds but the RTF_CLONING
flag is not set, this statement increments the entry’s reference count. This is the normal flow through the function, which then returns the nonnull pointer.
For a small function, rtalloc1
has many options in how it operates. There are seven different flows through the function, summarized in Figure 19.3.
We note that the first two rows (entry not found) are impossible if a default route exists. Also we show rt_refcnt
being incremented in the fifth and sixth rows when the call to rtrequest
with a command of RTM_RESOLVE
is OK. The increment is done by rtrequest
.
The RTFREE
macro, shown in Figure 19.4, calls the rtfree
function only if the reference count is less than or equal to 1, otherwise it just decrements the reference count.
Table 19.4. RTFREE
macro.
------------------------------------------------------------------------- route.h 209 #define RTFREE(rt) 210 if ((rt)->rt_refcnt <= 1) 211 rtfree(rt); 212 else 213 (rt)->rt_refcnt--; /* no need for function call */ ------------------------------------------------------------------------- route.h |
209-213
The rtfree
function, shown in Figure 19.5, releases an rtentry
structure when there are no more references to it. We’ll see in Figure 22.7, for example, that when a process control block is released, if it points to a routing entry, rtfree
is called.
Table 19.5. rtfree
function: release an rtentry
structure.
------------------------------------------------------------------------- route.c 105 void 106 rtfree(rt) 107 struct rtentry *rt; 108 { 109 struct ifaddr *ifa; 110 if (rt == 0) 111 panic("rtfree"); 112 rt->rt_refcnt--; 113 if (rt->rt_refcnt <= 0 && (rt->rt_flags & RTF_UP) == 0) { 114 if (rt->rt_nodes->rn_flags & (RNF_ACTIVE | RNF_ROOT)) 115 panic("rtfree 2"); 116 rttrash--; 117 if (rt->rt_refcnt < 0) { 118 printf("rtfree: %x not freed (neg refs) ", rt); 119 return; 120 } 121 ifa = rt->rt_ifa; 122 IFAFREE(ifa); 123 Free(rt_key(rt)); 124 Free(rt); 125 } 126 } ------------------------------------------------------------------------- route.c |
105-115
The entry’s reference count is decremented and if it is less than or equal to 0 and the route is not usable, the entry can be released. If either of the flags RNF_ACTIVE
or RNF_ROOT
are set, this is an internal error. If RNF_ACTIVE
is set, this structure is still part of the routing table tree. If RNF_ROOT
is set, this structure is one of the end markers built by rn_inithead
.
116
rttrash
is a debugging counter of the number of routing entries not in the routing tree, but not released. It is incremented by rtrequest
when it begins deleting a route, and then decremented here. Its value should normally be 0.
117-122
A check is made that the reference count is not negative, and then IFAFREE
decrements the reference count for the ifaddr
structure and releases it by calling ifafree
when it reaches 0.
123-124
The memory occupied by the routing entry key and its gateway is released. We’ll see in rt_setgate
that the memory for both is allocated in one contiguous chunk, allowing both to be released with a single call to Free
. Finally the rtentry
structure itself is released.
The handling of the routing table reference count, rt_refcnt
, differs from most other reference counts. We see in Figure 18.2 that most routes have a reference count of 0, yet the routing table entries without any references are not deleted. We just saw the reason in rtfree:
an entry with a reference count of 0 is not deleted unless the entry’s RTF_UP
flag is not set. The only time this flag is cleared is by rtrequest
when a route is deleted from the routing tree.
Most routes are used in the following fashion.
If the route is created automatically as a route to an interface when the interface is configured (which is typical for Ethernet interfaces, for example), then rtinit
calls rtrequest
with a command of RTM_ADD
, creating the new entry and setting the reference count to 1. rtinit
then decrements the reference count to 0 before returning.
A point-to-point interface follows a similar procedure, so the route starts with a reference count of 0.
If the route is created manually by the route
command or by a routing daemon, a similar procedure occurs, with route_output
calling rtrequest
with a command of RTM_ADD
, setting the reference count to 1. This is then decremented by route_output
to 0 before it returns.
Therefore all newly created routes start with a reference count of 0.
When an IP datagram is sent on a socket, be it TCP or UDP, we saw that ip_output
calls rtalloc
, which calls rtalloc1
. In Figure 19.3 we saw that the reference count is incremented by rtalloc1
if the route is found.
The located route is called a held route, since a pointer to the routing table entry is being held by the protocol, normally in a route
structure contained within a protocol control block. An rtentry
structure that is being held by someone else cannot be deleted, which is why rtfree
doesn’t release the structure until its reference count reaches 0.
A protocol releases a held route by calling RTFREE
or rtfree
. We saw this in Figure 8.24 when ip_output
detects a change in the destination address. We’ll encounter it in Chapter 22 when a protocol control block that holds a route is released.
Part of the confusion we’ll encounter in the code that follows is that rtalloc1
is often called to look up a route in order to verify that a route to the destination exists, but when the caller doesn’t want to hold the route. Since rtalloc1
increments the counter, the caller immediately decrements it.
Consider a route being deleted by rtrequest
. The RTF_UP
flag is cleared, and if no one is holding the route (its reference count is 0), rtfree
should be called. But rtfree
considers it an error for the reference count to go below 0, so rtrequest
checks whether its reference count is less than or equal to 0, and, if so, increments it and calls rtfree
. Normally this sets the reference count to 1 and rtfree
decrements it to 0 and deletes the route.
The rtrequest
function is the focal point for adding and deleting routing table entries. Figure 19.6 shows some of the other functions that call it.
rtrequest
is a switch
statement with one case
per command: RTM_ADD, RTM_DELETE
, and RTM_RESOLVE
. Figure 19.7 shows the start of the function and the RTM_DELETE
command.
Table 19.7. rtrequest
function: RTM_DELETE
command.
------------------------------------------------------------------------- route.c 290 int 291 rtrequest(req, dst, gateway, netmask, flags, ret_nrt) 292 int req, flags; 293 struct sockaddr *dst, *gateway, *netmask; 294 struct rtentry **ret_nrt; 295 { 296 int s = splnet(); 297 int error = 0; 298 struct rtentry *rt; 299 struct radix_node *rn; 300 struct radix_node_head *rnh; 301 struct ifaddr *ifa; 302 struct sockaddr *ndst; 303 #define senderr(x) { error = x ; goto bad; } 304 if ((rnh = rt_tables[dst->sa_family]) == 0) 305 senderr(ESRCH); 306 if (flags & RTF_HOST) 307 netmask = 0; 308 switch (req) { 309 case RTM_DELETE: 310 if ((rn = rnh->rnh_deladdr(dst, netmask, rnh)) == 0) 311 senderr(ESRCH); 312 if (rn->rn_flags & (RNF_ACTIVE | RNF_ROOT)) 313 panic("rtrequest delete"); 314 rt = (struct rtentry *) rn; 315 rt->rt_flags &= ~RTF_UP; 316 if (rt->rt_gwroute) { 317 rt = rt->rt_gwroute; 318 RTFREE(rt); 319 (rt = (struct rtentry *) rn)->rt_gwroute = 0; 320 } 321 if ((ifa = rt->rt_ifa) && ifa->ifa_rtrequest) 322 ifa->ifa_rtrequest(RTM_DELETE, rt, SA(0)); 323 rttrash++; 324 if (ret_nrt) 325 *ret_nrt = rt; 326 else if (rt->rt_refcnt <= 0) { 327 rt->rt_refcnt++; 328 rtfree(rt); 329 } 330 break; ------------------------------------------------------------------------- route.c |
290-307
The second argument, dst
, is a socket address structure specifying the key to be added or deleted from the routing table. The sa_family
from this key selects the routing table. If the flags
argument indicates a host route (instead of a route to a network), the netmask
pointer is set to null, ignoring any value the caller may have passed.
309-315
The rnh_deladdr
function (rn_delete
from Figure 18.17) deletes the entry from the routing table tree and returns a pointer to the corresponding rtentry
structure. The RTF_UP
flag is cleared.
316-320
If the entry is an indirect route through a gateway, RTFREE
decrements the rt_refcnt
member of the gateway’s entry and deletes it if the count reaches 0. The rt_gwroute
pointer is set to null and rt
is set back to point to the entry that was deleted.
321-322
If an ifa_rtrequest
function is defined for this entry, that function is called. This function is used by ARP, for example, in Chapter 21 to delete the corresponding ARP entry.
323-330
The rttrash
global is incremented because the entry may not be released in the code that follows. If the caller wants the pointer to the rtentry
structure that was deleted from the routing tree (if ret_nrt
is nonnull), then that pointer is returned, but the entry cannot be released: it is the caller’s responsibility to call rtfree
when it is finished with the entry. If ret_nrt
is null, the entry can be released: if the reference count is less than or equal to 0, it is incremented, and rtfree
is called. The break
causes the function to return.
Figure 19.8 shows the next part of the function, which handles the RTM_RESOLVE
command. This function is called with this command only from rtalloc1
, when a new entry is to be created from an entry with the RTF_CLONING
flag set.
Table 19.8. rtrequest
function: RTM_RESOLVE
command.
------------------------------------------------------------------------- route.c 331 case RTM_RESOLVE: 332 if (ret_nrt == 0 || (rt = *ret_nrt) == 0) 333 senderr(EINVAL); 334 ifa = rt->rt_ifa; 335 flags = rt->rt_flags & ~RTF_CLONING; 336 gateway = rt->rt_gateway; 337 if ((netmask = rt->rt_genmask) == 0) 338 flags |= RTF_HOST; 339 goto makeroute; ------------------------------------------------------------------------- route.c |
331-339
The final argument, ret_nrt
, is used differently for this command: it contains the pointer to the entry with the RTF_CLONING
flag set (Figure 19.2). The new entry will have the same rt_ifa
pointer, the same flags (with the RTF_CLONING
flag cleared), and the same rt_gateway
. If the entry being cloned has a null rt_genmask
pointer, the new entry has its RTF_HOST
flag set, because it is a host route; otherwise the new entry is a network route and the network mask of the new entry is copied from the rt_genmask
value. We give an example of cloned routes with a network mask at the end of this section. This case
continues at the label makeroute
, which is in the next figure.
Figure 19.9 shows the RTM_ADD
command.
Table 19.9. rtrequest
function: RTM_ADD
command.
------------------------------------------------------------------------- route.c 340 case RTM_ADD: 341 if ((ifa = ifa_ifwithroute(flags, dst, gateway)) == 0) 342 senderr(ENETUNREACH); 343 makeroute: 344 R_Malloc(rt, struct rtentry *, sizeof(*rt)); 345 if (rt == 0) 346 senderr(ENOBUFS); 347 Bzero(rt, sizeof(*rt)); 348 rt->rt_flags = RTF_UP | flags; 349 if (rt_setgate(rt, dst, gateway)) { 350 Free(rt); 351 senderr(ENOBUFS); 352 } 353 ndst = rt_key(rt); 354 if (netmask) { 355 rt_maskedcopy(dst, ndst, netmask); 356 } else 357 Bcopy(dst, ndst, dst->sa_len); 358 rn = rnh->rnh_addaddr((caddr_t) ndst, (caddr_t) netmask, 359 rnh, rt->rt_nodes); 360 if (rn == 0) { 361 if (rt->rt_gwroute) 362 rtfree(rt->rt_gwroute); 363 Free(rt_key(rt)); 364 Free(rt); 365 senderr(EEXIST); 366 } 367 ifa->ifa_refcnt++; 368 rt->rt_ifa = ifa; 369 rt->rt_ifp = ifa->ifa_ifp; 370 if (req == RTM_RESOLVE) 371 rt->rt_rmx = (*ret_nrt)->rt_rmx; /* copy metrics */ 372 if (ifa->ifa_rtrequest) 373 ifa->ifa_rtrequest(req, rt, SA(ret_nrt ? *ret_nrt : 0)); 374 if (ret_nrt) { 375 *ret_nrt = rt; 376 rt->rt_refcnt++; 377 } 378 break; 379 } 380 bad: 381 splx(s); 382 return (error); 383 } ------------------------------------------------------------------------- route.c |
340-342
The function ifa_ifwithroute
finds the appropriate local interface for the destination (dst
), returning a pointer to its ifaddr
structure.
343-348
An rtentry
structure is allocated. Recall that this structure contains both the two radix_node
structures for the routing tree and the other routing information. The structure is zeroed and the rt_flags
are set from the caller’s flags, including the RTF_UP
flag.
349-352
The rt_setgate
function (Figure 19.11) allocates memory for both the routing table key (dst
) and its gateway
. It then copies gateway
into the new memory and sets the pointers rt_key
, rt_gateway
, and rt_gwroute
.
353-357
The destination address (the routing table key dst
) must now be copied into the memory pointed to by rn_key
. If a network mask is supplied, rt_maskedcopy
logically ANDs dst
and netmask
, forming the new key. Otherwise dst
is copied into the new key. The reason for logically ANDing dst
and netmask
is to guarantee that the key in the table has already been ANDed with its mask, so when a search key is compared against the key in the table only the search key needs to be ANDed. For example, the following command adds another IP address (an alias) to the Ethernet interface le0
, with subnet 12 instead of 13:
bsdi $ ifconfig le0 inet 140.252.12.63 netmask 0xffffffe0 alias
The problem is that we’ve incorrectly specified all one bits for the host ID. Nevertheless, when the key is stored in the routing table we can verify with netstat
that the address is first logically ANDed with the mask:
Destination Gateway Flags Refs Use Interface 140.252.12.32 link#1 U C 0 0 le0
358-366
The rnh_addaddr
function (rn_addroute
from Figure 18.17) adds this rtentry
structure, with its destination and mask, to the routing table tree. If an error occurs, the structures are released and EEXIST
returned (i.e., the entry is already in the routing table).
367-369
The ifaddr
structure’s reference count is incremented and the pointers to its ifaddr
and ifnet
structures are stored.
370-371
If the command was RTM_RESOLVE
(not RTM_ADD
), the entire metrics structure is copied from the cloned entry into the new entry. If the command was RTM_ADD
, the caller can set the metrics after this function returns.
372-373
If an ifa_rtrequest
function is defined for this entry, that function is called. ARP uses this to perform additional processing for both the RTM_ADD
and RTM_RESOLVE
commands (Section 21.13).
374-378
If the caller wants a copy of the pointer to the new structure, it is returned through ret_nrt
and the rt_refcnt
reference count is incremented from 0 to 1.
The only use of the rt_genmask
value is with cloned routes created by the RTM_RESOLVE
command in rtrequest
. If an rt_genmask
pointer is nonnull, then the socket address structure pointed to by this pointer becomes the network mask of the newly created route. In our routing table, Figure 18.2, the cloned routes are for the local Ethernet and for multicast addresses. The following example from [Sklower 1991] provides a different use of cloned routes. Another example is in Exercise 19.2.
Consider a class B network, say 128.1, that is behind a point-to-point link. The subnet mask is 0xffffff00
, the typical value that uses 8 bits for the subnet ID and 8 bits for the host ID. We need a routing table entry for all possible 254 subnets, with a gateway value of a router that is directly connected to our host and that knows how to reach the link to which the 128.1 network is connected.
The easiest solution, assuming the gateway router isn’t our default router, is a single entry with a destination of 128.1.0.0 and a mask of 0xffff0000
. Assume, however, that the topology of the 128.1 network is such that each of the possible 254 subnets can have different operational characteristics: RTTs, MTUs, delays, and so on. If a separate routing table entry were used for each subnet, we would see that whenever a connection is closed, TCP would update the routing table entry with statistics about that route its RTT, RTT variance, and so on (Figure 27.3). While we could create up to 254 entries by hand using the route
command, one per subnet, a better solution is to use the cloning feature.
One entry is created by the system administrator with a destination of 128.1.0.0 and a network mask of 0xffff0000
. Additionally, the RTF_CLONING
flag is set and the genmask is set to 0xffffff00
, which differs from the network mask. If the routing table is searched for 128.1.2.3, and an entry does not exist for the 128.1.2 subnet, the entry for 128.1 with the mask of 0xffff0000
is the best match. A new entry is created (since the RTF_CLONING
flag is set) with a destination of 128.1.2 and a network mask of 0xffffff00
(the genmask value). The next time any host on this subnet is referenced, say 128.1.2.88, it will match this newly created entry.
Each leaf in the routing tree has a key (rt_key
, which is just the rn_key
member of the radix_node
structure contained at the beginning of the rtentry
structure), and an associated gateway (rt_gateway
). Both are socket address structures specified when the routing table entry is created. Memory is allocated for both structures by rt_setgate
, as shown in Figure 19.10.
This example shows two of the entries from Figure 18.2, the ones with keys of 127.0.0.1 and 140.252.13.33. The former’s gateway member points to an Internet socket address structure, while the latter’s points to a data-link socket address structure that contains an Ethernet address. The former was entered into the routing table by the route
system when the system was initialized, and the latter was created by ARP.
We purposely show the two structures pointed to by rt_key
one right after the other, since they are allocated together by rt_setgate
, which we show in Figure 19.11.
Table 19.11. rt_setgate
function.
----------------------------------------------------------------------- route.c 384 int 385 rt_setgate(rt0, dst, gate) 386 struct rtentry *rt0; 387 struct sockaddr *dst, *gate; 388 { 389 caddr_t new, old; 390 int dlen = ROUNDUP(dst->sa_len), glen = ROUNDUP(gate->sa_len); 391 struct rtentry *rt = rt0; 392 if (rt->rt_gateway == 0 || glen > ROUNDUP(rt->rt_gateway->sa_len)) { 393 old = (caddr_t) rt_key(rt); 394 R_Malloc(new, caddr_t, dlen + glen); 395 if (new == 0) 396 return 1; 397 rt->rt_nodes->rn_key = new; 398 } else { 399 new = rt->rt_nodes->rn_key; 400 old = 0; 401 } 402 Bcopy(gate, (rt->rt_gateway = (struct sockaddr *) (new + dlen)), glen); 403 if (old) { 404 Bcopy(dst, new, dlen); 405 Free(old); 406 } 407 if (rt->rt_gwroute) { 408 rt = rt->rt_gwroute; 409 RTFREE(rt); 410 rt = rt0; 411 rt->rt_gwroute = 0; 412 } 413 if (rt->rt_flags & RTF_GATEWAY) { 414 rt->rt_gwroute = rtalloc1(gate, 1); 415 } 416 return 0; 417 } ----------------------------------------------------------------------- route.c |
384-391
dlen
is the length of the destination socket address structure, and glen
is the length of the gateway socket address structure. The ROUNDUP
macro rounds the value up to the next multiple of 4 bytes, but the size of most socket address structures is already a multiple of 4.
392-397
If memory has not been allocated for this routing table key and gateway yet, or if glen
is greater than the current size of the structure pointed to by rt_gateway
, a new piece of memory is allocated and rn_key
is set to point to the new memory.
398-401
An adequately sized piece of memory is already allocated for the key and gateway, so new
is set to point to this existing memory.
402
The new gateway structure is copied and rt_gateway
is set to point to the socket address structure.
403-406
If a new piece of memory was allocated, the routing table key (dst
) is copied right before the gateway field that was just copied. The old piece of memory is released.
407-412
If the routing table entry contains a nonnull rt_gwroute
pointer, that structure is released by RTFREE
and the rt_gwroute
pointer is set to null.
413-415
If the routing table entry is an indirect route, rtalloc1
locates the entry for the new gateway, which is stored in rt_gwroute
. If an invalid gateway is specified for an indirect route, an error is not returned by rt_setgate
, but the rt_gwroute
pointer will be null.
There are four calls to rtinit
from the Internet protocols to add or delete routes associated with interfaces.
in_control
calls rtinit
twice when the destination address of a point-to-point interface is set (Figure 6.21). The first call specifies RTM_DELETE
to delete any existing route to the destination; the second call specifies RTM_ADD
to add the new route.
in_ifinit
calls rtinit
to add a network route for a broadcast network or a host route for a point-to-point link (Figure 6.19). If the route is for an Ethernet interface, the RTF_CLONING
flag is automatically set by in_ifinit
.
in_ifscrub
calls rtinit
to delete an existing route for an interface.
Figure 19.12 shows the first part of the rtinit
function. The cmd
argument is always RTM_ADD
or RTM_DELETE
.
Table 19.12. rtinit
function: call rtrequest
to handle command.
------------------------------------------------------------------------- route.c 441 int 442 rtinit(ifa, cmd, flags) 443 struct ifaddr *ifa; 444 int cmd, flags; 445 { 446 struct rtentry *rt; 447 struct sockaddr *dst; 448 struct sockaddr *deldst; 449 struct mbuf *m = 0; 450 struct rtentry *nrt = 0; 451 int error; 452 dst = flags & RTF_HOST ? ifa->ifa_dstaddr : ifa->ifa_addr; 453 if (cmd == RTM_DELETE) { 454 if ((flags & RTF_HOST) == 0 && ifa->ifa_netmask) { 455 m = m_get(M_WAIT, MT_SONAME); 456 deldst = mtod(m, struct sockaddr *); 457 rt_maskedcopy(dst, deldst, ifa->ifa_netmask); 458 dst = deldst; 459 } 460 if (rt = rtalloc1(dst, 0)) { 461 rt->rt_refcnt--; 462 if (rt->rt_ifa != ifa) { 463 if (m) 464 (void) m_free(m); 465 return (flags & RTF_HOST ? EHOSTUNREACH 466 : ENETUNREACH); 467 } 468 } 469 } 470 error = rtrequest(cmd, dst, ifa->ifa_addr, ifa->ifa_netmask, 471 flags | ifa->ifa_flags, &nrt); 472 if (m) 473 (void) m_free(m); ------------------------------------------------------------------------- route.c |
452
If the route is to a host, the destination address is the other end of the point-to-point link. Otherwise we’re dealing with a network route and the destination address is the unicast address of the interface (masked with ifa_netmask
).
453-459
If a route is being deleted, the destination must be looked up in the routing table to locate its routing table entry. If the route being deleted is a network route and the interface has an associated network mask, an mbuf is allocated and the destination address is copied into the mbuf by rt_maskedcopy
, logically ANDing the caller’s address with the mask. dst
is set to point to the masked copy in the mbuf, and that is the destination looked up in the next step.
460-469
rtalloc1
searches the routing table for the destination address. If the entry is found, its reference count is decremented (since rtalloc1
incremented the reference count). If the pointer to the interface’s ifaddr
in the routing table does not equal the caller’s argument, an error is returned.
470-473
rtrequest
executes the command, either RTM_ADD
or RTM_DELETE
. When it returns, if an mbuf was allocated earlier, it is released.
Figure 19.13 shows the second half of rtinit
.
Table 19.13. rtinit
function: second half.
------------------------------------------------------------------------- route.c 474 if (cmd == RTM_DELETE && error == 0 && (rt = nrt)) { 475 rt_newaddrmsg(cmd, ifa, error, nrt); 476 if (rt->rt_refcnt <= 0) { 477 rt->rt_refcnt++; 478 rtfree(rt); 479 } 480 } 481 if (cmd == RTM_ADD && error == 0 && (rt = nrt)) { 482 rt->rt_refcnt--; 483 if (rt->rt_ifa != ifa) { 484 printf("rtinit: wrong ifa (%x) was (%x) ", ifa, 485 rt->rt_ifa); 486 if (rt->rt_ifa->ifa_rtrequest) 487 rt->rt_ifa->ifa_rtrequest(RTM_DELETE, rt, SA(0)); 488 IFAFREE(rt->rt_ifa); 489 rt->rt_ifa = ifa; 490 rt->rt_ifp = ifa->ifa_ifp; 491 ifa->ifa_refcnt++; 492 if (ifa->ifa_rtrequest) 493 ifa->ifa_rtrequest(RTM_ADD, rt, SA(0)); 494 } 495 rt_newaddrmsg(cmd, ifa, error, nrt); 496 } 497 return (error); 498 } ------------------------------------------------------------------------- route.c |
474-480
If a route was deleted, and rtrequest
returned 0 along with a pointer to the rtentry
structure that was deleted (in nrt
), a routing socket message is generated by rt_newaddrmsg
. If the reference count is less than or equal to 0, it is incremented and the route is released by rtfree
.
481-482
If a route was added, and rtrequest
returned 0 along with a pointer to the rtentry
structure that was added (in nrt
), the reference count is decremented (since rtrequest
incremented it).
483-494
If the pointer to the interface’s ifaddr
in the new routing table entry does not equal the caller’s argument, an error occurred. Recall that rtrequest
determines the ifa
pointer that is stored in the new entry by calling ifa_ifwithroute
(Figure 19.9). When this error occurs the following steps take place: an error message is output to the console, the ifa_rtrequest
function is called (if defined) with a command of RTM_DELETE
, the ifaddr
structure is released, the rt_ifa
pointer is set to the value specified by the caller, the interface reference count is incremented, and the new interface’s ifa_rtrequest
function (if defined) is called with a command of RTM_ADD
.
When an ICMP redirect is received, icmp_input
calls rtredirect
and then calls pfctlinput
(Figure 11.27). This latter function calls udp_ctlinput
and tcp_ctlinput
, which go through all the UDP and TCP protocol control blocks. If the PCB is connected to the foreign address that has been redirected, and if the PCB holds a route to that foreign address, the route is released by rtfree
. The next time any of these control blocks is used to send an IP datagram to that foreign address, rtalloc
will be called and the destination will be looked up in the routing table, possibly finding a new (redirected) route.
The purpose of rtredirect
, the first half of which is shown in Figure 19.14, is to validate the information in the redirect, update the routing table immediately, and then generate a routing socket message.
Table 19.14. rtredirect
function: validate received redirect.
------------------------------------------------------------------------- route.c 147 int 148 rtredirect(dst, gateway, netmask, flags, src, rtp) 149 struct sockaddr *dst, *gateway, *netmask, *src; 150 int flags; 151 struct rtentry **rtp; 152 { 153 struct rtentry *rt; 154 int error = 0; 155 short *stat = 0; 156 struct rt_addrinfo info; 157 struct ifaddr *ifa; 158 /* verify the gateway is directly reachable */ 159 if ((ifa = ifa_ifwithnet(gateway)) == 0) { 160 error = ENETUNREACH; 161 goto out; 162 } 163 rt = rtalloc1(dst, 0); 164 /* 165 * If the redirect isn't from our current router for this dst, 166 * it's either old or wrong. If it redirects us to ourselves, 167 * we have a routing loop, perhaps as a result of an interface 168 * going down recently. 169 */ 170 #define equal(a1, a2) (bcmp((caddr_t)(a1), (caddr_t)(a2), (a1)->sa_len) == 0) 171 if (!(flags & RTF_DONE) && rt && 172 (!equal(src, rt->rt_gateway) || rt->rt_ifa != ifa)) 173 error = EINVAL; 174 else if (ifa_ifwithaddr(gateway)) 175 error = EHOSTUNREACH; 176 if (error) 177 goto done; 178 /* 179 * Create a new entry if we just got back a wildcard entry 180 * or if the lookup failed. This is necessary for hosts 181 * which use routing redirects generated by smart gateways 182 * to dynamically build the routing tables. 183 */ 184 if ((rt == 0) || (rt_mask(rt) && rt_mask(rt)->sa_len < 2)) 185 goto create; ------------------------------------------------------------------------- route.c |
147-157
The arguments are dst
, the destination IP address of the datagram that caused the redirect (HD in Figure 8.18); gateway
, the IP address of the router to use as the new gateway field for the destination (R2 in Figure 8.18); netmask
, which is a null pointer; flags
, which is RTF_GATEWAY
and RTF_HOST; src
, the IP address of the router that sent the redirect (R1 in Figure 8.18); and rtp
, which is a null pointer. We indicate that netmask
and rtp
are both null pointers when called by icmp_input
, but these arguments might be nonnull when called from other protocols.
158-162
The new gateway must be directly connected or the redirect is invalid.
163-177
rtalloc1
searches the routing table for a route to the destination. The following conditions must all be true, or the redirect is invalid and an error is returned. Notice that icmp_input
ignores any error return from rtredirect
. ICMP does not generate an error in response to an invalid redirect it just ignores it.
the RTF_DONE
flag must not be set;
rtalloc
must have located a routing table entry for dst;
the address of the router that sent the redirect (src
) must equal the current rt_gateway
for the destination;
the interface for the new gateway (the ifa
returned by ifa_ifwithnet
) must equal the current interface for the destination (rt_ifa
), that is, the new gateway must be on the same network as the current gateway; and
the new gateway cannot redirect this host to itself, that is, there cannot exist an attached interface with a unicast address or a broadcast address equal to gateway
.
178-185
If a route to the destination was not found, or if the routing table entry that was located is the default route, a new entry is created for the destination. As the comment indicates, a host with access to multiple routers can use this feature to learn of the correct router when the default is not correct. The test for finding the default route is whether the routing table entry has an associated mask and if the length field of the mask is less than 2, since the mask for the default route is rn_zeros
(Figure 18.35).
Figure 19.15 shows the second half of this function.
Table 19.15. rtredirect
function: second half.
------------------------------------------------------------------------- route.c 186 /* 187 * Don't listen to the redirect if it's 188 * for a route to an interface. 189 */ 190 if (rt->rt_flags & RTF_GATEWAY) { 191 if (((rt->rt_flags & RTF_HOST) == 0) && (flags & RTF_HOST)) { 192 /* 193 * Changing from route to net => route to host. 194 * Create new route, rather than smashing route to net. 195 */ 196 create: 197 flags |= RTF_GATEWAY | RTF_DYNAMIC; 198 error = rtrequest((int) RTM_ADD, dst, gateway, 199 netmask, flags, 200 (struct rtentry **) 0); 201 stat = &rtstat.rts_dynamic; 202 } else { 203 /* 204 * Smash the current notion of the gateway to 205 * this destination. Should check about netmask!!! 206 */ 207 rt->rt_flags |= RTF_MODIFIED; 208 flags |= RTF_MODIFIED; 209 stat = &rtstat.rts_newgateway; 210 rt_setgate(rt, rt_key(rt), gateway); 211 } 212 } else 213 error = EHOSTUNREACH; 214 done: 215 if (rt) { 216 if (rtp && !error) 217 *rtp = rt; 218 else 219 rtfree(rt); 220 } 221 out: 222 if (error) 223 rtstat.rts_badredirect++; 224 else if (stat != NULL) 225 (*stat)++; 226 bzero((caddr_t) & info, sizeof(info)); 227 info.rti_info[RTAX_DST] = dst; 228 info.rti_info[RTAX_GATEWAY] = gateway; 229 info.rti_info[RTAX_NETMASK] = netmask; 230 info.rti_info[RTAX_AUTHOR] = src; 231 rt_missmsg(RTM_REDIRECT, &info, flags, error); 232 } ------------------------------------------------------------------------- route.c |
186-195
If the current route to the destination is a network route and the redirect is a host redirect and not a network redirect, a new host route is created for the destination and the existing network route is left alone. We mentioned that the flags
argument always specifies RTF_HOST
since the Net/3 ICMP considers all received redirects as host redirects.
196-201
rtrequest
creates the new route, setting the RTF_GATEWAY
and RTF_DYNAMIC
flags. The netmask
argument is a null pointer, since the new route is a host route with an implied mask of all one bits. stat
points to a counter that is incremented later.
202-211
This code is executed when the current route to the destination is already a host route. A new entry is not created, but the existing entry is modified. The RTF_MODIFIED
flag is set and rt_setgate
changes the rt_gateway
field of the routing table entry to the new gateway address.
212-213
If the current route to the destination is a direct route (the RTF_GATEWAY
flag is not set), it is a redirect for a destination that is already directly connected. EHOSTUNREACH
is returned.
214-225
If a routing table entry was located, it is either returned (if rtp
is nonnull and there were no errors) or released by rtfree
. The appropriate statistic is incremented.
Routing messages consist of a fixed-length header followed by up to eight socket address structures. The fixed-length header is one of the following three structures:
rt_msghdr
if_msghdr
ifa_msghdr
Figure 18.11 provided an overview of which functions generated the different messages and Figure 18.9 showed which structure is used by each message type. The first three members of the three structures have the same data type and meaning: the message length, version, and type. This allows the receiver of the message to decode the message. Also, each structure has a member that encodes which of the eight potential socket address structures follow the structure (a bitmask): the rtm_addrs
, ifm_addrs
, and ifam_addrs
members.
Figure 19.16 shows the most common of the structures, rt_msghdr
. The RTM_IFINFO
message uses an if_msghdr
structure, shown in Figure 19.17. The RTM_NEWADDR
and RTM_DELADDR
messages use an ifa_msghdr
structure, shown in Figure 19.18.
Table 19.16. rt_msghdr
structure.
------------------------------------------------------------------------ route.h 139 struct rt_msghdr { 140 u_short rtm_msglen; /* to skip over non-understood messages */ 141 u_char rtm_version; /* future binary compatibility */ 142 u_char rtm_type; /* message type */ 143 u_short rtm_index; /* index for associated ifp */ 144 int rtm_flags; /* flags, incl. kern & message, e.g. DONE */ 145 int rtm_addrs; /* bitmask identifying sockaddrs in msg */ 146 pid_t rtm_pid; /* identify sender */ 147 int rtm_seq; /* for sender to identify action */ 148 int rtm_errno; /* why failed */ 149 int rtm_use; /* from rtentry */ 150 u_long rtm_inits; /* which metrics we are initializing */ 151 struct rt_metrics rtm_rmx; /* metrics themselves */ 152 }; ------------------------------------------------------------------------ route.h |
Table 19.17. if_msghdr
structure.
--------------------------------------------------------------------------- if.h 235 struct if_msghdr { 236 u_short ifm_msglen; /* to skip over non-understood messages */ 237 u_char ifm_version; /* future binary compatability */ 238 u_char ifm_type; /* message type */ 239 int ifm_addrs; /* like rtm_addrs */ 240 int ifm_flags; /* value of if_flags */ 241 u_short ifm_index; /* index for associated ifp */ 242 struct if_data ifm_data; /* statistics and other data about if */ 243 }; --------------------------------------------------------------------------- if.h |
Table 19.18. ifa_msghdr
structure.
--------------------------------------------------------------------------- if.h 248 struct ifa_msghdr { 249 u_short ifam_msglen; /* to skip over non-understood messages */ 250 u_char ifam_version; /* future binary compatability */ 251 u_char ifam_type; /* message type */ 252 int ifam_addrs; /* like rtm_addrs */ 253 int ifam_flags; /* value of ifa_flags */ 254 u_short ifam_index; /* index for associated ifp */ 255 int ifam_metric; /* value of ifa_metric */ 256 }; --------------------------------------------------------------------------- if.h |
Note that the first three members across the three different structures have the same data types and meanings.
The three variables rtm_addrs
, ifm_addrs
, and ifam_addrs
are bitmasks defining which socket address structures follow the header. Figure 19.19 shows the constants used with these bitmasks.
Table 19.19. Constants used to refer to members of rti_info
array.
Bitmask | Array index | Name in | Description | ||
---|---|---|---|---|---|
Constant | Value | Constant | Value | ||
|
|
| 0 |
| destination socket address structure |
|
|
| 1 |
| gateway socket address structure |
|
|
| 2 |
| netmask socket address structure |
|
|
| 3 |
| cloning mask socket address structure |
|
|
| 4 |
| interface name socket address structure |
|
|
| 5 |
| interface address socket address structure |
|
|
| 6 | socket address structure for author of redirect | |
|
|
| 7 |
| broadcast or point-to-point destination address |
| 8 | #elements in an |
The bitmask value is always the constant 1 left shifted by the number of bits specified by the array index. For example, 0x20
(RTA_IFA
) is 1 left shifted by five bits (RTAX_IFA
). We’ll see this fact used in the code.
The socket address structures that are present always occur in order of increasing array index, one right after the other. For example, if the bitmask is 0x87
, the first socket address structure contains the destination, followed by the gateway, followed by the network mask, followed by the broadcast address.
The array indexes in Figure 19.19 are used within the kernel to refer to its rt_addrinfo
structure, shown in Figure 19.20. This structure holds the same bitmask that we described, indicating which addresses are present, and pointers to those socket address structures.
Table 19.20. rt_addrinfo
structure: encode which addresses are present and pointers to them.
------------------------------------------------------------------------- route.h 199 struct rt_addrinfo { 200 int rti_addrs; /* bitmask, same as rtm_addrs */ 201 struct sockaddr *rti_info[RTAX_MAX]; 202 }; ------------------------------------------------------------------------- route.h |
For example, if the RTA_GATEWAY
bit is set in the rti_addrs
member, then the member rti_info
[
RTAX_GATEWAY
]
is a pointer to a socket address structure containing the gateway’s address. In the case of the Internet protocols, the socket address structure is a sockaddr_in
containing the gateway’s IP address.
The fifth column in Figure 19.19 shows the names used for the corresponding members of an rti_info
array throughout the file rtsock.c
. These definitions look like
#define dst info.rti_info[RTAX_DST]
We’ll encounter these names in many of the source files later in this chapter. The RTAX_AUTHOR
element is not assigned a name because it is never passed from a process to the kernel.
We’ve already encountered this rt_addrinfo
structure twice: in rtalloc1
(Figure 19.2) and rtredirect
(Figure 19.14). Figure 19.21 shows the format of this structure when built by rtalloc1
, after a routing table lookup fails, when rt_missmsg
is called.
All the unused pointers are null because the structure is set to 0 before it is used. Also note that the rti_addrs
member is not initialized with the appropriate bitmask because when this structure is used within the kernel, a null pointer in the rti_info
array indicates a nonexistent socket address structure. The bitmask is needed only for messages between a process and the kernel.
Figure 19.22 shows the format of the structure built by rtredirect
when it calls rt_missmsg
.
The following sections show how these structures are placed into the messages sent to a process.
Figure 19.23 shows the route_cb
structure, which we’ll encounter in the following sections. It contains four counters; one each for the IP, XNS, and OSI protocols, and an “any” counter. Each counter is the number of routing sockets currently in existence for that domain.
Table 19.23. route_cb
structure: counters of routing socket listeners.
-------------------------------------------------------------------------- route.h 203 struct route_cb { 204 int ip_count; /* IP */ 205 int ns_count; /* XNS */ 206 int iso_count; /* ISO */ 207 int any_count; /* sum of above three counters */ 208 }; ------------------------------------------------------------------------- route.h |
203-208
By keeping track of the number of routing socket listeners, the kernel avoids building a routing message and calling raw_input
to send the message when there aren’t any processes waiting for a message.
The function rt_missmsg
, shown in Figure 19.24, takes the structures shown in Figures 19.21 and 19.22, calls rt_msg1
to build a corresponding variable-length message for a process in an mbuf chain, and then calls raw_input
to pass the mbuf chain to all appropriate routing sockets.
Table 19.24. rt_missmsg
function.
------------------------------------------------------------------------- rtsock.c 516 void 517 rt_missmsg(type, rtinfo, flags, error) 518 int type, flags, error; 519 struct rt_addrinfo *rtinfo; 520 { 521 struct rt_msghdr *rtm; 522 struct mbuf *m; 523 struct sockaddr *sa = rtinfo->rti_info[RTAX_DST]; 524 if (route_cb.any_count == 0) 525 return; 526 m = rt_msg1(type, rtinfo); 527 if (m == 0) 528 return; 529 rtm = mtod(m, struct rt_msghdr *); 530 rtm->rtm_flags = RTF_DONE | flags; 531 rtm->rtm_errno = error; 532 rtm->rtm_addrs = rtinfo->rti_addrs; 533 route_proto.sp_protocol = sa ? sa->sa_family : 0; 534 raw_input(m, &route_proto, &route_src, &route_dst); 535 } ------------------------------------------------------------------------- rtsock.c |
516-525
If there aren’t any routing socket listeners, the function returns immediately.
526-528
rt_msg1
(Section 19.12) builds the appropriate message in an mbuf chain, and returns the pointer to the chain. Figure 19.25 shows an example of the resulting mbuf chain, using the rt_addrinfo
structure from Figure 19.22. The information needs to be in an mbuf chain because raw_input
calls sbappendaddr
to append the mbuf chain to a socket’s receive buffer.
529-532
The two members rtm_flags
and rtm_errno
are set to the values passed by the caller. The rtm_addrs
member is copied from the rti_addrs
value. We showed this value as 0 in Figures 19.21 and 19.22, but rt_msg1
calculates and stores the appropriate bitmask, based on which pointers in the rti_info
array are nonnull.
533-534
The final three arguments to raw_input
specify the protocol, source, and destination of the routing message. These three structures are initialized as
struct sockaddr route_dst = { 2, PF_ROUTE, }; struct sockaddr route_src = { 2, PF_ROUTE, }; struct sockproto route_proto = { PF_ROUTE, };
The first two structures are never modified by the kernel. The sockproto
structure, shown in Figure 19.26, is one we haven’t seen before.
Table 19.26. sockproto
structure.
------------------------------------------------------------------------- socket.h 128 struct sockproto { 129 u_short sp_family; /* address family */ 130 u_short sp_protocol; /* protocol */ 131 }; ------------------------------------------------------------------------- socket.h |
The family is never changed from its initial value of PF_ROUTE
, but the protocol is set each time raw_input
is called. When a process creates a routing socket by calling socket
, the third argument (the protocol) specifies the protocol in which the process is interested. The caller of raw_input
sets the sp_protocol
member of the route_proto
structure to the protocol of the routing message. In the case of rt_missmsg
, it is set to the sa_family
of the destination socket address structure (if specified by the caller), which in Figures 19.21 and 19.22 would be AF_INET
.
In Figure 4.30 we saw that if_up
and if_down
both call rt_ifmsg
, shown in Figure 19.27, to generate a routing socket message when an interface goes up or down.
Table 19.27. rt_ifmsg
function.
------------------------------------------------------------------------- rtsock.c 540 void 541 rt_ifmsg(ifp) 542 struct ifnet *ifp; 543 { 544 struct if_msghdr *ifm; 545 struct mbuf *m; 546 struct rt_addrinfo info; 547 if (route_cb.any_count == 0) 548 return; 549 bzero((caddr_t) & info, sizeof(info)); 550 m = rt_msg1(RTM_IFINFO, &info); 551 if (m == 0) 552 return; 553 ifm = mtod(m, struct if_msghdr *); 554 ifm->ifm_index = ifp->if_index; 555 ifm->ifm_flags = ifp->if_flags; 556 ifm->ifm_data = ifp->if_data; /* structure assignment */ 557 ifm->ifm_addrs = 0; 558 route_proto.sp_protocol = 0; 559 raw_input(m, &route_proto, &route_src, &route_dst); 560 } ------------------------------------------------------------------------- rtsock.c |
547-548
If there aren’t any routing socket listeners, the function returns immediately.
549-552
An rt_addrinfo
structure is set to 0 and rt_msg1
builds an appropriate message in an mbuf chain. Notice that all socket address pointers in the rt_addrinfo
structure are null, so only the fixed-length if_msghdr
structure becomes the routing message; there are no addresses.
553-557
The interface’s index, flags, and if_data
structure are copied into the message in the mbuf and the ifm_addrs
bitmask is set to 0.
In Figure 19.13 we saw that rtinit
calls rt_newaddrmsg
with a command of RTM_ADD
or RTM_DELETE
when an interface has an address added or deleted. Figure 19.28 shows the first half of the function.
Table 19.28. rt_newaddrmsg
function: first half: create ifa_msghdr
message.
------------------------------------------------------------------------- rtsock.c 569 void 570 rt_newaddrmsg(cmd, ifa, error, rt) 571 int cmd, error; 572 struct ifaddr *ifa; 573 struct rtentry *rt; 574 { 575 struct rt_addrinfo info; 576 struct sockaddr *sa; 577 int pass; 578 struct mbuf *m; 579 struct ifnet *ifp = ifa->ifa_ifp; 580 if (route_cb.any_count == 0) 581 return; 582 for (pass = 1; pass < 3; pass++) { 583 bzero((caddr_t) & info, sizeof(info)); 584 if ((cmd == RTM_ADD && pass == 1) || 585 (cmd == RTM_DELETE && pass == 2)) { 586 struct ifa_msghdr *ifam; 587 int ncmd = cmd == RTM_ADD ? RTM_NEWADDR : RTM_DELADDR; 588 ifaaddr = sa = ifa->ifa_addr; 589 ifpaddr = ifp->if_addrlist->ifa_addr; 590 netmask = ifa->ifa_netmask; 591 brdaddr = ifa->ifa_dstaddr; 592 if ((m = rt_msg1(ncmd, &info)) == NULL) 593 continue; 594 ifam = mtod(m, struct ifa_msghdr *); 595 ifam->ifam_index = ifp->if_index; 596 ifam->ifam_metric = ifa->ifa_metric; 597 ifam->ifam_flags = ifa->ifa_flags; 598 ifam->ifam_addrs = info.rti_addrs; 599 } ------------------------------------------------------------------------- rtsock.c |
580-581
If there aren’t any routing socket listeners, the function returns immediately.
582
The for
loop iterates twice because two messages are generated. If the command is RTM_ADD
, the first message is of type RTM_NEWADDR
and the second message is of type RTM_ADD
. If the command is RTM_DELETE
, the first message is of type RTM_DELETE
and the second message is of type RTM_DELADDR
. The RTM_NEWADDR
and RTM_DELADDR
messages are built from an ifa_msghdr
structure, while the RTM_ADD
and RTM_DELETE
messages are built from an rt_msghdr
structure. The function generates two messages because one message provides information about the interface and the other about the addresses.
583
An rt_addrinfo
structure is set to 0.
588-591
Pointers to four socket address structures containing information about the interface address that has been added or deleted are stored in the rti_info
array. Recall from Figure 19.19 that ifaaddr, ifpaddr, netmask
, and brdaddr
reference elements in the rti_info
array named in info. rt_msg1
builds the appropriate message in an mbuf chain. Notice that sa
is set to point to the ifa_addr
structure, and we’ll see at the end of the function that the family of this socket address structure becomes the protocol of the routing message.
Remaining members of the ifa_msghdr
structure are filled in with the interface’s index, metric, and flags, along with the bitmask set by rt_msg1
.
Figure 19.29 shows the second half of rt_newaddrmsg
, which creates an rt_msghdr
message with information about the routing table entry that was added or deleted.
Table 19.29. rt_newaddrmsg
function: second half, create rt_msghdr
message.
------------------------------------------------------------------------- rtsock.c 600 if ((cmd == RTM_ADD && pass == 2) || 601 (cmd == RTM_DELETE && pass == 1)) { 602 struct rt_msghdr *rtm; 603 if (rt == 0) 604 continue; 605 netmask = rt_mask(rt); 606 dst = sa = rt_key(rt); 607 gate = rt->rt_gateway; 608 if ((m = rt_msg1(cmd, &info)) == NULL) 609 continue; 610 rtm = mtod(m, struct rt_msghdr *); 611 rtm->rtm_index = ifp->if_index; 612 rtm->rtm_flags |= rt->rt_flags; 613 rtm->rtm_errno = error; 614 rtm->rtm_addrs = info.rti_addrs; 615 } 616 route_proto.sp_protocol = sa ? sa->sa_family : 0; 617 raw_input(m, &route_proto, &route_src, &route_dst); 618 } 619 } ------------------------------------------------------------------------- rtsock.c |
600-609
Pointers to three socket address structures are stored in the rti_info
array: the rt_mask
, rt_key
, and rt_gateway
structures. sa
is set to point to the destination address, and its family becomes the protocol of the routing message. rt_msg1
builds the appropriate message in an mbuf chain.
Additional fields in the rt_msghdr
structure are filled in, including the bitmask set by rt_msg1
.
The functions described in the previous three sections each called rt_msg1
to build the appropriate routing message. In Figure 19.25 we showed the mbuf chain that was built by rt_msg1
from the rt_msghdr
and rt_addrinfo
structures in Figure 19.22. Figure 19.30 shows the function.
Table 19.30. rt_msg1
function: obtain and initialize mbuf.
------------------------------------------------------------------------- rtsock.c 399 static struct mbuf * 400 rt_msg1(type, rtinfo) 401 int type; 402 struct rt_addrinfo *rtinfo; 403 { 404 struct rt_msghdr *rtm; 405 struct mbuf *m; 406 int i; 407 struct sockaddr *sa; 408 int len, dlen; 409 m = m_gethdr(M_DONTWAIT, MT_DATA); 410 if (m == 0) 411 return (m); 412 switch (type) { 413 case RTM_DELADDR: 414 case RTM_NEWADDR: 415 len = sizeof(struct ifa_msghdr); 416 break; 417 case RTM_IFINFO: 418 len = sizeof(struct if_msghdr); 419 break; 420 default: 421 len = sizeof(struct rt_msghdr); 422 } 423 if (len > MHLEN) 424 panic("rt_msg1"); 425 m->m_pkthdr.len = m->m_len = len; 426 m->m_pkthdr.rcvif = 0; 427 rtm = mtod(m, struct rt_msghdr *); 428 bzero((caddr_t) rtm, len); 429 for (i = 0; i < RTAX_MAX; i++) { 430 if ((sa = rtinfo->rti_info[i]) == NULL) 431 continue; 432 rtinfo->rti_addrs |= (1 << i); 433 dlen = ROUNDUP(sa->sa_len); 434 m_copyback(m, len, dlen, (caddr_t) sa); 435 len += dlen; 436 } 437 if (m->m_pkthdr.len != len) { 438 m_freem(m); 439 return (NULL); 440 } 441 rtm->rtm_msglen = len; 442 rtm->rtm_version = RTM_VERSION; 443 rtm->rtm_type = type; 444 return (m); 445 } ------------------------------------------------------------------------- rtsock.c |
399-422
An mbuf with a packet header is obtained and the length of the fixed-size message is stored in len
. Two of the message types in Figure 18.9 use an ifa_msghdr
structure, one uses an if_msghdr
structure, and the remaining nine use an rt_msghdr
structure.
423-424
The size of the fixed-length structure must fit entirely within the data portion of the packet header mbuf, because the mbuf pointer is cast to a structure pointer using mtod
and the structure is then referenced through the pointer. The largest of the three structures is if_msghdr
, which at 84 bytes is less than MHLEN
(100).
425-428
The two fields in the packet header are initialized and the structure in the mbuf is set to 0.
429-436
The caller passes a pointer to an rt_addrinfo
structure. The socket address structures corresponding to all the nonnull pointers in the rti_info
are copied into the mbuf by m_copyback
. The value 1 is left shifted by the RTAX_
xxx index to generate the corresponding RTA_
xxx bitmask (Figure 19.19), and each individual bitmask is logically ORed into the rti_addrs
member, which the caller can store on return into the corresponding member of the message structure. The ROUNDUP
macro rounds the size of each socket address structure up to the next multiple of 4 bytes.
437-440
If, when the loop terminates, the length in the mbuf packet header does not equal len
, the function m_copyback
wasn’t able to obtain a required mbuf.
441-445
The length, version, and message type are stored in the first three members of the message structure. Again, all three xxx_msghdr
structures start with the same three members, so this code works with all three structures even though the pointer rtm
is a pointer to an rt_msghdr
structure.
rt_msg1
constructs a routing message in an mbuf chain, and the three functions that called it then called raw_input
to append the mbuf chain to one or more socket’s receive buffer. rt_msg2
is different it builds a routing message in a memory buffer, not an mbuf chain, and has as an argument a pointer to a walkarg
structure that is used when rt_msg2
is called by the two functions that handle the sysctl
system call for the routing domain. rt_msg2
is called in two different scenarios:
from route_output
to process the RTM_GET
command, and
from sysctl_dumpentry
and sysctl_iflist
to process a sysctl
system call.
Before looking at rt_msg2
, Figure 19.31 shows the walkarg
structure that is used in scenario 2. We go through all these members as we encounter them.
Table 19.31. walkarg
structure: used with the sysctl
system call in the routing domain.
----------------------------------------------------------------------- rtsock.c 41 struct walkarg { 42 int w_op; /* NET_RT_xxx */ 43 int w_arg; /* RTF_xxx for FLAGS, if_index for IFLIST */ 44 int w_given; /* size of process' buffer */ 45 int w_needed; /* #bytes actually needed (at end) */ 46 int w_tmemsize; /* size of buffer pointed to by w_tmem */ 47 caddr_t w_where; /* ptr to process' buffer (maybe null) */ 48 caddr_t w_tmem; /* ptr to our malloc'ed buffer */ 49 }; ----------------------------------------------------------------------- rtsock.c |
Figure 19.32 shows the first half of the rt_msg2
function. This portion is similar to the first half of rt_msg1
.
Table 19.32. rt_msg2
function: copy socket address structures.
------------------------------------------------------------------------- rtsock.c 446 static int 447 rt_msg2(type, rtinfo, cp, w) 448 int type; 449 struct rt_addrinfo *rtinfo; 450 caddr_t cp; 451 struct walkarg *w; 452 { 453 int i; 454 int len, dlen, second_time = 0; 455 caddr_t cp0; 456 rtinfo->rti_addrs = 0; 457 again: 458 switch (type) { 459 case RTM_DELADDR: 460 case RTM_NEWADDR: 461 len = sizeof(struct ifa_msghdr); 462 break; 463 case RTM_IFINFO: 464 len = sizeof(struct if_msghdr); 465 break; 466 default: 467 len = sizeof(struct rt_msghdr); 468 } 469 if (cp0 = cp) 470 cp += len; 471 for (i = 0; i < RTAX_MAX; i++) { 472 struct sockaddr *sa; 473 if ((sa = rtinfo->rti_info[i]) == 0) 474 continue; 475 rtinfo->rti_addrs |= (1 << i); 476 dlen = ROUNDUP(sa->sa_len); 477 if (cp) { 478 bcopy((caddr_t) sa, cp, (unsigned) dlen); 479 cp += dlen; 480 } 481 len += dlen; 482 } ------------------------------------------------------------------------- rtsock.c |
446-455
Since this function stores the resulting message in a memory buffer, the caller specifies the start of that buffer in the cp
argument. It is the caller’s responsibility to ensure that the buffer is large enough for the message that is generated. To help the caller determine this size, if the cp
argument is null, rt_msg2
doesn’t store anything but processes the input and returns the total number of bytes required to hold the result. We’ll see that route_output
uses this feature and calls this function twice: first to determine the size and then to store the result, after allocating a buffer of the correct size. When rt_msg2
is called by route_output
, the final argument is null. This final argument is nonnull when called as part of the sysctl
system call processing.
458-470
The size of the fixed-length message structure is set based on the message type. If the cp
pointer is nonnull, it is incremented by this size.
471-482
The for
loop goes through the rti_info
array, and for each element that is a nonnull pointer it sets the appropriate bit in the rti_addrs
bitmask, copies the socket address structure (if cp
is nonnull), and updates the length.
Figure 19.33 shows the second half of rt_msg2
, most of which handles the optional walkarg
structure.
Table 19.33. rt_msg2
function: handle optional walkarg
argument.
------------------------------------------------------------------------- rtsock.c 483 if (cp == 0 && w != NULL && !second_time) { 484 struct walkarg *rw = w; 485 rw->w_needed += len; 486 if (rw->w_needed <= 0 && rw->w_where) { 487 if (rw->w_tmemsize < len) { 488 if (rw->w_tmem) 489 free(rw->w_tmem, M_RTABLE); 490 if (rw->w_tmem = (caddr_t) 491 malloc(len, M_RTABLE, M_NOWAIT)) 492 rw->w_tmemsize = len; 493 } 494 if (rw->w_tmem) { 495 cp = rw->w_tmem; 496 second_time = 1; 497 goto again; 498 } else 499 rw->w_where = 0; 500 } 501 } 502 if (cp) { 503 struct rt_msghdr *rtm = (struct rt_msghdr *) cp0; 504 rtm->rtm_version = RTM_VERSION; 505 rtm->rtm_type = type; 506 rtm->rtm_msglen = len; 507 } 508 return (len); 509 } ------------------------------------------------------------------------- rtsock.c |
483-484
This if
statement is true only when a pointer to a walkarg
structure was passed and this is the first loop through the function. The variable second_time
was initialized to 0 but can be set to 1 within this if
statement, and a jump made back to the label again
in Figure 19.32. The test for cp
being a null pointer is superfluous since whenever the w
pointer is nonnull, the cp
pointer is null, and vice versa.
485-486
w_needed
is incremented by the size of the message. This variable is initialized to 0 minus the size of the user’s buffer to the sysctl
function. For example, if the buffer size is 500 bytes, w_needed
is initialized to—500. As long as it remains negative, there is room in the buffer. w_where
is a pointer to the buffer in the calling process. It is null if the process doesn’t want the result the process just wants sysctl
to return the size of the result, so the process can allocate a buffer and call sysctl
again. rt_msg2
doesn’t copy the data back to the process that is up to the caller b ut if the w_where pointer is null, there’s no need for rt_msg2
to malloc
a buffer to hold the result and loop back through the function again, storing the result in this buffer. There are really five different scenarios that this function handles, summarized in Figure 19.34.
Table 19.34. Summary of different scenarios for rt_msg2
.
called from |
|
|
|
| Description |
---|---|---|---|---|---|
| null | null | wants return length | ||
nonnull | null | wants result | |||
| null | nonnull | null | 0 | process wants return length |
null | nonnull | nonnull | 0 | first time around to calculate length | |
nonnull | nonnull | nonnull | 1 | second time around to store result |
487-493
w_tmemsize
is the size of the buffer pointed to by w_tmem
. It is initialized to 0 by sysctl_rtable
, so the first time rt_msg2
is called for a given sysctl
request, the buffer must be allocated. Also, if the size of the result increases, the existing buffer must be released and a new (larger) buffer allocated.
494-499
If w_tmem
is nonnull, a buffer already exists or one was just allocated. cp
is set to point to this buffer, second_time
is set to 1, and a jump is made to again
. The if
statement at the beginning of this figure won’t be true during this second pass, since second_time
is now 1. If w_tmem
is null, the call to malloc
failed, so the pointer to the buffer in the process is set to null, preventing anything from being returned.
This function handles the sysctl
system call on a routing socket. It is called by net_sysctl
as shown in Figure 18.11.
Before going through the source code, Figure 19.35 shows the typical use of this system call with respect to the routing table. This example is from the arp
program.
Table 19.35. Example of sysctl
with routing table.
------------------------------------------------------------------------- int mib[6]; size_t needed; char *buf, *lim, *next; struct rt_msghdr *rtm; mib[0] = CTL_NET; mib[1] = PF_ROUTE; mib[2] = 0; mib[3] = AF_INET; /* address family; can be 0 */ mib[4] = NET_RT_FLAGS; /* operation */ mib[5] = RTF_LLINFO; /* flags; can be 0 */ if (sysctl(mib, 6, NULL, &needed, NULL, 0) < 0) quit("sysctl error, estimate"); if ( (buf = malloc(needed)) == NULL) quit("malloc"); if (sysctl(mib, 6, buf, &needed, NULL, 0) < 0) quit("sysctl error, retrieval"); lim = buf + needed; for (next = buf; next < lim; next += rtm->rtm_msglen) { rtm = (struct rt_msghdr *)next; ... /* do whatever */ } ------------------------------------------------------------------------- |
The first three elements in the mib
array cause the kernel to call sysctl_rtable
to process the remaining elements.
mib[4]
specifies the operation. Three operations are supported.
NET_RT_DUMP:
return the routing table corresponding to the address family specified by mib[3]
. If the address family is 0, all routing tables are returned.
An RTM_GET
routing message is returned for each routing table entry containing two, three, or four socket address structures per message: those addresses pointed to by rt_key
, rt_gateway
, rt_netmask
, and rt_genmask
. The final two pointers might be null.
NET_RT_FLAGS:
the same as the previous command except mib[5]
specifies an RTF_
xxx flag (Figure 18.25), and only entries with this flag set are returned.
NET_RT_IFLIST:
return information on all the configured interfaces. If the mib[5]
value is nonzero it specifies an interface index and only the interface with the corresponding if_index
is returned. Otherwise all interfaces on the ifnet
linked list are returned.
For each interface one RTM_IFINFO
message is returned, with information about the interface itself, followed by one RTM_NEWADDR
message for each ifaddr
structure on the interface’s if_addrlist
linked list. If the mib[3]
value is nonzero, RTM_NEWADDR
messages are returned for only the addresses with an address family that matches the mib[3]
value. Otherwise mib[3]
is 0 and information on all addresses is returned.
This operation is intended to replace the SIOCGIFCONF ioctl
(Figure 4.26).
One problem with this system call is that the amount of information returned can vary, depending on the number of routing table entries or the number of interfaces. Therefore the first call to sysctl
typically specifies a null pointer as the third argument, which means: don’t return any data, just return the number of bytes of return information. As we see in Figure 19.35, the process then calls malloc
, followed by sysctl
to fetch the information. This second call to sysctl
again returns the number of bytes through the fourth argument (which might have changed since the previous call), and this value provides the pointer lim
that points just beyond the final byte of data that was returned. The process then steps through the routing messages in the buffer, using the rtm_msglen
member to step to the next message.
Figure 19.36 shows the values for these six mib
variables that various Net/3 programs specify to access the routing table and interface list.
Table 19.36. Examples of programs that call sysctl
to obtain routing table and interface list.
|
|
|
|
|
|
|
---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The first three programs fetch entries from the routing table and the last three fetch the interface list. The routed
program supports only the Internet routing protocols, so it specifies a mib[3]
value of AF_INET
, while gated
supports other protocols, so its value for mib[3]
is 0.
Figure 19.37 shows the organization of the three sysctl_
xxx functions that we cover in the following sections.
Figure 19.38 shows the sysctl_rtable
function.
Table 19.38. sysctl_rtable
function: process sysctl
system call requests.
------------------------------------------------------------------------- rtsock.c 705 int 706 sysctl_rtable(name, namelen, where, given, new, newlen) 707 int *name; 708 int namelen; 709 caddr_t where; 710 size_t *given; 711 caddr_t *new; 712 size_t newlen; 713 { 714 struct radix_node_head *rnh; 715 int i, s, error = EINVAL; 716 u_char af; 717 struct walkarg w; 718 if (new) 719 return (EPERM); 720 if (namelen != 3) 721 return (EINVAL); 722 af = name[0]; 723 Bzero(&w, sizeof(w)); 724 w.w_where = where; 725 w.w_given = *given; 726 w.w_needed = 0 - w.w_given; 727 w.w_op = name[1]; 728 w.w_arg = name[2]; 729 s = splnet(); 730 switch (w.w_op) { 731 case NET_RT_DUMP: 732 case NET_RT_FLAGS: 733 for (i = 1; i <= AF_MAX; i++) 734 if ((rnh = rt_tables[i]) && (af == 0 || af == i) && 735 (error = rnh->rnh_walktree(rnh, 736 sysctl_dumpentry, &w))) 737 break; 738 break; 739 case NET_RT_IFLIST: 740 error = sysctl_iflist(af, &w); 741 } 742 splx(s); 743 if (w.w_tmem) 744 free(w.w_tmem, M_RTABLE); 745 w.w_needed += w.w_given; 746 if (where) { 747 *given = w.w_where - where; 748 if (*given < w.w_needed) 749 return (ENOMEM); 750 } else { 751 *given = (11 * w.w_needed) / 10; 752 } 753 return (error); 754 } ------------------------------------------------------------------------- rtsock.c |
705-719
The new
argument is used when the process is calling sysctl
to set the value of a variable, which isn’t supported with the routing tables. Therefore this argument must be a null pointer.
720-721
namelen
must be 3 because at this point in the processing of the system call, three elements in the name
array remain: name[0]
, the address family (what the process specifies as mib[3]
); name[1]
, the operation (mib[4]
); and name[2]
, the flags (mib[5]
).
723-728
A walkarg
structure (Figure 19.31) is set to 0 and the following members are initialized: w_where
is the address in the calling process of the buffer for the results (this can be a null pointer, as we mentioned); w_given
is the size of the buffer in bytes (this is meaningless on input if w_where
is a null pointer, but it must be set on return to the amount of data that would have been returned); w_needed
is set to the negative of the buffer size; w_op
is the operation (the NET_RT_
xxx value); and w_arg
is the flags value.
731-738
The NET_RT_DUMP
and NET_RT_FLAGS
operations are handled the same way: a loop is made through all the routing tables (the rt_tables
array), and if the routing table is in use and either the address family argument was 0 or the address family argument matches the family of this routing table, the rnh_walktree
function is called to process the entire routing table. In Figure 18.17 we show that this function is normally rn_walktree
. The second argument to this function is the address of another function that is called for each leaf of the routing tree (sysctl_dumpentry
). The third pointer is just a pointer to anything that rn_walktree
passes to the sysctl_dumpentry
function. This argument is a pointer to the walkarg
structure that contains all the information about this sysctl
call.
739-740
The NET_RT_IFLIST
operation calls the function sysctl_iflist
, which goes through all the ifnet
structures.
743-744
If a buffer was allocated by rt_msg2
to contain a routing message, it is now released.
745
The size of each message was added to w_needed
by rt_msg2
. Since this variable was initialized to the negative of w_given
, its value can now be expressed as
w_needed = 0 - w_given + totalbytes
where totalbytes
is the sum of all the message lengths added by rt_msg2
. By adding the value of w_given
back into w_needed
, we get
w_needed = 0 - w_given + totalbytes + w_given = totalbytes
the total number of bytes. Since the two values of w_given
in this equation end up canceling each other, when the process specifies w_where
as a null pointer it need not initialize the value of w_given
. Indeed, we see in Figure 19.35 that the variable needed
was not initialized.
746-749
If where
is nonnull, the number of bytes stored in the buffer is returned through the given
pointer. If this value is less than the size of the buffer specified by the process, an error is returned because the return information has been truncated.
In the previous section we described how this function is called by rn_walktree
, which in turn is called by sysctl_rtable
. Figure 19.39 shows the function.
Table 19.39. sysctl_dumpentry
function: process one routing table entry.
------------------------------------------------------------------------- rtsock.c 623 int 624 sysctl_dumpentry(rn, w) 625 struct radix_node *rn; 626 struct walkarg *w; 627 { 628 struct rtentry *rt = (struct rtentry *) rn; 629 int error = 0, size; 630 struct rt_addrinfo info; 631 if (w->w_op == NET_RT_FLAGS && !(rt->rt_flags & w->w_arg)) 632 return 0; 633 bzero((caddr_t) & info, sizeof(info)); 634 dst = rt_key(rt); 635 gate = rt->rt_gateway; 636 netmask = rt_mask(rt); 637 genmask = rt->rt_genmask; 638 size = rt_msg2(RTM_GET, &info, 0, w); 639 if (w->w_where && w->w_tmem) { 640 struct rt_msghdr *rtm = (struct rt_msghdr *) w->w_tmem; 641 rtm->rtm_flags = rt->rt_flags; 642 rtm->rtm_use = rt->rt_use; 643 rtm->rtm_rmx = rt->rt_rmx; 644 rtm->rtm_index = rt->rt_ifp->if_index; 645 rtm->rtm_errno = rtm->rtm_pid = rtm->rtm_seq = 0; 646 rtm->rtm_addrs = info.rti_addrs; 647 if (error = copyout((caddr_t) rtm, w->w_where, size)) 648 w->w_where = NULL; 649 else 650 w->w_where += size; 651 } 652 return (error); 653 } ------------------------------------------------------------------------- rtsock.c |
623-630
Each time this function is called, its first argument points to a radix_node
structure, which is also a pointer to a rtentry
structure. The second argument points to the walkarg
structure that was initialized by sysctl_rtable
.
631-632
If the process specified a flag value (mib[5]
), this entry is skipped if the rt_flags
member doesn’t have the desired flag set. We see in Figure 19.36 that the arp
program uses this to select only those entries with the RTF_LLINFO
flag set, since these are the entries of interest to ARP.
633-638
The following four pointers in the rti_info
array are copied from the routing table entry: dst, gate, netmask
, and genmask
. The first two are always nonnull, but the other two can be null. rt_msg2
forms an RTM_GET
message.
639-651
If the process wants the message returned and a buffer was allocated by rt_msg2
, the remainder of the routing message is formed in the buffer pointed to by w_tmem
and copyout
copies the message back to the process. If the copy was successful, w_where
is incremented by the number of bytes copied.
This function, shown in Figure 19.40, is called directly by sysctl_rtable
to return the interface list to the process.
Table 19.40. sysctl_iflist
function: return list of interfaces and their addresses.
------------------------------------------------------------------------- rtsock.c 654 int 655 sysctl_iflist(af, w) 656 int af; 657 struct walkarg *w; 658 { 659 struct ifnet *ifp; 660 struct ifaddr *ifa; 661 struct rt_addrinfo info; 662 int len, error = 0; 663 bzero((caddr_t) & info, sizeof(info)); 664 for (ifp = ifnet; ifp; ifp = ifp->if_next) { 665 if (w->w_arg && w->w_arg != ifp->if_index) 666 continue; 667 ifa = ifp->if_addrlist; 668 ifpaddr = ifa->ifa_addr; 669 len = rt_msg2(RTM_IFINFO, &info, (caddr_t) 0, w); 670 ifpaddr = 0; 671 if (w->w_where && w->w_tmem) { 672 struct if_msghdr *ifm; 673 ifm = (struct if_msghdr *) w->w_tmem; 674 ifm->ifm_index = ifp->if_index; 675 ifm->ifm_flags = ifp->if_flags; 676 ifm->ifm_data = ifp->if_data; 677 ifm->ifm_addrs = info.rti_addrs; 678 if (error = copyout((caddr_t) ifm, w->w_where, len)) 679 return (error); 680 w->w_where += len; 681 } 682 while (ifa = ifa->ifa_next) { 683 if (af && af != ifa->ifa_addr->sa_family) 684 continue; 685 ifaaddr = ifa->ifa_addr; 686 netmask = ifa->ifa_netmask; 687 brdaddr = ifa->ifa_dstaddr; 688 len = rt_msg2(RTM_NEWADDR, &info, 0, w); 689 if (w->w_where && w->w_tmem) { 690 struct ifa_msghdr *ifam; 691 ifam = (struct ifa_msghdr *) w->w_tmem; 692 ifam->ifam_index = ifa->ifa_ifp->if_index; 693 ifam->ifam_flags = ifa->ifa_flags; 694 ifam->ifam_metric = ifa->ifa_metric; 695 ifam->ifam_addrs = info.rti_addrs; 696 if (error = copyout(w->w_tmem, w->w_where, len)) 697 return (error); 698 w->w_where += len; 699 } 700 } 701 ifaaddr = netmask = brdaddr = 0; 702 } 703 return (0); 704 } ------------------------------------------------------------------------- rtsock.c |
This function is a for
loop that iterates through each interface starting with the one pointed to by ifnet
. Then a while
loop proceeds through the linked list of ifaddr
structures for each interface. An RTM_IFINFO
routing message is generated for each interface and an RTM_NEWADDR
message for each address.
654-666
The process can specify a nonzero flags argument (mib[5]
in Figure 19.36) to select only the interface with a matching if_index
value.
667-670
The only socket address structure returned with the RTM_IFINFO
message is ifpaddr
. The message is built by rt_msg2
. The pointer ifpaddr
in the info
structure is then set to 0, since the same info
structure is used for generating the subsequent RTM_NEWADDR
messages.
671-681
If the process wants the message returned, the remainder of the if_msghdr
structure is filled in, copyout
copies the buffer to the process, and w_where
is incremented.
682-684
Each ifaddr
structure for the interface is processed and the process can specify a nonzero address family (mib[3]
in Figure 19.36) to select only the interface addresses of the given family.
685-688
Up to three socket address structures are returned in each RTM_NEWADDR
message: ifaaddr, netmask
, and brdaddr
. The message is built by rt_msg2
.
689-699
If the process wants the message returned, the remainder of the ifa_msghdr
structure is filled in, copyout
copies the buffer to the process, and w_where
is incremented.
701
These three pointers in the info
array are set to 0, since the same array is used for the next interface message.
Routing messages all have the same format a fixed-length structure followed by a variable number of socket address structures. There are three different types of messages, each corresponding to a different fixed-length structure, and the first three elements of each structure identify the length, version, and type of message. A bitmask in each structure identifies which socket address structures follow the fixed-length structure.
These messages are passed between a process and the kernel in two different ways. Messages can be passed in either direction, one message per read or write, across a routing socket. This allows a superuser process complete read and write access to the kernel’s routing tables. This is how routing daemons such as routed
and gated
implement their desired routing policy.
Alternatively any process can read the contents of the kernel’s routing tables using the sysctl
system call. This does not involve a routing socket and does not require special privileges. The entire result, normally consisting of many routing messages, is returned as part of the system call. Since the process does not know the size of the result, a method is provided for the system call to return this size without returning the actual result.
19.1 | What is the difference in the |
19.1 | The |
19.2 | What happens when the default route is entered with the command of the form bsdi $ route add default -cloning -genmask 255.255.255.255 sun |
19.2 | A host route is created for each host accessed through the default route. TCP can then maintain and update routing metrics for each individual host (Figure 27.3). |
19.3 | Estimate the space required by |
19.3 | Each |
18.227.111.33