C H A P T E R  13

Networking

Network support in the kernel is implemented primarily in the BSD layer. The BSD flavors of UNIX are renowned for their robust and secure networking support. Consequently, code from the BSD networking stack has made its way into a wide variety of operating systems, including Mac OS X and iOS. While the networking support is primarily in the BSD layer, it has hooks into I/O Kit, which provides the interface for building hardware-based network drivers. A conceptual view of the kernel network architecture is shown in Figure 13-1.

images

Figure 13-1. Conceptual view of the kernel network architecture

From a user space application's perspective, networking services are accessed through the BSD/POSIX socket API, with functions such as connect(), listen(), and bind(). However, the socket API is not only about networking. It also handles various forms of inter-process communication (IPC), such as UNIX domain sockets. Unlike most BSD versions, the XNU kernel also implements an in-kernel socket API (KPI). This KPI allows the kernel and KEXTs to use sockets much the same way as in user space applications. The key difference is that functions in the socket KPI are named with a “sock_” prefix. For example, the connect() function is named sock_connect() in the kernel KPI.

Higher-level APIs, like Core Foundation or Cocoa, build their network support on top of the socket API interface. The socket API communicates with the kernel through the standard system call interface. The socket layer shares many commonalities with the file system APIs; indeed, a socket is just a special type of file descriptor. In fact, the read() and write() system call functions can be used on socket descriptors as well.

The kernel part of the socket API is responsible for queuing and routing data to and from the appropriate protocol handler in the protocol stack, which handles the tasks of constructing network packets and dividing the data into appropriately sized packets, adding checksums, etc. It's in the protocol stack that TCP, UDP, and IP are handled. The protocol stack is also responsible for handling the details of routing, the firewall, and auxiliary protocols, such as ARP. Packets destined for external hosts end up in the interface layer of the BSD network stack. The interface layer again plugs into the network interface classes in the I/O Kit, which again communicates with a physical network device through its driver.

Four key data structures are used in the BSD network stack:

  • The socket structure represents open sockets in user space or kernel space and is accessed using file descriptors from user space.
  • The domain structure is used to describe protocol families, such as IP version 4 (PF_INET), IP version 6 (PF_INET6), or the local domain (PF_LOCAL/PF_UNIX).
  • The protosw describes individual protocol handlers for each supported protocol, such as IPv4, IPv6, TCP, UDP, ICMP, IGMP, or RAW. Protocols accessible through the sockets interface, such as TCP and UDP, are referred to by the identifiers SOCK_STREAM and SOCK_DGRAM, respectively, when an AF_INET socket is used.
  • The ifnet structure describes a network interface. Each interface listed by the command ifconfig, such as en0, en1, and lo0, is backed by an ifnet structure. An ifnet structure is also defined for each I/O Kit network driver. An I/O Kit driver doesn't need to interface with the structure directly, as the IONetworkInterface class provides an abstraction for it.

Another feature of the XNU kernel is the network kernel extensions (NKE) mechanism. NKE allows filters to be inserted at various levels of the network stack, such as in the sockets layer or IP layer. The NKE architecture allows you to write custom routing algorithms, and implement new protocols and virtual network interfaces. It can also be used for packet filtering and logging. Furthermore, the kernel supports the Berkeley Packet Filter (BPF), which allows raw network traffic to be routed to user space for analysis with tools such as tcpdump. We will look at the NKE system in more detail later in this chapter, as well as how to implement drivers for network devices in the I/O Kit.

To get the most out of this chapter, it is necessary that you have some understanding of networking, of concepts such as TCP/IP and Ethernet, and that you are familiar with the layers of the OSI model.

Network Memory Buffers

Network Memory Buffers, or mbufs, is a fundamental data structure in BSD UNIX systems, including Mac OS X and iOS. While it is mostly a concept of the BSD network layer, you will also encounter the mbuf data structure when writing I/O network drivers. The structure is used to represent network packets and their metadata. The structure is not exposed to user space. The mbuf structure is shown in Listing 13-1.

Listing 13-1. The mbuf Data Structure

struct mbuf {
    struct  m_hdr m_hdr;
    union {
        struct {
            struct  pkthdr MH_pkthdr;            /* M_PKTHDR set */
            union {
                struct  m_ext MH_ext;           /* M_EXT set */
                char    MH_databuf[_MHLEN];
            } MH_dat;
         } MH;
         char    M_databuf[_MLEN];              /* !M_PKTHDR, !M_EXT */
    } M_dat;
};

The complete mbuf structure is fixed size and is currently 256 bytes long. This size includes both the header and the data held by the structure. To get the number of bytes available for data storage: (256 – sizeof(struct m_hdr)). To describe larger packets, multiple mbufs are linked together in a linked list as shown in Figure 13-2.

images

Figure 13-2. A chain of mbuf structures

A list of mbufs is called a chain. In Figure 13-2, a chain of three mbufs, each describing a packet, is shown. Each mbuf may contain chains of other mbufs making up the complete network packet.

To reduce overhead with large packets, mbufs can have their structure point to an external buffer instead of using the internal storage of the mbuf. An mbuf structure with an external buffer is referred to as a cluster. The MH_ext field is used to describe the external buffer. The mbuf header (m_hdr) is located at the start of the structure and contains the length of the mbuf's data, which is stored in the mh_len field. The header also contains the pointers for the next buffer in the chain, and the next entry in a list/or queue, which usually represent a new packet; however, mbufs can also be used for storage of other control information. The mh_type and mh_flags are used to determine the type and options of an mbuf—for example, whether it has an associated external buffer. If an mbuf represents the start of a packet, the MH_PKTHDR will be set, and if the mbuf has external data, the MH_EXT flag will be set, which means that it is safe to access the mbuf's MH_pkthdr or MH_ext structures.

Working with Memory Buffers

While the mbuf structure is found in many UNIX variants, the programming interface for working with them differs between platforms. The XNU kernel offers the mbuf KPI for working with mbufs. The idea of the KPI is to treat the mbuf as an opaque structure, which is only manipulated by KPI functions instead of accessing structure fields directly. This allows the mbuf implementation to change under the hood but still remain binary and source compatible with code that uses KPI. For this reason, when manipulating mbufs, we do not use the mbuf structure directly but rather use the handle mbuf_t as a reference.

images Tip The mbuf KPI header file is bsd/sys/mbuf.h. The full documentation for the KPI can be found at http://developer.apple.com/library/mac/#documentation/Darwin/Reference/KernelIOKitFramework/kpi_mbuf_h/.

Getting data in and out of mbufs can be achieved with the following functions:

errno_t mbuf_copydata(const mbuf_t mbuf, size_t offset, size_t length, void *out_data);
errno_t mbuf_copyback(mbuf_t mbuf, size_t offset, size_t length, const void *data, mbuf_how_t how);

It is not always possible to use bcopy() or similar functions directly, because data in mbufs may be scattered over several structures or external buffers. The preceding functions simplify this task significantly. However, if the buffer is known to be contiguous, the mbuf_data() function can retrieve the pointer to the data area of the mbuf. The mbuf_copydata() function copies data from an mbuf (chain) to the memory location pointed to by the out_data parameter, which should be large enough to hold length bytes.

The mbuf_copyback() does the reverse and allows you to copy data back to an mbuf. If the mbuf is not large enough, the function will grow the buffer by appending more mbufs to form a chain. The last parameter how should be either MBUF_WAITOK or MBUF_DONTWAIT, which indicates to the function whether it is allowed to block while allocating memory. In an interrupt routine or performance critical path, MBUF_DONTWAIT must be used and, generally, where possible, MBUF_DONTWAIT is preferred.

The mbuf KPI offers several ways to construct new mbufs as shown here:

errno_t mbuf_allocpacket(mbuf_how_t how, size_t packetlen, unsigned int *maxchunks, mbuf_t *mbuf);
errno_t mbuf_allocpacket_list(unsigned int numpkts, mbuf_how_t how,
                              size_t packetlen, unsigned int *maxchunks, mbuf_t *mbuf);
errno_t mbuf_tag_allocate(mbuf_t mbuf, mbuf_tag_id_t module_id,
                          mbuf_tag_type_t type, size_t length, mbuf_how_t how, void **data_p);

Following is a brief description of the preceding functions:

  • mbuf_allocpacket() allocates a chain of mbufs with a leading packet header of the specified length. maxchunks is an input/output parameter that specifies the maximum length of the chain. If NULL is specified, there is no limit.
  • mbuf_allocpacket_list() is identical to mbuf_allocpacket() but generates a list of mbuf chains instead.
  • mbuf_tag_allocate() allocates an mbuf but also allows one to specify additional data (tag) that will be passed along with the mbuf as it travels through the stack. The tag can be retrieved again by using the mbuf_tag_find() function.

Besides allocating and copying data in and out of an mbuf, a common operation is to iterate through an mbuf chain using the mbuf_next() macro:

void walk_mbuf(mbuf_t mbuf_head)
{
    mbuf_t mb;
    unsigned char* data;
    size_t len;
    
    for (mb = mbuf_head; mb; mb = mbuf_next(mb))
    {
         data = (unsigned char*)mbuf_data(mb); // get pointer to data
         len = mbuf_len(mb);                   // get length of this segment
    }
}
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.186.124