Network support in the kernel is implemented primarily in the BSD layer. The BSD flavors of UNIX are renowned for their robust and secure networking support. Consequently, code from the BSD networking stack has made its way into a wide variety of operating systems, including Mac OS X and iOS. While the networking support is primarily in the BSD layer, it has hooks into I/O Kit, which provides the interface for building hardware-based network drivers. A conceptual view of the kernel network architecture is shown in Figure 13-1.
Figure 13-1. Conceptual view of the kernel network architecture
From a user space application's perspective, networking services are accessed through the BSD/POSIX socket API, with functions such as connect()
, listen()
, and bind()
. However, the socket API is not only about networking. It also handles various forms of inter-process communication (IPC), such as UNIX domain sockets. Unlike most BSD versions, the XNU kernel also implements an in-kernel socket API (KPI). This KPI allows the kernel and KEXTs to use sockets much the same way as in user space applications. The key difference is that functions in the socket KPI are named with a “sock_” prefix. For example, the connect()
function is named sock_connect()
in the kernel KPI.
Higher-level APIs, like Core Foundation or Cocoa, build their network support on top of the socket API interface. The socket API communicates with the kernel through the standard system call interface. The socket layer shares many commonalities with the file system APIs; indeed, a socket is just a special type of file descriptor. In fact, the read()
and write()
system call functions can be used on socket descriptors as well.
The kernel part of the socket API is responsible for queuing and routing data to and from the appropriate protocol handler in the protocol stack, which handles the tasks of constructing network packets and dividing the data into appropriately sized packets, adding checksums, etc. It's in the protocol stack that TCP, UDP, and IP are handled. The protocol stack is also responsible for handling the details of routing, the firewall, and auxiliary protocols, such as ARP. Packets destined for external hosts end up in the interface layer of the BSD network stack. The interface layer again plugs into the network interface classes in the I/O Kit, which again communicates with a physical network device through its driver.
Four key data structures are used in the BSD network stack:
socket
structure represents open sockets in user space or kernel space and is accessed using file descriptors from user space.domain
structure is used to describe protocol families, such as IP version 4 (PF_INET
), IP version 6 (PF_INET6), or the local domain (PF_LOCAL
/PF_UNIX
).protosw
describes individual protocol handlers for each supported protocol, such as IPv4, IPv6, TCP, UDP, ICMP, IGMP, or RAW. Protocols accessible through the sockets interface, such as TCP and UDP, are referred to by the identifiers SOCK_STREAM
and SOCK_DGRAM, respectively, when an AF_INET
socket is used.ifnet
structure describes a network interface. Each interface listed by the command ifconfig
, such as en0
, en1
, and lo0
, is backed by an ifnet
structure. An ifnet
structure is also defined for each I/O Kit network driver. An I/O Kit driver doesn't need to interface with the structure directly, as the IONetworkInterface
class provides an abstraction for it.Another feature of the XNU kernel is the network kernel extensions (NKE) mechanism. NKE allows filters to be inserted at various levels of the network stack, such as in the sockets layer or IP layer. The NKE architecture allows you to write custom routing algorithms, and implement new protocols and virtual network interfaces. It can also be used for packet filtering and logging. Furthermore, the kernel supports the Berkeley Packet Filter (BPF), which allows raw network traffic to be routed to user space for analysis with tools such as tcpdump
. We will look at the NKE system in more detail later in this chapter, as well as how to implement drivers for network devices in the I/O Kit.
To get the most out of this chapter, it is necessary that you have some understanding of networking, of concepts such as TCP/IP and Ethernet, and that you are familiar with the layers of the OSI model.
Network Memory Buffers, or mbufs
, is a fundamental data structure in BSD UNIX systems, including Mac OS X and iOS. While it is mostly a concept of the BSD network layer, you will also encounter the mbuf
data structure when writing I/O network drivers. The structure is used to represent network packets and their metadata. The structure is not exposed to user space. The mbuf
structure is shown in Listing 13-1.
Listing 13-1. The mbuf Data Structure
struct mbuf {
struct m_hdr m_hdr;
union {
struct {
struct pkthdr MH_pkthdr; /* M_PKTHDR set */
union {
struct m_ext MH_ext; /* M_EXT set */
char MH_databuf[_MHLEN];
} MH_dat;
} MH;
char M_databuf[_MLEN]; /* !M_PKTHDR, !M_EXT */
} M_dat;
};
The complete mbuf
structure is fixed size and is currently 256 bytes long. This size includes both the header and the data held by the structure. To get the number of bytes available for data storage: (256 – sizeof(struct m_hdr))
. To describe larger packets, multiple mbufs
are linked together in a linked list as shown in Figure 13-2.
Figure 13-2. A chain of mbuf structures
A list of mbufs
is called a chain. In Figure 13-2, a chain of three mbufs
, each describing a packet, is shown. Each mbuf
may contain chains of other mbufs
making up the complete network packet.
To reduce overhead with large packets, mbufs
can have their structure point to an external buffer instead of using the internal storage of the mbuf
. An mbuf
structure with an external buffer is referred to as a cluster. The MH_ext
field is used to describe the external buffer. The mbuf
header (m_hdr
) is located at the start of the structure and contains the length of the mbuf's
data, which is stored in the mh_len
field. The header also contains the pointers for the next buffer in the chain, and the next entry in a list/or queue, which usually represent a new packet; however, mbufs
can also be used for storage of other control information. The mh_type
and mh_flags
are used to determine the type and options of an mbuf
—for example, whether it has an associated external buffer. If an mbuf
represents the start of a packet, the MH_PKTHDR
will be set, and if the mbuf
has external data, the MH_EXT
flag will be set, which means that it is safe to access the mbuf's
MH_pkthdr
or MH_ext
structures.
While the mbuf
structure is found in many UNIX variants, the programming interface for working with them differs between platforms. The XNU kernel offers the mbuf KPI for working with mbufs
. The idea of the KPI is to treat the mbuf
as an opaque structure, which is only manipulated by KPI functions instead of accessing structure fields directly. This allows the mbuf
implementation to change under the hood but still remain binary and source compatible with code that uses KPI. For this reason, when manipulating mbufs
, we do not use the mbuf
structure directly but rather use the handle mbuf_t
as a reference.
Tip The mbuf KPI header file is
bsd/sys/mbuf.h
. The full documentation for the KPI can be found at http://developer.apple.com/library/mac/#documentation/Darwin/Reference/KernelIOKitFramework/kpi_mbuf_h/.
Getting data in and out of mbufs
can be achieved with the following functions:
errno_t mbuf_copydata(const mbuf_t mbuf, size_t offset, size_t length, void *out_data);
errno_t mbuf_copyback(mbuf_t mbuf, size_t offset, size_t length, const void *data, mbuf_how_t how);
It is not always possible to use bcopy()
or similar functions directly, because data in mbufs
may be scattered over several structures or external buffers. The preceding functions simplify this task significantly. However, if the buffer is known to be contiguous, the mbuf_data()
function can retrieve the pointer to the data area of the mbuf
. The mbuf_copydata()
function copies data from an mbuf
(chain) to the memory location pointed to by the out_data
parameter, which should be large enough to hold length
bytes.
The mbuf_copyback()
does the reverse and allows you to copy data back to an mbuf
. If the mbuf
is not large enough, the function will grow the buffer by appending more mbufs
to form a chain. The last parameter how
should be either MBUF_WAITOK
or MBUF_DONTWAIT
, which indicates to the function whether it is allowed to block while allocating memory. In an interrupt routine or performance critical path, MBUF_DONTWAIT
must be used and, generally, where possible, MBUF_DONTWAIT
is preferred.
The mbuf KPI offers several ways to construct new mbufs
as shown here:
errno_t mbuf_allocpacket(mbuf_how_t how, size_t packetlen, unsigned int *maxchunks, mbuf_t *mbuf);
errno_t mbuf_allocpacket_list(unsigned int numpkts, mbuf_how_t how,
size_t packetlen, unsigned int *maxchunks, mbuf_t *mbuf);
errno_t mbuf_tag_allocate(mbuf_t mbuf, mbuf_tag_id_t module_id,
mbuf_tag_type_t type, size_t length, mbuf_how_t how, void **data_p);
Following is a brief description of the preceding functions:
mbuf_allocpacket()
allocates a chain of mbufs
with a leading packet header of the specified length. maxchunks
is an input/output parameter that specifies the maximum length of the chain. If NULL
is specified, there is no limit.mbuf_allocpacket_list()
is identical to mbuf_allocpacket()
but generates a list of mbuf
chains instead.mbuf_tag_allocate()
allocates an mbuf
but also allows one to specify additional data (tag) that will be passed along with the mbuf
as it travels through the stack. The tag can be retrieved again by using the mbuf_tag_find()
function.Besides allocating and copying data in and out of an mbuf
, a common operation is to iterate through an mbuf
chain using the mbuf_next()
macro:
void walk_mbuf(mbuf_t mbuf_head)
{
mbuf_t mb;
unsigned char* data;
size_t len;
for (mb = mbuf_head; mb; mb = mbuf_next(mb))
{
data = (unsigned char*)mbuf_data(mb); // get pointer to data
len = mbuf_len(mb); // get length of this segment
}
}
3.15.186.124