As the computer world becomes more networked, network-aware applications are increasingly important. Linux provides the Berkeley socket API, which has become the standard networking API. We discuss the basics of using Berkeley sockets for both TCP/IP networking and simple interprocess communication (IPC) through Unix-domain sockets.
This chapter is not intended to be a complete guide to network programming. Network programming is a complicated topic, and we recommend dedicated network programming books for programmers who intend to do serious work with sockets [Stevens, 2004]. This chapter should be sufficient to allow you to write simple networked applications, however.
The Berkeley socket API was designed as a gateway to multiple protocols. Although this does necessitate extra complexity in the interface, it is much easier than inventing (or learning) a new interface for every new protocol you encounter. Linux uses the socket API for many protocols, including TCP/IP (both version 4 and version 6), AppleTalk, and IPX.
We discuss using sockets for two of the protocols available through Linux’s socket implementation. The most important protocol that Linux supports is TCP/IP,[1] which is the protocol that drives the Internet. We also cover Unix domain sockets, an IPC mechanism restricted to a single machine. Although they do not work across networks, Unix domain sockets are widely used for applications that run on a single computer.
Protocols normally come in groups, or protocol families. The popular TCP/IP protocol family includes the TCP and UDP protocols (among others). Making sense of the various protocols requires you to know a few networking terms.
Most users consider networking protocols to provide the equivalent of Unix pipes between machines. If a byte (or sequence of bytes) goes in one end of the connection, it is guaranteed to come out the other end. Not only is it guaranteed to come out the other end, but it also comes out right after the byte that was sent before it and immediately before the byte that was sent after it. Of course, all of these bytes should be received exactly as they were sent; no bytes should change. Also, no other process should be able to interject extra bytes into the conversation; it should be restricted to the original two parties.
A good visualization of this idea is the telephone. When you speak to your friends, you expect them to hear the same words you speak, and in the order you speak them.[2] Not only that, you do not expect your mother to pick up her phone (assuming she is not in the same house as you) and start chatting away happily to you and your friend.
Although this may seem pretty basic, it is not at all how underlying computer networks work. Networks tend to be chaotic and random. Imagine a first-grade class at recess, except they are not allowed to speak to each other and they have to stay at least five feet apart. Now, chances are those kids are going to find some way to communicate—perhaps even with paper airplanes!
Imagine that whenever students want to send letters to one another they simply write the letters on pieces of paper, fold them into airplanes, write the name of the intended recipient on the outside, and hurl them toward someone who is closer to the final recipient than the sender is. This intermediate looks at the airplane, sees who the intended target is, and sends it toward the next closest person. Eventually, the intended recipient will (well, may) get the airplane and unfold it to read the message.
Believe it or not, this is almost exactly how computer networks operate.[3] The intermediaries are called routers and the airplanes are called packets, but the rest is the same. Just as in the first-grade class, some of those airplanes (or packets) are going to get lost. If a message is too long to fit in a single packet, it must be split across multiple ones (each of which may be lost). All the students in between can read the packets if they like[4] and may simply throw the message away rather than try to deliver it. Also, anyone can interrupt your conversation by sending new packets into the middle of it.
Confronted with the reality of millions of paper airplanes, protocol designers endeavor to present a view of the network more on par with the telephone than the first-grade class. Various terms have evolved to describe networking protocols.
Connection-oriented protocols have two endpoints, like a telephone conversation. The connection must be established before any communication takes place, just as you answer the phone by saying “hello” rather than just talking immediately. Other users cannot (or should not be able to) intrude into the connection. Protocols that do not have these characteristics are known as connectionless.
Protocols provide sequencing if they ensure the data arrives in the same order it was sent.
Protocols provide error control if they automatically discard messages that have been corrupted and arrange to retransmit the data.
Streaming protocols recognize only byte boundaries. Sequences of bytes may be split up and are delivered to the recipient as the data arrives.
Packet-based protocols handle packets of data, preserving the packet boundaries and delivering complete packets to the receiver. Packet-based protocols normally enforce a maximum packet size.
Although each of these attributes is independent of the others, two major types of protocols are commonly used by applications. Datagram protocols are packet-oriented transports that provide neither sequencing nor error control; UDP, part of the TCP/IP protocol family, is a widely used datagram protocol. Stream protocols, such as the TCP portion of TCP/IP, are streaming protocols that provide both sequencing and error control.
Although datagram protocols, such as UDP, can be useful,[5] we focus on using stream protocols because they are easier to use for most applications. More information on protocol design and the differences between various protocols is available from many books [Stevens, 2004] [Stevens, 1994].
As every protocol has its own definition of a network address, the sockets API must abstract addresses. It uses a struct sockaddr
as the basic form of an address; its contents are defined differently for each protocol family. Whenever a struct sockaddr
is passed to a system call, the process also passes the size of the address that is being passed. The type socklen_t
is defined as a numeric type large enough to hold the size of any socket address used by the system.
All struct sockaddr
types conform to the following definition:
#include <sys/socket.h> struct sockaddr { unsigned short sa_family; char sa_data[MAXSOCKADDRDATA]; }
The first two bytes (the size of a short
) specifies the address family this address belongs to. A list of the common address families that Linux applications use is in Table 17.1, on page 413.
Table 17.1. Protocol and Address Families
Address | Protocol | Protocol Description |
---|---|---|
|
| Unix domain |
|
| TCP/IP (version 4) |
|
| TCP/IP (version 6) |
|
| AX.25, used by amateur radio |
|
| Novell IPX |
|
| AppleTalk DDS |
|
| NetROM, used by amateur radio |
All of the examples in this section use two functions, copyData()
and die().copyData()
reads data from a file descriptor and writes it to another as long as data is left to be read. die()
calls perror()
and exits the program. We put both of these functions in the file sockutil.c to keep the example programs a bit cleaner. For reference, here is the implementation of these two functions:
1: /* sockutil.c */ 2: 3: #include <stdio.h> 4: #include <stdlib.h> 5: #include <unistd.h> 6: 7: #include "sockutil.h" 8: 9: /* issue an error message via perror() and terminate the program */ 10: void die(char * message) { 11: perror(message); 12: exit(1); 13: } 14: 15: /* Copies data from file descriptor 'from' to file descriptor 16: 'to' until nothing is left to be copied. Exits if an error 17: occurs. This assumes both from and to are set for blocking 18: reads and writes. */ 19: void copyData(int from, int to) { 20: char buf[1024]; 21: int amount; 22: 23: while ((amount = read(from, buf, sizeof(buf))) > 0) { 24: if (write(to, buf, amount) != amount) { 25: die("write"); 26: return; 27: } 28: } 29: if (amount < 0) 30: die("read"); 31: }
Like most other Linux resources, sockets are implemented through the file abstraction. They are created through the socket()
system call, which returns a file descriptor. Once the socket has been properly initialized, that file descriptor may be used for read()
and write()
requests, like any other file descriptor. When a process is finished with a socket, it should be close()
ed to free the resources associated with it.
This section presents the basic system calls for creating and initializing sockets for any protocol. It is a bit abstract due to this protocol independence, and does not contain any examples for the same reason. The next two sections of this chapter describe how to use sockets with two different protocols, Unix Domain and TCP/IP, and those sections include full examples of how to use most of the system calls introduced here.
New sockets are created by the socket()
system call, which returns a file descriptor for the uninitialized socket. The socket is tied to a particular protocol when it is created, but it is not connected to anything. As it is not connected, it cannot yet be read from or written to.
#include <sys/socket.h> int socket(int domain, int type, int protocol);
Like open(), socket()
returns a value less than 0 on error and a file descriptor, which is greater than or equal to 0, on success. The three parameters specify the protocol to use.
The first parameter specifies the protocol family that should be used and is usually one of the values specified in Table 17.1.
The next parameter, type
, is SOCK_STREAM, SOCK_DGRAM
, or SOCK_RAW
.[6] SOCK_STREAM
specifies a protocol from the specified family that provides a stream connection, whereas SOCK_DGRAM
specifies a datagram protocol from the same family. SOCK_RAW
provides the ability to send packets directly to a network device driver, which enables user space applications to provide networking protocols that are not understood by the kernel.
The final parameter specifies which protocol is to be used, subject to the constraints specified by the first two parameters. Usually this parameter is 0, letting the kernel use the default protocol of the specified type and family. For the PF_INET
protocol family, Table 17.2 lists some of protocols allowed, with IPPROTO_TCP
being the default stream protocol and IPPROTO_UDP
the default datagram protocol.
Table 17.2. IP Protocols
Protocol | Description |
---|---|
| Internet Control Message Protocol for IPv4 |
| Internet Control Message Protocol for IPv6 |
| IPIP tunnels |
| IPv6 headers |
| Raw IP packets |
| Transmission Control Protocol (TCP) |
| User Datagram Protocol (UDP) |
After you create a stream socket, it needs to be connected to something before it is of much use. Establishing socket connections is an inherently asymmetric task; each side of the connection does it differently. One side gets its socket ready to be connected to something and then waits for someone to connect to it. This is usually done by server applications that are started and continuously run, waiting for other processes to connect to them.
Client processes instead create a socket, tell the system which address they want to connect it to, and then try to establish the connection. Once the server (which has been waiting for a client) accepts the connection attempt, the connection is established between the two sockets. After this happens, the socket may be used for bidirectional communication.
Both server and client processes need to tell the system which address to use for the socket. Attaching an address to the local side of a socket is called binding the socket and is done through the bind()
system call.
#include <sys/socket.h> int bind(int sock, struct sockaddr * my_addr, socklen_t addrlen);
The first parameter is the socket being bound, and the other parameters specify the address to use for the local endpoint.
After creating a socket, server processes bind()
the socket to the address they are listening to. After the socket is bound to an address, the process tells the system it is willing to let other processes establish connections to that socket (at the specified address) by calling listen()
. Once a socket is bound to an address, the kernel is able to handle processes’ attempts to connect to that address. However, the connection is not immediately established. The listen()
ing process must first accept the connection attempt through the accept()
system call. New connection attempts that have been made to addresses that have been listen()
ed to are called pending connections until the connections has been accept()
ed.
Normally, accept()
blocks until a client process tries to connect to it. If the socket has been marked as nonblocking through fcntl(), accept()
instead returns EAGAIN
if no client process is available.[7] The select(), poll()
, and epoll
system calls may also be used to determine whether a connection to a socket is pending (those calls mark the socket as ready to be read from).[8]
Here are the prototypes of listen()
and accept()
.
#include <sys/socket.h> int listen(int sock, int backlog); int accept(int sock, struct sockaddr * addr, socklen_t * addrlen);
Both of these functions expect the socket’s file descriptor as the first parameter. listen()
’s other parameter, backlog
, specifies how many connections may be pending on the socket before further connection attempts are refused. Network connections are not established until the server has accept()
ed the connection; until the accept()
, the incoming connection is considered pending. By providing a small queue of pending connections, the kernel relaxes the need for server processes to be constantly prepared to accept()
connections. Applications have historically set the maximum backlog to five, although a larger value may sometimes be necessary. listen()
returns zero on success and nonzero on failure.
The accept()
call changes a pending connection to an established connection. The established connection is given a new file descriptor, which accept()
returns. The new file descriptor inherits its attributes from the socket that was listen()
ed to. One unusual feature of accept()
is that it returns networking errors that are pending as errors from accept()
.[9] Servers should not abort when accept()
returns an error if errno
is one of ECONNABORTED, ENETDOWN, EPROTO, ENOPROTOOPT, EHOSTDOWN, ENONET, EHOSTUNREACH, EOPNOTSUPP
, or ENETUNREACH
. All of these should be ignored, with the server just calling accept()
once more.
The addr
and addrlen
parameters point to data that the kernel fills in with the address of the remote (client) end of the connection. Initially, addrlen
should point to an integer containing the size of the buffer addr
points to. accept()
returns a file descriptor, or less than zero if an error occurs, just like open()
.
Like servers, clients may bind()
the local address to the socket immediately after creating it. Usually, the client does not care what the local address is and skips this step, allowing the kernel to assign it any convenient local address.
After the bind()
step (which may be omitted), the client connect()
s to a server.
#include <sys/socket.h> int connect(int sock, struct sockaddr * servaddr, socklen_t addrlen);
The process passes to connect()
the socket that is being connected, followed by the address to which the socket should be connected.
Figure 17.1 shows the system calls usually used to establish socket connections, and the order in which they occur.
After a connection has been established, applications can find the addresses for both the local and remote end of a socket by using getpeername()
and getsockname()
.
#include <sys/socket.h> int getpeername(int s, struct sockaddr * addr, socklen_t * addrlen); int getsockname(int s, struct sockaddr * addr, socklen_t * addrlen);
Both functions fill in the structures pointed to by their addr
parameters with addresses for the connection used by socket s
. The address for the remote side is returned by getpeername()
, while getsockname()
returns the address for the local part of the connection. For both functions, the integer pointed to by addrlen
should be initialized to the amount of space pointed to by addr
, and that integer is changed to the number of bytes in the address returned.
Unix domain sockets are the simplest protocol family available through the sockets API. They do not actually represent a network protocol; they can connect only to sockets on the same machine. Although this restricts their usefulness, they are used by many applications because they provide a flexible IPC mechanism. Their addresses are pathnames that are created in the file system when a socket is bound to the pathname. Socket files, which represent Unix domain addresses, can be stat()
ed but cannot be opened through open()
; the socket API must be used instead.
The Unix domain provides both datagram and stream interfaces. The datagram interface is rarely used and is not discussed here. The stream interface, which is discussed here, is similar to named pipes. Unix domain sockets are not identical to named pipes, however.
When multiple processes open a named pipe, any of the processes may read a message sent through the pipe by another process. Each named pipe is like a bulletin board. When a process posts a message to the board, any other process (with sufficient permission) may take the message from the board.
Unix domain sockets are connection-oriented; each connection to the socket results in a new communication channel. The server, which may be handling many simultaneous connections, has a different file descriptor for each. This property makes Unix domain sockets much better suited to many IPC tasks than are named pipes. This is the primary reason they are used by many standard Linux services, including the X Window System and the system logger.
Addresses for Unix domain sockets are pathnames in the file system. If the file does not already exist, it is created as a socket-type file when a socket is bound to the pathname through bind()
. If a file (even a socket) exists with the pathname being bound, bind()
fails and returns EADDRINUSE.bind()
sets the permissions of newly created socket files to 0666, as modified by the current umask.
To connect()
to an existing socket, the process must have read and write permissions for the socket file.[10]
Unix domain socket addresses are passed through a struct sockaddr_un
structure.
#include <sys/socket.h> #include <sys/un.h> struct sockaddr_un { unsigned short sun_family; /* AF_UNIX */ char sun_path[UNIX_PATH_MAX]; /* pathname */ };
In the Linux 2.6.7 kernel, UNIX_PATH_MAX
is 108, but that may change in future versions of the Linux kernel.
The first member, sun_family
, must contain AF_UNIX
to indicate that the structure contains a Unix domain address. The sun_path
holds the pathname to use for the connection. When the size of the address is passed to and from the socket-related system calls, the passed length should be the number of characters in the pathname plus the size of the sun_family
member. The sun_path
does not need to be ' '
terminated, although it usually is.
Listening for a connection to be established on a Unix domain socket follows the procedure we described earlier: Create the socket, bind()
an address to the socket, tell the system to listen()
for connection attempts, and then accept()
the connection.
Here is a simple server that repeatedly accepts connections on a Unix domain socket (the file sample-socket
in the current directory) and reads all the data available from the socket, displaying it on standard output:
1: /* userver.c */ 2: 3: /* Waits for a connection on the ./sample-socket Unix domain 4: socket. Once a connection has been established, copy data 5: from the socket to stdout until the other end closes the 6: connection, and then wait for another connection to the 7: socket. */ 8: 9: #include <stdio.h> 10: #include <sys/socket.h> 11: #include <sys/un.h> 12: #include <unistd.h> 13: 14: #include "sockutil.h" /* some utility functions */ 15: 16: int main(void) { 17: struct sockaddr_un address; 18: int sock, conn; 19: size_t addrLength; 20: 21: if ((sock = socket(PF_UNIX, SOCK_STREAM, 0)) < 0) 22: die("socket"); 23: 24: /* Remove any preexisting socket (or other file) */ 25: unlink("./sample-socket"); 26: 27: address.sun_family = AF_UNIX; /* Unix domain socket */ 28: strcpy(address.sun_path, "./sample-socket"); 29: 30: /* The total length of the address includes the sun_family 31: element */ 32: addrLength = sizeof(address.sun_family) + 33: strlen(address.sun_path); 34: 35: if (bind(sock, (struct sockaddr *) &address, addrLength)) 36: die("bind"); 37: 38: if (listen(sock, 5)) 39: die("listen"); 40: 41: while ((conn = accept(sock, (struct sockaddr *) &address, 42: &addrLength)) >= 0) { 43: printf("---- getting data "); 44: copyData(conn, 1); 45: printf("---- done "); 46: close(conn); 47: } 48: 49: if (conn < 0) 50: die("accept"); 51: 52: close(sock); 53: return 0; 54: }
Although this program is small, it illustrates how to write a simple server process. This server is an iterative server because it handles one client at a time. Servers may also be written as concurrent servers, which handle multiple clients simultaneously.[11]
Notice the unlink()
call before the socket is bound. Because bind()
fails if the socket file already exists, this allows the program to be run more than once without requiring that the socket file be manually removed.
The server code typecasts the struct sockaddr_un
pointer passed to both bind()
and accept()
to a (struct sockaddr *
). All the various socket-related system calls are prototyped as taking a pointer to struct sockaddr
; the typecast keeps the compiler from complaining about pointer type mismatches.
Connecting to a server through a Unix domain socket consists of creating a socket and connect()
ing to the desired address. Once the socket is connected, it may be treated like any other file descriptor.
The following program connects to the same socket that the example server uses and copies its standard input to the server:
1: /* uclient.c */ 2: 3: /* Connect to the ./sample-socket Unix domain socket, copy stdin 4: into the socket, and then exit. */ 5: 6: #include <sys/socket.h> 7: #include <sys/un.h> 8: #include <unistd.h> 9: 10: #include "sockutil.h" /* some utility functions */ 11: 12: int main(void) { 13: struct sockaddr_un address; 14: int sock; 15: size_t addrLength; 16: 17: if ((sock = socket(PF_UNIX, SOCK_STREAM, 0)) < 0) 18: die("socket"); 19: 20: address.sun_family = AF_UNIX; /* Unix domain socket */ 21: strcpy(address.sun_path, "./sample-socket"); 22: 23: /* The total length of the address includes the sun_family 24: element */ 25: addrLength = sizeof(address.sun_family) + 26: strlen(address.sun_path); 27: 28: if (connect(sock, (struct sockaddr *) &address, addrLength)) 29: die("connect"); 30: 31: copyData(0, sock); 32: 33: close(sock); 34: 35: return 0; 36: }
The client is not much different than the server. The only changes were replacing the bind(), listen(), accept()
sequence with a single connect()
call and copying a slightly different set of data.
The previous two example programs, one a server and the other a client, are designed to work together. Run the server from one terminal, then run the client from another terminal (but in the same directory). As you type lines into the client, they are sent through the socket to the server. When you exit the client, the server waits for another connection. You can transmit files through the socket by redirecting the input to the client program.
Because Unix domain sockets have some advantages over pipes (such as being full duplex), they are often used as an IPC mechanism. To facilitate this, the socketpair()
system call was introduced.
#include <sys/socket.h> int socketpair(int domain, int type, int protocol, int sockfds[2]);
The first three parameters are the same as those passed to socket()
. The final parameter, sockfds()
, is filled in by socketpair()
with two file descriptors, one for each end of the socket. A sample application of socketpair()
is shown on page 425.
Unix domain sockets have a unique ability: File descriptors can be passed through them. No other IPC mechanism supports this facility. It allows a process to open a file and pass the file descriptor to another—possibly unrelated—process. All the access checks are done when the file is opened, so the receiving process gains the same access rights to the file as the original process.
File descriptors are passed as part of a more complicated message that is sent using the sendmsg()
system call and received using recvmsg()
.
#include <sys/socket.h> int sendmsg(int fd, const struct msghdr * msg, unsigned int flags); int recvmsg(int fd, struct msghdr * msg, unsigned int flags);
The fd
parameter is the file descriptor through which the message is transmitted; the second parameter is a pointer to a structure describing the message. The flags
are not usually used and should be set to zero for most applications. More advanced network programming books discuss the available flags [Stevens, 2004].
A message is described by the following structure:
#include <sys/socket.h> #include <sys/un.h> struct msghdr { void * msg_name; /* optional address */ unsigned int msg_namelen; /* size of msg_name */ struct iovec * msg_iov; /* scatter/gather array */ unsigned int msg_iovlen; /* number of elements in msg_iov */ void * msg_control; /* ancillary data */ unsigned int msg_controllen; /* ancillary data buffer len */ int msg_flags; /* flags on received message */ };
The first two members, msg_name
and msg_namelen
, are not used with stream protocols. Applications that send messages across stream sockets should set msg_name
to NULL
and msg_namelen
to zero.
msg_iov
and msg_iovlen
describe a set of buffers that are sent or received. Scatter/gather reads and writes, as well as struct iovec
, are discussed on pages 290-291. The final member of the structure, msg_flags
, is not currently used and should be set to zero.
The two members we skipped over, msg_control
and msg_controllen
, provide the file descriptor passing ability. The msg_control
member points to an array of control message headers; msg_controllen
specifies how many bytes the array contains. Each control message consists of a struct cmsghdr
followed by extra data.
#include <sys/socket.h> struct cmsghdr { unsigned int cmsg_len; /* length of control message */ int cmsg_level; /* SOL_SOCKET */ int cmsg_type; /* SCM_RIGHTS */ int cmsg_data[0]; /* file descriptor goes here */ };
The size of the control message, including the header, is stored in cmsg_len
. The only type of control message currently defined is SCM_RIGHTS
, which passes file descriptors.[12] For this message type, cmsg_level
and cmsg_type
must be set to SOL_SOCKET
and SCM_RIGHTS
, respectively. The final member, cmsg_data
, is an array of size zero. This is a gcc extension that allows an application to copy data to the end of the structure (see the following program for an example of this).
Receiving a file descriptor is similar. Enough buffer space must be left for the control message, and a new file descriptor follows each struct cmsghdr
that arrives.
To illustrate the use of these nested structures, we wrote an example program that is a fancy cat
. It takes a file name as its sole argument, opens the specified file in a child process, and passes the resulting file descriptor to the parent through a Unix domain socket. The parent then copies the file to standard output. The file name is sent along with the file descriptor for illustrative purposes.
1: /* passfd.c */ 2: 3: /* We behave like a simple /bin/cat, which only handles one 4: argument (a file name). We create Unix domain sockets through 5: socketpair(), and then fork(). The child opens the file whose 6: name is passed on the command line, passes the file descriptor 7: and file name back to the parent, and then exits. The parent 8: waits for the file descriptor from the child, then copies data 9: from that file descriptor to stdout until no data is left. The 10: parent then exits. */ 11: 12: #include <alloca.h> 13: #include <fcntl.h> 14: #include <stdio.h> 15: #include <string.h> 16: #include <sys/socket.h> 17: #include <sys/uio.h> 18: #include <sys/un.h> 19: #include <sys/wait.h> 20: #include <unistd.h> 21: 22: #include "sockutil.h" /* simple utility functions */ 23: 24: /* The child process. This sends the file descriptor. */ 25: int childProcess(char * filename, int sock) { 26: int fd; 27: struct iovec vector; /* some data to pass w/ the fd */ 28: struct msghdr msg; /* the complete message */ 29: struct cmsghdr * cmsg; /* the control message, which */ 30: /* wil linclude the fd */ 31: 32: /* Open the file whose descriptor will be passed. */ 33: if ((fd = open(filename, O_RDONLY)) < 0) { 34: perror("open"); 35: return 1; 36: } 37: 38: /* Send the file name down the socket, including the trailing 39: ' ' */ 40: vector.iov_base = filename; 41: vector.iov_len = strlen(filename) + 1; 42: 43: /* Put together the first part of the message. Include the 44: file name iovec */ 45: msg.msg_name = NULL; 46: msg.msg_namelen = 0; 47: msg.msg_iov = &vector; 48: msg.msg_iovlen = 1; 49: 50: /* Now for the control message. We have to allocate room for 51: the file descriptor. */ 52: cmsg = alloca(sizeof(struct cmsghdr) + sizeof(fd)); 53: cmsg->cmsg_len = sizeof(struct cmsghdr) + sizeof(fd); 54: cmsg->cmsg_level = SOL_SOCKET; 55: cmsg->cmsg_type = SCM_RIGHTS; 56: 57: /* copy the file descriptor onto the end of the control 58: message */ 59: memcpy(CMSG_DATA(cmsg), &fd, sizeof(fd)); 60: 61: msg.msg_control = cmsg; 62: msg.msg_controllen = cmsg->cmsg_len; 63: 64: if (sendmsg(sock, &msg, 0) != vector.iov_len) 65: die("sendmsg"); 66: 67: return 0; 68: } 69: 70: /* The parent process. This receives the file descriptor. */ 71: int parentProcess(int sock) { 72: char buf[80]; /* space to read file name into */ 73: struct iovec vector; /* file name from the child */ 74: struct msghdr msg; /* full message */ 75: struct cmsghdr * cmsg; /* control message with the fd */ 76: int fd; 77: 78: /* set up the iovec for the file name */ 79: vector.iov_base = buf; 80: vector.iov_len = 80; 81: 82: /* the message we're expecting to receive */ 83: 84: msg.msg_name = NULL; 85: msg.msg_namelen = 0; 86: msg.msg_iov = &vector; 87: msg.msg_iovlen = 1; 88: 89: /* dynamically allocate so we can leave room for the file 90: descriptor */ 91: cmsg = alloca(sizeof(struct cmsghdr) + sizeof(fd)); 92: cmsg->cmsg_len = sizeof(struct cmsghdr) + sizeof(fd); 93: msg.msg_control = cmsg; 94: msg.msg_controllen = cmsg->cmsg_len; 95: 96: if (!recvmsg(sock, &msg, 0)) 97: return 1; 98: 99: printf("got file descriptor for '%s' ", 100: (char *) vector.iov_base); 101: 102: /* grab the file descriptor from the control structure */ 103: memcpy(&fd, CMSG_DATA(cmsg), sizeof(fd)); 104: 105: copyData(fd, 1); 106: 107: return 0; 108: } 109: 110: int main(int argc, char ** argv) { 111: int socks[2]; 112: int status; 113: 114: if (argc != 2) { 115: fprintf(stderr, "only a single argument is supported "); 116: return 1; 117: } 118: 119: /* Create the sockets. The first is for the parent and the 120: second is for the child (though we could reverse that 121: if we liked. */ 122: if (socketpair(PF_UNIX, SOCK_STREAM, 0, socks)) 123: die("socketpair"); 124: 125: if (!fork()) { 126: /* child */ 127: close(socks[0]); 128: return childProcess(argv[1], socks[1]); 129: } 130: 131: /* parent */ 132: close(socks[1]); 133: parentProcess(socks[0]); 134: 135: /* reap the child */ 136: wait(&status); 137: 138: if (WEXITSTATUS(status)) 139: fprintf(stderr, "child failed "); 140: 141: return 0; 142: }
The primary use for sockets is to allow applications running on different machines to talk to one another. The TCP/IP protocol family [Stevens, 1994] is the protocol used on the Internet, the largest set of networked computers in the world. Linux provides a complete, robust TCP/IP implementation that allows it to act as both a TCP/IP server and client.
The most widely deployed version of TCP/IP is version 4 (IPv4). Version 6 of TCP/IP (IPv6) has become available for most operating systems and network infrastructure products, although IPv4 is still dominant. We concentrate here on writing applications for IPv4, but we touch on the differences for IPv6 applications, as well as for programs that need to support both.
TCP/IP networks are usually heterogenous; they include a wide variety of machines and architectures. One of the most common differences between architectures is how they store numbers.
Computer numbers are made up of a sequence of bytes. C integers are commonly 4 bytes (32 bits), for example. There are quite a few ways of storing those four bytes in memory. Big-endian architectures store the most significant byte at the lowest hardware address, and the other bytes follow in order from most significant to least significant. Little-endian machines store multibyte values in exactly the opposite order: The least significant byte is stored at the smallest memory address. Other machines store bytes in different orders yet.
Because multiple byte quantities are needed as part of the TCP/IP protocol, the protocol designers adopted a single standard for how multibyte values are sent across the network.[13] TCP/IP mandates that big-endian byte order be used for transmitting protocol information and suggests that it be used for application data, as well (although no attempt is made to enforce the format of an application’s data stream).[14] The ordering used for multibyte values sent across the network is known as the network byte order.
Four functions are available for converting between host byte order and network byte order:
#include <netinet/in.h> unsigned int htonl(unsigned int hostlong); unsigned short htons(unsigned short hostshort); unsigned int ntohl(unsigned int netlong); unsigned short ntohs(unsigned short netshort);
Although each of these functions is prototyped for unsigned quantities, they all work fine for signed quantities, as well.
The first two functions, htonl()
and htons()
, convert longs and shorts, respectively, from host order to network order. The final two, ntohl()
and ntohs()
, convert longs and shorts from network order to the host byte ordering.
Although we use the term long in the descriptions, that is a misnomer. htonl()
and ntohl()
both expect 32-bit quantities, not values that are C long
s. We prototype both functions as manipulating int
values, as all Linux platforms currently use 32-bit integers.
IPv4 connections are a 4-tuple of (local host, local port, remote host, remote port). Each part of the connection must be determined before a connection can be established. Local host and remote host are each IPv4 addresses. IPv4 addresses are 32-bit (4-byte) numbers unique across the entire connected network. Usually they are written as aaa. bbb. ccc. ddd, with each element in the address being the decimal representation of one of the bytes in the machine’s address. The left-most number in the address corresponds to the most significant byte in the address. This format for IPv4 addresses is known as dotted-decimal notation.
As most machines need to run multiple concurrent TCP/IP applications, an IP number does not provide a unique identification for a connection on a single machine. Port numbers are 16-bit numbers that uniquely identify one endpoint of a connection on a single host. The combination of an IPv4 address and a port number identifies a connection endpoint anywhere on a single TCP/IP network (the Internet is a single TCP/IP network). Two connection endpoints form a TCP connection, so two IP number/port number pairs uniquely identify a TCP/IP connection on a network.
Determining which port numbers to use for various protocols is done by a part of the Internet standards known as well-known port numbers, maintained by the Internet Assigned Numbers Authority (IANA).[15] Common Internet protocols, such as ftp, telnet, and http, are each assigned a port number. Most servers provide those services at the assigned numbers, making them easy to find. Some servers are run at alternate port numbers, usually to allow multiple services to be provided by a single machine.[16] As well-known port numbers do not change, Linux uses a simple mapping between protocol names (commonly called services) and port numbers through the /etc/services
file.
Although the port numbers range from 0 to 65,535, Linux divides them into two classes. The reserved ports, numbering from 0 to 1,024, may be used only by processes running as root. This allows client programs to trust that a program running on a server is not a Trojan horse started by a user.[17]
IPv4 addresses are stored in struct sockaddr_in
, which is defined as follows:
#include <sys/socket.h> #include <netinet/in.h> struct sockaddr_in { short int sin_family; /* AF_INET */ unsigned short int sin_port; /* port number */ struct in_addr sin_addr; /* IP address */ }
The first member must be AF_INET
, indicating that this is an IP address. The next member is the port number in network byte order. The final member is the IP number of the machine for this TCP address. The IP number, stored in sin_addr
, should be treated as an opaque type and not accessed directly.
If either sin_port
or sin_addr
is filled with