An echo protocol

Before we write our first client and server programs, we need to decide how they are going to interact with each other, that is we need to design a protocol for their communication.

Our echo server should listen until a client connects and sends a bytes string, then we want it to echo that string back to the client. We only need a few basic rules for doing this. These rules are as follows:

  1. Communication will take place over TCP.
  2. The client will initiate an echo session by creating a socket connection to the server.
  3. The server will accept the connection and listen for the client to send a bytes string.
  4. The client will send a bytes string to the server.
  5. Once it sends the bytes string, the client will listen for a reply from the server
  6. When it receives the bytes string from the client, the server will send the bytes string back to the client.
  7. When the client has received the bytes string from the server, it will close its socket to end the session.

These steps are straightforward enough. The missing element here is how the server and the client will know when a complete message has been sent. Remember that an application sees a TCP connection as an endless stream of bytes, so we need to decide what in that byte stream will signal the end of a message.

Framing

This problem is called framing, and there are several approaches that we can take to handle it. The main ones are described here:

  1. Make it a protocol rule that only one message will be sent per connection, and once a message has been sent, the sender will immediately close the socket.
  2. Use fixed length messages. The receiver will read the number of bytes and know that they have the whole message.
  3. Prefix the message with the length of the message. The receiver will read the length of the message from the stream first, then it will read the indicated number of bytes to get the rest of the message.
  4. Use special character delimiters for indicating the end of a message. The receiver will scan the incoming stream for a delimiter, and the message comprises everything up to the delimiter.

Option 1 is a good choice for very simple protocols. It's easy to implement and it doesn't require any special handling of the received stream. However, it requires the setting up and tearing down of a socket for every message, and this can impact performance when a server is handling many messages at once.

Option 2 is again simple to implement, but it only makes efficient use of the network when our data comes in neat, fixed-length blocks. For example in a chat server the message lengths are variable, so we will have to use a special character, such as the null byte, to pad messages to the block size. This only works where we know for sure that the padding character will never appear in the actual message data. There is also the additional issue of how to handle messages longer than the block length.

Option 3 is usually considered as one of the best approaches. Although it can be more complex to code than the other options, the implementations are still reasonably straightforward, and it makes efficient use of bandwidth. The overhead imposed by including the length of each message is usually minimal as compared to the message length. It also avoids the need for any additional processing of the received data, which may be needed by certain implementations of option 4.

Option 4 is the most bandwidth-efficient option, and is a good choice when we know that only a limited set of characters, such as the ASCII alphanumeric characters, will be used in messages. If this is the case, then we can choose a delimiter character, such as the null byte, which will never appear in the message data, and then the received data can be easily broken into messages as this character is encountered. Implementations are usually simpler than option 3. Although it is possible to employ this method for arbitrary data, that is, where the delimiter could also appear as a valid character in a message, this requires the use of character escaping, which needs an additional round of processing of the data. Hence in these situations, it's usually simpler to use length-prefixing.

For our echo and chat applications, we'll be using the UTF-8 character set to send messages. The null byte isn't used in any character in UTF-8 except for the null byte itself, so it makes a good delimiter. Thus, we'll be using method 4 with the null byte as the delimiter to frame our messages.

So, our rule number 8 will become:

Messages will be encoded in the UTF-8 character set for transmission, and they will be terminated by the null byte.

Now, let's write our echo programs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.176.194