Handling data on persistent connections

A new problem which our persistent connection approach raises is that we can no longer assume that our socket.recv() call will contain data from only one message. In our echo server, because of how we have defined the protocol, we know that as soon as we see a null byte, the message that we have received is complete, and that the sender won't be sending anything further. That is, everything we read in the last socket.recv() call is a part of that message.

In our new setup, we'll be reusing the same connection to send an indefinite number of messages, and these won't be synchronized with the chunks of data that we will pull from each socket.recv(). Hence, it's quite possible that the data from one recv() call will contain data from multiple messages. For example, if we send the following:

caerphilly,
illchester,
brie

Then on the wire they will look like this:

caerphillyillchesterbrie

Due to the vagaries of network transmission though, a set of successive recv() calls may receive them as:

recv 1: caerphil
recv 2: lyillches
recv 3: terbrie

Notice that recv 1 and recv 2, when taken together contain a complete message, but they also contain the beginning of the next message. Clearly, we need to update our parsing. One option is to read data from the socket one byte at a time, that is, use recv(1), and check every byte to see if it's a null byte. This is a dismally inefficient way to use a network socket though. We want to read as much data in our call to recv() as we can. Instead, when we encounter an incomplete message we can cache the extraneous bytes and use them when we next call recv(). Lets do this, add these functions to the tincanchat.py file:

def parse_recvd_data(data):
    """ Break up raw received data into messages, delimited
        by null byte """
    parts = data.split(b'')
    msgs = parts[:-1]
    rest = parts[-1]
    return (msgs, rest)

def recv_msgs(sock, data=bytes()):
    """ Receive data and break into complete messages on null byte
       delimiter. Block until at least one message received, then
       return received messages """
    msgs = []
    while not msgs:
        recvd = sock.recv(4096)
        if not recvd:
            raise ConnectionError()
        data = data + recvd
        (msgs, rest) = parse_recvd_data(data)
    msgs = [msg.decode('utf-8') for msg in msgs]
    return (msgs, rest)

From now on, we'll be using recv_msgs() wherever we were using recv_msg() before. So, what are we doing here? Starting with a quick scan through recv_msgs(), you can see that it's similar to recv_msg(). We make repeated calls to recv() and accumulate the received data as before, but now we will be using parse_recvd_data() to parse it, with the expectation that it may contain multiple messages. When parse_recvd_data() finds one or more complete messages in the received data, it splits them into a list and returns them, and if there is anything left after the last complete message, then it additionally returns this using the rest variable. The recv_msgs() function then decodes the messages from UTF-8, and returns them and the rest variable.

The rest value is important because we will feed it back to recv_msgs() next time we call it, and it will be prefixed to the data from the recv() calls. In this way, the leftover data from the last recv_msgs() call won't be lost.

So, in our preceding example, parsing the messages would take place as shown here:

recv_msgs call

data argument

recv result

Accumulated data

msgs

rest

1

-

'caerphil'

'caerphil'

[]

b''

1

-

'lyillches'

'caerphillyillches'

['caerphilly']

'illches'

2

'illches'

'terbrie'

'illchesterbrie'

['illchester', 'brie']

b''

Here, we can see that the first recv_msgs() call doesn't return after its first iteration. It loops again because msgs is still empty. This is why the recv_msgs call numbers are 1, 1, and 2.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.157.34