Using a process to add a timeout to a function

Most, if not all, libraries that expose functions to make HTTP requests, provide the ability to specify a timeout when performing the request. This means that if after X seconds (X being the timeout), the request hasn't completed, the whole operation is aborted and execution resumes from the next instruction. Not all functions expose this feature though, so, when a function doesn't provide the ability to being interrupted, we can use a process to simulate that behavior. In this example, we'll be trying to translate a hostname into an IPv4 address.

The gethostbyname function, from the socket module, doesn't allow us to put a timeout on the operation though, so we use a process to do that artificially. The code that follows might not be so straightforward, so I encourage you to spend some time going through it before you read on for the explanation:

# hostres/util.py
import socket
from multiprocessing import Process, Queue

def resolve(hostname, timeout=5):
exitcode, ip = resolve_host(hostname, timeout)
if exitcode == 0:
return ip
else:
return hostname

def resolve_host(hostname, timeout):
queue = Queue()
proc = Process(target=gethostbyname, args=(hostname, queue))
proc.start()
proc.join(timeout=timeout)

if queue.empty():
proc.terminate()
ip = None
else:
ip = queue.get()
return proc.exitcode, ip

def gethostbyname(hostname, queue):
ip = socket.gethostbyname(hostname)
queue.put(ip)

Let's start from resolve. It simply takes a hostname and a timeout, and calls resolve_host with them. If the exit code is 0 (which means the process terminated correctly), it returns the IPv4 that corresponds to that host. Otherwise, it returns the hostname itself, as a fallback mechanism.

Next, let's talk about gethostbyname. It takes a hostname and a queue, and calls socket.gethostbyname to resolve the hostname. When the result is available, it is put into the queue. Now, this is where the issue lies. If the call to socket.gethostbyname takes longer than the timeout we want to assign, we need to kill it.

The resolve_host function does exactly this. It receives the hostname and the timeout, and, at first, it simply creates a queue. Then it spawns a new process that takes gethostbyname as the target, and passes the appropriate arguments. Then the process is started and joined on, but with a timeout.

Now, the successful scenario is this: the call to socket.gethostbyname succeeds quickly, the IP is in the queue, the process terminates well before its timeout time, and when we get to the if part, the queue will not be empty. We fetch the IP from it, and return it, alongside the process exit code.

In the unsuccessful scenario, the call to socket.gethostbyname takes too long, and the process is killed after its timeout has expired. Because the call failed, no IP has been inserted in the queue, and therefore it will be empty. In the if logic, we therefore set the IP to None, and return as before. The resolve function will find that the exit code is not 0 (as the process didn't terminate happily, but was killed instead), and will correctly return the hostname instead of the IP, which we couldn't get anyway.

In the source code of the book, in the hostres folder of this chapter, I have added some tests to make sure this behavior is actually correct. You can find instructions on how to run them in the README.md file in the folder. Make sure you check the test code too, it should be quite interesting.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.28.126