Handling failure

All sorts of exceptions can happen while executing a Celery task. In the absence of a well-defined exception handling and retry mechanism, they can go undetected. Often, a job failure is temporary, such as an unresponsive API (which is beyond our control) or running out of memory. In such cases, it is better to wait and retry the task.

In Celery, you can choose to retry automatically or manually. Celery makes it easy to fine-tune its automatic retry mechanism. In the following example, we specify multiple retry parameters:

@shared_task(autoretry_for=(GatewayError,), 
             retry_backoff=60, 
             retry_kwargs={'max_retries': 5}, 
             retry_jitter=True) 
def fetch_feed(feed_id): 
    ...

The autoretry_for argument lists all the exceptions for which Celery should automatically retry. In this case, it is just the GatewayError exception. You may also mention the exception base class here to autoretry_for all exceptions.

The retry_backoff argument specifies the initial wait period before the first retry, that is, 60 seconds. Each time a retry fails, the waiting period gets doubled, so the waiting period becomes 120, 240, and 360 seconds, until the maximum retry limit of 5 is reached.

This technique of waiting longer and longer for a retry is called exponential backoff. This is ideal for interacting with an external server as we are giving it sufficient time to recover in case of a server overload.

A random jitter is added to avoid the problem of thundering herds. If a large number of tasks have the same retry pattern and request a resource at the same time, it might make it unusable.

Hence, a random number is added to the waiting period so that such collisions do not occur.

Here's an example of manually retrying in case of an exception:

@shared_task(bind=True) 
def fetch_feed(self, feed_id): 
    ... 
    try: 
        ... 
    except (GatewayError) as exc: 
        raise self.retry(exc=exc)

Note the bind argument to the task decorator and a new self argument to the task, which will be the task instance. If an exception occurs, you can call the self.retry method to attempt a retry manually. The exc argument is used to pass the exception information that can be used in logs.

Last but not least, ensure that you log all your exceptions. You can use the standard Python logging module or the print function (which will be redirected to logs) for this. Use a tool such as Sentry to track and automate error handling.

Table of Contents for Handling failure

Create new playlist

Sign In

Sign Up

Table of Contents for
Handling failure