Retry pattern with transient failures

A transient failure is a common type of failure in a cloud-based architecture and often this is due to the nature of the network itself (loss of connectivity, timeout on requests, and so on).

A common way to react to these failures is to retry sending the request to the target application. You need to implement mechanisms that permit an automatic retry of requests if a failure occurs, maybe adding an increased delay between each retry attempt. This is commonly known as the Retry pattern.

In the Retry pattern, when there's a failure sending a request to a cloud service, the source application can react to this failure by providing the following actions:

Retry: The source application can immediately retry to send the request to the cloud service. This is common when the failure event is classified as rare and the probability of a success when repeating the request is very high.
Retry after a delay: The source application can retry to send the request to the cloud service after a period of time (that normally increases exponentially). This is a common practice when the failure event is due to reasons such as cloud service busy and so on.
Cancel: The source application can cancel the request to the cloud service and throw (or log) the error. This is a common practice when the failure is not transient and there's a low probability that resending the request will be a success.

The following diagram shows a schema of the Retry pattern:

The Azure client SDK provides automatic retries by default for many different Azure services. For more information, you can check this official link on MSDN: https://docs.microsoft.com/en-us/azure/architecture/best-practices/retry-service-specific.

When implementing this pattern for your application, be careful to fine-tune the retry policy. A retry policy that is too frequent could impact on your application, making it become busy or unresponsive. A way to improve the reliability of this pattern could be to apply the Circuit Breaker pattern that we'll see in one of the following section.

As previously described, it's better to have an exponential backoff strategy for retry than a simple retry (fixed delay). With an exponential backoff strategy, the time between each retry is increased after each retry failure.

Always remember that this pattern is useful for handling transient faults, not for handling faults due to internal application exceptions (this should be done in a different way). Remember also that if your application receives too many faults due to the busy destination, this could be a sign that your cloud service must be scaled up.

An example of how you can implement a retry pattern in C# is as follows (console application):

using System;
using System.Collections.Generic;
using System.Linq;
using System.Net.Http;
using System.Text;
using System.Threading.Tasks;

namespace RetryPattern
{
    class Program
    {
        static void Main(string[] args)
        {
            //Sending a request to a cloud service
            SendRequest();
        }
        static async void SendRequest()
        {
            HttpClient httpClient = new HttpClient();
            var maxRetryAttempts = 3;
            var pauseBetweenFailures = TimeSpan.FromSeconds(2);
            await 
            RetryPattern.RetryOnExceptionAsync<HttpRequestException>  
              (maxRetryAttempts, pauseBetweenFailures, async () =>
                {
                    var response = await httpClient.GetAsync(
                    "https://mycloudservice.com/api/items/1");
                    response.EnsureSuccessStatusCode();
                });
        }
    }
}

The Main function sends requests to a cloud service. If the request fails, the RetryPattern class handles the retry policy.

The class that implements the retry policy is as follows:

public static class RetryPattern
    {
        public static async Task RetryOnExceptionAsync(
            int times, TimeSpan delay, Func<Task> operation)
        {
            await RetryOnExceptionAsync<Exception>(times, delay, 
            operation);
        }
        public static async Task RetryOnExceptionAsync<TException>(
            int times, TimeSpan delay, Func<Task> operation) where 
        TException : Exception
        {
            if (times <= 0)
                throw new ArgumentOutOfRangeException(nameof(times));
            var attempts = 0;
            do
            {
                try
                {
                    attempts++;
                    await operation();
                    break;
                }
                catch (TException ex)
                {
                    if (attempts == times)
                        throw;
                    await CreateDelayForException(times, attempts, 
                    delay, ex);
                }
            } while (true);
        }
        private static Task CreateDelayForException(
            int times, int attempts, TimeSpan delay, Exception ex)
        {
            var _delay = IncreasingDelayInSeconds(attempts);
            return Task.Delay(delay);
        }
        internal static int[] DelayPerAttemptInSeconds =
        {
            //Delay management for retry (exponential)
            (int) TimeSpan.FromSeconds(10).TotalSeconds,
            (int) TimeSpan.FromSeconds(40).TotalSeconds,
            (int) TimeSpan.FromMinutes(2).TotalSeconds,
            (int) TimeSpan.FromMinutes(10).TotalSeconds,
            (int) TimeSpan.FromMinutes(30).TotalSeconds
        };
        static int IncreasingDelayInSeconds(int failedAttempts)
        {
            if (failedAttempts <= 0) throw new 
            ArgumentOutOfRangeException();
            return failedAttempts > DelayPerAttemptInSeconds.Length ? 
            DelayPerAttemptInSeconds.Last() :   
                   DelayPerAttemptInSeconds[failedAttempts];
        }
    }

The RetryPattern class handles the retry of requests by counting the number of retries and using an exponential delay (wait time) before each retry.

An interesting open source framework for implementing the Retry pattern (but also the Circuit Breaker pattern that we'll see in the next sections) is Polly. You can find more documentation here: https://github.com/App-vNext/Polly.

Table of Contents for Retry pattern with transient failures

Create new playlist

Sign In

Sign Up

Table of Contents for
Retry pattern with transient failures