Retries and timeouts

Linkerd implements intelligent retries automatically to handle application failures gracefully. However, incorrect automatic retries can also aggravate the problem due to a retry storm (amplifying the retry to a service that is already either overloaded, experiencing backpressure, or sending a negative acknowledgment).

Linkerd solves this by limiting risks by using retry budgets rather than a fixed number of retries. If the retry budget is defined at 10%, only 10% more requests may be added to avoid an indefinite retry amount, which can lead to a retry storm. The retry budget and timeouts can be specified through a service profile that's been created for specific routes.

A service profile for a service/specific path can be defined for retries as follows for /api/annotations, which has a configuration to turn on retries through isRetryable: true:

...
spec:
  routes:
  - name: GET /api/annotations
    condition:
      method: GET
      pathRegex: /api/annotations
    isRetryable: true
...

The retry budget can be defined in the service profile by using the ratio as a percentage and by setting the time-to-live (ttl) parameter. The following specification is for a retry budget of 20% retries with a minimum of 10 retries per second. This retry attempt will not last for more than 15 seconds:

spec:
  retryBudget:
    retryRatio: 0.2
    minRetriesPerSecond: 10
    ttl: 15s

Timeouts can be configured through the service profile, as shown in the following example:

...
spec:
  routes:
  - condition:
      method: HEAD
      pathRegex: /authors/[^/]*.json
    name: HEAD /authors/{id}.json
    timeout: 300ms
...

The preceding specification defines a maximum of 300 milliseconds of wait time before the Linkerd proxy cancels the request and returns a 504 code for paths starting with /authors and ending with .json.

Table of Contents for Retries and timeouts

Create new playlist

Sign In

Sign Up

Table of Contents for
Retries and timeouts