retry

automatic retries

Retries the step until it succeeds. If you do not set retry, pypyr will not retry the step automatically. When you do set retry, pypyr will retry whatever step it is without you having to do anything else.

The retry iteration counter is retryCounter. You can use this as usual for any context value in a formatting string expression as {retryCounter}.

These are all the available configuration parameters for retry:

retry: # optional. Retry step until it doesn't raise an error.
  max: 1 # max times to retry. integer. Defaults None (infinite).
  sleep: 0 # sleep between retries, in seconds. Decimals allowed. Defaults 0.
  stopOn: ['ValueError', 'MyModule.SevereError'] # Stop retry on these errors. Defaults None (retry all).
  retryOn: ['TimeoutError'] # Only retry these errors. Defaults None (retry all).
  
  # back-off configuration
  backoff: fixed # optional. Default 'fixed'.
  backoffArgs: # optional. User defined args for custom back-off. Default None.
    arg1: value 1
    arg2: value 2
  sleepMax: 123 # optional. Max sleep in seconds. Default None (infinite).
  jrc: 123.45 # optional. Jitter Range Coefficient. Default 0.

All inputs support substitutions, so you can assign values to these properties dynamically at run-time.

If you reach max retry count while the step still errors, pypyr will raise the last error and stop further pipeline processing.

If you set swallow to True, pypyr will swallow the error after max retries exhausts.

retry with sleep

Use sleep to pause in between retries. Sleep is a decimal with the unit seconds. If you do not set sleep, the default wait between retries is 0s.

# ./retry-sleep.yaml
steps:
  - name: pypyr.steps.echo
    in:
      echoMe: BEGIN
  - name: pypyr.steps.assert
    comment: assert will fail until the 3rd retry.
             will sleep 0.5s between retries.
             on 3rd retry, step succeeds, and 
             pypyr proceeds to next step.
    retry:
      max: 4
      sleep: 0.5
    in:
      assert: !py retryCounter == 3
  - name: pypyr.steps.echo
    in:
      echoMe: END. you'll see me only after previous step succeeds.

$ pypyr retry-sleep
BEGIN
retry: ignoring error because retryCounter < max.
AssertionError: assert retryCounter == 3 evaluated to False.
retry: ignoring error because retryCounter < max.
AssertionError: assert retryCounter == 3 evaluated to False.
END. you'll see me only after previous step succeeds.

You can also use a list of numbers for sleep, where each retry will sleep for the duration in seconds of the next value from the list:

- name: pypyr.steps.assert
  comment: assert will fail until the 3rd retry.
           will sleep 2s on 1st retry,
           sleep 4s after 2nd retry,
           on 3rd retry, step succeeds, and 
           pypyr proceeds to next step.
  retry:
    max: 4
    sleep: [2, 4, 6]
  in:
    assert: !py retryCounter == 3

If you want to vary the sleep interval between retries based on a formula you can use different retry backoff algorithms. The list sleep input option is available on the default fixed retry backoff strategy.

infinite retry

Set max to 0 or null to retry indefinitely until the step succeeds.

It’s up to you to interrupt an infinite loop. If you’re using the cli, you can hit CTRL+C to interrupt and exit pypyr.

# ./retry-infinite.yaml
steps:
  - name: pypyr.steps.assert
    description: CTRL+C to exit
    comment: 0 or null means infinite retry.
             since sleep not set, will retry
             immediately with no delay between
             retries.
    in:
      assert: False
    retry:
      max: 0 # 0 or null means indefinitely
  - name: pypyr.steps.echo
    in:
      echoMe: this won't run because the previous step always errors.

$ pypyr retry-infinite
pypyr.steps.assert: CTRL+C to exit
retry: ignoring error because retryCounter < max.
AssertionError: assert False evaluated to False.
retry: ignoring error because retryCounter < max.
AssertionError: assert False evaluated to False
^C

$ echo $?
130

retry entire sequence of steps

You can retry any step, whether it is your own or a built-in pypyr step. If you retry on a call step, you retry an entire sequence of steps inside a step-group. This lets you retry multiple steps as a unit.

Similarly, if you invoke another pipeline from a parent pipeline with pype, you can retry the entire child pipeline if it raises an error.

# ./retry-call
steps:
  - name: pypyr.steps.echo
    in:
      echoMe: begin
  - name: pypyr.steps.call
    comment: all of retry_me will retry 3X if it returns an error.
    retry:
      max: 3
    in:
      call:
        groups: retry_me
  - name: pypyr.steps.echo
    in:
      echoMe: you won't see me, because the previous step always errs.

retry_me:
    - name: pypyr.steps.echo
      in:
        echoMe: running retry {retryCounter}
    - name: pypyr.steps.assert
      in:
        assert: False
    - name: pypyr.steps.echo
      in:
        echoMe: you won't see me, because the previous step always errs.

This will result in:

$ pypyr retry-call
begin
running retry 1
Error while running step pypyr.steps.assert at pipeline yaml line: 20, col: 7
retry: ignoring error because retryCounter < max
AssertionError: assert False evaluated to False.
running retry 2
Error while running step pypyr.steps.assert at pipeline yaml line: 20, col: 7
retry: ignoring error because retryCounter < max
AssertionError: assert False evaluated to False.
running retry 3
Error while running step pypyr.steps.assert at pipeline yaml line: 20, col: 7
Error while running step pypyr.steps.call at pipeline yaml line: 6, col: 5
Something went wrong. Will now try to run on_failure.

AssertionError: assert False evaluated to False.

backoff algorithms

You can increase (or decrease) the waiting times between retries by using different retry back-off strategies & apply a random jitter to the calculated sleep interval to reduce contention.

retry:
  max: 1 # max times to retry. integer. Defaults None (infinite).
  sleep: 0 # sleep between retries, in seconds. Decimals allowed. Defaults 0.
  
  # back-off configuration
  backoff: <<backoff-algorithm-name-here>>
  backoffArgs: # optional. User defined args for custom back-off. Default None.
    arg1: value 1
    arg2: value 2
  sleepMax: 123 # optional. Max sleep in seconds. Default None (infinite).
  jrc: 123.45 # optional. Jitter Range Coefficient. Default 0.

pypyr has the obvious common backoff retry strategies built in, but you can also code your own custom retry back-off algorithm without too much fuss.

n is the iteration counter.

backoff	formula
fixed	`sleep` (single or list)
jitter	`random_between(jrc*sleep, sleep)`
linear	`n*sleep`
linearjitter	`random_between(jrc*linear(n), linear(n)`
exponential	`(base*n)sleep`
exponentialjitter	`random_between(jrc*exponential(n), exponential(n))`

If you don’t explicitly specify a backoff strategy, pypyr will default to fixed. You can also change the default backoff strategy in config.

For all jitter strategies, the resulting sleep value is a random number between the calculated interval, and the product of the calculated interval and the jrc (jitter range coefficient). If you do not set jrc, by default it is 0. This means that by default jitter algorithms will pick a random value anywhere between 0 and the calculated interval. If you do not want your random range to retry too soon by picking a number too close to 0, you can use the jrc to control the lower bound - for example, given a 10 second calculated sleep interval:

jrc of 0 (the default) will randomize between 0 and 10.
jrc of 0.2 will randomize between 2 and 10

If you want to restrict the growth of the sleep interval, set sleepMax to the maximum sleep interval you want to allow in seconds. This is handy especially for exponential growth, which could well get too large too soon for optimal throughput. If you do not set sleepMax, by default the sleep interval will grow infinitely. (okay, strictly speaking until Python runs out of numbers, but if you’re bumping into this limit you got other probs.)

All of the builtin backoff strategies restrict the sleep interval to less than sleepMax, except in the case of the jitter style backoff strategies, where jrc > 1 can potentially result in an interval exceeding sleepMax. This is because pypyr calculates jitter AFTER it applies the sleepMax limit to the calculated value. This allows you to use the jrc to jitter the retry sleep interval over a range higher or lower than the calculated value.

fixed

The default backoff strategy uses fixed, or static, values. pypyr sleeps for a fixed interval between retries with no calculation to increment or decrement that interval.

You can use two different types of input for sleep with the fixed backoff strategy:

single fixed interval

In this example pypyr will retry the command 3 times, each time sleeping for the same fixed interval.

- name: pypyr.steps.cmd
  retry:
    max: 3
    sleep: 1.5
  in:
    cmd: curl xyz://manifestly-wrong-url

sleep is a single float number.
Each retry iteration will sleep for this same number in seconds.

You do not need to set backoff if you want to use fixed, because it is the default if you don’t specify anything. You can if you want to, though:

- name: pypyr.steps.cmd
  retry:
    backoff: fixed
    max: 3
    sleep: 1.5
  in:
    cmd: curl xyz://manifestly-wrong-url

fixed list of sleep intervals

In this example pypyr will retry the command 5 times, each time sleeping for the next value in the list:

- name: pypyr.steps.cmd
  retry:
    max: 5
    sleep: [0, 1, 1.5, 2, 3, 5]
  in:
    cmd: curl httpz://manifestly-wrong-url

sleep is a list of floats/numbers.
Each retry interval will sleep for the next number from the list
When the input list runs out of numbers, pypyr will keep on re-using the last number on the list for subsequent iterations.

jitter

Jitter adds randomization on top of fixed.

- name: pypyr.steps.cmd
  comment: retry 3X, sleep on a random interval between 0-1.5s each try.
  retry:
    backoff: jitter
    max: 3
    sleep: 1.5
  in:
    cmd: curl xyz://manifestly-wrong-url

Since jitter extends fixed, you can also use the fixed list style input for sleep:

  - name: pypyr.steps.cmd
    comment: retry 7X, sleep on a random interval between 0-n each try,
             where n is the next number from the sleep list.
             n is, in turn - 2, 4, 6, 8, 8, 8
    retry:
      backoff: jitter
      max: 7
      sleep: [2, 4, 6, 8]
    in:
      cmd: curl xyz://manifestly-wrong-url

Be default jitter picks a random decimal number between 0 and the value for sleep for that retry iteration. If you want to narrow this randomization range, to prevent retries happening too close to each other if random ends up giving a number very close to 0, you can use the Jitter Range Coefficient (jrc).

Simply put: pypyr will pause for a random number between jrc*sleep and sleep.

- name: pypyr.steps.cmd
  comment: retry 3X, sleep randomly between 2-10s on each retry.
  retry:
    backoff: jitter
    max: 3
    sleep: 10
    jrc: 0.2
  in:
    cmd: curl xyz://manifestly-wrong-url

When you use jrc with a list style input, the jrc*sleep applies to the sleep interval for the current iteration:

- name: pypyr.steps.cmd
  comment: retry 7X, sleep randomly between 0.1*sleep and sleep,
           where sleep is the next value from the list for each retry.
  retry:
    backoff: jitter
    max: 7
    sleep: [2, 4, 6, 8]
    jrc: 0.5
  in:
    cmd: curl httpz://manifestly-wrong-url

In this case the sleep interval (in seconds) for each retry in turn jitters between:

retry #1: 1 -> 2
retry #2: 2 -> 4
retry #3: 3 -> 6
retry #4: 4 -> 8
retry #5: 4 -> 8
retry #6: 4 -> 8

Notice that when the list runs out of numbers, pypyr re-uses the last number in the list for the subsequent sleep intervals.

linear

The linear backoff algorithm sleeps for sleep * n in between retries, where n is the iteration counter.

- name: pypyr.steps.cmd
  comment: retry 5X, at interval (sleep * n)
           retry wait time is, in turn - 2, 4, 6, 8.
  retry:
    backoff: linear
    max: 5
    sleep: 2
  in:
    cmd: curl httpz://manifestly-wrong-url

restrict linear backoff growth

You can restrict the growth of the effective sleep interval with sleepMax:

- name: pypyr.steps.cmd
  comment: retry 5X, with interval between retries = (sleep * n)
           retry wait time is, in turn - 3, 6, 8.5, 8.5
  retry:
    backoff: linear
    max: 5
    sleep: 3
    sleepMax: 8.5
  in:
    cmd: curl httpz://manifestly-wrong-url

linear jitter

Jitter on top of the linear backoff algorithm. Each retry pauses for a random interval between jrc*linear(n) and linear(n), where n is the iteration counter.

- name: pypyr.steps.cmd
  comment: retry 5X, sleep randomly between 0 and (sleep * n) on each retry
  retry:
    backoff: linearjitter
    max: 5
    sleep: 4
  in:
    cmd: curl httpz://manifestly-wrong-url

In this case the sleep interval (in seconds) for each retry in turn jitters between:

retry #1: 0 -> 4
retry #2: 0 -> 8
retry #3: 0 -> 12
retry #4: 0 -> 16

You can combine linearjitter with sleepMax to restrict the growth of the effective sleep interval. Use jrc to change the jitter range. You can use both sleepMax and jrc, just one or the other, or neither.

- name: pypyr.steps.cmd
  comment: retry 6X, sleep randomly between 0 and (sleep * n) on each retry.
           do not go beyond 14s sleep between retries.
           each sleep interval jitters between (0.5*(5*n)) and (5*n).
  retry:
    backoff: linearjitter
    max: 6
    sleep: 5
    sleepMax: 14
    jrc: 0.5
  in:
    cmd: curl httpz://manifestly-wrong-url

In this case the sleep interval (in seconds) for each retry in turn jitters between:

retry #1: 2.5 -> 5
retry #2: 5 -> 10
retry #3: 7 -> 14
retry #4: 7 -> 14
retry #5: 7 -> 14

exponential

Retry with an exponential backoff on the sleep coefficient, increasing the wait time between retries exponentially as given by the formula (base**n)*sleep, where n is the iteration counter.

By default the exponential base is 2.

- name: pypyr.steps.cmd
  comment: retry 5X, with interval between retries growing base 2 exponential
            sleep in seconds between retries, in turn, is - 2, 4, 8, 16. 
  retry:
    backoff: exponential
    max: 5
    sleep: 1
  in:
    cmd: curl httpz://manifestly-wrong-url

restrict exponential backoff growth

Use sleepMax to restrict the effective maximum retry sleep interval. This lets you configure your retries to achieve optimal throughput, because often a pure exponential will result in too long backoff intervals too soon.

- name: pypyr.steps.cmd
  comment: retry 5X, with interval between retries (2**n)*2.5).
            sleep in seconds between retries, in turn, is - 5, 10, 20, 21.
            do not allow sleep more than 21s.
  retry:
    backoff: exponential
    max: 5
    sleep: 2.5
    sleepMax: 21
  in:
    cmd: curl httpz://manifestly-wrong-url

change exponent base

By default the exponent base is 2. You can change the base for exponentiation using the base input argument:

- name: pypyr.steps.cmd
  comment: retry 5X, with interval between retries (3**n)*sleep).
            do not allow sleep more than 41s.
            sleep in seconds between retries, in turn, is - 3, 9, 27, 41.
  retry:
    backoff: exponential
    max: 5
    sleep: 1
    sleepMax: 41
    backoffArgs:
      base: 3
  in:
    cmd: curl httpz://manifestly-wrong-url

exponential jitter

Use the ready-made exponential backoff with randomized jitter on any retrying command or step without having to write code.

This adds jitter on top of the exponential backoff strategy, so all the inputs to the underlying exponential such as sleepMax and base are available to exponentialjitter also.

For each retry iteration, pypyr will, in order:

calculate the [raw exponential result] with formula (base**n)*sleep, where n is the iteration counter
proceed with the lesser of sleepMax or the [raw exponential result]. Call this the [capped result]
jitter by picking a random interval between jrc*[capped result] and the [capped result].

- name: pypyr.steps.cmd
  comment: retry 5X.
           sleep randomly between 0 and ((2**n)*sleep)) on each retry.
  retry:
    backoff: exponentialjitter
    max: 5
    sleep: 1
  in:
    cmd: curl httpz://manifestly-wrong-url

You can use the Jitter Range Coefficient jrc to narrow the randomization range and prevent intervals too close to 0:

- name: pypyr.steps.cmd
  comment: retry 5X.
            sleep randomly between 0 and (2**n)*sleep) on each retry.
  retry:
    backoff: exponentialjitter
    max: 5
    sleep: 2
    jrc: 0.5
  in:
    cmd: curl httpz://manifestly-wrong-url

In this case the sleep interval (in seconds) for each retry in turn jitters between:

retry #1: 2 -> 4
retry #2: 4 -> 8
retry #3: 8 -> 16
retry #4: 16 -> 32

Use a jrc greater than 1 to jitter over a range higher than the capped result.

only retry specific error types

When you set neither stopOn nor retryOn, all types of errors will retry. This is the default behavior.

If you specify stopOn, errors listed in stopOn will stop retry processing and raise an error. Errors not listed in stopOn will retry.

If you specify retryOn, ONLY errors listed in retryOn will retry.

max evaluates before stopOn and retryOn.

stopOn supersedes retryOn.

For builtin python errors, specify the bare error name for stopOn and retryOn, e.g ValueError, KeyError.

For all other errors, use module.errorname, e.g mypackage.mymodule.myerror

only retry on specific errors

# ./retry-retryon.yaml
steps:
  - name: pypyr.steps.py
    description: this step will retry infinitely until the cmd does not return 
                 an error unless an unexpected error that is NOT in retryOn 
                 occurs.
    comment: retries 1 & 2 raise ValueError, 
             retry 3 raise KeyError.
             KeyError is not in retryOn, so will quit step reporting failure.
    retry:
      # will ONLY retry these errors. All other stop processing.
      retryOn: ['KeyError', 'pypyr.errors.ContextError']
    in:
      py: |
        if retryCounter == 3:
          raise ValueError("this won't retry!")
        else:
          raise KeyError('this will retry')        
  - name: pypyr.steps.echo
    in:
      echoMe: this won't run because the previous step always errors.

This will result in:

 $ pypyr retry-retryon
this step will retry infinitely until the cmd does not return an error.
retry: ignoring error because retryCounter < max.
KeyError: 'this will retry'
retry: ignoring error because retryCounter < max.
KeyError: 'this will retry'
ValueError not in retryOn. Raising error and exiting retry.
Error while running step pypyr.steps.py at pipeline yaml line: 4, col: 7
Something went wrong. Will now try to run on_failure.

ValueError: Booom!

$ echo $?
255

stop processing on specific error type

# pypyr retry-stopon
steps:
  - name: pypyr.steps.py
    description: this step will retry infinitely until the cmd does not return 
                 an error or until a StopOn error happens.
    retry:
      # will stop retry on these errors. All others will carry on retry processing.
      stopOn: ['ValueError', 'pypyr.errors.ContextError']
    in:
      py: |
        if retryCounter == 3:
          raise ValueError('this will STOP processing!')
        else:
          raise KeyError('this will retry')        
  - name: pypyr.steps.echo
    in:
      echoMe: this won't run because the previous step always errors.

This will result in:

$ pypyr retry-stopon
this step will retry infinitely until the cmd does not return an error.
retry: ignoring error because retryCounter < max.
KeyError: 'this will retry'
retry: ignoring error because retryCounter < max.
KeyError: 'this will retry'
ValueError in stopOn. Raising error and exiting retry.
Error while running step pypyr.steps.py at pipeline yaml line: 4, col: 7
Something went wrong. Will now try to run on_failure.

ValueError: this will STOP processing!

common & dynamic retry configuration

You can use the same retry configuration for multiple steps, or set retry configuration dynamically based on run-time values, or do both at the same time.

format expressions

Use pypyr format substitutions to assign values to retry dynamically based on run-time parameters:

- name: pypyr.steps.set
  in:
    set:
      my_sleep: [2, 4, 8]
      my_max: 3
      my_sleep_max: 6

- name: pypyr.steps.assert
  retry:
    sleep: '{my_sleep}'
    max: '{my_max}'
    sleepMax: '{my_sleep_max}'
  in:
    assert: !py retryCounter == 3

yaml anchors & references

You can create a yaml anchor with an & for a retry strategy and then re-use that in multiple steps with a yaml reference with a *:

In this example, both the steps will use the same retry configuration as given in &commonRetryWithFixedList.

common:
  retry1: &commonRetryWithFixedList
    sleep: [2, 4, 8]
    max: 3
    sleepMax: 6

steps:
  - name: pypyr.steps.assert
    retry: *commonRetryWithFixedList
    in:
      assert: !py retryCounter == 3

  - name: pypyr.steps.assert
    retry: *commonRetryWithFixedList
    in:
      assert: !py retryCounter == 2

You can create and reference multiple different retry strategies in the same pipeline this way.

combine yaml anchors & format expressions

You can share common retry configuration and still use format substitution expressions to configure those values dynamically at run-time.

In this example, the first set step sets the values that the shared retry configuration under &commonRetryWithSubstitutions will substitute at runtime:

common:
  retry1: &commonRetryFixedList
    sleep: [2, 4, 8]
    max: 3
    sleepMax: 6

  retry2: &commonRetryWithSubstitutions
    sleep: '{base3_exponential_retry[sleep]}'
    max: '{base3_exponential_retry[max]}'
    sleepMax: '{base3_exponential_retry[sleepMax]}'
    jrc: '{base3_exponential_retry[jrc]}'
    backoff: '{base3_exponential_retry[backoff]}'
    backoffArgs:
      base: '{base3_exponential_retry[base]}'

steps:
  - name: pypyr.steps.set
    in:
      set:
        base3_exponential_retry:
          sleep: 1
          max: 3
          sleepMax: 8.5
          jrc: 0.5
          backoff: exponentialjitter
          base: 3

  - name: pypyr.steps.assert
    retry: *commonRetryWithSubstitutions
    in:
      assert: !py retryCounter == 3

  - name: pypyr.steps.contextmerge
    comment: change sleep for next retry.
             all other retry values stay the same.
    in:
      contextMerge:
        base3_exponential_retry:
          sleep: 2

  - name: pypyr.steps.assert
    retry: *commonRetryWithSubstitutions
    in:
      assert: !py retryCounter == 2

If you change any of the values under base3_exponential_retry subsequent retries will use the new updated values, so you can parameterize the same base retry configuration with different values for different steps in the same pipeline.

retry permalink

automatic retries permalink

retry with sleep permalink

infinite retry permalink

retry entire sequence of steps permalink

backoff algorithms permalink

fixed permalink

single fixed interval permalink

fixed list of sleep intervals permalink

jitter permalink

linear permalink

restrict linear backoff growth permalink

linear jitter permalink

exponential permalink

restrict exponential backoff growth permalink

change exponent base permalink

exponential jitter permalink

only retry specific error types permalink

only retry on specific errors permalink

stop processing on specific error type permalink

common & dynamic retry configuration permalink

format expressions permalink

yaml anchors & references permalink

combine yaml anchors & format expressions permalink

see also

retry

automatic retries

retry with sleep

infinite retry

retry entire sequence of steps

backoff algorithms

fixed

single fixed interval

fixed list of sleep intervals

jitter

linear

restrict linear backoff growth

linear jitter

exponential

restrict exponential backoff growth

change exponent base

exponential jitter

only retry specific error types

only retry on specific errors

stop processing on specific error type

common & dynamic retry configuration

format expressions

yaml anchors & references

combine yaml anchors & format expressions