To make a setup more resilient we should allow for certain actions to be retried before they fail. We should not "hammer" our underlying systems, so it is wise to wait a bit before a retry (exponential backoff). Let's see how we can make a function that implements a "retry and exponential backoff". Note: this only works if actions are idempotent and you can afford to wait.
Backoff & retry
Let's create a function that retries when an exception is raised. I've added typings, if you need something without typings, look here.
import random, time
from typing import TypeVar, Callable
T = TypeVar('T')
def retry_with_backoff(
fn: Callable[[], T],
retries = 5,
backoff_in_seconds = 1) -> T:
x = 0
while True:
try:
return fn()
except:
if x == retries:
raise
sleep = (backoff_in_seconds * 2 ** x +
random.uniform(0, 1))
time.sleep(sleep)
x += 1
De default number of retries is 5.
Exponential backoff
So what is exponential backoff? Wikipedia says:
In a variety of computer networks, binary exponential backoff or truncated binary exponential backoff refers to an algorithm used to space out repeated retransmissions of the same block of data, often to avoid network congestion.
Wikipedia
I've ended up implementing the algorithm specified by Google Cloud IOT Docs: Implementing exponential backoff. The default backoff time is 1 second. So when the function call fails, it will retry 5 times: after +1, +2, +4, +8 and +16 seconds. If the call still fails, the error will be raised.
Visual example
To test our resilient setup, we need a function that sometimes throws an exception. In the following example we're increasing the value of global i
. If it less than 4 or uneven, we throw an example. Our retry_with_backoff
will retry and backoff:
import random, time
from datetime import datetime
def tprint(msg):
timestamp=datetime.now().strftime('%M:%S.%f')[:-3]
print(timestamp, msg)
def retry_with_backoff(fn, retries=5, backoff_in_seconds=1):
x = 0
while True:
try:
return fn()
except:
if x == retries:
tprint("raise")
raise
sleep = (backoff_in_seconds * 2**x + random.uniform(0, 1))
tprint(f"sleep: {sleep}")
time.sleep(sleep)
x += 1
i = 0
def f() -> int:
global i
i = i + 1
tprint(f"i={i}")
if i < 4 or i % 2 != 0:
raise Exception("Invalid number.")
tprint("ready")
return i
tprint(f"i={i}")
print()
tprint("Starting first test, should sleep 3 times:")
x = retry_with_backoff(f)
print()
tprint("Starting second test, should sleep 1 time:")
x = retry_with_backoff(f)
i = 0
print()
tprint(f"i={i}")
print()
tprint("Starting third test, should crash after 2 retries:")
x = retry_with_backoff(f, retries=2)
Here we can see the code in action:
A decorator?
You can also implement this mechanism as a decorator. The code for the decorator looks like this:
def retry_with_backoff(retries = 5, backoff_in_seconds = 1):
def rwb(f):
def wrapper(*args, **kwargs):
x = 0
while True:
try:
return f(*args, **kwargs)
except:
if x == retries:
raise
sleep = (backoff_in_seconds * 2 ** x +
random.uniform(0, 1))
time.sleep(sleep)
x += 1
return wrapper
return rwb
You can implement the decorator like this:
@retry_with_backoff(retries=6)
def f() -> int:
global i
i = i + 1
print(" i :", i);
if i < 6 or i % 2 != 0:
raise Exception("Invalid number.")
return i
I'm not 100% sure if the decorator is the best solution. The main advantage is that you tie the mechanism to your function, so your caller does not need to implement it. But that is also its weakness, your caller cannot influence the defaults you've set. It heavily depends on your use case if you want to use a decorator.
Conclusion
You see: it is not so hard to implement retry and exponential backoff in Python. It will make your setup way more resilient!
Without typings
If you're not a fan of typings or need something small and simple, you can use this code:
import random, time
def retry_with_backoff(fn, retries = 5, backoff_in_seconds = 1):
x = 0
while True:
try:
return fn()
except:
if x == retries:
raise
sleep = (backoff_in_seconds * 2 ** x +
random.uniform(0, 1))
time.sleep(sleep)
x += 1
Changelog
- 2022-09-13 Improved the visual demonstration with a clearer example.
- 2022-09-10 Added demo repl as an illustration.
- 2021-03-11 Initial article.