At Wehkamp we use Redis a lot. It is fast, available and implemented as a managed AWS service called ElastiCache. Sometimes we need to extract data from Redis, and usually I use the redis-cli to interact from the command-line. But what if you need to get the values of 400k+ keys? What would you do? Is there an effective way to query multiple key/values from Redis?
- Intro
- Only redis-cli+bash is sloooooow
- Python-scripting to the rescue
- Introducing: redis-mass-get cli
- Conclusion
- Improvements
- Comments
Only redis-cli+bash is sloooooow
When you use the red-cli and bash, your script might look a bit like this:
URL=redis://my.redis.url
echo "KEYS product:*" | \
redis-cli -u $URL | \
sed 's/^/GET /' | \
redis-cli -u $URL >> test.txt
For small queries this works like a charm. You don't have the keys for each value that is written to the file, which - depending on your use case - might be okay. The biggest problem I have with the approach is not that it is slow, but that it does not show me any progress. I have no clue when the process is finished!
Python-scripting to the rescue
Let's write a small Python-script that uses KEYS and MGET to write the key and value to a file while showing the progress.
First, we need to install the Python Redis client:
pip install redis
Our script will do the following:
- Get all the keys that match te query.
- Partition the list of keys into lists of 10.000 items.
- Perform an MGET per 10.000 items.
- Zip the keys and values and write it line by line to a file.
- Show the progress.
When we turn it into a script, it looks like this:
#!/usr/bin/env python3
import redis
file='result.txt'
url='redis://my.redis.url'
query='product:*'
print('Reading keys... ', end='')
client = redis.StrictRedis.from_url(url, decode_responses=True)
keys = client.keys(query)
print(f'{len(keys):,} keys found.')
def chunks(lst, n):
for i in range(0, len(lst), n):
yield lst[i:i + n]
partitions = list(chunks(keys, 10000))
with open(file, 'w', newline='\n', encoding='utf-8') as f:
for i in range(0, len(partitions)):
progress = ((i+1)/len(partitions)) * 100
print(f'\rProcessing values... {progress:.2f}%', end='')
keys = partitions[i]
values = client.mget(keys)
for i in zip(keys, values):
f.write(i[0])
f.write('\n')
f.write(i[1])
f.write('\n')
print('\nDone!')
It shows the following progress:
Reading keys... 1,069,715 keys found.
Processing values... 55.14%
But we can do you one better...
Introducing: redis-mass-get cli
The script above generates two lines per key/value. This might not suit your use-case, especially when the value contains new lines as well. Sometimes a CSV or JSON output is better. Based on this Python script I've created the Python redis-mass-query cli which can be installed with:
pip install redis-mass-get
JSON format
It can even parse the JSON value (-jd
) before writing its output to a file, all while showing the progress:
redis-mass-get -d results.json -jd redis://my.redis.url product:*
CSV format
Since working with Spark I see CSV taking flight again. The CLI can also generate a CSV file with a key, value
header:
redis-mass-get -d results.csv redis://my.redis.url product:*
Pipeline CLI commands
Sometimes you want to pipe the output to another program. This example shows how to pipe the key/values in the CSV format, ignoring the CSV header (-och
):
redis-mass-get -f csv -och redis://my.redis.url product:* | less
When no destination is specified, the data will be written to the stdout
.
Conclusion
Querying multiple key/values from Redis is easy using KEYS
and MGET
. If you need to write key/values to a JSON, TXT or CSV file, just use the Python redis-mass-get
CLI. Quick and easy.
Improvements
2020-08-17: added the Pipeline CLI commands section
2020-08-16: Initial article