How we switched 150 stores from Redis to Valkey, and nobody cared!

Ever since we started our move from the on-premises data center to the cloud, delivery teams haven't had to worry about their infrastructure. We've built a platform that allows delivery teams to focus on solving business problems. At Wehkamp, teams don't have to worry about Kubernetes, network infrastructure, or writing Terraform code to provision data stores. They don't even need to know about AWS! Team infrastructure is easy to provision through our Slack bot. On the platform team we say: we've been running serverless since 2013!

Having such a platform comes with a peculiar downside: when teams don't have to care about infrastructure, they usually... don't!

But what if we want to migrate our AWS ElastiCache Redis stores to Valkey? In this article we'll explain how we've switched 150+ AWS ElastiCache stores from Redis to Valkey.

A big shout-out to Klaas Talsma and Chris Vahl for working together on this story.

We've started this migration at the beginning of 2026, so this article was long overdue.

Why?

Back in 2024, Redis switched to another license, and that got developers around the world thinking about a fork. Valkey was born in no time! AWS was heavily involved from the start. Honestly, the results were impressive. And they didn't stop there, as Valkey 8.1 was released with even more power.

When we read it was 20% cheaper to run Valkey than Redis — with even better performance — the decision was a no-brainer. So, we threw together some slides, gave a Tech Talk, and started to work on adding Valkey to the platform.

And this was only the beginning, as these slides are from early 2025.

To sum up: it is faster && cheaper => let's migrate.

First things first

Whenever we introduce new technology, we try to work according a two-step priniple:

Stop the bleeding: make sure newly provisioned infrastructure uses the new technology by default.
Migration: present a migration path and let teams move at their own pace.

Our delivery teams don't have to write Terraform code (they can if they want to); we provide a Slack bot so teams can use guides to provision data sources. We swapped out the provisioning of Redis resources for that of Valkey. In the beginning, we even kept the Redis icon, but over time that got swapped as well, as the name caused some confusion.

The Provisioner provides an easy dialog for Valkey provisioning. In an earlier version you could select Redis or Valkey engines.

Our Provisioner provisions Valkey according to Tech Hub standards, including TLS, authentication, and multi-AZ. It generates Terraform code, commits to the IAC Git repository, creates a PR, and watches it roll out to production, all while keeping the user informed on Slack.

Next step: tell the delivery teams

Again, we made some slides:

We made TLS mandatory for our new Valkey setups, which might need an application change.

The usual suspects moved right away, but the bulk of the teams did not migrate 😱. So, we talked to the teams to find the main blocker:

Some older services did not use AUTH or a TLS connection, and the teams did not want to refactor those applications to include both features.
Some teams still had their service on a single fascia environment, so they needed to move to a multi-fascia environment before they could benefit from the new setup.
Some teams had very busy backlogs, so they couldn't prioritize it.

Now what?

What could we, as a platform team, do? If we wanted to reap the performance and financial benefits, we needed to make an in-place upgrade possible (Valkey is meant as a drop-in replacement).

However, we hit a technical hurdle: our latest infrastructure standards mandate encryption and TLS, but applying those specific enhancements to existing environments would have made an in-place engine upgrade impossible.

Fortunately there is a pragmatic solution: we did a rewrite of an older version of the Terraform module.

By testing this extensively on clusters with at least two instances, we confirmed we could swap the engine under the hood with zero downtime. The result was a process so streamlined that teams could migrate to Valkey by updating just a few lines of code:

A New Year's Resolution

The next step was to talk to Koen Roumen, Head of Technology for the Store teams. We explained that there was no reason not to move, and we came up with a plan: either the teams move before a certain date, or we would make the move for them. Maybe it was the optimism that comes with starting a new year, or maybe the stars aligned, but we all agreed on a pretty tight timeline.

Again, some teams woke up and moved, but most teams were like: if you're sure there is no impact, go for it! And so we did. On the morning of the 27th, we migrated the remaining 47 Redis instances for 6 delivery teams across 3 AWS development environments to Valkey. It went incredibly smooth. Yes, it took AWS a while to provision the stores (no idea why), but by 9 o'clock, we were done. Our delivery teams did not report any problems with the result.

Production was scheduled for a week later. As dev went so smooth, we dropped our requirement to send a backend engineer to the call, and we decided not to begin at 08:00, but at 09:00. Again: one smooth transition.

Right sizing

Now that the migration to Valkey was behind us, we’ve shifted our focus from "making it work" to "making it right". Because our in-place migration kept the exact same instance types that were previously running Redis, we now find ourselves in a position where much of our fleet is likely over-provisioned. Valkey’s improved performance and smaller footprint mean that the "safety margins" teams originally selected for Redis may now be unnecessary overhead.

To address this, we’ve started a deep-dive analysis leveraging the power of LLMs and the AWS CLI to parse through two weeks of CloudWatch metrics since migration to Valkey. This data-driven approach has already identified significant 'ghost capacity' across our fleet. By identifying these oversized clusters now, we can rightsize our infrastructure before committing to new long-term reservations. It’s a pragmatic approach to cloud spending: optimize the baseline first, so we only pay for the performance we actually use.

In the end, that's the beauty of this setup: 155 instances migrated, zero downtime, and zero complaints. We've switched the engine, saved the money, and—just as we intended—nobody even noticed.

So what did we learn?

A lot!

Valkey is a great replacement of Redis.
Our delivery teams are very busy, and that's great!
That should not stop us as a platform team from moving forward.
With Infrastructure as Code (Terraform) and AWS, we, as the platform team, can still help teams to move forward.
There is even more power ahead in Valkey 9.1!