Using the S3P API to copy 1.3M of 5M of AWS S3 keys

This week we had to exfil some data out of a bucket with 5M+ of keys. After doing some calculations and testing with a Bash script that used AWS CLI, we went a more performant route and used S3P and a small Node.js script. S3P claims to be 5-50 times faster than AWS CLI 😊.

Big shout out to Kunal, Vincent and Ionut for participating on the project.

Input file

Our input file is a simple text file that contains all case IDs that should be copied. Fortunately, all the data of a case is stored under an S3 key that begins with its case ID: {Case ID}/{Entity}/{Entity Key}. The file looks like this:


It contains 411K+ of case IDs.


In order for us to copy (or sync) the data, we need to inspect every key in our source bucket. Here is where S3P shines: it has a fancy listing algorithm that uses massive parallel workers to retrieve pages with keys.

We'll inspect every key of the bucket to see if it starts with a case id that is in the file. S3P will do the rest: balance list and copy actions.


Let's turn the file into a Set and use it as a filter on every key:

import fs from "fs"
import s3p from "s3p"

const { cp, sync } = s3p

let dataFilePath = "./keys.txt"

let sourceBucket = `{source-bucket}`
let destinationBucket = `{destination-bucket}`

let operation = sync // or cp
let destinationPrefix = "cases/"

let keys = new Set(
    .map(x => x.trim())
    .filter(x => x != "")

  bucket: sourceBucket,
  toBucket: destinationBucket,
  addPrefix: destinationPrefix,
  filter: ({ Key }) => {
    let k = Key.split("/")[0]
    return keys.has(k)

I love that S3P can be used as a CLI tool (by running npx s3p) and as an API. When using the API, you can use a filter that is a bit more complex, as shown here.

How to run it?

Unfortunately, I haven't found a way to directly run this script using node index.mjs, because it has a dependency — so you'll need to create a small Node.js application. Fortunately, it is pretty easy to do: first you create a new directory and save the code to an index.mjs file. Next, you'll need to create a file named package.json and paste the following in there:

  "type": "module",
  "scripts": {
    "start": "node index.mjs"
  "dependencies": {
    "s3p": "latest"

Now you can execute the script like this:

# only need to run this once to install s3p
$ npm install
# start the script:
$ npm start

Easy peasy 🤓.


I like the fact that S3P helps you to understand what it can do. It ships a nice bit of help documentation with the CLI:

$ npx s3p cp --help

It shows:

Screenshot of the npx s3p cp --help command showing help documentation.
Screenshot of the npx s3p cp --help command. If you scroll down, they even provide you a nice list of examples.

The CLI can even help you write your code:

$ npx s3p cp --bucket bucket-a --to-bucket bucket-b --add-prefix cases/ --api-example 
   bucket: "bucket-a",  
   toBucket: "bucket-b",
   addPrefix: "cases/"  
// > Promise

That's service!


On my local machine and home Wi-Fi, I got a listing performance of 5-16K keys per second. This performance can be further increased by running the script from an EC2 machine. I think the number of copies influences the speed -- if I need to copy less keys, my avg. items/second increases greatly.

The S3P API will echo its status every second to your screen, so it is easy to track what it is doing:

VSCode Terminal screen with s3p copy output showing a per second overview of the copy progress.
Screenshot of s3p output. The number of list workers are throttled to speed up copying.


2022-05-23: Added the transfer speed improvement hypothesis to the Performance section after running the script for a big bucket (5M+) with only 3K matches. The list performance went from 5K items p/s to 15K 😱.