Harmonizing Team tags on AWS S3 Buckets

At Wehkamp we use many - many - buckets! To do FinOps correctly, it is important we're able to determine which teams own which buckets. In this article I'll discuss how to detect Team tags that are not correct and apply the correct ones. We're using a combination of Bash, AWS CLI, CSV and JQ.

Process

Harmonizing the Team tags involves the following phases:

First, we extract the current data from AWS S3 API into a CSV file. Then we change the data and apply it.

So we need to create two scripts: one to extract the tags from the AWS S3 API into a CSV and one to apply the changed CSV.

Get the Team tags from S3

To extract the data from the AWS S3 API, we need to do:

  • Use the AWS CLI to query all the S3 buckets in the account.
  • Check if the Team tag of each bucket is on the allowed_team_tags list.
  • If not, query if the bucket is empty. Write the result to a CSV file.

Let's call the file extract.sh:

#!/bin/bash
set -e

csv_filename="${1:-s3_team_tags.csv}"

# Set the list of allowed team tags
declare -a allowed_team_tags=(
    "apps"
    "brands-recommendations"
    "..."
    "workplace"
    "pathfinders"
)

RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color

# Create CSV file with header row
printf "name,team,empty\n" > "$csv_filename"

# Loop through all S3 buckets in the AWS account
for bucket_name in $(aws s3api list-buckets --query 'Buckets[].Name' --output text); do
    
    printf "Bucket ${YELLOW}%s${NC} " "$bucket_name"
    
    # Get the Team tag for the bucket
    set +e
    team=$(aws s3api get-bucket-tagging --bucket "$bucket_name" --query 'TagSet[?Key==`Team`].Value' --output text)
    set -e
    if [ -z "$team" ]; then
        team="no_team_tag"
    fi
    
    printf "has tag ${YELLOW}%s${NC}: " "$team"
    
    # Check if the bucket has an allowed team tag
    if [[ " ${allowed_team_tags[*]} " == *"$team"* ]]; then
        echo -e "${GREEN}valid${NC}"
        continue
    fi
    
    echo -e "${RED}invalid${NC}"
    
    # Check if the bucket is empty
    if [[ "$(aws s3api list-objects-v2 --bucket "$bucket_name" --max-items 1)" == "" ]]; then
        is_empty="yes"
    else
        is_empty="no"
    fi
    
    # Add the bucket information to the CSV file
    echo "$bucket_name,$team,$is_empty" >> "$csv_filename"
done

# Count CSV lines minus header
lines=$(wc -l < "$csv_filename")
lines=$((lines-1))

echo ""
echo "CSV file $csv_filename has $lines lines"
echo ""

To monitor the progress, the script will output what it is doing.

Apply the Team tags

Now applying the Team tags was not as straightforward as I hoped. Updating the tags of a bucket will destroy any existing tags, so you'll need to correct for that. We use JQ to change existing tags.

Here is the apply.sh:

#!/bin/bash

csv_filename="${1:-s3_team_tags.csv}"

GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color

{
    # skip header
    read -r
    
    # Read the CSV file and iterate over each row
    while IFS=',' read -r bucket team _; do
        
        echo -e "Setting ${YELLOW}$bucket${NC} to ${GREEN}$team${NC}"
        
        # Retrieve the current set of tags for the bucket
        existing_tags=$(aws s3api get-bucket-tagging --bucket "$bucket")
        
        # Check if the existing set of tags is empty
        if [ -z "$existing_tags" ]; then
            # If the set of tags is empty, add the Team tag
            new_tags='{"TagSet": [{"Key": "Team", "Value": "'"$team"'"}]}'
        else
            # If the set of tags is not empty, check if the Team tag is already present
            if echo "$existing_tags" | grep -q '"Key": "Team"'; then
                # If the Team tag is present, update its value
                new_tags=$(echo "$existing_tags" | jq '.TagSet |= map(if .Key == "Team" then .Value = "'"$team"'" else . end)')
            else
                # If the Team tag is not present, add it to the existing set of tags
                new_tags=$(echo "$existing_tags" | jq '.TagSet += [{"Key": "Team", "Value": "'"$team"'"}]')
            fi
        fi
        
        new_tags=$(echo "$new_tags" | tr -d '\n')
        new_tags=$(echo "$new_tags" | sed -E 's/[\n\t ]+/ /g')
        
        aws s3api put-bucket-tagging --bucket "$bucket" --tagging "$new_tags"
        
    done
    
} < "$csv_filename"

Final thoughts

I struggled a bit to get the tags "merged". Hopefully AWS will provide a better API to update single tags. I don't think it should be this hard.

I had some problems running these scripts on Windows combining AWS Vault and Bash / WSL, so I wrote a small blog about it.

Changelog

  • 2023-02-20: removed double checking of tags by directly querying for the Team tag, makes the script a bit faster.
expand_less