Read the AWS Archives at KeesTalksTech

Harmonizing Team tags on AWS S3 Buckets

At Wehkamp we use many – many – buckets! To do FinOps correctly, it is important we’re able to determine which teams own which buckets. In this article I’ll discuss how to detect Team tags that are not correct and apply the correct ones. We’re using a combination of Bash, AWS CLI, CSV and JQ.

AWS, aws-vault, PowerShell

Share AWS Vault session with Bash (WSL)

I’m on Windows and I use AWS Vault to connect to AWS using an MFA token. It works wonderfully, unless you need to execute some Bash scripts. I love using Bash on Windows, as WSL makes it really easy to write my scripts. But, alas, the AWS environment variables set by AWS Vault are not […]

Amazon S3, Automation, Node.js, Scripting

Using the S3P API to copy 1.3M of 5M of AWS S3 keys

This week we had to exfil some data out of a bucket with 5M+ of keys. After doing some calculations and testing with a Bash script that used AWS cli, we decided to go a more performant route and use s3p. They claim to be 5-50 times faster than AWS cli 😊.

AWS, AWS Athena, SQL

ALB access logs + Athena: identify target problems

It is pretty easy to write your AWS ALB access logs to S3, but if you want to do something with them, you might want to add them to AWS Athena, so you could query them using plain old SQL. Let’s investigate how we can see which upstreams / targets are misbehaving.

AWS, aws-vault, bash, PowerShell, Windows, WSL

Thoughts on AWS Vault config on Windows

I’ve installed and configured AWS Vault in Windows. In this blog I’ll show how to setup MFA, automate token rotation, share AWS Vault with Windows Subsystem for Linux (WSL) and how to do an ECR login on Docker.

Amazon S3, Databricks / Spark, Kafka, PySpark

Streaming a Kafka topic in a Delta table on S3 using Spark Structured Streaming

Our data strategy specifies that we should store data on S3 for further processing. Raw S3 data is not the best way of dealing with data on Spark, though. In this blog I’ll show how you can use Spark Structured Streaming to write JSON records of a Kafka topic into a Delta table.

Amazon S3, Databricks / Spark, PySpark

Caching resized images on S3 with Databricks

When you are training a machine learning image classification model, you often need to resize the images your dataset into smaller ones. When you retrain your model on new data, you resize the images once more. In this blog I’ll share how S3 can be used to cache the resized images.

Amazon S3, Automation, AWS, Python

Trigger Lambda for large S3 Bucket with SQS

At Wehkamp we use AWS Lambda to classify images on S3. The Lambda is triggered when a new image is uploaded to the S3 bucket. Currently we have over 6.400.000 images in the bucket. Now we would like to run the Lambda for all images of the bucket. In this blog I’ll show how we did this with a Python 3.6 script.

Amazon S3, AWS Lambda, bash, Python

AWS Lambda Size: PIL+TF+Keras+Numpy?

At Wehkamp we’ve been using machine learning for a while now. We’re training models in Databricks (Spark) and Keras. This produces a Keras file that we use to make the actual predictions. Training is one thing, but getting them to production is quite another!

The main problem we’ve faced was that it was too big to actually fit into a lambda. This blogs shows how we’ve dealt with that problem.