AWS

Amazon Web Services (AWS) is a subsidiary of Amazon that provides on-demand cloud computing platforms to individuals, companies, and governments, on a metered pay-as-you-go basis. In aggregate, these cloud computing web services provide a set of primitive abstract technical infrastructure and distributed computing building blocks and tools.

There are 9 articles tagged with AWS.

Amazon S3, bash

Harmonizing Team tags on AWS S3 Buckets

At Wehkamp we use many – many – buckets! To do FinOps correctly, it is important we’re able to determine which teams own which buckets. In this article I’ll discuss how to detect Team tags that are not correct and apply the correct ones. We’re using a combination of Bash, AWS CLI, CSV and JQ.

Read article

AWS, PowerShell

Share AWS Vault session with Bash (WSL)

I’m on Windows and I use AWS Vault to connect to AWS using an MFA token. It works wonderfully, unless you need to execute some Bash scripts. I love using Bash on Windows, as WSL makes it really easy to write my scripts. But, alas, the AWS environment variables set by AWS Vault are not […]

Read article

Amazon S3, Automation, Node.js

Using the S3P API to copy 1.3M of 5M of AWS S3 keys

This week we had to exfil some data out of a bucket with 5M+ of keys. After doing some calculations and testing with a Bash script that used AWS cli, we decided to go a more performant route and use s3p. They claim to be 5-50 times faster than AWS cli 😊.

Read article

AWS, SQL

ALB access logs + Athena: identify target problems

It is pretty easy to write your AWS ALB access logs to S3, but if you want to do something with them, you might want to add them to AWS Athena, so you could query them using plain old SQL. Let’s investigate how we can see which upstreams / targets are misbehaving.

Read article

AWS, Installation Notes, Windows

Thoughts on AWS Vault config on Windows

I’ve installed and configured AWS Vault in Windows. In this blog I’ll show how to setup MFA, automate token rotation, share AWS Vault with Windows Subsystem for Linux (WSL) and how to do an ECR login on Docker.

Read article

Amazon S3, Databricks / Spark

Streaming a Kafka topic in a Delta table on S3 using Spark Structured Streaming

Our data strategy specifies that we should store data on S3 for further processing. Raw S3 data is not the best way of dealing with data on Spark, though. In this blog I’ll show how you can use Spark Structured Streaming to write JSON records of a Kafka topic into a Delta table.

Read article

Amazon S3, Databricks / Spark

Caching resized images on S3 with Databricks

When you are training a machine learning image classification model, you often need to resize the images your dataset into smaller ones. When you retrain your model on new data, you resize the images once more. In this blog I’ll share how S3 can be used to cache the resized images.

Read article

Amazon S3, Automation, AWS, Python

Trigger Lambda for large S3 Bucket with SQS

At Wehkamp we use AWS Lambda to classify images on S3. The Lambda is triggered when a new image is uploaded to the S3 bucket. Currently we have over 6.400.000 images in the bucket. Now we would like to run the Lambda for all images of the bucket. In this blog I’ll show how we did this with a Python 3.6 script.

Read article

Amazon S3, bash, Python

AWS Lambda Size: PIL+TF+Keras+Numpy?

At Wehkamp we’ve been using machine learning for a while now. We’re training models in Databricks (Spark) and Keras. This produces a Keras file that we use to make the actual predictions. Training is one thing, but getting them to production is quite another!

The main problem we’ve faced was that it was too big to actually fit into a lambda. This blogs shows how we’ve dealt with that problem.

Read article