Archive of 2019 - KeesTalksTech

Kafka, Spark and schema inference

At Wehkamp we use Apache Kafka in our event driven service architecture. It handles high loads of messages really well. We use Apache Spark to run analysis. From time to time, I need to read a Kafka topic into my Databricks notebook. In this article, I’ll show what I use to read from a Kafka topic that has no schema attached to it. We’ll also dive into how we can render the JSON schema in a human-readable format.

Read article

Chatops, Databricks / Spark, Python, Slack

Simple Python code to send messages to a Slack channel (without packages)

Last week I was working on a Databricks script that needed to produce a Slack message as its final outcome. I lifted some code that used a Slack client that was PIP-installed. Unfortunately, I could not use the package on my cluster. Fortunately, the Slack API is so simple, that you don’t really need a package to post a simple message to a channel. In this blog I’ll show you the simplest way of producing awesome messages in Slack.

Read article

Amazon S3, Databricks / Spark

Caching resized images on S3 with Databricks

When you are training a machine learning image classification model, you often need to resize the images your dataset into smaller ones. When you retrain your model on new data, you resize the images once more. In this blog I’ll share how S3 can be used to cache the resized images.

Read article

Databricks / Spark

Sorting an array of a complex data type in Spark

Today we’ll be looking at sorting and reducing an array of a complex data type. I’m using Databricks to do Spark, but I’m sure the code is compatible. I’ll be using Spark SQL to show the steps. I’ve tried to keep the data as simple as possible. The example should apply to scenarios that are more complex.

Read article

Databricks / Spark, Python

Adding True/False and list value widgets to your Databricks notebook

As an engineer, I love to parametrise my applications. That’s why I love the widget-feature of Databricks notebooks, which allows me to do this with a nice UI. In this blog I’ll explore how to build a True/False widget and a list widget. I also show how to validate the values of required fields.

Read article

bash

Investigate problems due to User-Agent using Bash

Last week we had some problems with the Google Ads bot. It was not able to crawl a bunch of URLs while the browser had no problem getting through. The only difference was the User-Agent. This send us on a debugging journey through Cloudflare, gateways and micro-sites. To assist us, we’ve created a small bash script to visit an URL and show some debug info.

Read article

Amazon S3, Automation, AWS, Python

Trigger Lambda for large S3 Bucket with SQS

At Wehkamp we use AWS Lambda to classify images on S3. The Lambda is triggered when a new image is uploaded to the S3 bucket. Currently we have over 6.400.000 images in the bucket. Now we would like to run the Lambda for all images of the bucket. In this blog I’ll show how we did this with a Python 3.6 script.

Read article

Tips

My Little List of Tools for Prototyping

As a developer I love to prototype to see if an idea works. Thinking big and starting small is actually one of Wehkamp’s principles. And, let’s face it, that’s not easy!

Usually it starts by getting an idea of the core concept that should be validated. Especially when working with teams, communication is key. This list of tools helped me over the years to draw or code out some of these concepts and get a discussion started.

Every tool on this list is free and online.

Read article

Automation, Node.js

Convert JsFiddle to SVG using Node.js

I love SVG, but sometimes they are hard to create, especially when you need to visualize diagrams. HTML is way easier to program. So why not combine them? Can we use HTML to generate an SVG? And can we use JsFiddle to generate that HTML?

Read article

Amazon S3, bash, Python

AWS Lambda Size: PIL+TF+Keras+Numpy?

At Wehkamp we’ve been using machine learning for a while now. We’re training models in Databricks (Spark) and Keras. This produces a Keras file that we use to make the actual predictions. Training is one thing, but getting them to production is quite another!

The main problem we’ve faced was that it was too big to actually fit into a lambda. This blogs shows how we’ve dealt with that problem.

Read article