Archive of November, 2019

Docker on Synology: from git to running container; the easy way

My Synology disk crashed and so did my Docker set up. Basically, the CI/CD pipeline for my programs no longer existed. The wonderful thing of an awful crash like this, is that I could rethink my setup. The result is what I would call “a poor man’s CI/CD”. It’s just Git, Docker, Docker Compose and Cron. It is easy to set up and it might be all you need.

Read article

Tips

Add black borders to terminal screen recordings

For some blogs I need to capture terminal screens. The recording of these types of screens have different requirement then normal application or website recordings. The bottom of the video is the most important part, the letters need to be crisp and readable for the end user.

Read article

Amazon S3, Databricks / Spark

Streaming a Kafka topic in a Delta table on S3 using Spark Structured Streaming

Our data strategy specifies that we should store data on S3 for further processing. Raw S3 data is not the best way of dealing with data on Spark, though. In this blog I’ll show how you can use Spark Structured Streaming to write JSON records of a Kafka topic into a Delta table.

Read article

Databricks / Spark

Easy Spark optimization for max record: aggregate instead of join?

There is a lot of code that needs to make a selection based on a maximum value. One example are Kafka reads: we only want the latest offset for each key, because that’s the latest record. What is the fastest way of doing this?

Read article

Databricks / Spark

Add more color to the Python code of your Databricks notebook [retired]

Tired of the dull Python syntax highlighting in Databricks? Just copy this code into your Magic CSS editor, change it (to your own style), pin it & enjoy!

Read article

Databricks / Spark

Kafka, Spark and schema inference

At Wehkamp we use Apache Kafka in our event driven service architecture. It handles high loads of messages really well. We use Apache Spark to run analysis. From time to time, I need to read a Kafka topic into my Databricks notebook. In this article, I’ll show what I use to read from a Kafka topic that has no schema attached to it. We’ll also dive into how we can render the JSON schema in a human-readable format.

Read article