At Wehkamp we use decoration a lot. Decoration is a nice way of separating concerns from the actual code. Most of our repositories need the same set of decorators: exception logging, latency metrics and Jaeger spans. In this article I’ll be joining these 3 types of decorator into a single Swiss Army Knife decorator: one decorator to rule them all.
This week we’ve been working on processing the access logs from Cloudflare with Databricks (Spark). We now have a job that generates a huge CSV file (+1GB) and sends it on towards by FTP for further processing with an external tool. Creating a DataFrame with the right data was easy. Now, let’s explore how to do a CSV export, secrets management and an FTP transfer!
I operate from the Netherlands and that makes my time zone Central European Summer Time (CEST). The data I handle is usually stored in UTC time. Whenever I need to crunch some data with Spark I struggle to do the right date conversion, especially around summer or winter time (do I need to add 1 or 2 hours?). In this blog, I’ll show how to handle these time zones properly in PySpark.
A Table of Contents helps users navigate (long) blog posts. I use them on both posts and post. The desktop version always shows the table on the right side in the sidebar (using a text-widget with a shortcode). On mobile, I’ll only show it on long articles, using a shortcode under the first paragraph.