This week we’ve been working on processing the access logs from Cloudflare with Databricks (Spark). We now have a job that generates a huge CSV file (+1GB) and sends it on towards by FTP for further processing with an external tool. Creating a DataFrame with the right data was easy. Now, let’s explore how to do a CSV export, secrets management and an FTP transfer!
I operate from the Netherlands and that makes my time zone Central European Summer Time (CEST). The data I handle is usually stored in UTC time. Whenever I need to crunch some data with Spark I struggle to do the right date conversion, especially around summer or winter time (do I need to add 1 or 2 hours?). In this blog, I’ll show how to handle these time zones properly in PySpark.
A Table of Contents helps users navigate (long) blog posts. I use them on both posts and post. The desktop version always shows the table on the right side in the sidebar (using a text-widget with a shortcode). On mobile, I’ll only show it on long articles, using a shortcode under the first paragraph.
Recently, I worked on my theme for KeesTalksTech. To gain performance, I need to rely less on plugins, that’s why I needed a simple way to show small lists of posts in my sidebar.
I’ve created 2 short codes: one that shows recent posts, used in the new section and one that shows specific posts, used in the highlights section.
At Wehkamp we use Redis a lot. It is fast, available and implemented as a managed AWS service called ElastiCache. Sometimes we need to extract data from Redis, and usually I use the redis-cli to interact from the command-line. But what if you need to get the values of 400k+ keys? What would you do? Is there an effective way to query multiple key/values from Redis?