Anything can be technology. But the dictionary defines it as:
1: a: the practical application of knowledge especially in a particular area; b: a capability given by the practical application of knowledge.
2: a manner of accomplishing a task especially using technical processes, methods, or knowledge.
3: the specialized aspects of a particular field of endeavor.
Yesterday we’ve encountered a curious problem: we needed use Spark to parse JSON data that was produced by AWS Kinesis. Unfortunately, the data was in a concatenated JSON format. Spark does not support that format, so we had to come up with something our selves.
While working in Databricks, I needed to plot some images. I wrote some code that does this in IPython notebooks, but nothing that works on a Dataframe. I decided to change the code a bit, so it works in Databricks. This solution uses PIL and Matplotlib.
Last week we wanted to parse some XML data with Spark. We have a column with unstructured XML documents and we need to extract text fields from it. This article shows how you can extract data in Spark using a UDF and ElementTree, the Python XML parser.
This week we’ve been working on processing the access logs from Cloudflare with Databricks (Spark). We now have a job that generates a huge CSV file (+1GB) and sends it on towards by FTP for further processing with an external tool. Creating a DataFrame with the right data was easy. Now, let’s explore how to do a CSV export, secrets management and an FTP transfer!
I operate from the Netherlands and that makes my time zone Central European Summer Time (CEST). The data I handle is usually stored in UTC time. Whenever I need to crunch some data with Spark I struggle to do the right date conversion, especially around summer or winter time (do I need to add 1 or 2 hours?). In this blog, I’ll show how to handle these time zones properly in PySpark.
It is pretty easy to write your AWS ALB access logs to S3, but if you want to do something with them, you might want to add them to AWS Athena, so you could query them using plain old SQL. Let’s investigate how we can see which upstreams / targets are misbehaving.
I’ve installed and configured AWS Vault in Windows. In this blog I’ll show how to setup MFA, automate token rotation, share AWS Vault with Windows Subsystem for Linux (WSL) and how to do an ECR login on Docker.
There are a myriad of tutorials on Redis in almost every programming language. Many will cover how to make a to-do list, so why write another one? Well, I want to write a tutorial that is language agnostic and only uses Redis commands and Lua scripts (the build-in scripting language of Redis).
A Table of Contents helps users navigate (long) blog posts. I use them on both posts and post. The desktop version always shows the table on the right side in the sidebar (using a text-widget with a shortcode). On mobile, I’ll only show it on long articles, using a shortcode under the first paragraph.
In WordPress you have two main taxonomies: categories and tags. I use categories as a taxonomy tree. That is why I want to show the submenu on the category page. It does not come out of the box, so I created something that renders the submenu items for me.