Databricks / Spark

Databricks / Spark

Databricks is a company founded by the original creators of Apache Spark. Databricks develops a web-based platform for working with Spark, that provides automated cluster management and IPython-style notebooks.

Apache Spark is an open-source distributed general-purpose cluster-computing framework. Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Tip: pimp your Python syntax highlighting in Databricks!

Databricks / Spark, Kafka, PySpark

Kafka, Spark and schema inference

At Wehkamp we use Apache Kafka in our event driven service architecture. It handles high loads of messages really well. We use Apache Spark to run analysis. From time to time, I need to read a Kafka topic into my Databricks notebook. In this article, I’ll show what I use to read from a Kafka topic that has no schema attached to it. We’ll also dive into how we can render the JSON schema in a human-readable format.

Chatops, Databricks / Spark, Python, Slack

Simple Python code to send message to Slack channel (without packages)

Last week I was working on a Databricks script that needed to produce a Slack message as its final outcome. I lifted some code that used a Slack client that was PIP-installed. Unfortunately, I could not use the package on my cluster. Fortunately, the Slack API is so simple, that you don’t really need a package to post a simple message to a channel. In this blog I’ll show you the simplest way of producing awesome messages in Slack.