Adding True/False and list value widgets to your Databricks notebook

As an engineer, I love to add parameters to my applications. That's why I love the widget-feature of Databricks notebooks. It allows me to do parameterization with a nice UI.

Screenshot of a Databricks notebook with 5 widgets. By default, they stick on top of the notebook.
A Databricks notebook with 5 widgets. By default, they stick on top of the notebook.

You can add widgets to a notebook by specifying them in the first cells of the notebook. There are four flavors: text, dropdown, combobox, and multiselect. It is even possible to specify widgets in SQL, but I'll be using Python today.

I use them to influence the settings of my notebook, so I can easily change them and run the notebook again. I also love the fact that I can specify a default value for a widget.

Adding a True/False widget

Unfortunately, there is no checkbox widget in Databricks. We'll be using a dropdown with the values True and False:

dbutils.widgets.dropdown("skip_send_to_bucket", "False", 
                         ["True", "False"], "Skip send to bucket")

dbutils.widgets.dropdown("force_reclassification", "False",
                         ["True", "False"], "Force Reclassification")

This will render the following UI:

Two Databricks dropdown widgets that show True/False options. The widgets are ordered alphabetically.
The widgets are in alphabetical order.

Let's say these values are optional. The UI will always provide the values, but when other notebooks are calling this notebook, they might not care to specify all arguments. First, we'll create a function that will read the value or provide a default:

def get_argument_value_or_default(name, default):
  value = getArgument(name)
  if len(value) < 1:
    return default
  return value

Next, we must convert the value into a Boolean. Did you know that bool("False") or bool("0") will return True? This is because the string is not empty. We'll create a function to parse to a string value to a Boolean value:

def str_to_bool(value):
  FALSE_VALUES = ['false', 'no', '0']
  TRUE_VALUES = ['true', 'yes', '1']
  lvalue = str(value).lower()
  if lvalue in (FALSE_VALUES): return False
  if lvalue in (TRUE_VALUES):  return True
  raise Exception("String value should be one of {}, but got '{}'.".format(FALSE_VALUES + TRUE_VALUES, value))

For our example we could just check on 'True' and 'False', but I like my functions to give me more options. Now let's parse the widget values into variables:

force_reclassification = str_to_bool(get_argument_value_or_default("force_reclassification", False))
skip_send_to_bucket = str_to_bool(get_argument_value_or_default("skip_send_to_bucket", False))

What about required values?

When a value is required, you should terminate notebook execution if the value is not provided. Let's create a function that aides us in doing so:

def validate_required_argument_and_return_value(name):
  value = getArgument(name)
  if len(value) < 1:
    dbutils.notebook.exit("'{}' argument value is required.".format(name))
  return value

This will cause the notebook to exit with the following message:

Notebook exited: 'delivery_time_codes_to_skip' argument value is required.

Separated list widget?

I like to specify lists in the form of a comma-separated list of values. I'll be using a text widget for these types of values:

dbutils.widgets.text("mcc1_codes_to_reclassify", "F9, G4, J9, W2, G9, M7", "MCC1 for Reclassification")

This renders:

Databricks text field widget with a comma separated list of strings.
The rendered text field is small for a list.

Let's create a helper functions for parsing the string into a list:

def split_into_string_list(value):
  lst = re.split(';|,|\\[|\\]|"|\'', value)
  lst = map(lambda x: x.strip(), lst)
  lst = filter(lambda x: len(x) > 0, lst)
  return list(lst)

We're splitting on everything in the book: commas, semi-colons, pipes. This split parses weird values like "[\"my\", 1, ' ', 'weird-list',]" into ['my', '1', 'weird-list'] . It will skip empty values en remove spaces from values. You might not want or need such an opinionated split.

Let's combine the plit_into_string_list with the validate_required_argument_and_return_value:

def validate_required_argument_and_return_list_value(name):
  value = validate_required_argument_and_return_value(name)
  value = split_into_string_list(value)
  if len(value) == 0:
        dbutils.notebook.exit("'{}' list argument has no items.".format(name))
  return value

Let's call it:

mcc1_codes_to_reclassify = validate_required_argument_and_return_list_value("mcc1_codes_to_reclassify")

Running a parametrised notebook

Running a notebook from another notebook is easy:
    'Image Classification Pipeline', # name
    60 * 10, # timeout in minutes
    # parameters:
    { 'mcc1_codes_to_reclassify': 'KLM, CKB, W0T',
      'skip_send_to_bucket': 'False',
      'force_reclassification': 'True' })

Or use %run:

%run /path/to/notebook $mcc1_codes_to_reclassify="KLM, CKB, W0T" $skip_send_to_bucket="False" $force_reclassification="True" 


That's it. It is easy to add a True/False widget to your notebook, using a dropdown. One big advantage that you don't have to use True or False as a caption, but you can give a more descriptive caption and still use a Boolean value in your notebook.
The list value widgets are harder but will give the end-user great control over feeding a list into your notebook.