Lately I’ve become fascinated with the Latin language. I’m working on a project that converts photographs of Latin inscriptions on medieval statues into translated text. One of the challenges is parsing years, usually expressed in the form of Roman Numerals.

After building a parser class I noticed that it had a lot of nice characteristics: parsing, operator overloading, implicit conversions. A really nice way to play around with C#.

Read here Part 2: Calculations with Romand Numerals using C#.

Let’s define some numerals

When you read about Roman Numerals you’ll notice that there are different styles. Last week I visited the Gros Horloge in Rouen, France. Look at the numerals of the clock:

The Big Clock of Rouen, France. The numbers are expressed as Roman Numerals.

Notice how the numeral for 4 isn’t IV, but IIII. It turns out that it is the (perfectly fine) long notation called the additive form. Long notations like VIIII are a bit harder to read, that’s why the subtractive notation IX is used in many cases. In some documents both long and short notations are used. The V might be written as IIIII and LX as XXXXXX.

Did you know there’s no zero¬†in Roman Numerals. It wasn’t invented yet (for more on that here). The writers used¬†NULLA, which is latin for¬†nothing.

Ow… and it turns out that 18 can be written as both XIIX and IIXX – you got to love those Romans.

So let’s define these values as constants:

By the way, in the Middle Ages and the Renaissance more numerals were added. They are not used anymore, but might be useful in parsing – that’s why they are included.


First we need an object to store the data and perform operations on. Let’s create something simple that is nullable (a class).

Parsing a Roman Numeral string into a number

A Roman Numeral is an ordered string of characters. The values of the string are in a descending order. So XI means 10 + 1. MDCIX means 1000 + 500 + 100 + 9. Notice that IX does not mean 1+10, but 10-1 (it’s subtractive!). Let’s build an array with all the numerals a descending order of their numerical value:

We can use this array to parse a given string. We need to do the following:

  1. Start with resultNumber with value 0.
  2. Find the first numeral from the NUMERAL_OPTION that matches the start of the string
  3. If the numeral cannot be found, the input string is invalid and NULL must be returned (and the routine must be terminated)
  4. Otherwise add the value of the numeral (stored in the VALUES array) to the resultNumber
  5. Remove the numeral from the input string.
  6. Repeat until the input string is empty. Make sure that only the same or lower numerals are used in the repetitions. Higher numerals are invalid.

Tha code

This will result in the following code:

Let’s be nice

To harden the code we could add some extra processing rules:

  • An empty string is numeral 0.
  • The input string should be uppercased.
  • The string¬†NULLA is numeral 0.
  • The character¬†U can be converted to¬†V.
  • Some medical recipes will end in a¬†J instead of an¬†I. Replace this end-character.
  • Check if the whole string is in the¬†VALUES array and if so, return that value (small optimization).

Just add the following to the method:

How about writing Roman Numerals?

First we need to distinguish between the two notations – the additive and the subtractive:

Notice that the values of the numerals in the arrays are both in descended order. With these ordered arrays we can device the following algorithm:

  1. Use the array of the notation.
  2. Traverse each item of the array.
  3. If a numeral item value is smaller than the number, subtract the number and add the numeral to the result string. Try again.
  4. If a numeral does not fit, move on to the next.
  5. Repeat until the number is 0.
  6. Return the result as the full Roman Numeral.

ToString() implementation

Converted into code the algorithm will look like this:

The advantage of parsing the notation array is the way that subtractive numerals are handled.  XL is parsed before X. This prevents the algorithm from reading 10+50 instead of 40. It also has the nice advantage of discarding wrongly formed numerals.

Let’s see it work

I’ve created some unit tests to show how the Roman Numeral implementation works:


I’ve shown how one can parse Roman Numerals with an easy C# class. You can download the full class from GitHub and use it in your projects. In the next blog I’ll show you¬†how to implement calculation operators (+,¬†,¬†*,¬†/,¬†%) and implicit casting (to¬†int and string).