Lately I’ve become fascinated with the Latin language. I’m working on a project that converts photographs of Latin inscriptions on medieval statues into translated text. One of the challenges is parsing years, usually expressed in the form of Roman Numerals.

After building a parser class I noticed that it had a lot of nice characteristics: parsing, operator overloading, implicit conversions. A really nice way to play around with C#.

Read here Part 2: Calculations with Romand Numerals using C#.

## Let’s define some numerals

When you read about Roman Numerals you’ll notice that there are different styles. Last week I visited the Gros Horloge in Rouen, France. Look at the numerals of the clock:

Notice how the numeral for 4 isn’t IV, but IIII. It turns out that it is the (perfectly fine) long notation called the *additive form*. Long notations like VIIII are a bit harder to read, that’s why the subtractive notation IX is used in many cases. In some documents both long and short notations are used. The V might be written as IIIII and LX as XXXXXX.

Did you know there’s no *zero* in Roman Numerals. It wasn’t invented yet (for more on that here). The writers used *NULLA,* which is latin for *nothing.*

Ow… and it turns out that 18 can be written as both XIIX and IIXX – you got to love those Romans.

So let’s define these values as constants:

//0, nothing, nada. The zero wasn't invented yet ;-) public const string NULLA = "NULLA"; //values - a readonly dictionary where the numerals are the keys to values public static readonly IReadOnlyDictionary<string, int> VALUES = new ReadOnlyDictionary<string, int>(new Dictionary<string, int> { {"I", 1 }, {"IV", 4 }, {"V", 5 }, {"IX", 9 }, {"X", 10 }, {"XIIX", 18 }, {"IIXX", 18 }, {"XL", 40 }, {"L", 50 }, {"XC", 90 }, {"C", 100 }, {"CD", 400 }, {"D", 500 }, {"CM", 900 }, {"M", 1000 }, //alternatives from Middle Ages and Renaissance {"O", 11 }, {"F", 40 }, {"P", 400 }, {"G", 400 }, {"Q", 500 } });

By the way, in the Middle Ages and the Renaissance more numerals were added. They are not used anymore, but might be useful in parsing – that’s why they are included.

## Storage

First we need an object to store the data and perform operations on. Let’s create something simple that is nullable (a **class**).

public class RomanNumeral { private readonly int _number; public int Number => _number; public RomanNumeral(int number) { if (number < 0) { throw new ArgumentOutOfRangeException(nameof(number), "Number should be positive."); } _number = number; } }

## Parsing a Roman Numeral string into a number

A Roman Numeral is an ordered string of characters. The values of the string are in a descending order. So XI means 10 + 1. MDCIX means 1000 + 500 + 100 + 9. Notice that IX does not mean 1+10, but 10-1 (it’s subtractive!). Let’s build an array with all the numerals a descending order of their numerical value:

//all the options that are used for parsing, in their order of value public static readonly string[] NUMERAL_OPTIONS = { "M", "CM", "D", "Q", "CD", "P", "G", "C", "XC", "L", "F", "XL", "IIXX", "XIIX", "O", "X", "IX", "V", "IV", "I" };

We can use this array to parse a given string. We need to do the following:

- Start with
**resultNumber**with value**0**. - Find the first numeral from the
**NUMERAL_OPTION**that matches the start of the string - If the numeral cannot be found, the input string is invalid and
**NULL**must be returned (and the routine must be terminated) - Otherwise add the value of the numeral (stored in the
**VALUES**array) to the**resultNumber** - Remove the numeral from the input string.
- Repeat until the input string is empty. Make sure that
*only*the same or lower numerals are used in the repetitions. Higher numerals are invalid.

### Tha code

This will result in the following code:

public static RomanNumeral Parse(string str) { var strToRead = str; var resultNumber = 0; //start with M and iterate through the options var numeralOptionPointer = 0; //continue to read until the string is empty or the numeral options pointer has exceeded all options while (!String.IsNullOrEmpty(strToRead) && numeralOptionPointer < NUMERAL_OPTIONS.Length) { //select the current numeral var numeral = NUMERAL_OPTIONS[numeralOptionPointer]; //read numeral -> check if the numeral is used, otherwise move on to the next option if (!strToRead.StartsWith(numeral)) { numeralOptionPointer++; continue; } //add the vaue of the found numeral var value = VALUES[numeral]; resultNumber += value; //remove the letters of the numeral from the string strToRead = strToRead.Substring(numeral.Length); } //if the whole string is read, return the numeral if (String.IsNullOrEmpty(strToRead)) { return new RomanNumeral(resultNumber); } //string is invalid return null; }

### Let’s be nice

To harden the code we could add some extra processing rules:

- An empty string is numeral
**0**. - The input string should be uppercased.
- The string
**NULLA**is numeral**0**. - The character
**U**can be converted to**V**. - Some medical recipes will end in a
**J**instead of an**I**. Replace this end-character. - Check if the whole string is in the
**VALUES**array and if so, return that value (small optimization).

Just add the following to the method:

if (String.IsNullOrEmpty(str)) { return new RomanNumeral(0); } //upper case the string var strToRead = str.ToUpper(); //nulla? means nothing 0 wasn't invented yet ;-) if (strToRead == NULLA) { return new RomanNumeral(0); } //if ends in J -> replace it to I (used in medicine) if (strToRead.EndsWith("J")) { strToRead = strToRead.Substring(0, strToRead.Length - 1) + "I"; } //if a U is present, assume a V strToRead = strToRead.Replace("U", "V"); //check simple numbers directly in dictionary if (VALUES.ContainsKey(str)) { return new RomanNumeral(VALUES[str]); }

## How about writing Roman Numerals?

First we need to distinguish between the two notations – the additive and the subtractive:

//subtractive notation uses these numerals public static readonly string[] SUBTRACTIVE_NOTATION = { "M", "CM", "D", "CD", "C", "XC", "L", "XL", "X", "IX", "V", "IV", "I" }; //the addative notation uses these numerals public static readonly string[] ADDITIVE_NOTATION = { "M", "D", "C", "L", "X", "V", "I" };

Notice that the values of the numerals in the arrays are both in descended order. With these ordered arrays we can device the following algorithm:

- Use the array of the notation.
- Traverse each item of the array.
- If a numeral item
*value*is smaller than the number,*subtract*the number and add the numeral to the result string. Try again. - If a numeral does not fit, move on to the next.
- Repeat until the number is
**0**. - Return the result as the full Roman Numeral.

### ToString() implementation

Converted into code the algorithm will look like this:

public override string ToString() { return ToString(RomanNumeralNotation.Substractive); } public string ToString(RomanNumeralNotation notation) { if (Number == 0) { return NULLA; } //check notation for right set of characters string[] numerals; switch (notation) { case RomanNumeralNotation.Additive: numerals = ADDITIVE_NOTATION; break; default: numerals = SUBTRACTIVE_NOTATION; break; } var resultRomanNumeral = ""; //start with the M and iterate back var position = 0; //substract till the number is 0 var value = Number; do { var numeral = numerals[position]; var numeralValue = VALUES[numeral]; //check if the value is in the number if (value >= numeralValue) { //substract from the value value -= numeralValue; //add the numeral to the string resultRomanNumeral += numeral; //subtractive numeral? advance position because IVIV does not exist bool isSubtractiveNumeral = numeral.Length > 1; if(isSubtractiveNumeral) { position++; } continue; } position++; } while (value != 0); return resultRomanNumeral; }

The advantage of parsing the notation array is the way that subtractive numerals are handled. **XL** is parsed before **X**. This prevents the algorithm from reading 10+50 instead of 40. It also has the nice advantage of discarding wrongly formed numerals.

## Let’s see it work

I’ve created some unit tests to show how the Roman Numeral implementation works:

Assert.AreEqual(1910, RomanNumeral.Parse("MDCCCCX").Number); Assert.AreEqual(1910, RomanNumeral.Parse("MCMX").Number); Assert.AreEqual("CXVIIII", new RomanNumeral(119).ToString(RomanNumeralNotation.Additive)); Assert.AreEqual("CXIX", new RomanNumeral(119).ToString());

## Summary

I’ve shown how one can parse Roman Numerals with an easy C# class. You can download the full class from GitHub and use it in your projects. In the next blog I’ll show you how to implement calculation operators (**+**, **–**, *****, **/**, **%**) and implicit casting (to **int** and **string**).