Lately Iāve become fascinated with the Latin language. Iām working on a project that converts photographs of Latin inscriptions on medieval statues into translated text. One of the challenges is parsing years, usually expressed in the form of Roman Numerals.

After building a parser class I noticed that it had a lot of nice characteristics: parsing, operator overloading, implicit conversions. A really nice way to play around with C#.

## Let’s define some numerals

When you read about Roman Numerals you’ll notice that there are different styles. Last week I visited the Gros Horloge in Rouen, France. Look at the numerals of the clock:

Notice how the numeral for 4 isn’t IV, but IIII. It turns out that it is the (perfectly fine) long notation called the additive form. Long notations like VIIII are a bit harder to read, that’s why the subtractive notation IX is used in many cases. In some documents both long and short notations are used. The V might be written as IIIII and LX as XXXXXX.

Did you know there’s no zeroĀ in Roman Numerals. It wasn’t invented yet (for more on that here). The writers usedĀ NULLA, which is latin forĀ nothing.

Ow… and it turns out that 18 can be written as both XIIX and IIXX – you got to love those Romans.

So let’s define these values as constants:

```//0, nothing, nada. The zero wasn't invented yet š
public const string NULLA = "NULLA";

//values - a readonly dictionary where the numerals are the keys to values
{
{"I",       1 },
{"IV",      4 },
{"V",       5 },
{"IX",      9 },
{"X",       10 },
{"XIIX",    18 },
{"IIXX",    18 },
{"XL",      40 },
{"L",       50 },
{"XC",      90 },
{"C",       100 },
{"CD",      400 },
{"D",       500 },
{"CM",      900 },
{"M",       1000 },

//alternatives from Middle Ages and Renaissance
{"O",       11 },
{"F",       40 },
{"P",       400 },
{"G",       400 },
{"Q",       500 }
});```

By the way, in the Middle Ages and the Renaissance more numerals were added. They are not used anymore, but might be useful in parsing – that’s why they are included.

## Storage

First we need an object to store the data and perform operations on. Let’s create something simple that is nullable (a class).

```public class RomanNumeral
{

public int Number => _number;

public RomanNumeral(int number)
{
if (number < 0)
{
throw new ArgumentOutOfRangeException(nameof(number), "Number should be positive.");
}

_number = number;
}
}```

## Parsing a Roman Numeral string into a number

A Roman Numeral is an ordered string of characters. The values of the string are in a descending order. So XI means 10 + 1. MDCIX means 1000 + 500 + 100 + 9. Notice that IX does not mean 1+10, but 10-1 (it’s subtractive!). Let’s build an array with all the numerals a descending order of their numerical value:

```//all the options that are used for parsing, in their order of value
public static readonly string[] NUMERAL_OPTIONS =
{
"M", "CM", "D", "Q", "CD", "P", "G", "C", "XC", "L", "F", "XL", "IIXX", "XIIX", "O", "X", "IX", "V", "IV", "I"
};```

We can use this array to parse a given string. We need to do the following:

2. Find the first numeral from theĀ NUMERAL_OPTION that matches the start of the string
3. If the numeral cannot be found, the input string is invalid andĀ NULLĀ must be returned (and the routine must be terminated)
4. Otherwise add the value of the numeral (stored inĀ theĀ VALUES array) to theĀ resultNumber
5. Remove the numeral from the input string.
6. Repeat until the input string is empty. Make sure that only the same or lower numerals are used in the repetitions. Higher numerals are invalid.

### Tha code

This will result in the following code:

```public static RomanNumeral Parse(string str)
{
var resultNumber = 0;

var numeralOptionPointer = 0;

//continue to read until the string is empty or the numeral options pointer has exceeded all options
while (!String.IsNullOrEmpty(strToRead) && numeralOptionPointer < NUMERAL_OPTIONS.Length)
{
//select the current numeral
var numeral = NUMERAL_OPTIONS[numeralOptionPointer];

//read numeral -> check if the numeral is used, otherwise move on to the next option
{
numeralOptionPointer++;
continue;
}

//add the vaue of the found numeral
var value = VALUES[numeral];
resultNumber += value;

//remove the letters of the numeral from the string
}

//if the whole string is read, return the numeral
{
return new RomanNumeral(resultNumber);
}

//string is invalid
return null;
}```

### Let’s be nice

To harden the code we could add some extra processing rules:

• An empty string is numeral 0.
• The input string should be uppercased.
• The stringĀ NULLA is numeral 0.
• The characterĀ U can be converted toĀ V.
• Some medical recipes will end in aĀ J instead of anĀ I. Replace this end-character.
• Check if the whole string is in theĀ VALUES array and if so, return that value (small optimization).

Just add the following to the method:

```if (String.IsNullOrEmpty(str))
{
return new RomanNumeral(0);
}

//upper case the string

//nulla? means nothing 0 wasn't invented yet š
{
return new RomanNumeral(0);
}

//if ends in J -> replace it to I (used in medicine)
{
}

//if a U is present, assume a V

//check simple numbers directly in dictionary
if (VALUES.ContainsKey(str))
{
return new RomanNumeral(VALUES[str]);
}```

## How about writing Roman Numerals?

First we need to distinguish between the two notations – the additive and the subtractive:

```//subtractive notation uses these numerals
public static readonly string[] SUBTRACTIVE_NOTATION =
{
"M", "CM", "D", "CD", "C", "XC", "L", "XL", "X", "IX", "V", "IV", "I"
};

//the addative notation uses these numerals
{
"M", "D", "C", "L", "X", "V", "I"
};```

Notice that the values of the numerals in the arrays are both in descended order. With these ordered arrays we can device the following algorithm:

1. Use the array of the notation.
2. Traverse each item of the array.
3. If a numeral item valueĀ is smaller than the number, subtract the number and add the numeral to the result string. Try again.
4. If a numeral does not fit, move on to the next.
5. Repeat until the number is 0.
6. Return the result as the full Roman Numeral.

### ToString() implementation

Converted into code the algorithm will look like this:

```public override string ToString()
{
}

public string ToString(RomanNumeralNotation notation)
{
if (Number == 0)
{
return NULLA;
}

//check notation for right set of characters
string[] numerals;
switch (notation)
{
break;
default:
numerals = SUBTRACTIVE_NOTATION;
break;
}

var resultRomanNumeral = "";

var position = 0;

//substract till the number is 0
var value = Number;

do
{
var numeral = numerals[position];
var numeralValue = VALUES[numeral];

//check if the value is in the number
if (value >= numeralValue)
{
//substract from the value
value -= numeralValue;

//add the numeral to the string
resultRomanNumeral += numeral;

//subtractive numeral? advance position because IVIV does not exist
bool isSubtractiveNumeral = numeral.Length > 1;
if(isSubtractiveNumeral)
{
position++;
}

continue;
}

position++;
}
while (value != 0);

return resultRomanNumeral;
}```

The advantage of parsing the notation array is the way that subtractive numerals are handled. Ā XL is parsed beforeĀ X. This prevents the algorithm from reading 10+50 instead of 40. It also has the nice advantage of discarding wrongly formed numerals.

## Let’s see it work

I’ve created some unit tests to show how the Roman Numeral implementation works:

```Assert.AreEqual(1910, RomanNumeral.Parse("MDCCCCX").Number);
Assert.AreEqual(1910, RomanNumeral.Parse("MCMX").Number);