# Parsing Roman Numerals using C#

Lately I’ve become fascinated with the Latin language. I’m working on a project that converts photographs of Latin inscriptions on medieval statues into translated text. One of the challenges is parsing years, usually expressed in the form of Roman Numerals.

After building a parser class I noticed that it had a lot of nice characteristics: parsing, operator overloading, implicit conversions. A really nice way to play around with C#. Read here Calculations with Romand Numerals using C#.

- Intro
- Let's define some numerals
- Storage
- Parsing a Roman Numeral string into a number
- How about writing Roman Numerals?
- Let's see it works
- Summary
- Comments

## Let’s define some numerals

When you read about Roman Numerals you’ll notice that there are different styles. Last week I visited the Gros Horloge in Rouen, France. Look at the numerals of the clock:

Notice how the numeral for 4 isn’t IV, but IIII. It turns out that it is the (perfectly fine) long notation called the *additive form*. Long notations like VIIII are a bit harder to read, that’s why the subtractive notation IX is used most times. In some documents, both long and short notations are used. The V might be written as IIIII and LX as XXXXXX.

Did you know there’s no *zero* in Roman Numerals? It wasn’t invented yet (for more on that here). The writers used *NULLA,* which is Latin for *nothing.*

Ow… and it turns out that 18 can be written as both XIIX and IIXX – you got to love those Romans.

### Constants

So let’s define these values as constants:

```
// 0, nothing, nada. The zero wasn't invented yet ;-)
public const string NULLA = "NULLA";
// numerals are the keys to values
public static readonly IReadOnlyDictionary<string, int> VALUES =
new ReadOnlyDictionary<string, int>(new Dictionary<string, int>
{
{"I", 1 },
{"IV", 4 },
{"V", 5 },
{"IX", 9 },
{"X", 10 },
{"XIIX", 18 },
{"IIXX", 18 },
{"XL", 40 },
{"L", 50 },
{"XC", 90 },
{"C", 100 },
{"CD", 400 },
{"D", 500 },
{"CM", 900 },
{"M", 1000 },
// alternatives from Middle Ages and Renaissance
{"O", 11 },
{"F", 40 },
{"P", 400 },
{"G", 400 },
{"Q", 500 }
});
```

In the Middle Ages and the Renaissance, more numerals were added. They are not used anymore, but might be useful in parsing – that’s why they are included.

## Storage

First, we need an object to store the data and perform operations on. Let’s create something simple that is nullable (a `class`

).

```
public class RomanNumeral
{
private readonly int _number;
public int Number => _number;
public RomanNumeral(int number)
{
if (number < 0)
{
throw new ArgumentOutOfRangeException(
nameof(number),
"Number should be positive."
);
}
_number = number;
}
}
```

## Parsing a Roman Numeral string into a number

A Roman Numeral is an ordered string of characters. The values of the string are in descending order. So XI means 10 + 1. MDCIX means 1000 + 500 + 100 + 9. Notice that IX does not mean 1+10, but 10-1 (it’s subtractive!). Let’s build an array with all the numerals a descending order of their numerical value:

```
// all the options that are used for parsing, in their order of value
public static readonly string[] NUMERAL_OPTIONS =
{
"M", "CM", "D", "Q", "CD", "P", "G", "C", "XC", "L",
"F", "XL", "IIXX", "XIIX", "O", "X", "IX", "V", "IV", "I"
};
```

We can use this array to parse a string. We need to do:

- Start with
`resultNumber`

with value`0`

. - Find the first numeral from the
`NUMERAL_OPTION`

that matches the start of the string. - If the numeral cannot be found, the input string is invalid and
`NULL`

must be returned (and the routine must be terminated). - Otherwise add the value of the numeral (stored in the
`VALUES`

array) to the`resultNumber`

. - Remove the numeral from the input string.
- Repeat until the input string is empty. Make sure that
*only*the same or lower numerals are used in the repetitions. Higher numerals are invalid.

### The code

This will result in the following code:

```
public static RomanNumeral Parse(string str)
{
var strToRead = str;
var resultNumber = 0;
// start with M and iterate through the options
var numeralOptionPointer = 0;
// continue to read until the string is empty
// or the numeral options pointer has exceeded all options
while (
!String.IsNullOrEmpty(strToRead) &&
numeralOptionPointer < NUMERAL_OPTIONS.Length)
{
// select the current numeral
var numeral = NUMERAL_OPTIONS[numeralOptionPointer];
// read numeral -> check if the numeral is used,
// otherwise move on to the next option
if (!strToRead.StartsWith(numeral))
{
numeralOptionPointer++;
continue;
}
// add the vaue of the found numeral
var value = VALUES[numeral];
resultNumber += value;
// remove the letters of the numeral from the string
strToRead = strToRead.Substring(numeral.Length);
}
// if the whole string is read, return the numeral
if (String.IsNullOrEmpty(strToRead))
{
return new RomanNumeral(resultNumber);
}
// string is invalid
return null;
}
```

### Let’s be nice

To harden the code, we could add some extra processing rules:

- An empty string is numeral
`0`

. - The input string should be uppercased.
- The string
`NULLA`

is numeral`0`

. - The character
`U`

can be converted to`V`

. - Some medical recipes will end in a
`J`

instead of an`I`

. Replace this end-character. - Check if the whole string is in the
`VALUES`

array and if so, return that value (small optimization).

Just add the following to the method:

```
if (String.IsNullOrEmpty(str))
{
return new RomanNumeral(0);
}
// upper case the string
var strToRead = str.ToUpper();
// nulla? means nothing 0 wasn't invented yet ;-)
if (strToRead == NULLA)
{
return new RomanNumeral(0);
}
// if ends in J -> replace it to I (used in medicine)
if (strToRead.EndsWith("J"))
{
strToRead = strToRead.Substring(0, strToRead.Length - 1) + "I";
}
// if a U is present, assume a V
strToRead = strToRead.Replace("U", "V");
// check simple numbers directly in dictionary
if (VALUES.ContainsKey(str))
{
return new RomanNumeral(VALUES[str]);
}
```

## How about writing Roman Numerals?

First, we need to distinguish between the two notations – the additive and the subtractive:

```
// subtractive notation uses these numerals
public static readonly string[] SUBTRACTIVE_NOTATION =
{
"M", "CM", "D", "CD", "C", "XC", "L",
"XL", "X", "IX", "V", "IV", "I"
};
// the additive notation uses these numerals
public static readonly string[] ADDITIVE_NOTATION =
{
"M", "D", "C", "L", "X", "V", "I"
};
```

Notice that the values of the numerals in the arrays are both in descending order. With these ordered arrays, we can device the following algorithm:

- Use the array of the notation.
- Traverse each item of the array.
- If a numeral item
*value*is smaller than the number,*subtract*the number and add the numeral to the result string. Try again. - If a numeral does not fit, move on to the next.
- Repeat until the number is
`0`

. - Return the result as the full Roman Numeral.

### ToString() implementation

Converted into code the algorithm will look like this:

```
public override string ToString()
{
return ToString(RomanNumeralNotation.Substractive);
}
public string ToString(RomanNumeralNotation notation)
{
if (Number == 0)
{
return NULLA;
}
// check notation for right set of characters
string[] numerals;
switch (notation)
{
case RomanNumeralNotation.Additive:
numerals = ADDITIVE_NOTATION;
break;
default:
numerals = SUBTRACTIVE_NOTATION;
break;
}
var resultRomanNumeral = "";
// start with the M and iterate back
var position = 0;
// substract till the number is 0
var value = Number;
do
{
var numeral = numerals[position];
var numeralValue = VALUES[numeral];
// check if the value is in the number
if (value >= numeralValue)
{
// substract from the value
value -= numeralValue;
// add the numeral to the string
resultRomanNumeral += numeral;
// subtractive numeral?
// advance position because IVIV does not exist
bool isSubtractiveNumeral = numeral.Length > 1;
if(isSubtractiveNumeral)
{
position++;
}
continue;
}
position++;
}
while (value != 0);
return resultRomanNumeral;
}
```

The advantage of parsing the notation array is how subtractive numerals are handled. `XL`

is parsed before** **`X`

. This prevents the algorithm from reading 10+50 instead of 40. It also has the advantage of discarding wrongly formed numerals.

## Let’s see it works

I’ve created some unit tests to show how the Roman Numeral implementation works:

```
Assert.AreEqual(1910, RomanNumeral.Parse("MDCCCCX").Number);
Assert.AreEqual(1910, RomanNumeral.Parse("MCMX").Number);
Assert.AreEqual("CXVIIII", new RomanNumeral(119).ToString(RomanNumeralNotation.Additive));
Assert.AreEqual("CXIX", new RomanNumeral(119).ToString());
```

## Summary

I’ve shown how one can parse Roman Numerals with an easy C# class. You can download the full class from GitHub and use it in your projects. In the next blog I’ll show you how to implement calculation operators (`+`

, `-`

, `*`

, `/`

, `%`

) and implicit casting (to `int`

and `string`

).