Regular Expression Groups: .NET vs JavaScript

As a developer, I love to solve common string problems with regular expression. Sure, they are sometimes hard to read, but you can do so much with such a small expression! It is nice that many languages have support for them, but sometimes it feels like every language creates its own dialect. Let's look at the difference between regular expression groups in .NET and JavaScript.

Example string

Things work better with an example, so let's use the following HTML:

<ul>
  <li><a href="https://images.wehkamp.nl/i/wehkamp/16375003_eb_02/1.jpg">Style 1</a></li>
  <li><a href="https://images.wehkamp.nl/i/wehkamp/16374993_eb_02/2.jpg">Style 2</a></li>
  <li><a href="https://images.wehkamp.nl/i/wehkamp/16374973_eb_01/3.jpg">Style 3</a></li>
</ul>

We will parse the HTML and add the link as an image to the HTML. We'll also add the text of the link as title and alt attributes.

.NET: Regex replace

In .NET we use the Regex class. Let's assume the HTML code is in the variable input:

var input = "{{the html code}}";
var regex = new System.Text.RegularExpressions.Regex("<a href=\"(.*?)\">(.*?)</a>");
var replacement = "<a href=\"$1\"><img src=\"$1\" alt=\"$2\" title=\"$2\" /></a>";
var output = regex.Replace(input, replacement);

JavaScript: RegExp replace

In JavaScript, the regular expression class is called RegExp. We will use the literal notation, which is smaller. Again, let's assume the HTML code is in the input variable:

var input = "{{the html code}}";
var regex = /<a href="(.*?)">(.*?)<\/a>/g;
var replacement = '<a href="$1"><img src="$1" alt="$2" title="$2" /></a>';
var output = input.replace(regex, replacement);

So... what's the difference? Named Groups?

There is not so much difference when you stay close to the general regular expression syntax (as implemented by most languages). So what about Named Groups in .NET? You can give your groups a name and use it in your replacement string. They are used quite frequently by .NET developers! The code looks like this:

var input = "{{the html code}}";
var regex = new System.Text.RegularExpressions.Regex("<a href=\"(?<link>.*?)\">(?<text>.*?)</a>");
var replacement = "<a href=\"${link}\"><img src=\"${link}\" alt=\"${text}\" title=\"${text}\" /></a>";
var output = regex.Replace(input, replacement);

Can we use them in JavaScript? Sure!

var input = "{{the html code}}";
var regex = /<a href="(?<link>.*?)">(?<text>.*?)<\/a>/g
var replacement = '<a href="$<link>"><img src="$<link>" alt="$<text>" title="$<text>" /></a>';
var output = input.replace(regex, replacement);

Instead of ${name} we do $<name> in JavaScript.

1 difference: named index groups

I found 1 tiny, pesky difference, and that's the Named Index Groups. In our first .NET example we've used parenthesis to identify to identify the group. The first group was captured to the $1 and the second group was captured to the $2. In .NET you can override this pattern. Technically, this code is correct:

var input = "{{the html code}}";
var regex = new System.Text.RegularExpressions.Regex("<a href=\"(?<2>.*?)\">(?<1>.*?)</a>");
var replacement = "<a href=\"$2\"><img src=\"$2\" alt=\"$1\" title=\"$1\" /></a>";
var output = regex.Replace(input, replacement);

This is handy if you need to match a group, but want to skip it upon replacement. JavaScript does not have this feature. So be aware of this when you bring your .NET regular expression magic powers to JavaScript!

Final thoughts

So JavaScript and .NET are very similar in the aspect of regular expression groups. Who would have thought?

The original article was written in 2010 and in that article I used a specific (?<1>.*?) group, which was the main problem. Upon revisiting this article 10 years later, I discovered the problem with these groups and decided to rewrite the article. It turns out that JavaScript and .NET regular expressions are not so different after all.

Got to say: after all these years I still love regular expressions!

Improvements

2020-06-05 rewrote the article using the latest JavaScript code and .NET Core 3.1.
2010-11-14 initial article.

Kees C. Bakker says:

2014-02-15 at 22:36

Regex is a great tool, but it can get complicated very quickly! Check http://stackoverflow.com/questions/800813/what-is-the-most-difficult-challenging-regular-expression-you-have-ever-written

1. Kees C. Bakker says:
  
  2014-02-15 at 23:08
  
  More about regex complexity can be found here: http://stackoverflow.com/questions/4378455/what-is-the-complexity-of-regular-expression
  
  1. Kees C. Bakker says:
    
    2014-02-15 at 23:16
    
    Okay… you got me… I’m testing comments on my blog :P
    
  2. Kees C. Bakker says:
    
    2014-02-15 at 23:18
    
    Hope it is not annoying!
    
2. Kees C. Bakker says:
  
  2014-02-15 at 23:19
  
  Some people would call it link-spamming!