# Regular Expression Groups in PowerShell (for .NET people)

**Date:** 2010-11-17  
**Author:** Kees C. Bakker  
**Categories:** .NET / C#, PowerShell  
**Original:** https://keestalkstech.com/regex-replacement-problem-powershell/

![Aerial photo of a dam.](https://keestalkstech.com/wp-content/uploads/2010/11/john-gibbons-E2TVn-NpCU4-unsplash-scaled-scaled.jpg)

---

PowerShell is very similar to .NET, so it is no surprise that it is very popular with .NET developers. It is a language for writing scripts, so you might encounter some unexpected situations. I had this experience when I tried to parse some HTML with PowerShell: I could not get the replacement with regular expression groups to work! It turned out that my .NET knowledge was working against me...

## TL;DR

When creating regular expression or replacement string, use single quoted strings and you'll avoid a world of pain! Also make sure you use the proper regular expression options.

## Let's create some data

Anything is better with an example, *so let's use PowerShell to download a blog and extract the article content using a regular expression*. First, we'll download a blog into a string, like this:

```powershell
$ErrorActionPreference = "Stop"

# download article
$domain = "https://keestalkstech.com"
$url = "$domain/2020/05/plotting-a-grid-of-pil-images-in-jupyter/"
$article = Invoke-WebRequest $url -UseBasicParsing

# simple convert to string :-)
$article = "$article"
```

## Fail on the first try

My first attempt was the following code:

```powershell
$article = $article -replace ".*<article.*?>\s*(.*)\s*<\/article>.*", "$1"
```

It compiles. It looks good to me as a .NET developer... but... it does not do anything! I end up with exactly the same string I had...

## Regular expression options: (?s)

First, we need to understand the way regular expression matching works in PowerShell: the default mode is that `.` will not match new lines. To change the [behavior](https://docs.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-options) into *single line mode*, you can specify the `(?s)` to your expression, like this:

```powershell
$article = $article -replace "(?s).*<article.*?>\s*(.*)\s*<\/article>.*", "$1"
```

Again: it compiles. It looks good to me... but now I end up with an empty string! ?

## Quotation matters!

The main problem has to do with [quotation](https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_quoting_rules?view=powershell-7). To me as a .NET developer the double quote is a string (`"hello"`) and the single quote a char (`'c'`). But to a PowerShell developer, a double quote means a string that supports variable replacement: `"hello $name"`. We do not have a variable named `$1`, so that's why our article is replaced by an empty string.

The following is more PowerShell-esque and actually works:

```powershell
$article = $article -replace '(?s).*<article.*?>\s*(.*)\s*<\/article>.*', '$1'
```

### But I love my double quotes...

If you are adamant on using double quotes, you must [escape](https://ss64.com/ps/syntax-esc.html#quotes) your dollar signs with a ```:

```powershell
$article = $article -replace "(?s).*<article.*?>\s*(.*)\s*<\/article>.*", "`$1"
```

### What about named capture replacement?

Sometimes named group captures improve readability of your code. As they also the dollar-sign, they "suffer" from the same problem, so use single quotes:

```powershell
$article = $article -replace '(?s).*<article.*?>\s*(?<content>.*)\s*<\/article>.*', '${content}'
```

Did you know that `${1}` also works?

## Final thoughts

Don't assume PowerShell and .NET are the same! Scripting-needs differ from application-programming-needs. To be on the safe side: use single quotes and your regular expression groups will work fine in PowerShell.

Funnily enough it was not the first time I had problems with regular expressions that looked similar to .NET; [read this article about regular expression groups in JavaScript](https://keestalkstech.com/2010/11/how-to-use-regex-groups-in-javascript/).

## Improvements

2020-06-06: rewrote the article to reflect the problems with quotation and new-line matching.
