# Little life saver: parsing HTML entities

**Date:** 2014-10-29  
**Author:** Kees C. Bakker  
**Categories:** JavaScript  
**Tags:** Html  
**Original:** https://keestalkstech.com/little-life-saver-parsing-html-entities/

![Little life saver: parsing HTML entities](https://keestalkstech.com/wp-content/uploads/2014/10/photo-1416339442236-8ceb164046f8.jpg)

---

Recently I had the pleasure of building a calculator example exercise. Being a good programmer, I used HTML entities as values on the buttons: ×, ÷ and ±. It turned out to be quite difficult to parse them with native JavaScript. It is not so hard with Lodash or jQuery, but I wanted to do it *native*.

## Parse entity

I ended up using the following script I got from a [StackOverflow answer](http://stackoverflow.com/a/23649560/201482):

```js
var PLUSMINUS = getHtmlEntityString('&plusmn;')
var DIVIDE = getHtmlEntityString('&divide;')
var TIMES = getHtmlEntityString('&times;')

function getHtmlEntityString(str) {
    let d = document.createElement("div")
    d.innerHTML = str
    return d.textContent || d.innerText
}
```

Ouch!

## Encode HTML

Next step is doing some encoding on HTML writes. I ended up borrowing some of the code of [js-htmlencode](https://github.com/emn178/js-htmlencode/blob/master/src/htmlencode.js#L295-L298):

```js
const htmlEncoders = [
    [/&/g, "&amp;"],
    [/"/g, "&quot;"],
    [/'/g, "&#39;"],
    [/</g, "&lt;"],
    [/>/g, "&gt;"],
]

const htmlEncode = str =>
    htmlEncoders.reduce(
        (str, enc) => str.replace(enc[0], enc[1]),
        str
)
```

It uses an [arrow function](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions/Arrow_functions) and can be called just like any function:

```js
function writeTag(tag, contents) {
  tag = htmlEncode(tag)
  contents = htmlEncode(contents)
  document.write(`<${tag}>${contents}</${tag}>`)
}
```

Native JavaScript, no libs needed.

## Replace Non Alpha-Numeric chars

Sometimes you'll need to replace any non alpha-numeric characters in a string. Again, we can use a regular expression. Don't forget to escape the replacement string:

```js
function replaceNonAlpha(str, replaceBy = "_") {
  replaceBy = replaceBy.replace(
    /[.*+?^${}()|[\]\\]/g,
    "\\$&"
  )
  let r = new RegExp(`(\\W|${replaceBy})+`, "g")
  return str.replace(r, replaceBy)
}
```

Here are some results:

```js
console.log(replaceNonAlpha("replace me"))
/* renders: replace_me */

console.log(replaceNonAlpha("replace me!"))
/* renders: replace_me_ */

console.log(replaceNonAlpha("what!? #7 is aweful!"))
/* renders: what_7_is_aweful_ */

console.log(replaceNonAlpha("hey, _ is alpha"))
/* renders: hey_is_alpha */

console.log(replaceNonAlpha("één te veel"))
/* renders: _te_veel */

console.log(
  replaceNonAlpha(
    "één te veel"
      .normalize("NFD")
      .replace(/[\u0300-\u036f]/g, "")
  )
)
/* renders: een_te_veel */
```

Note `\W` means: *matches any character that is not a word character from the basic Latin alphabet* (source: [MDN Character Classes](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Character_Classes)). So you might need [to do some extra work](https://stackoverflow.com/a/37511463/201482) if you don't want to replace characters like *é*.

## Changelog

2021-11-22 Added the [Encode HTML](#encode-html) and [Replace Non Alpha-Numeric](#replace-non-alpha-numeric-chars) chars sections. Improved code from the original.
2014-10-29 Initial article.
