Little life saver: parsing HTML entities

Recently I had the pleasure of building a calculator example exercise. Begin a good programmer I used the some HTML entities as values on the buttons: ×, ÷ and ± as values. It turned out to be quite difficult to parse them with native JavaScript. It is not so hard with LoDash or jQuery, but I wanted to do it native.

Parse entity

I ended up using the following script I got from a StackOverflow answer:

var PLUSMINUS = getHtmlEntityString('±')
var DIVIDE = getHtmlEntityString('÷')
var TIMES = getHtmlEntityString('×')

function getHtmlEntityString(str) {
    let d = document.createElement("div")
    d.innerHTML = str
    return d.textContent || d.innerText
}

Ouch!

Encode HTML

Next step is doing some encoding on HTML writes. I ended up borrowing some of the code of js-htmlencode:

const htmlEncoders = [
    [/&/g, "&"],
    [/"/g, """],
    [/'/g, "'"],
    [/</g, "&lt;"],
    [/>/g, "&gt;"],
]


let htmlEncode = str =>
    htmlEncoders.reduce(
        (str, enc) => str.replace(enc[0], enc[1]),
        str
)

It uses an arrow function and can be called just like any function:

function writeTag(tag, contents) {
  tag = htmlEncode(tag)
  contents = htmlEncode(contents)
  document.write(`<${tag}>${contents}</${tag}>`)
}

Native JavaScript, no libs needed.

Replace Non Alpha-Numeric chars

Sometimes you'll need to replace any non alpha-numeric characters in a string. Again, we can use a regular expression. Don't forget to escape the replacement string:

function replaceNonAlpha(str, replaceBy = "_") {
  replaceBy = replaceBy.replace(
    /[.*+?^${}()|[\]\\]/g,
    "\\$&"
  )
  let r = new RegExp(`(\\W|${replaceBy})+`, "g")
  return str.replace(r, replaceBy)
}

Here are some results:

console.log(replaceNonAlpha("replace me"))
/* renders: replace_me */

console.log(replaceNonAlpha("replace me!"))
/* renders: replace_me_ */

console.log(replaceNonAlpha("what!? #7 is aweful!"))
/* renders: what_7_is_aweful_ */

console.log(replaceNonAlpha("hey, _ is alpha"))
/* renders: hey_is_alpha */

console.log(replaceNonAlpha("één te veel"))
/* renders: _te_veel */

console.log(
  replaceNonAlpha(
    "één te veel"
      .normalize("NFD")
      .replace(/[\u0300-\u036f]/g, "")
  )
)
/* renders: een_te_veel */

Note \W means: matches any character that is not a word character from the basic Latin alphabet (source: MDN Character Classes). So you might need to do some extra work if you don't want to replace characters like é.

Changelog

2014-10-29 Initial article
2021-11-22 Added the Encode HTML and Replace Non Alpha-Numeric chars sections. Improved code from the original.

expand_less