Recently I had the pleasure of building a calculator example exercise. Begin a good programmer I used the some HTML entities as values on the buttons: ×, ÷ and ± as values. It turned out to be quite difficult to parse them with native JavaScript. It is not so hard with LoDash or jQuery, but I wanted to do it native.
Parse entity
I ended up using the following script I got from a StackOverflow answer:
var PLUSMINUS = getHtmlEntityString('±')
var DIVIDE = getHtmlEntityString('÷')
var TIMES = getHtmlEntityString('×')
function getHtmlEntityString(str) {
let d = document.createElement("div")
d.innerHTML = str
return d.textContent || d.innerText
}
Ouch!
Encode HTML
Next step is doing some encoding on HTML writes. I ended up borrowing some of the code of js-htmlencode:
const htmlEncoders = [
[/&/g, "&"],
[/"/g, """],
[/'/g, "'"],
[/</g, "<"],
[/>/g, ">"],
]
let htmlEncode = str =>
htmlEncoders.reduce(
(str, enc) => str.replace(enc[0], enc[1]),
str
)
It uses an arrow function and can be called just like any function:
function writeTag(tag, contents) {
tag = htmlEncode(tag)
contents = htmlEncode(contents)
document.write(`<${tag}>${contents}</${tag}>`)
}
Native JavaScript, no libs needed.
Replace Non Alpha-Numeric chars
Sometimes you'll need to replace any non alpha-numeric characters in a string. Again, we can use a regular expression. Don't forget to escape the replacement string:
function replaceNonAlpha(str, replaceBy = "_") {
replaceBy = replaceBy.replace(
/[.*+?^${}()|[\]\\]/g,
"\\$&"
)
let r = new RegExp(`(\\W|${replaceBy})+`, "g")
return str.replace(r, replaceBy)
}
Here are some results:
console.log(replaceNonAlpha("replace me"))
/* renders: replace_me */
console.log(replaceNonAlpha("replace me!"))
/* renders: replace_me_ */
console.log(replaceNonAlpha("what!? #7 is aweful!"))
/* renders: what_7_is_aweful_ */
console.log(replaceNonAlpha("hey, _ is alpha"))
/* renders: hey_is_alpha */
console.log(replaceNonAlpha("één te veel"))
/* renders: _te_veel */
console.log(
replaceNonAlpha(
"één te veel"
.normalize("NFD")
.replace(/[\u0300-\u036f]/g, "")
)
)
/* renders: een_te_veel */
Note \W
means: matches any character that is not a word character from the basic Latin alphabet (source: MDN Character Classes). So you might need to do some extra work if you don't want to replace characters like é.
Changelog
2014-10-29 Initial article
2021-11-22 Added the Encode HTML and Replace Non Alpha-Numeric chars sections. Improved code from the original.