Regex for validating the HTML id Attribute

When we initially started building our Chrome extension, we needed a way to validate the id attributes being used on the page. We would sometimes capture an id attribute in a recording, only to find that it failed when we used it in a test because it didn’t meet specification. For instance, sometimes websites would use an id with a number in front, like this:

<div id="5-answer">...</div>

That is technically invalid, at least in the HTML4 specification:

HTML4 Specification
ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".").

The HTML5 specification is a little more lax. It’s only requirement is that “the value must not contain any space characters.” However, many websites are still technically using HTML4, so sometimes these still need to be validated. Luckily, you can test for these requirements with a simple regular expression. Here’s the function in Javascript:

function isCssIdValid (id) {
    re = /^[A-Za-z]+[\w\-\:\.]*$/
    return re.test(id)
}

First, we make sure that the id starts with a letter, then we ensure the rest of it is an alphanumeric character, an underscore, a dash, a colon or a period. Note that the \w is the equivalent of [A-Za-z0-9_].

Justin Klemm

Author: Justin Klemm

Justin is the founder and tech lead at Ghost Inspector. He's a seasoned developer with a passion for innovation. When he's not tinkering with the latest web frameworks, Justin enjoys world traveling, good eats and lots of outdoor activity.

Web | Twitter | Google+