Regex for validating the HTML id AttributeRegex for validating the HTML id Attribute

When we initially started building our test recorder, we needed a way to validate the id attributes being used on the page. We would sometimes capture an id attribute in a recording, only to find that it failed when we used it in a test because it didn’t meet specification. For instance, sometimes websites would use an id with a number in front, like this:

<div id="5-answer"><!-- … --></div>

That is technically invalid, at least in the HTML4 specification:

ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens (“-“), underscores (“_”), colons (“:”), and periods (“.”).

The HTML5 specification is a little more lax. It’s only requirement is that “the value must not contain any space characters.” However, many websites are still technically using HTML4, so sometimes these still need to be validated. Luckily, you can test for these requirements with a simple regular expression. Here’s the function in Javascript:

function isCssIdValid(id) {
  re = /^[A-Za-z]+[\w\-\:\.]*$/
  return re.test(id)
}

First, we make sure that the id starts with a letter, then we ensure the rest of it is an alphanumeric character, an underscore, a dash, a colon or a period. Note that the \w is the equivalent of [A-Za-z0-9_].