JavaScript > Regular Expressions > RegExp Patterns > Character classes

Using Character Classes in JavaScript Regular Expressions

This snippet demonstrates how to use character classes in JavaScript regular expressions to match specific sets of characters within a string. Character classes provide a concise way to define character sets, making regular expressions more readable and efficient.

Introduction to Character Classes

Character classes are a fundamental part of regular expressions. They allow you to define a set of characters that you want to match. Instead of specifying each character individually, you can use a character class to represent the entire set. This makes your regular expressions more concise and easier to understand. We will cover several commonly used character classes and demonstrate how they work in JavaScript.

Matching Digits (\d)

The \d character class matches any digit (0-9). The + quantifier means 'one or more occurrences'. The g flag ensures that all matches in the string are returned. This code finds all sequences of digits within the input string.

const text = 'There are 123 apples and 456 oranges.';
const regex = /\d+/g;
const matches = text.match(regex);
console.log(matches); // Output: ['123', '456']

Matching Non-Digits (\D)

The \D character class matches any character that is not a digit. In this example, the regular expression extracts all sequences of non-digit characters.

const text = 'Room 42 is on floor 2.';
const regex = /\D+/g;
const matches = text.match(regex);
console.log(matches); // Output: ['Room ', ' is on floor ']

Matching Whitespace (\s)

The \s character class matches any whitespace character, including spaces, tabs (\t), newlines (\n), and carriage returns (\r). This example extracts all whitespace sequences.

const text = 'Hello\t World\n!';
const regex = /\s+/g;
const matches = text.match(regex);
console.log(matches); // Output: ['\t', '\n']

Matching Non-Whitespace (\S)

The \S character class matches any character that is not a whitespace character. Here, it extracts the words 'Hello', 'World', and '!'.

const text = 'Hello  World !';
const regex = /\S+/g;
const matches = text.match(regex);
console.log(matches); // Output: ['Hello', 'World', '!']

Matching Word Characters (\w)

The \w character class matches any word character (alphanumeric characters and underscores). It's equivalent to [a-zA-Z0-9_]. The example splits the sentence into individual words.

const text = 'The quick brown fox jumps over the lazy dog.';
const regex = /\w+/g;
const matches = text.match(regex);
console.log(matches); // Output: ['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']

Matching Non-Word Characters (\W)

The \W character class matches any character that is not a word character. This example extracts the punctuation and space characters.

const text = 'Hello, world!';
const regex = /\W+/g;
const matches = text.match(regex);
console.log(matches); // Output: [', ', '!']

Matching Any Character (.)

The . (dot) character class matches any character except newline characters (\n, \r). When the 'g' flag is used it splits the string in 2 parts because there is a '\n'.

const text = 'abc\ndef';
const regex = /.+/g;
const matches = text.match(regex);
console.log(matches); // Output: ['abc', 'def']

Character Classes in Brackets ([])

You can define your own character classes using square brackets []. For example, [0-9-] matches any digit or a hyphen. This is useful for matching specific patterns not covered by the predefined character classes.

const text = 'My phone number is 555-123-4567.';
const regex = /[0-9-]+/g;
const matches = text.match(regex);
console.log(matches); // Output: ['555-123-4567']

Real-Life Use Case: Validating User Input

Character classes are frequently used to validate user input in forms. For example, you can use /^[0-9]+$/ to ensure that a field only contains digits or /^[a-zA-Z\s]+$/ to allow only letters and spaces in a name field. They are also beneficial when sanitizing data.

Best Practices

Use character classes to simplify complex regular expressions.
Choose the most specific character class for your needs. Using \d instead of . when expecting a digit improves readability and performance.
Combine character classes with quantifiers for flexible matching.
Test your regular expressions thoroughly with different inputs.

Interview Tip

Be prepared to explain the common character classes (\d, \D, \s, \S, \w, \W, .) and how to define custom character classes using brackets ([]). Practice writing regular expressions using character classes to solve common problems like validating email addresses or phone numbers.

When to Use Them

Use character classes when you need to match a set of characters that share a common property (e.g., all digits, all whitespace characters, all alphanumeric characters). They are particularly useful when you need to validate data, parse text, or extract specific information from strings.

Memory Footprint

Character classes themselves don't typically introduce a significant memory footprint. However, complex regular expressions with many character classes and quantifiers can consume more memory, especially when used on very large strings. Keep your regular expressions as simple and efficient as possible to minimize memory usage.

Alternatives

While character classes are powerful, you could also use explicit character lists within brackets (e.g., [abc] instead of [a-c] or separate /a|b|c/) . However, character classes are generally more concise and readable, especially for common character sets. String manipulation functions like substring and indexOf can be alternatives for very simple matching scenarios, but regular expressions with character classes offer more flexibility and power.

Pros

Conciseness: Character classes represent entire sets of characters with single characters.
Readability: They make regular expressions easier to understand.
Efficiency: They are often more efficient than explicitly listing all characters in a set.

Cons

Complexity: Complex regular expressions using many character classes can be difficult to debug.
Potential for Overuse: Sometimes, simpler string manipulation techniques might be more appropriate than complex regular expressions.

← Understanding RegExp Flags in JavaScript Using exec() to Find Matches in a String →

FAQ

What is the difference between \d and [0-9]?

In most cases, \d and [0-9] are equivalent. However, \d is often more concise and can be affected by Unicode settings in some regular expression engines.
How do I negate a character class?

You can negate a character class using the caret (^) inside the square brackets. For example, [^0-9] matches any character that is not a digit. Note that [^\d] is equivalent to \D.

Asynchronous JavaScript

Browser APIs

DOM Manipulation

Error Handling

ES6 and Beyond

Events

Functions

JavaScript Fundamentals

JSON and Data Formats

Objects and Arrays

Performance Optimization

Prototypes and Inheritance

Regular Expressions

Security

Testing and Debugging

TypeScript