JavaScript > Regular Expressions > RegExp Patterns > Character classes
Using Character Classes in JavaScript Regular Expressions
This snippet demonstrates how to use character classes in JavaScript regular expressions to match specific sets of characters within a string. Character classes provide a concise way to define character sets, making regular expressions more readable and efficient.
Introduction to Character Classes
Character classes are a fundamental part of regular expressions. They allow you to define a set of characters that you want to match. Instead of specifying each character individually, you can use a character class to represent the entire set. This makes your regular expressions more concise and easier to understand. We will cover several commonly used character classes and demonstrate how they work in JavaScript.
Matching Digits (\d)
The \d
character class matches any digit (0-9). The +
quantifier means 'one or more occurrences'. The g
flag ensures that all matches in the string are returned. This code finds all sequences of digits within the input string.
const text = 'There are 123 apples and 456 oranges.';
const regex = /\d+/g;
const matches = text.match(regex);
console.log(matches); // Output: ['123', '456']
Matching Non-Digits (\D)
The \D
character class matches any character that is not a digit. In this example, the regular expression extracts all sequences of non-digit characters.
const text = 'Room 42 is on floor 2.';
const regex = /\D+/g;
const matches = text.match(regex);
console.log(matches); // Output: ['Room ', ' is on floor ']
Matching Whitespace (\s)
The \s
character class matches any whitespace character, including spaces, tabs (\t
), newlines (\n
), and carriage returns (\r
). This example extracts all whitespace sequences.
const text = 'Hello\t World\n!';
const regex = /\s+/g;
const matches = text.match(regex);
console.log(matches); // Output: ['\t', '\n']
Matching Non-Whitespace (\S)
The \S
character class matches any character that is not a whitespace character. Here, it extracts the words 'Hello', 'World', and '!'.
const text = 'Hello World !';
const regex = /\S+/g;
const matches = text.match(regex);
console.log(matches); // Output: ['Hello', 'World', '!']
Matching Word Characters (\w)
The \w
character class matches any word character (alphanumeric characters and underscores). It's equivalent to [a-zA-Z0-9_]
. The example splits the sentence into individual words.
const text = 'The quick brown fox jumps over the lazy dog.';
const regex = /\w+/g;
const matches = text.match(regex);
console.log(matches); // Output: ['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']
Matching Non-Word Characters (\W)
The \W
character class matches any character that is not a word character. This example extracts the punctuation and space characters.
const text = 'Hello, world!';
const regex = /\W+/g;
const matches = text.match(regex);
console.log(matches); // Output: [', ', '!']
Matching Any Character (.)
The .
(dot) character class matches any character except newline characters (\n
, \r
). When the 'g' flag is used it splits the string in 2 parts because there is a '\n'.
const text = 'abc\ndef';
const regex = /.+/g;
const matches = text.match(regex);
console.log(matches); // Output: ['abc', 'def']
Character Classes in Brackets ([])
You can define your own character classes using square brackets []
. For example, [0-9-]
matches any digit or a hyphen. This is useful for matching specific patterns not covered by the predefined character classes.
const text = 'My phone number is 555-123-4567.';
const regex = /[0-9-]+/g;
const matches = text.match(regex);
console.log(matches); // Output: ['555-123-4567']
Real-Life Use Case: Validating User Input
Character classes are frequently used to validate user input in forms. For example, you can use /^[0-9]+$/
to ensure that a field only contains digits or /^[a-zA-Z\s]+$/
to allow only letters and spaces in a name field. They are also beneficial when sanitizing data.
Best Practices
\d
instead of .
when expecting a digit improves readability and performance.
Interview Tip
Be prepared to explain the common character classes (\d
, \D
, \s
, \S
, \w
, \W
, .
) and how to define custom character classes using brackets ([]
). Practice writing regular expressions using character classes to solve common problems like validating email addresses or phone numbers.
When to Use Them
Use character classes when you need to match a set of characters that share a common property (e.g., all digits, all whitespace characters, all alphanumeric characters). They are particularly useful when you need to validate data, parse text, or extract specific information from strings.
Memory Footprint
Character classes themselves don't typically introduce a significant memory footprint. However, complex regular expressions with many character classes and quantifiers can consume more memory, especially when used on very large strings. Keep your regular expressions as simple and efficient as possible to minimize memory usage.
Alternatives
While character classes are powerful, you could also use explicit character lists within brackets (e.g., [abc]
instead of [a-c]
or separate /a|b|c/
) . However, character classes are generally more concise and readable, especially for common character sets. String manipulation functions like substring
and indexOf
can be alternatives for very simple matching scenarios, but regular expressions with character classes offer more flexibility and power.
Pros
Cons
FAQ
-
What is the difference between \d and [0-9]?
In most cases,\d
and[0-9]
are equivalent. However,\d
is often more concise and can be affected by Unicode settings in some regular expression engines. -
How do I negate a character class?
You can negate a character class using the caret (^
) inside the square brackets. For example,[^0-9]
matches any character that is not a digit. Note that[^\d]
is equivalent to\D
.