JavaScript RegExp (?!…): Negative Lookahead Explained

The negative lookahead (?!...) is an advanced feature in JavaScript Regular Expressions that allows you to match a pattern only if it is not followed by a specific sequence of characters. It’s a non-capturing group, meaning it doesn’t capture the matched text, but only asserts whether a match is possible. This is particularly useful for refining your pattern matching and excluding unwanted matches.

Purpose of Negative Lookahead

The main purpose of a negative lookahead is to ensure that a certain pattern does not exist immediately after the current match. This allows for more precise and flexible pattern matching, especially when you need to exclude specific cases.

Syntax

The syntax for negative lookahead in JavaScript RegExp is:

(?!pattern)

Where pattern is the sequence of characters that you want to ensure does not follow the preceding part of your regular expression.

Examples

Let’s dive into some examples to illustrate how negative lookahead works in practice.

Basic Example: Matching Words Not Followed by “ing”

This example demonstrates how to match words that are not followed by the suffix “ing”.

const text1 = "run running jump jumping";
const regex1 = /\b\w+(?!ing\b)\b/g;
const matches1 = text1.match(regex1);

console.log(matches1);

Output:

['run', 'jump']

In this example:

  • \b asserts a word boundary.
  • \w+ matches one or more word characters.
  • (?!ing\b) is the negative lookahead, ensuring that the matched word is not followed by “ing” and another word boundary.
  • g flag makes the search global, finding all matches.

Matching Filenames Without a Specific Extension

This example shows how to match filenames that do not have a “.txt” extension.

const text2 = "document.pdf report.txt image.jpg";
const regex2 = /\b\w+(?!\.txt\b)\.\w+\b/g;
const matches2 = text2.match(regex2);

console.log(matches2);

Output:

['document.pdf', 'image.jpg']

In this example:

  • \b asserts a word boundary.
  • \w+ matches one or more word characters (the filename).
  • (?!\.txt\b) is the negative lookahead, ensuring the filename is not followed by “.txt” and a word boundary.
  • \.\w+ matches the file extension (e.g., “.pdf”, “.jpg”).
  • \b asserts another word boundary.

Excluding Specific Patterns in Email Validation

This example illustrates how to exclude specific patterns when validating email addresses.

const text3 = "[email protected] [email protected]";
const regex3 = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.(?!invalid$)[a-zA-Z]{2,}$/g;
const matches3 = text3.match(regex3);

console.log(matches3);

Output:

['[email protected]']

Here:

  • ^[a-zA-Z0-9._%+-]+ matches the username part of the email.
  • @ matches the “@” symbol.
  • [a-zA-Z0-9.-]+ matches the domain name.
  • \. matches the dot before the domain extension.
  • (?!invalid$) is the negative lookahead, ensuring the domain extension is not “invalid” at the end of the string.
  • [a-zA-Z]{2,}$ matches the domain extension (at least 2 characters).

Password Validation: Excluding Common Words

This example demonstrates how to validate a password, excluding common words to improve security.

const text4 = "P@ssword123 MyPassword WeakPassword";
const regex4 = /^(?!.*(?:password|12345)).{8,}$/g;
const matches4 = text4.match(regex4);

console.log(matches4);

Output:

['P@ssword123', 'MyPassword']

In this example:

  • ^ asserts the start of the string.
  • (?!.*(?:password|12345)) is the negative lookahead, ensuring the password does not contain “password” or “12345”.
  • .{8,} matches at least 8 characters.
  • $ asserts the end of the string.

Matching HTML Tags Without Specific Attributes

This example shows how to match HTML tags that do not contain a specific attribute.

const text5 = '<div class="container"> <span id="mySpan"> <p> ';
const regex5 = /<(?!\w+\s+id=).+?>/g;
const matches5 = text5.match(regex5);

console.log(matches5);

Output:

['<div class="container">', '<p>']

Here:

  • < matches the opening angle bracket of an HTML tag.
  • (?!\w+\s+id=) is the negative lookahead, ensuring the tag does not contain an id attribute.
  • .+? matches any characters within the tag.
  • > matches the closing angle bracket of an HTML tag.

Advanced Example: Data Extraction with Exclusions

Consider extracting data from a log file, excluding entries with specific error codes.

const logData = `
  [INFO] 2023-01-01: System started
  [ERROR 404] 2023-01-01: File not found
  [INFO] 2023-01-02: User logged in
  [ERROR 500] 2023-01-02: Server error
`;

const regex6 = /^(?!.*ERROR 404).*$/gm;
const matches6 = logData.match(regex6);

console.log(matches6);

Output:

[
  '\n  [INFO] 2023-01-01: System started',
  '\n  [INFO] 2023-01-02: User logged in',
  '  [ERROR 500] 2023-01-02: Server error',
  ''
]

In this example:

  • ^ asserts the start of each line (due to the m flag).
  • (?!.*ERROR 404) is the negative lookahead, ensuring the line does not contain “ERROR 404”.
  • .*$ matches the entire line.
  • g flag makes the search global, finding all matches.
  • m flag enables multiline mode, so ^ and $ match the start and end of each line.

Tips and Best Practices

  • Specificity: Ensure your negative lookahead pattern is specific enough to avoid unintended exclusions.
  • Performance: Complex negative lookaheads can impact performance, so test and optimize as needed.
  • Readability: Use comments to explain complex regular expressions, especially those with negative lookaheads.
  • Testing: Always test your regular expressions with a variety of inputs to ensure they work as expected.

Conclusion

The negative lookahead (?!...) is a powerful tool for refining your JavaScript Regular Expressions. By ensuring that a pattern is not followed by a specific sequence, you can create more precise and flexible matching rules. Understanding and utilizing negative lookaheads can significantly enhance your ability to extract, validate, and manipulate text using regular expressions.