Regex Lookahead

In this article you will learn about Negative Lookahead and positive lookahead assertions in regular expressions, their syntax and usage with examples.

Regular Expression Lookahead assertions are very important in constructing a practical regex. They belong to a group called lookarounds which means looking around your match, i.e. the elements before it or the elements after it. Lookaround consists of lookahead and lookbehind assertions. And as the name shows it means simply checking the element after your match and making a decision. It is basically of two types.

If you want to learn Regex with Simple & Practical Examples, I will suggest you to see this simple and to the point Complete Regex Course with step by step approach & exercises. This video course teaches you the Logic and Philosophy of Regular Expressions from scratch to advanced level.

Explanation

It is used if the next element to your match is as per requirement described in regex. It actually matches for the element or characters but after matching, gives it up and returns only if there is a match or no match hence that is why they are called assertions. They only assert if in a given test string the match with certain conditions is possible or not Yes or No.

These type of assertions match the item which may be a character, characters or a group after the current match. And return their results. If the certain given conditions are fulfilled they give a result.

They are of two types

i. Positive lookahead.

ii. Negative lookahead.

Positive lookahead:

In this type the regex engine searches for a particular element which may be a character or characters or a group after the item matched. If that particular element is present then the regex declares the match as a match otherwise it simply rejects that match.

Syntax

The syntax of positive is like this

/ match (?=element) /

if element follows match then it will be a match otherwise match will technically not be a match and will not be declared as a match. The positive lookahead is a sort of group with parenthesis around it. Within this group the expression starts with a question mark immediately followed by equal sign and then the element to look ahead.

Example

Now look at this expression / a(?=b) / . This regular expression will match an a followed by a b hence it is going to match ab, abc, abz but it will not match ba, bax, bat etc.

Now lets see what happens internally in regex engine to have a better understanding of positive lookahead assertion.

Test string: This is a car.
regex: / a(?=r) /

This regex is going to match that a which is immediately followed by an r.

First of all the regex engine will start searching for an a in the string from left to right. When it matches an a, which is after is in the sentence then the positive lookahead process starts. After matching a the engine enters the positive lookahead and it notes that now it is going to match a positive lookahead. The character inside lookahead is r however in the sentence after the first a is space and not r. This lookahead fails and the engine continues moving in the right direction again looking for an a. It again finds that a after c in car. On matching a regex engine again enters into positive lookahead structure and it knows now lookahead operation is going to be performed. Now with in assertion it looks for r which is there and it is a match. Regex engine will declare this a as a match and it will not match r but only a. Because this is the a which is immediately followed by r.

Now we are going to discuss a more practical application lets suppose you want to match all those USD characters which are followed by some digits. Simply you want to match all those USD characters which are immediately followed by numbers for example you want to match

USD 100

USD 350

USD 12,345

but you don't want to match

USD currency

USD rate

etc etc.

Now the regex for this will be

/ USD(?=\s\d+?,?\d+) / gm

Here first the engine will search for U after finding U upper case the engine will look for S if it finds it will see if it is immediately followed by D. In case of a USD match the engine will enter lookahead and finds that it is a positive lookahead and in this look ahead there is a space followed by an optional number one or more quantifier then an optional comma and after comma there is one or more digits. This regex will match all USD words followed by a number of one or more digits.

Negative Lookahead:

In this type of lookahead the regex engine searches for a particular element which may be a character or characters or a group after the item matched. If that particular element is not present then the regex declares the match as a match otherwise it simply rejects that match.

Syntax

The syntax is

/ match(?!element) / gm

Where match is the item to match and element is the item which should not immediately follow for a successful match. The match will be declared a match if it is not followed by a given element. Thus this pattern helps in matching those items which have a condition of not being immediately followed by a certain character, group of characters or a regex group.

/ a(?!b) / will match all a not followed by b hence it will match
ad, ae, az but will not match ab

Example

Lets suppose you have a list and a part of list is like

S.no    Vehicle    Status
1      car        sold
2        car          not sold
3        car        sold
4        car        Repair

Now out of a detailed list you want to match all those cars which don't have a status of sold they could be anything but sold. The regex for that will be
/ car(?!\s+sold) /
This regex will match all those cars which have any status but sold. In this way you can use regular expressions to match certain words not followed by some elements. These elements could be character, characters or groups.

Features

Regex made Easy for all
High School and College
University level
Professionals doctors, engineers, scientists
Data Analysts