Regex Literal
Characters
In this article you will learn about literal character match in regular expressions. What is search pattern for literals and how a group of literals is matched.
The
simplest match in Regex is a literal match. And out of these literal
matches is a single character literal match. Literal means what it
appears to be or just take the meaning of apparent character or group
of characters.
If you want to learn Regex with Simple & Practical Examples, I will suggest you to see this simple and to the point
Complete Regex Course with step by step approach & exercises. This video course teaches you the Logic and Philosophy of Regular Expressions from scratch to advanced level.
A literal
match can be either single character, two characters or more. If there
are more than one character then it is usually a word. A word need not
be meaningful, like if you want to match car, it is a proper word with
some meaning however lets say a group of words razcyl now this doesn't
have some meaning or proper word, however you will be able to match it.
Regex is not a language expert. It is based on characters and
combination of characters.
When
you try to match a literal character the regex engine matches the first
occurence of that character in the string and returns a match in
standard mode. However, in global mode it will match all the occurences
of the literal character or group of characters. Important thing to
note is that regex engine starts matching from left to right. If it
finds the character to match at the first position in the string, it
will stop the search and return the results even if the string consists
of hundreds or thousands of lines. It will simply match the first
occurence and then will stop searching.
Lets say you want to match the first alphabet of English Language a in
a test string.
This is my car. I like it very much.
When
the regex engine runs, it will match a in the word car. Now a is in the
center of word car but it does not matter to regex. If you want to
match a single character, as a standalone match you will have to use
word boundaries by applying \b before and after the character you want
to match.
Literal Character Search
Pattern:
A little explanation
of how regex works: If the literal character is a which you want to
search in text string
This is my car. I like it very much.
The
regex engine starts with checking or matching the first character in
the string from left. It checks the first character which is T a
capital T in the word This. Ofcourse this is not a match. Now the regex
engine moves to next character which is h. Again it is not a match and
the regex engine proceeds to next character i, still no match it moves
to next character which is s and no match. Regex engine keeps on
searching all through the string. After the word s there is space. A
space is called white character and it is treated like any other
character in regular expressions, this is not a match and regex engine
moves to i,s, space,m,y,space and c. After c it moves further in the
left direction and finds the character a and wow! here is the match.
The regex engine happily reports there is a match of your literal
character a. Regex engine usually also reports the position of match.
After this match the engine ignores all the remaining test string. This
is how a regex engine works, character by character from left to right.
You can make a regex engine to search further, for a, in the
string. But this is usually done with global mode or some code in the
specific programming language usually known as the flavor of regex in
which you are working. You would be aware of search next in text
editors.
Group
of Literal Characters Search:
If
you want to match a group of characters instead of a single character
the regex engine will do it for you. Lets say you want to match car in
the string. Now the regex engine first will search for c in the string.
It will not search for whole word car, but will start with the first
character out of group of characters. It will keep looking for c in the
test string and when it will find c, it will check the next character a
i.e. if c is immediately followed by a. If c is followed by a it will
say ok and search for next character r. In case c is not followed by a,
it will move further and again start searching for c again. If c is
immediately followed by a then it will check if a is immediately
followed by r. If a is note followed by r, the regex engine will move
in the right direction and will start searching for c again. In case a
is followed by r the engine will declare it a full match and will stop
its search and will not proceed towards right in the string.
Hence car means match c, which is immediately followed by a,
which is immediately followed by r. If these three conditions are met
only in that case it is a complete match. Now for global mode the regex
will search for all occurences of car in the test string. But the
search for each occurence of car will be same as described here and
search for each character will be same as discussed earlier.
One
thing to note here is that a literal search is case sensitive. car will
match car only and it will not match Car. You can make car to match Car
or caR or cAr or CAR by simply using case insensitive flag in regex but
by default a regex search is case sensitive.
For characters
from other languages, either you can write the characters and they will
just work for any English alphabets or you may use unicode in
your regex for sepecial characters.
Here are some examples of literal character matches.
Regex pattern: / e /
Test string: This is my pen.
e will match e in the word pen here
Regex pattern: / i /
Test string: Here is a list of things for election.
i will match first occurence of i, that is in is
Regex pattern: / car /
Test string: This is my car. I like this car. A car is for
transportation.
car will match first occurence of car after word my
Now
I hope you know what are literal characters in regex. How a single or a
group of literal characters are matched how a regular expression engine
works. These things might seem trivial but they are very important in
setting the foundation of regex. Complex regex patterns are simply made
from these simple characters.