Optional Items (Page 12)
The question mark (?
) makes the preceding token in a regular expression optional. This means that the regex engine will try to match the token if it is present, but it won’t fail if the token is absent.
Basic Usage
For example:
colou?r
This pattern matches both "colour" and "color." The u
is optional due to the question mark.
You can make multiple tokens optional by grouping them with round brackets and placing a question mark after the closing bracket:
Nov(ember)?
This regex matches both "Nov" and "November."
You can use multiple optional groups to match more complex patterns. For instance:
Feb(ruary)? 23(rd)?
This pattern matches:
- "February 23rd"
- "February 23"
- "Feb 23rd"
- "Feb 23"
Important Concept: Greediness
The question mark is a greedy operator. This means that the regex engine will first try to match the optional part. It will only skip the optional part if matching it causes the entire regex to fail.
For example:
Feb 23(rd)?
When applied to the string "Today is Feb 23rd, 2003," the engine will match "Feb 23rd" rather than "Feb 23" because it tries to match as much as possible.
You can make the question mark lazy by adding another question mark after it:
Feb 23(rd)??
In this case, the regex will match "Feb 23" instead of "Feb 23rd."
Looking Inside the Regex Engine
Let’s see how the regex engine processes the pattern:
colou?r
when applied to the string "The colonel likes the color green."
-
The engine starts by matching the literal
c
with thec
in "colonel." -
It continues matching
o
,l
, ando
. -
It then tries to match
u
, but fails when it reachesn
in "colonel." -
The question mark makes
u
optional, so the engine skips it and moves tor
. -
r
does not matchn
, so the engine backtracks and starts searching from the next occurrence ofc
in the string.
The engine eventually matches color
in "color green." It matches the entire word because the u
was skipped, and the remaining characters matched successfully.
Summary
The question mark is a versatile operator that allows you to make parts of a regex optional. It is greedy by default, but you can make it lazy by using ??.
Understanding how the regex engine processes optional items is essential for creating efficient and accurate patterns.
0 Comments
Recommended Comments
There are no comments to display.