
Possessive Quantifiers (Page 18)
Tutorials · Jessica Brown · 01/10/25 10:28 AM
How Possessive Quantifiers Work
Possessive quantifiers are a feature of some modern regex engines, including JGsoft, Java, and PCRE. These quantifiers behave like greedy quantifiers by attempting to match as many characters as possible. However, once a match is made, possessive quantifiers lock in the match and refuse to give up characters during backtracking.
You can make a quantifier possessive by adding a + after it:
* (greedy) matches zero or more times.
*? (lazy) matches as few times as possible.
*+ (possessive) matches zero or more times but refuses to backtrack.
Other possessive quantifiers include ++, ?+, and {n,m}+.
Example of Possessive Quantifiers in Action
Consider the regex pattern "[^"]*+" applied to the string "abc":
The first " matches the opening quote.
The [^\"]*+ matches the characters abc within the quotes.
The final " matches the closing quote.
In this case, the possessive quantifier behaves similarly to a greedy quantifier. However, if the string lacks a closing quote, the regex will fail faster with a possessive quantifier because there are no backtracking steps to try.
For instance, when applied to the string "abc, the possessive quantifier prevents the regex engine from backtracking to try alternate matches, immediately resulting in a failure when it encounters the missing closing quote. In contrast, a greedy quantifier would continue backtracking unnecessarily, trying to find a match.
When Possessive Quantifiers Matter
Possessive quantifiers are particularly useful for optimizing regex performance by preventing excessive backtracking. This is especially valuable in cases where:
You expect a match to fail.
The pattern includes nested quantifiers.
By using possessive quantifiers, you can reduce or eliminate catastrophic backtracking, which can slow down your regex significantly.
How Possessive Quantifiers Can Change Match Results
Possessive quantifiers can alter the outcome of a match. For example:
The pattern ".*" applied to the string "abc"x will match "abc".
The pattern ".*+" applied to the same string will fail to match because the possessive quantifier locks in the entire string, including the extra character x, preventing the second quote from matching.
This demonstrates that possessive quantifiers should be used carefully. The part of the pattern that follows the possessive quantifier must not be able to match any characters already consumed by the quantifier.
Using Atomic Grouping Instead of Possessive Quantifiers
Atomic groups offer a similar function to possessive quantifiers. They prevent backtracking within the group, making them a useful alternative for regex flavors that don’t support possessive quantifiers.
To create an atomic group, use the syntax (?>X*) instead of X*+. For example:
(?:a|b)*+ is equivalent to (?>(?:a|b)*).
The key difference is that the quantified token and the quantifier must be inside the atomic group for the effect to be the same. If the atomic group only surrounds the alternation (e.g., (?>a|b)*), the behavior will differ.
Example Comparison
Consider the following examples:
(?:a|b)*+b and (?>(?:a|b)*)b will both fail to match the string b because the possessive quantifier or atomic group prevents the pattern from backtracking.
In contrast, (?>a|b)*b will match b. The atomic group ensures that each alternation (a or b) doesn’t backtrack, but the outer greedy quantifier allows backtracking to match the final b.
Practical Tip for Conversion
When converting a regex from a flavor that supports possessive quantifiers to one that doesn’t, you can replace possessive quantifiers with atomic groups. For instance:
Replace X*+ with (?>(X*)).
Replace (?:a|b)*+ with (?>(?:a|b)*).
Using 3rd party tools can automate this conversion process and ensure compatibility across different regex flavors.
Table of Contents
Regular Expression Tutorial
Different Regular Expression Engines
Literal Characters
Special Characters
Non-Printable Characters
First Look at How a Regex Engine Works Internally
Character Classes or Character Sets
The Dot Matches (Almost) Any Character
Start of String and End of String Anchors
Word Boundaries
Alternation with the Vertical Bar or Pipe Symbol
Optional Items
Repetition with Star and Plus
Grouping with Round Brackets
Named Capturing Groups
Unicode Regular Expressions
Regex Matching Modes
Possessive Quantifiers
Understanding Atomic Grouping in Regular Expressions
Understanding Lookahead and Lookbehind in Regular Expressions (Lookaround)
Testing Multiple Conditions on the Same Part of a String with Lookaround
Understanding the \G Anchor in Regular Expressions
Using If-Then-Else Conditionals in Regular Expressions
XML Schema Character Classes and Subtraction Explained
Understanding POSIX Bracket Expressions in Regular Expressions
Adding Comments to Regular Expressions: Making Your Regex More Readable
Free-Spacing Mode in Regular Expressions: Improving Readability
- Read more...
- 0 comments
- 122 views
-