Jump to content

Welcome to CodeNameJessica

โœจ Welcome to CodeNameJessica! โœจ

๐Ÿ’ป Where tech meets community.

Hello, Guest! ๐Ÿ‘‹
You're just a few clicks away from joining an exclusive space for tech enthusiasts, problem-solvers, and lifelong learners like you.

๐Ÿ” Why Join?
By becoming a member of CodeNameJessica, youโ€™ll get access to:
โœ… In-depth discussions on Linux, Security, Server Administration, Programming, and more
โœ… Exclusive resources, tools, and scripts for IT professionals
โœ… A supportive community of like-minded individuals to share ideas, solve problems, and learn together
โœ… Project showcases, guides, and tutorials from our members
โœ… Personalized profiles and direct messaging to collaborate with other techies

๐ŸŒ Sign Up Now and Unlock Full Access!
As a guest, you're seeing just a glimpse of what we offer. Don't miss out on the complete experience! Create a free account today and start exploring everything CodeNameJessica has to offer.

In addition to the question mark, regex provides two more repetition operators: the asterisk (*) and the plus (+).


Basic Usage

The * (star) matches the preceding token zero or more times. The + (plus) matches the preceding token one or more times.

For example:

<[A-Za-z][A-Za-z0-9]*>

This pattern matches HTML tags without attributes:

  • <[A-Za-z] matches the first letter.

  • [A-Za-z0-9]* matches zero or more alphanumeric characters after the first letter.

This regex will match tags like:

  • <B>

  • <HTML>

If you used + instead of *, the regex would require at least one alphanumeric character after the first letter, making it match:

  • <HTML> but not <1>.


Limiting Repetition

Modern regex flavors allow you to limit repetitions using curly braces ({}).

Syntax:

{min,max}
  • min: Minimum number of matches.

  • max: Maximum number of matches.

Examples:

  • {0,} is equivalent to *.

  • {1,} is equivalent to +.

  • {3} matches exactly three repetitions.

Example:

\b[1-9][0-9]{3}\b

This pattern matches numbers between 1000 and 9999.

\b[1-9][0-9]{2,4}\b

This pattern matches numbers between 100 and 99999.

The word boundaries (\b) ensure that only complete numbers are matched.


Watch Out for Greediness!

All repetition operators (*, +, and {}) are greedy by default. This means the regex engine will try to match as much text as possible.

Example:

Consider the pattern:

<.+>

When applied to the string:

This is a <EM>first</EM> test.

You might expect it to match <EM> and </EM> separately. However, it will match <EM>first</EM> instead.

This happens because the + is greedy and matches as many characters as possible.


Looking Inside the Regex Engine

The first token in the regex is <, which matches the first < in the string.

The next token is the . (dot), which matches any character except newlines. The + causes the dot to repeat as many times as possible:

  1. The dot matches E, then M, and so on.

  2. It continues matching until the end of the string.

  3. At this point, the > token fails to match because there are no more characters left.

The engine then backtracks and tries to reduce the match length until > matches the next character.

The final match is <EM>first</EM>.


Laziness Instead of Greediness

To fix this issue, make the quantifier lazy by adding a question mark (?๐Ÿ˜ž

<.+?>

This tells the engine to match as few characters as possible.

  1. The < matches the first <.

  2. The . matches E.

  3. The engine checks for > and finds a match right after EM.

The final match is <EM>, which is what we intended.


An Alternative to Laziness

Instead of using lazy quantifiers, you can use a negated character class:

<[^>]+>

This pattern matches any sequence of characters that are not >, followed by >. It avoids backtracking and improves performance.

Example:

Given the string:

This is a <EM>first</EM> test.

The regex <[^>]+> will match:

  • <EM>

  • </EM>

This approach is more efficient because it reduces backtracking, which can significantly improve performance in large datasets or tight loops.

The *, +, and {} quantifiers control repetition in regex. They are greedy by default, but you can make them lazy by adding a question mark (?). Using negated character classes is another way to handle repetition efficiently without backtracking.

Table of Contents

  1. Regular Expression Tutorial

  2. Different Regular Expression Engines

  3. Literal Characters

  4. Special Characters

  5. Non-Printable Characters

  6. First Look at How a Regex Engine Works Internally

  7. Character Classes or Character Sets

  8. The Dot Matches (Almost) Any Character

  9. Start of String and End of String Anchors

  10. Word Boundaries

  11. Alternation with the Vertical Bar or Pipe Symbol

  12. Optional Items

  13. Repetition with Star and Plus

  14. Grouping with Round Brackets

  15. Named Capturing Groups

  16. Unicode Regular Expressions

  17. Regex Matching Modes

  18. Possessive Quantifiers

  19. Understanding Atomic Grouping in Regular Expressions

  20. Understanding Lookahead and Lookbehind in Regular Expressions (Lookaround)

  21. Testing Multiple Conditions on the Same Part of a String with Lookaround

  22. Understanding the \G Anchor in Regular Expressions

  23. Using If-Then-Else Conditionals in Regular Expressions

  24. XML Schema Character Classes and Subtraction Explained

  25. Understanding POSIX Bracket Expressions in Regular Expressions

  26. Adding Comments to Regular Expressions: Making Your Regex More Readable

  27. Free-Spacing Mode in Regular Expressions: Improving Readability

0 Comments

Recommended Comments

There are no comments to display.

Important Information

Terms of Use Privacy Policy Guidelines We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.