Jump to content

Welcome to CodeNameJessica

✨ Welcome to CodeNameJessica! ✨

πŸ’» Where tech meets community.

Hello, Guest! πŸ‘‹
You're just a few clicks away from joining an exclusive space for tech enthusiasts, problem-solvers, and lifelong learners like you.

πŸ” Why Join?
By becoming a member of CodeNameJessica, you’ll get access to:
βœ… In-depth discussions on Linux, Security, Server Administration, Programming, and more
βœ… Exclusive resources, tools, and scripts for IT professionals
βœ… A supportive community of like-minded individuals to share ideas, solve problems, and learn together
βœ… Project showcases, guides, and tutorials from our members
βœ… Personalized profiles and direct messaging to collaborate with other techies

🌐 Sign Up Now and Unlock Full Access!
As a guest, you're seeing just a glimpse of what we offer. Don't miss out on the complete experience! Create a free account today and start exploring everything CodeNameJessica has to offer.

Alternation with the Vertical Bar or Pipe Symbol (Page 11)

(0 reviews)

Previously, we explored how character classes allow you to match a single character out of several possible options. Alternation, on the other hand, enables you to match one of several possible regular expressions.

The vertical bar or pipe symbol (|) is used for alternation. It acts as an OR operator within a regex.


Basic Syntax

To search for either "cat" or "dog," use the pattern:

cat|dog

You can add more options as needed:

cat|dog|mouse|fish

The regex engine will match any of these options. For example:

Regex

String

Matches

cat|dog|mouse|fish

"I have a cat and a dog"

βœ… Yes

cat|dog|mouse|fish

"I have a fish"

βœ… Yes


Precedence and Grouping

The alternation operator has the lowest precedence among all regex operators. This means the regex engine will try to match everything to the left or right of the vertical bar. If you need to control the scope of the alternation, use round brackets (()) to group expressions.

Example:

Without grouping:

\bcat|dog\b

This regex will match:

  • A word boundary followed by "cat"

  • "dog" followed by a word boundary

With grouping:

\b(cat|dog)\b

This regex will match:

  • A word boundary, then either "cat" or "dog," followed by another word boundary.

Regex

String

Matches

\bcat|dog\b

"I saw a cat dog"

βœ… Yes

\b(cat|dog)\b

"I saw a cat dog"

βœ… Yes


Understanding Regex Engine Behavior

The regex engine is eager, meaning it stops searching as soon as it finds a valid match. The order of alternatives matters.

Consider the pattern:

Get|GetValue|Set|SetValue

When applied to the string "SetValue," the engine will:

  1. Try to match Get, but fail.

  2. Try GetValue, but fail.

  3. Match Set and stop.

The result is that the engine matches "Set," but not "SetValue." This happens because the engine found a valid match early and stopped.


Solutions to Eagerness

There are several ways to address this behavior:

1. Change the Order of Options

By changing the order of options, you can ensure longer matches are attempted first:

GetValue|Get|SetValue|Set

This way, "SetValue" will be matched before "Set."

2. Use Optional Groups

You can combine related options and use ? to make parts of them optional:

Get(Value)?|Set(Value)?

This pattern ensures "GetValue" is matched before "Get," and "SetValue" before "Set."

3. Use Word Boundaries

To ensure you match whole words only, use word boundaries:

\b(Get|GetValue|Set|SetValue)\b

Alternatively, use:

\b(Get(Value)?|Set(Value)?)\b

Or even better:

\b(Get|Set)(Value)?\b

This pattern is more efficient and concise.


POSIX Regex Behavior

Unlike most regex engines, POSIX-compliant regex engines always return the longest possible match, regardless of the order of alternatives. In a POSIX engine, applying Get|GetValue|Set|SetValue to "SetValue" will return "SetValue," not "Set." This behavior is due to the POSIX standard, which prioritizes the longest match.


Summary

Alternation is a powerful feature in regex that allows you to match one of several possible patterns. However, due to the eager behavior of most regex engines, it’s essential to order your alternatives carefully and use grouping to ensure accurate matches. By understanding how the engine processes alternation, you can write more effective and optimized regex patterns.

Table of Contents

  1. Regular Expression Tutorial

  2. Different Regular Expression Engines

  3. Literal Characters

  4. Special Characters

  5. Non-Printable Characters

  6. First Look at How a Regex Engine Works Internally

  7. Character Classes or Character Sets

  8. The Dot Matches (Almost) Any Character

  9. Start of String and End of String Anchors

  10. Word Boundaries

  11. Alternation with the Vertical Bar or Pipe Symbol

  12. Optional Items

  13. Repetition with Star and Plus

  14. Grouping with Round Brackets

  15. Named Capturing Groups

  16. Unicode Regular Expressions

  17. Regex Matching Modes

  18. Possessive Quantifiers

  19. Understanding Atomic Grouping in Regular Expressions

  20. Understanding Lookahead and Lookbehind in Regular Expressions (Lookaround)

  21. Testing Multiple Conditions on the Same Part of a String with Lookaround

  22. Understanding the \G Anchor in Regular Expressions

  23. Using If-Then-Else Conditionals in Regular Expressions

  24. XML Schema Character Classes and Subtraction Explained

  25. Understanding POSIX Bracket Expressions in Regular Expressions

  26. Adding Comments to Regular Expressions: Making Your Regex More Readable

  27. Free-Spacing Mode in Regular Expressions: Improving Readability

0 Comments

Recommended Comments

There are no comments to display.

Guest
Add a comment...

Important Information

Terms of Use Privacy Policy Guidelines We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.