Non-Printable Characters (Page 5)
Regular expressions can also match non-printable characters using special sequences. Here are some common examples:
- «\t»: Tab character (ASCII 0x09)
- «\r»: Carriage return (ASCII 0x0D)
- «\n»: Line feed (ASCII 0x0A)
- «\a»: Bell (ASCII 0x07)
- «\e»: Escape (ASCII 0x1B)
- «\f»: Form feed (ASCII 0x0C)
- «\v»: Vertical tab (ASCII 0x0B)
Keep in mind that Windows text files use "\r\n" to terminate lines, while UNIX text files use "\n".
Hexadecimal and Unicode Characters
You can include any character in your regex using its hexadecimal or Unicode code point. For example:
- «\x09»: Matches a tab character (same as «\t»).
- «\xA9»: Matches the copyright symbol (©) in the Latin-1 character set.
- «\u20AC»: Matches the euro currency sign (€) in Unicode.
Additionally, most regex flavors support control characters using the syntax «\cA» through «\cZ», which correspond to Control+A through Control+Z. For example:
- «\cM»: Matches a carriage return, equivalent to «\r».
In XML Schema regex, the token «\c» is a shorthand for matching any character allowed in an XML name.
When working with Unicode regex engines, it’s best to use the «\uFFFF» notation to ensure compatibility with a wide range of characters.
0 Comments
Recommended Comments
There are no comments to display.