Jump to content
  • entries
    25
  • comments
    0
  • views
    239

Non-Printable Characters (Page 5)


Regular expressions can also match non-printable characters using special sequences. Here are some common examples:

  • «\t»: Tab character (ASCII 0x09)
  • «\r»: Carriage return (ASCII 0x0D)
  • «\n»: Line feed (ASCII 0x0A)
  • «\a»: Bell (ASCII 0x07)
  • «\e»: Escape (ASCII 0x1B)
  • «\f»: Form feed (ASCII 0x0C)
  • «\v»: Vertical tab (ASCII 0x0B)

Keep in mind that Windows text files use "\r\n" to terminate lines, while UNIX text files use "\n".

Hexadecimal and Unicode Characters

You can include any character in your regex using its hexadecimal or Unicode code point. For example:

  • «\x09»: Matches a tab character (same as «\t»).
  • «\xA9»: Matches the copyright symbol (©) in the Latin-1 character set.
  • «\u20AC»: Matches the euro currency sign (€) in Unicode.

Additionally, most regex flavors support control characters using the syntax «\cA» through «\cZ», which correspond to Control+A through Control+Z. For example:

  • «\cM»: Matches a carriage return, equivalent to «\r».

In XML Schema regex, the token «\c» is a shorthand for matching any character allowed in an XML name.

When working with Unicode regex engines, it’s best to use the «\uFFFF» notation to ensure compatibility with a wide range of characters.

0 Comments


Recommended Comments

There are no comments to display.

×
×
  • Create New...

Important Information

Terms of Use Privacy Policy Guidelines We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.