-
Blog Entries
-
by: Bill Dyer
Before Reddit, before GitHub, and even before the World Wide Web went online, there was Usenet.
This decentralized network of discussion groups was a main line of communication of the early internet - ideas were exchanged, debates raged, research conducted, and friendships formed.
For those of us who experienced it, Usenet was more than just a communication tool; it was a community, a center of innovation, and a proving ground for the ideas.
📋It is also culturally and historically significant in that it popularized concepts and terms such as "LOL (first used in a newsgroup in 1990)," FAQ," "flame," "spam," and,"sockpuppet".Oldest network that is still in use
Usenet is one of the oldest computer network communications systems still in widespread use. It went online in 1980, at the University of North Carolina at Chapel Hill and Duke University.
1980 is significant in that it was pre-World Wide Web by over a decade. In fact, the "Internet" was basically a network of privately owned ARPANet sites. Usenet was created to be the network for the general public - before the general public even had access to the Internet. I see Usenet as the first social network.
Over time, Usenet grew to include thousands of discussion groups (called newsgroups) and millions of users. Users read and write posts, called articles, using software called a newsreader.
In the 1990s, early Web browsers and email programs often had a built-in newsreader. Topics were many; if you could imagine a topic, there was probably a group made for it and, if a group didn't exist, one could be made.
The culture of Usenet: Learning the ropes
While I say that Usenet was the first social network, it was never really organized to be one. Each group owner could - and usually did - set their own rules.
Before participating in discussions, it was common advice to “lurk” for a while - read the group without posting - to learn the rules, norms, and tone of the community. Every Usenet group had its own etiquette, usually unwritten, and failing to follow it could lead to a “flaming.” These public scoldings, were often harsh, but they reinforced the importance of respecting the group’s culture.
For groups like comp.std.doc and comp.text, lurking was essential to understand the technical depth and specificity of the conversations. Jumping in without preparation wasn’t just risky - it was almost a rite of passage to survive the initial corrections from seasoned members. Yet, once you earned their respect, you became part of a tightly knit network of expertise and camaraderie - you belonged.
Usenet and the birth of Linux
One newsgroup, comp.os.minix, became legendary when Linus Torvalds posted about a new project he was working on. In August 1991, Linus announced the creation of Linux, a hobby project of a free operating system.
Usenet's structure - decentralized, threaded, and open - can be seen as the first demonstration of the values of open-source development. Anyone with a connection and a bit of technical know-how could hop on and join in a conversation. For developers, Usenet quickly became the main tool for keeping up with rapidly evolving programming languages, paradigms, and methodologies.
It's not a stretch to see how Usenet also became an essential platform for code collaborating, bug tracking, and intellectual exchange - it thrived in this ecosystem.
The discussions were sometimes messy - flame wars and off-topic posts were common - but they were also authentic. Problems were solved not in isolation but through collective effort. Without Usenet, the early growth of Linux may well have been much slower.
A personal memory: Helping across continents
My own experience with Usenet wasn’t just about reading discussions or solving technical problems. It became a bridge to collaboration and friendship. I remember one particular interaction well: a Finnish academic working on her doctoral dissertation on documentation standards posted a query to a group I frequented. By chance, I had the information she needed.
At the time, I spent a lot of my time in groups like comp.std.doc and comp.text, where discussions about documentation standards and text processing were common. She was working with SGML standards, while I was more focused on HTML. Despite our different areas of expertise, Usenet made it easy for the two of us to connect and share knowledge. She later wrote back to say my input had helped her complete her dissertation.
This took place in the mid-1990s and that brief collaboration turned into a friendship that lasts to this day. Although we may go long periods without writing, we’ve always kept in touch. It’s evidence to how Usenet didn’t just encourage innovation but also created a lasting friendship across continents.
The decline and legacy of Usenet
As the internet evolved, Usenet's use has been fading. The rise of web-based forums, social media, and version-control platforms like GitHub made Usenet feel clunky and outdated, and there are concerns that it is largely being used to send spam and conduct flame wars and binary (no text) exchanges.
On February 22, 2024, Google stopped Usenet support for these reasons. Users can no longer post or subscribe, or view new content. Historical content, before the cut-off date can be viewed, however.
This doesn't mean that Usenet is dead; far from it. Giganews, Newsdemon, and Usenet are still running, if you are interested in looking into this. Both require a subscription, but Eternal September provides free access.
If Usenet's use has been declining, then why look into it? Its archives. The archives hold detailed discussions, insights, questions and answers, and snippets of code - a good deal of which is still relevant to today’s software hurdles.
Conclusion
I would guess that, for those of us who were there, Usenet remains a nostalgic memory. It does for me. The quirks of its culture - from FAQs to "RTFM" responses - were part of its charm. It was chaotic, imperfect, and sometimes frustrating, but it was also a place where real work got done and real connections were made.
Looking back, my time on Usenet was one of the foundational chapters in my journey through technology. Helping a stranger across the globe complete a dissertation might seem like a small thing, but it’s emblematic of what Usenet stood for: collaboration without boundaries. It bears repeating: It was a place where knowledge was freely shared and where the seeds of ideas could grow into something great. And in this case, it helped create a friendship that continues to remind me of Usenet’s unique power to connect people.
As Linux fans, we owe a lot to Usenet. Without it, Linux might have remained a small hobby project instead of becoming the force of computing that it has become. So, the next time you’re diving into a Linux distro or collaborating on an open-source project, take a moment to appreciate the platform that helped make it all possible.
-
By Jessica Brown in Jessica BrownMost regular expression engines discussed in this tutorial support the following four matching modes:
Modifier Description /i Makes the regex case-insensitive. /s Enables "single-line mode," making the dot (.) match newlines. /m Enables "multi-line mode," allowing caret (^) and dollar ($) to match at the start and end of each line. /x Enables "free-spacing mode," where whitespace is ignored, and # can be used for comments. Specifying Modes Inside The Regular Expression
You can specify these modes within a regex using mode modifiers. For example:
(?i) turns on case-insensitive matching.
(?s) enables single-line mode.
(?m) enables multi-line mode.
(?x) enables free-spacing mode.
Example:
(?i)hello matches "HELLO" Turning Modes On and Off for Only Part of the Regex
Modern regex flavors allow you to apply modifiers to specific parts of the regex:
(?i-sm) turns on case-insensitive mode while turning off single-line and multi-line modes.
To apply a modifier to only a part of the regex, you can use the following syntax:
(?i)word(?-i)Word This pattern makes "word" case-insensitive but "Word" case-sensitive.
Modifier Spans
Modifier spans apply modes to a specific section of the regex:
(?i:word) makes "word" case-insensitive.
(?i:case)(?-i:sensitive) applies mixed modes within the regex.
Example:
(?i:ignorecase)(?-i:casesensitive) Summary
Understanding matching modes is essential for writing efficient and accurate regex patterns. By leveraging modes like case-insensitivity, single-line, multi-line, and free-spacing, you can create more flexible and maintainable regular expressions.
-
By Jessica Brown in Jessica BrownUnicode regular expressions are essential for working with text in multiple languages and character sets. As the world becomes more interconnected, supporting Unicode is increasingly important for ensuring that software can handle diverse text inputs.
What is Unicode?
Unicode is a standardized character set that encompasses characters and glyphs from all human languages, both living and dead. It aims to provide a consistent way to represent characters from different languages, eliminating the need for language-specific character sets.
Challenges with Unicode in Regular Expressions
Working with Unicode introduces unique challenges:
Characters, Code Points, and Graphemes:
A single character (grapheme) may be represented by multiple code points. For example, the letter "à" can be represented as: A single code point: U+00E0 Two code points: U+0061 ("a") + U+0300 (grave accent) Regular expressions that treat code points as characters may fail to match graphemes correctly. Combining Marks:
Combining marks are code points that modify the preceding character. For example, U+0300 (grave accent) is a combining mark that can be applied to many base characters. Matching Unicode Graphemes
To match a single Unicode grapheme (character), use:
Perl, RegexBuddy, PowerGREP: \X Java, .NET: \P{M}\p{M}* Example:
\X matches a grapheme \P{M}\p{M}* matches a base character followed by zero or more combining marks Matching Specific Code Points
To match a specific Unicode code point, use:
JavaScript, .NET, Java: \uFFFF (FFFF is the hexadecimal code point) Perl, PCRE: \x{FFFF} Unicode Character Properties
Unicode defines properties that categorize characters based on their type. You can match characters belonging to specific categories using:
Positive Match: \p{Property} Negative Match: \P{Property} Common Properties:
\p{L} - Letter \p{Lu} - Uppercase Letter \p{Ll} - Lowercase Letter \p{N} - Number \p{P} - Punctuation \p{S} - Symbol \p{Z} - Separator \p{C} - Other (Control Characters) Unicode Scripts and Blocks
Unicode groups characters into scripts and blocks:
Scripts: Collections of characters used by a particular language or writing system. Blocks: Contiguous ranges of code points. Example Scripts:
\p{Latin} \p{Greek} \p{Cyrillic} Example Blocks:
\p{InBasic_Latin} \p{InGreek_and_Coptic} \p{InCyrillic} Best Practices for Unicode Regex
Use \X to match graphemes when supported. Be aware of different ways to encode characters. Normalize input to avoid mismatches due to different encodings. Use Unicode properties to match character categories. Use scripts and blocks to match specific writing systems. -
By Jessica Brown in Jessica BrownNamed capturing groups allow you to assign names to capturing groups, making it easier to reference them in complex regular expressions. This feature is available in most modern regular expression engines.
Why Use Named Capturing Groups?
In traditional regular expressions, capturing groups are referenced by their numbers (e.g., \1, \2). As the number of groups increases, it becomes harder to manage and understand which group corresponds to which part of the match. Named capturing groups solve this problem by allowing you to reference groups by descriptive names.
Example (Traditional):
(\d{4})-(\d{2})-(\d{2}) In this pattern, you would reference the year as \1, the month as \2, and the day as \3.
Example (Named):
(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2}) Now, you can reference the year as year, the month as month, and the day as day, making the regex more readable and maintainable.
Named Capture Syntax by Flavor
Python, PCRE, and PHP
These flavors use the following syntax for named capturing groups:
(?P<name>group) To reference the named group inside the regex, use:
(?P=name) To reference it in replacement text, use:
\g<name> Example:
(?P<word>\w+)\s+(?P=word) This pattern matches doubled words like "the the".
.NET Framework
The .NET regex engine uses its own syntax for named capturing groups:
(?<name>group) or (?'name'group) To reference the named group inside the regex, use:
\k<name> or \k'name' In replacement text, use:
${name} Example:
(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2}) This pattern matches a date in YYYY-MM-DD format. You can reference the named groups in replacement text like:
${year}/${month}/${day} Multiple Groups with the Same Name
In the .NET framework, you can have multiple capturing groups with the same name. This is useful when you have different patterns that should capture the same kind of data.
Example:
a(?<digit>[0-5])|b(?<digit>[4-7]) In this pattern, both groups are named digit. The capturing group will contain the matched digit, regardless of which alternative was matched.
Note:
Python and PCRE do not allow multiple groups with the same name. Attempting to do so will result in a compilation error. Numbering of Named Groups
The way capturing groups are numbered varies between regex flavors:
Python and PCRE
Both named and unnamed capturing groups are numbered from left to right.
(a)(?P<x>b)(c)(?P<y>d) In this pattern:
Group 1: (a) Group 2: (?P<x>b) Group 3: (c) Group 4: (?P<y>d) In replacement text, you can reference these groups as \1, \2, \3, and \4.
.NET Framework
The .NET framework handles named groups differently. Named groups are numbered after all unnamed groups.
(a)(?<x>b)(c)(?<y>d) In this pattern:
Group 1: (a) Group 2: (c) Group 3: (?<x>b) Group 4: (?<y>d) In replacement text, you would reference the groups as:
$1 for (a) $2 for (c) $3 for (?<x>b) $4 for (?<y>d) To avoid confusion, it’s best to reference named groups by their names rather than their numbers in the .NET framework.
Best Practices
To ensure compatibility across different regex flavors and avoid confusion, follow these best practices:
Do not mix named and unnamed groups. Use either all named groups or all unnamed groups. Use non-capturing groups for parts of the regex that don’t need to be captured: (?:group) Use descriptive names for capturing groups to make your regex more readable. JGsoft Engine
The JGsoft regex engine (used in tools like EditPad Pro and PowerGREP) supports both Python-style and .NET-style named capturing groups.
Python-style named groups are numbered along with unnamed groups. .NET-style named groups are numbered after unnamed groups. Multiple groups with the same name are allowed. Summary
Named capturing groups make regular expressions more readable and maintainable. Different regex flavors have varying syntaxes and behaviors for named groups. To write portable and efficient regex patterns:
Use named groups to improve readability. Avoid mixing named and unnamed groups. Use non-capturing groups when capturing is unnecessary. By understanding how different regex engines handle named groups, you can write more robust and compatible regex patterns across various programming languages and tools.
-
By Jessica Brown in Jessica BrownIn regular expressions, round brackets (()) are used for grouping. Grouping allows you to apply operators to multiple tokens at once. For example, you can make an entire group optional or repeat the entire group using repetition operators.
Basic Usage
For example:
Set(Value)? This pattern matches:
"Set" "SetValue" The round brackets group "Value", and the question mark makes it optional.
Note:
Square brackets ([]) define character classes. Curly braces ({}) specify repetition counts. Only round brackets (()) are used for grouping. Backreferences
Round brackets not only group parts of a regex but also create backreferences. A backreference stores the text matched by the group, allowing you to reuse it later in the regex or replacement text.
Example:
Set(Value)? If "SetValue" is matched, the backreference \1 will contain "Value". If only "Set" is matched, the backreference will be empty.
To prevent creating a backreference, use non-capturing parentheses:
Set(?:Value)? The (?: ... ) syntax disables capturing, making the regex more efficient when backreferences are not needed.
Using Backreferences in Replacement Text
Backreferences are often used in search-and-replace operations. The exact syntax for using backreferences in replacement text varies between tools and programming languages.
For example, in many tools:
\1 refers to the first capturing group. \2 refers to the second capturing group, and so on. In replacement text, you can use these backreferences to reinsert matched text:
Find: (\w+)\s+\1 Replace: \1 This pattern finds doubled words like "the the" and replaces them with a single instance.
Using Backreferences in the Regex
Backreferences can also be used within the regex itself to match the same text again.
Example:
<([A-Z][A-Z0-9]*)\b[^>]*>.*?</\1> This pattern matches an HTML tag and its corresponding closing tag. The opening tag name is captured in the first backreference, and \1 is used to ensure the closing tag matches the same name.
Numbering Backreferences
Backreferences are numbered based on the order of opening brackets in the regex:
The first opening bracket creates backreference \1. The second opening bracket creates backreference \2. Non-capturing groups do not count toward the numbering.
Example:
([a-c])x\1x\1 This pattern matches:
"axaxa" "bxbxb" "cxcxc" If a group is optional and not matched, the backreference will be empty, but the regex will still work.
Looking Inside the Regex Engine
Let’s see how the regex engine processes the following pattern:
<([A-Z][A-Z0-9]*)\b[^>]*>.*?</\1> when applied to the string:
Testing <B><I>bold italic</I></B> text The engine matches <B> and stores "B" in the first backreference. It skips over the text until it finds the closing </B>. The backreference \1 ensures the closing tag matches the same name as the opening tag. The entire match is <B><I>bold italic</I></B>. Backreferences to Failed Groups
There’s a difference between a backreference to a group that matched nothing and one to a group that did not participate at all:
Example:
(q?)b\1 This pattern matches "b" because the optional q? matched nothing.
In contrast:
(q)?b\1 This pattern fails to match "b" because the group (q) did not participate in the match at all.
In most regex flavors, a backreference to a non-participating group causes the match to fail. However, in JavaScript, backreferences to non-participating groups match an empty string.
Forward References and Invalid References
Some modern regex flavors, like .NET, Java, and Perl, allow forward references. A forward reference is a backreference to a group that appears later in the regex.
Example:
(\2two|(one))+ This pattern matches "oneonetwo". The forward reference \2 fails at first but succeeds when the group is matched during repetition.
In most flavors, referencing a group that doesn’t exist results in an error. In JavaScript and Ruby, such references result in a zero-width match.
Repetition and Backreferences
The regex engine doesn’t permanently substitute backreferences in the regex. Instead, it uses the most recent value captured by the group.
Example:
([abc]+)=\1 This pattern matches "cab=cab".
In contrast:
([abc])+\1 This pattern does not match "cab" because the backreference holds only the last value captured by the group (in this case, "b").
Useful Example: Checking for Doubled Words
You can use the following regex to find doubled words in a text:
\b(\w+)\s+\1\b In your text editor, replace the doubled word with \1 to remove the duplicate.
Example:
Input: "the the cat" Output: "the cat" Limitations
Round brackets cannot be used inside character classes. For example: [(a)b] This pattern matches the literal characters "a", "b", "(", and ")".
Backreferences also cannot be used inside character classes. In most flavors, \1 inside a character class is treated as an octal escape sequence. Example:
(a)[\1b] This pattern matches "a" followed by either \x01 (an octal escape) or "b".
Summary
Grouping with round brackets allows you to:
Apply operators to entire groups of tokens. Create backreferences for reuse in the regex or replacement text. Use non-capturing groups (?: ... ) to avoid creating unnecessary backreferences and improve performance. Be mindful of the limitations and differences in behavior across various regex flavors.
-
-
-
Topics
-
- 0 replies
- 7 views
-
Programming Challenge: IP Address Validator (Jan 9, 2025)
By Jessica Brown, in Programming Challenges
- 0 replies
- 8 views
-
- 0 replies
- 5 views
-
- 1 reply
- 7 views
-
Breaking Boundaries: Empowering Women in Technology
By Jessica Brown, in Welcome to the Women in IT Club!
- 0 replies
- 6 views
-