Regular Expressions Tutorial Table of Contents
-
- 163 views
- 0 comments
💻 Where tech meets community.
Hello, Guest! 👋
You're just a few clicks away from joining an exclusive space for tech enthusiasts, problem-solvers, and lifelong learners like you.
🔐 Why Join?
By becoming a member of CodeNameJessica, you’ll get access to:
✅ In-depth discussions on Linux, Security, Server Administration, Programming, and more
✅ Exclusive resources, tools, and scripts for IT professionals
✅ A supportive community of like-minded individuals to share ideas, solve problems, and learn together
✅ Project showcases, guides, and tutorials from our members
✅ Personalized profiles and direct messaging to collaborate with other techies
🌐 Sign Up Now and Unlock Full Access!
As a guest, you're seeing just a glimpse of what we offer. Don't miss out on the complete experience! Create a free account today and start exploring everything CodeNameJessica has to offer.
The internet is deeply embedded in modern life, serving as a platform for communication, commerce, education, and entertainment. However, the Dead Internet Theory questions the authenticity of this digital ecosystem. Proponents suggest that much of the internet is no longer powered by genuine human activity but by bots, AI-generated content, and automated systems. This article delves into the theory, its claims, evidence, counterarguments, and broader implications.
The Dead Internet Theory posits that a substantial portion of online activity is generated not by humans but by automated scripts and artificial intelligence. This transformation, theorists argue, has turned the internet into an artificial space designed to simulate engagement, drive corporate profits, and influence public opinion.
Bots Dominate the Internet:
AI-Generated Content:
Decline in Human Interaction:
Corporate and Government Manipulation:
The Internet "Died" in the Mid-2010s:
While intriguing, the Dead Internet Theory has several weaknesses that critics are quick to point out:
Bots Are Present but Contained:
Human Behavior Drives Patterns:
AI Content Is Transparent:
The Internet’s Complexity:
Algorithms, Not Deception, Shape Content:
Cognitive Biases Shape Perceptions:
The Human or Not website offers a practical way to explore the boundary between human and artificial interactions. Users engage in chats and guess whether their conversational partner is a human or an AI bot. For example, a bot might respond to a question about hobbies with, "I enjoy painting because it’s calming." While this seems plausible, deeper engagement often reveals limitations in nuance or context, exposing the bot.
In another instance, a human participant might share personal anecdotes, such as a memory of painting outdoors during a childhood trip, which adds emotional depth and a specific context that most bots currently struggle to replicate. Similarly, a bot might fail to provide meaningful responses when asked about abstract topics like "What does art mean to you?" or "How do you interpret the role of creativity in society?"
This platform highlights how advanced AI systems have become and underscores the challenge of distinguishing between genuine and artificial behavior—a core concern of the Dead Internet Theory.
The Human or Not website offers a practical way to explore the boundary between human and artificial interactions. Users engage in chats and guess whether their conversational partner is a human or an AI bot. For example, a bot might respond to a question about hobbies with, "I enjoy painting because it’s calming." While this seems plausible, deeper engagement often reveals limitations in nuance or context, exposing the bot.
This platform highlights how advanced AI systems have become and underscores the challenge of distinguishing between genuine and artificial behavior—a core concern of the Dead Internet Theory.
The Dead Internet Theory inevitably invokes the legacy of Alan Turing, a pioneer in computing and artificial intelligence. Turing’s contributions extended far beyond theoretical ideas; he laid the groundwork for modern computing with the invention of the Turing Machine, a conceptual framework for algorithmic processes that remains a foundation of computer science.
One of Turing’s most enduring legacies is the Turing Test, a method designed to evaluate a machine’s ability to exhibit behavior indistinguishable from a human. In this test, a human evaluator interacts with both a machine and a human through a text-based interface. If the evaluator cannot reliably differentiate between the two, the machine is said to have "passed" the test. While the Turing Test is not a perfect measure of artificial intelligence, it set the stage for the development of conversational agents and the broader study of machine learning.
Turing’s work was instrumental in breaking the German Enigma code during World War II, an achievement that significantly influenced the outcome of the war. His efforts at Bletchley Park showcased the practical applications of computational thinking, blending theoretical insights with real-world problem-solving.
Beyond his technical achievements, Turing’s life story has inspired countless discussions about the ethics of AI and human rights. Despite his groundbreaking contributions, Turing faced persecution due to his sexuality, a tragic chapter that underscores the importance of inclusion and diversity in the scientific community.
Turing’s vision continues to inspire advancements in AI, sparking philosophical debates about intelligence, consciousness, and the ethical implications of creating machines that mimic human behavior. His legacy reminds us that the questions surrounding AI—both its possibilities and its risks—are as relevant today as they were in his time.
The Dead Internet Theory inevitably invokes the legacy of Alan Turing, a pioneer in computing and artificial intelligence. His most famous contribution, the Turing Test, was designed to determine whether a machine could exhibit behavior indistinguishable from a human.
In the Turing Test, a human evaluator engages with two entities—one human and one machine—without knowing which is which. If the evaluator cannot reliably tell them apart, the machine is said to have "passed." This benchmark remains a foundational concept in AI research, symbolizing the quest for machines that emulate human thought and interaction.
Turing’s groundbreaking work laid the foundation for modern AI and sparked philosophical debates about the nature of intelligence and authenticity. His vision continues to inspire both advancements in AI and critical questions about its societal impact.
The Dead Internet Theory reflects growing concerns about authenticity and manipulation in digital spaces. As AI technologies become more sophisticated, fears about artificial content displacing genuine human voices intensify. The theory also taps into frustrations with the commercialization of the internet, where algorithms prioritize profit over meaningful interactions.
For many, the theory is a metaphor for their disillusionment. The internet, once a space for creativity and exploration, now feels dominated by ads, data harvesting, and shallow content.
The Dead Internet Theory raises valid questions about the role of automation and AI in shaping online experiences. However, the internet remains a space where human creativity, community, and interaction persist. The challenges posed by bots and AI are real, but they are counterbalanced by ongoing efforts to ensure authenticity and transparency.
Whether the theory holds merit or simply reflects anxieties about the digital age, it underscores the need for critical engagement with the technologies that increasingly mediate our lives online. The future of the internet depends on our ability to navigate these complexities and preserve the human element in digital spaces.
As someone who has worked with numerous hosting providers over the years, I can confidently say that IONOS stands out as a superior choice for web hosting. Their servers are not only robust but also incredibly cost-effective, offering features and performance that rival much pricier competitors. Let me share why I’ve been so impressed with their services and why you might want to consider them for your own projects.
IONOS provides a wide range of hosting solutions tailored to meet various needs, from small personal blogs to large e-commerce platforms. Their offerings include:
IONOS offers a referral program where both you and your friends can benefit. By signing up through my referral links, you can earn rewards like cash bonuses and free services, all while supporting sustainability efforts with tree planting.
Here are some of the popular IONOS services you can explore:
From the moment I signed up, I’ve experienced nothing but excellent support and performance. Setting up my website was a breeze thanks to their user-friendly interface. Their customer service team has been quick and knowledgeable whenever I’ve had questions.
If you’re searching for reliable and affordable web hosting, look no further than IONOS. With incredible performance, eco-friendly initiatives, and lucrative referral rewards, it’s an easy choice for businesses and individuals alike.
Use my referral links to start your journey with IONOS and enjoy top-tier hosting with amazing benefits:
Make the switch to IONOS today—you won’t regret it!
The Linux operating system has continually evolved from a niche platform for tech enthusiasts into a critical pillar of modern technology. As the backbone of everything from servers and supercomputers to mobile devices and embedded systems, Linux drives innovation across industries. Looking ahead to 2025, several key developments and trends are set to shape its future.
As the foundation of cloud infrastructure, Linux distributions such as Ubuntu Server, CentOS Stream, and Debian are integral to cloud-native environments. In 2025, advancements in container orchestration and microservices will further optimize Linux for the cloud. Additionally, edge computing, spurred by IoT and 5G, will rely heavily on lightweight Linux distributions tailored for constrained hardware. These distributions are designed to provide efficient operation in environments with limited resources, ensuring smooth integration of devices and systems at the network's edge.
With cyber threats growing in complexity, Linux distributions will focus on enhancing security. Tools like SELinux, AppArmor, and eBPF will see tighter integration. SELinux and AppArmor provide mandatory access control, significantly reducing the risk of unauthorized system access. Meanwhile, eBPF, a technology for running sandboxed programs in the kernel, will enable advanced monitoring and performance optimization. Automated vulnerability detection, rapid patching, and robust supply chain security mechanisms will also become key priorities, ensuring Linux's resilience against evolving attacks.
Linux's role in AI development will expand as industries increasingly adopt machine learning technologies. Distributions optimized for AI workloads, such as Ubuntu with GPU acceleration, will lead the charge. Kernel-level optimizations ensure better performance for data processing tasks, while tools like TensorFlow and PyTorch will be enhanced with more seamless integration into Linux environments. These improvements will make AI and ML deployments faster and more efficient, whether on-premises or in the cloud.
Wayland continues to gain traction as the default display protocol, promising smoother transitions from X11. This shift reduces latency and improves rendering, offering a better user experience for developers and gamers alike. Improvements in gaming and professional application support, coupled with enhancements to desktop environments like GNOME, KDE Plasma, and XFCE, will deliver a refined and user-friendly interface. These developments aim to make Linux an even more viable choice for everyday users.
Immutable Linux distributions such as Fedora Silverblue and openSUSE MicroOS are rising in popularity. By employing read-only root filesystems, these distributions enhance stability and simplify rollback processes. This approach aligns with trends in containerization and declarative system management, enabling users to maintain consistent system states. Immutable systems are particularly beneficial for developers and administrators who prioritize security and system integrity.
With initiatives like Valve's Proton and increasing native Linux game development, gaming on Linux is set to grow. Compatibility improvements in Proton allow users to play Windows games seamlessly on Linux. Additionally, hardware manufacturers are offering better driver support, making gaming on Linux an increasingly appealing choice for enthusiasts. The Steam Deck's success underscores the potential of Linux in the gaming market, encouraging more developers to consider Linux as a primary platform.
Long favored by developers, Linux will see continued enhancements in tools, containerization, and virtualization. For instance, Docker and Podman will likely introduce more features tailored to developer needs. CI/CD pipelines will integrate more seamlessly with Linux-based workflows, streamlining software development and deployment. Enhanced support for programming languages and frameworks ensures that developers can work efficiently across diverse projects.
As environmental concerns drive the tech industry, Linux will lead efforts in green computing. Power-saving optimizations, such as improved CPU scaling and kernel-level energy management, will reduce energy consumption without compromising performance. Community-driven solutions, supported by the open-source nature of Linux, will focus on creating systems that are both powerful and environmentally friendly.
The Linux community is set to make the operating system more accessible to a broader audience. Improvements in assistive technologies, such as screen readers and voice navigation tools, will empower users with disabilities. Simplified interfaces, better multi-language support, and comprehensive documentation will make Linux easier to use for newcomers and non-technical users.
Debian Debian's regular two-year release cycle ensures a steady stream of updates, with version 13 (“Trixie”) expected in 2025, following the 2023 release of “Bookworm.” Debian 13 will retain support for 32-bit processors but drop very old i386 CPUs in favor of i686 or newer. This shift reflects the aging of these processors, which date back over 25 years. Supporting modern hardware allows Debian to maintain its reputation for stability and reliability. As a foundational distribution, Debian's updates ripple across numerous derivatives, including Antix, MX Linux, and Tails, ensuring widespread impact in the Linux ecosystem.
Ubuntu Support for Ubuntu 20.04 ends in April 2025, unless users opt for the Extended Security Maintenance (ESM) via Ubuntu Pro. This means systems running this version will no longer receive security updates, potentially leaving them vulnerable to threats. Upgrading to Ubuntu 24.04 LTS is recommended for server systems to ensure continued support and improved features, such as better hardware compatibility and performance optimizations.
openSUSE OpenSUSE Leap 16 will adopt an “immutable” Linux architecture, focusing on a write-protected base system for enhanced security and stability. Software delivery via isolated containers, such as Flatpaks, will align the distribution with cloud and automated management trends. While this model enhances security, it may limit flexibility for desktop users who prefer customizable systems. Nevertheless, openSUSE's focus on enterprise and cloud environments ensures it remains a leader in innovation for automated and secure Linux systems.
Nix-OS Nix-OS introduces a unique concept of declarative configuration, enabling precise system reproduction and rollback capabilities. By isolating dependencies akin to container formats, Nix-OS minimizes conflicts and ensures consistent system behavior. This approach is invaluable for cloud providers and desktop users alike. The ability to roll back to previous states effortlessly provides added security and convenience, especially for administrators managing complex environments.
In 2025, Linux will continue to grow, adapt, and innovate. From powering cloud infrastructure and advancing AI to providing secure and stable desktop experiences, Linux remains an indispensable part of the tech ecosystem. The year ahead promises exciting developments that will reinforce its position as a leader in the operating system landscape. With a vibrant community and industry backing, Linux will continue shaping the future of technology for years to come.
Uploading large files to a website can fail due to server-side limitations on file size. This issue is typically caused by default configurations of web servers like Nginx or Apache, or by PHP settings for sites using PHP.
This guide explains how to adjust these settings and provides detailed examples for common scenarios.
Nginx limits the size of client requests using the client_max_body_size
directive. If this value is exceeded, Nginx will return a 413 Request Entity Too Large
error.
Locate the Nginx Configuration File
/etc/nginx/nginx.conf
/etc/nginx/sites-available/
or /etc/nginx/conf.d/
.
Adjust the client_max_body_size
Add or modify the directive in the appropriate http
, server
, or location
block. Examples:
Increase upload size globally:
http {
client_max_body_size 100M; # Set to 100 MB
}
Increase upload size for a specific site:
server {
server_name example.com;
client_max_body_size 100M;
}
Increase upload size for a specific directory:
location /uploads/ {
client_max_body_size 100M;
}
Restart Nginx Apply the changes:
sudo systemctl restart nginx
Verify Changes
/var/log/nginx/error.log
.
Apache restricts file uploads using the LimitRequestBody
directive. If PHP is in use, it may also be restricted by post_max_size
and upload_max_filesize
.
Locate the Apache Configuration File
/etc/httpd/conf/httpd.conf
(CentOS/Red Hat) or /etc/apache2/apache2.conf
(Ubuntu/Debian).
/etc/httpd/sites-available/
or /etc/apache2/sites-available/
.
Adjust LimitRequestBody
Modify or add the directive in the <Directory>
or <VirtualHost>
block.
Increase upload size globally:
<Directory "/var/www/html">
LimitRequestBody 104857600 # 100 MB
</Directory>
Increase upload size for a specific virtual host:
<VirtualHost *:80>
ServerName example.com
DocumentRoot /var/www/example.com
<Directory "/var/www/example.com">
LimitRequestBody 104857600 # 100 MB
</Directory>
</VirtualHost>
Update PHP Settings (if applicable)
Edit the php.ini
file (often in /etc/php.ini
or /etc/php/7.x/apache2/php.ini
).
Modify these values:
upload_max_filesize = 100M
post_max_size = 100M
Restart Apache to apply changes:
sudo systemctl restart apache2 # For Ubuntu/Debian
sudo systemctl restart httpd # For CentOS/Red Hat
Verify Changes
/var/log/apache2/error.log
.
Allow Large File Uploads to a Specific Directory (Nginx): To allow uploads up to 200 MB in a directory /var/www/uploads/
:
location /uploads/ {
client_max_body_size 200M;
}
Allow Large File Uploads for a Subdomain (Apache): For a subdomain uploads.example.com
:
<VirtualHost *:80>
ServerName uploads.example.com
DocumentRoot /var/www/uploads.example.com
<Directory "/var/www/uploads.example.com">
LimitRequestBody 209715200 # 200 MB
</Directory>
</VirtualHost>
Allow Large POST Requests (PHP Sites): Ensure PHP settings align with web server limits. For example, to allow 150 MB uploads:
upload_max_filesize = 150M
post_max_size = 150M
max_execution_time = 300 # Allow enough time for the upload
max_input_time = 300
Handling Large API Payloads (Nginx): If your API endpoint needs to handle JSON payloads up to 50 MB:
location /api/ {
client_max_body_size 50M;
}
gzip
or other compression techniques for file transfers.
Vue.js is a versatile and progressive JavaScript framework for building user interfaces. Its simplicity and powerful features make it an excellent choice for modern web applications. In this article, we will walk through creating a VueJS application from scratch on both Windows and Linux.
Before starting, ensure you have the following tools installed on your system:
node -v
npm -v
npm install -g @vue/cli
vue --version
Node.js and npm
curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash -
sudo apt install -y nodejs
18.x
with the desired Node.js version.
node -v
npm -v
Terminal
Vue CLI
npm install -g @vue/cli
vue --version
Curl
Code Editor (Optional)
Vetur
or Vue Language Features
for enhanced development.
npm
is installed alongside Node.js.
node -v
npm -v
npm install -g @vue/cli
vue --version
cd path\to\your\project
vue create my-vue-app
cd my-vue-app
npm run serve
http://localhost:8080
in your browser to view your app.
sudo apt update
sudo apt upgrade
curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash -
sudo apt install -y nodejs
Replace 18.x
with the desired Node.js version.
node -v
npm -v
npm install -g @vue/cli
vue --version
cd ~/projects
vue create my-vue-app
cd my-vue-app
npm run serve
http://localhost:8080
in your browser to view your app.
Create a new component, HelloWorld.vue
, in the src/components
directory:
<template> <div> <h1>Hello, VueJS!</h1> </div> </template> <script> export default { name: "HelloWorld", }; </script> <style scoped> h1 { color: #42b983; } </style>
Import and use the component in src/App.vue
:
<template> <div id="app"> <HelloWorld /> </div> </template> <script> import HelloWorld from "./components/HelloWorld.vue"; export default { name: "App", components: { HelloWorld, }, }; </script>
The Model-View-ViewModel (MVVM) architecture separates the graphical user interface from the business logic and data. Here's an example:
Define a data structure in the Vue component:
export default { data() { return { message: "Welcome to MVVM with VueJS!", counter: 0, }; }, methods: { incrementCounter() { this.counter++; }, }, };
Bind the data to the template:
<template> <div> <h1>{{ message }}</h1> <p>Counter: {{ counter }}</p> <button @click="incrementCounter">Increment</button> </div> </template>
The data
and methods
act as the ViewModel, connecting the template (View) with the business logic (Model).
Vetur
or Vue Language Features
for enhanced development.
The Model-View-ViewModel (MVVM) architectural pattern is widely used in modern software development for creating applications with a clean separation between user interface (UI) and business logic. Originating from Microsoft's WPF (Windows Presentation Foundation) framework, MVVM has found applications in various programming environments, including web development frameworks like Vue.js, Angular, and React (when combined with state management libraries).
The MVVM pattern organizes code into three distinct layers:
The Model is responsible for managing the application's data and business logic. It represents real-world entities and operations without any concern for the UI.
The View is the visual representation of the data presented to the user. It is responsible for displaying information and capturing user interactions.
The ViewModel acts as a mediator between the Model and the View. It binds the data from the Model to the UI and translates user actions into commands that the Model can understand.
Adopting the MVVM pattern offers several benefits:
Separation of Concerns:
Reusability:
Testability:
Scalability:
A simple counter application where users can increment a number by clicking a button.
Defines the data and business logic:
export default { data() { return { counter: 0, }; }, methods: { incrementCounter() { this.counter++; }, }, };
The template displays the UI:
<template> <div> <h1>Counter: {{ counter }}</h1> <button @click="incrementCounter">Increment</button> </div> </template>
Binds the Model to the View:
export default { name: "CounterApp", data() { return { counter: 0, }; }, methods: { incrementCounter() { this.counter++; }, }, };
Keep Layers Independent:
Leverage Data Binding:
Minimize ViewModel Complexity:
Test Each Layer Separately:
MVVM is ideal for:
The MVVM pattern is a robust architectural solution for creating scalable, maintainable, and testable applications. By clearly separating responsibilities into Model, View, and ViewModel layers, developers can build applications that are easier to develop, debug, and extend. Whether you're working on a desktop application or a modern web application, understanding and implementing MVVM can significantly enhance the quality of your codebase.
Start applying MVVM in your projects today and experience the difference it can make in your development workflow!
List By: Miko Pawlikowski
Descriptions By: Jessica Brown
Published: December 29, 2024
Software engineering is a discipline that balances technical precision, creativity, and collaboration. These 17 subtle rules provide insights to improve the quality of code, foster teamwork, and guide sustainable practices.
0. Stop Falling in Love with Your Own Code
When you become too attached to your code, you may resist valuable feedback or overlook its flaws. Always prioritize the quality of the solution over personal pride. It's common for engineers to feel a sense of ownership over their code. While this passion is commendable, it can lead to bias, making it hard to see where improvements or simplifications are needed. Detach emotionally and view feedback as an opportunity to improve, not a critique of your skills.
1. You Will Regret Complexity When On-Call
Overly complex systems are hard to debug, especially during emergencies. Strive for simplicity, making it easier for others (and your future self) to understand and maintain. Complexity often creeps in unnoticed, through clever solutions or layers of abstraction. However, when systems fail, it's the simpler designs that are easier to troubleshoot. Use complexity judiciously and only when it's absolutely necessary to meet requirements.
2. Everything is a Trade-Off. There's No "Best"
Every design decision involves compromises. The "best" solution depends on the context, constraints, and goals of the project. Choosing a database, framework, or algorithm involves balancing speed, scalability, maintainability, and cost. Recognize that no solution excels in every category. Acknowledge the trade-offs and ensure your choices align with the project's priorities.
3. Every Line of Code You Write is a Liability
Code requires maintenance, testing, and updates. Write only what is necessary and consider the long-term implications of every addition. Each line of code introduces potential bugs, security vulnerabilities, or technical debt. Minimize code by reusing existing libraries, automating where possible, and ensuring that each addition has a clear purpose.
4. Document Your Decisions and Designs
Good documentation saves time and prevents confusion. Capture the reasoning behind decisions, architectural diagrams, and usage guidelines. Documentation acts as a map for future developers. Without it, even straightforward systems can become inscrutable. Write with clarity and ensure that your documentation evolves alongside the code.
5. Everyone Hates Code They Didn't Write
Familiarity breeds fondness. Review others' code with empathy, recognizing the constraints they faced and the decisions they made. It's easy to criticize unfamiliar code. Instead, approach it with curiosity: Why were certain decisions made? What challenges were faced? Collaborative and constructive feedback fosters a more supportive team environment.
6. Don't Use Unnecessary Dependencies
Dependencies add risk and complexity. Evaluate whether you truly need an external library or if a simpler, inhouse solution will suffice. While dependencies can save development time, they may introduce vulnerabilities, licensing concerns, or compatibility issues. Regularly audit your dependencies and remove any that are redundant or outdated.
7. Coding Standards Prevent Arguments
Adhering to established coding standards reduces debates over style, allowing teams to focus on substance. Standards provide consistency, making code easier to read and maintain. Enforce them with tools like linters and code formatters, ensuring that discussions focus on logic and architecture rather than aesthetics.
8. Write Meaningful Commit Messages
Clear commit messages make it easier to understand changes and the rationale behind them. They are essential for effective collaboration and debugging. A commit message should explain the "why" behind a change, not just the "what." This helps future developers understand the context and reduces time spent deciphering history during troubleshooting.
9. Don't Ever Stop Learning New Things
Technology evolves rapidly. Stay curious and keep up with new tools, frameworks, and best practices to remain effective. The software industry is dynamic, with innovations appearing regularly. Make continuous learning a habit, through courses, conferences, or simply experimenting with new technologies.
10. Code Reviews Spread Knowledge
Code reviews are opportunities to share knowledge, identify improvements, and maintain consistency across the codebase. Reviews aren't just for catching bugs; they're a chance to mentor junior developers, share context about the codebase, and learn from peers. Encourage a culture where reviews are collaborative, not adversarial.
11. Always Build for Maintainability
Prioritize readability and modularity. Write code as if the next person maintaining it is a less experienced version of yourself. Maintainable code is self-explanatory, well-documented, and structured in a way that modifications don't introduce unintended side effects. Avoid shortcuts that save time now but create headaches later.
12. Ask for Help When You're Stuck
Stubbornness wastes time and energy. Leverage your team's knowledge to overcome challenges more efficiently. No one has all the answers, and seeking help is a sign of strength, not weakness. Asking for assistance early can prevent wasted effort and lead to better solutions.
13. Fix Root Causes, Not Symptoms
Patchwork fixes lead to recurring problems. Invest the time to identify and resolve the underlying issues. Quick fixes may address immediate symptoms but often exacerbate underlying problems. Use tools like root cause analysis to ensure long-term stability.
14. Software is Never Completed
Software evolves with changing requirements and environments. Embrace updates and refactorings as a natural part of the lifecycle. Even after release, software requires bug fixes, feature enhancements, and adjustments to new technologies. Treat software as a living entity that needs regular care.
15. Estimates Are Not Promises
Treat estimates as informed guesses, not guarantees. Communicate uncertainties and assumptions clearly. Overpromising can erode trust. Instead, explain what factors might affect timelines and provide regular updates as the project progresses.
16. Ship Early, Iterate Often
Releasing early and frequently allows you to gather feedback, address issues, and refine your product based on real-world usage. Getting a minimal viable product (MVP) into users' hands quickly provides valuable insights. Iterative development helps align the product more closely with user needs and reduces the risk of large-scale failures.
These rules aren't hard-and-fast laws but guiding principles to help software engineers navigate the complexities of their craft. Adopting them can lead to better code, smoother collaborations, and more resilient systems.
In today’s digital landscape, the role of a System Administrator (SysAdmin) extends far beyond server uptime and software updates. With cyber threats evolving daily, understanding key information security standards like ISO/IEC 27001:2022 is no longer optional, it’s essential. This international standard provides a robust framework for establishing, implementing, maintaining, and continuously improving an Information Security Management System (ISMS). For SysAdmins, mastering ISO/IEC 27001 isn’t just about compliance; it’s about safeguarding critical infrastructure, protecting sensitive data, and enhancing organizational resilience.
ISO/IEC 27001:2022 is the latest revision of the globally recognized standard for information security management systems. It outlines best practices for managing information security risks, ensuring the confidentiality, integrity, and availability of data. This version revises:
ISO/IEC 27001:2013
ISO/IEC 27001:2013/Cor1:2014
ISO/IEC 27001:2013/Cor2:2015
While the core principles remain, the 2022 update refines requirements to address the evolving cybersecurity landscape, making it even more relevant for today’s IT environments.
Proactive Risk Management
ISO/IEC 27001 equips SysAdmins with a structured approach to identifying, assessing, and mitigating risks. Instead of reacting to security incidents, you’ll have a proactive framework to prevent them.
Enhanced Security Posture
Implementing ISO/IEC 27001 controls helps strengthen the organization’s overall security, from server configurations to user access management.
Compliance and Legal Requirements
Many industries, especially those handling sensitive data (e.g., healthcare, finance), require compliance with ISO/IEC 27001. Understanding the standard ensures your systems meet these legal and regulatory demands.
Career Advancement
Knowledge of ISO/IEC 27001 is highly valued in the IT industry. It demonstrates a commitment to best practices and can open doors to higher-level roles in security and compliance.
ISO/IEC 27001 isn’t a standalone standard. It’s part of a broader ecosystem of ISO standards that address various aspects of information security, risk management, and quality control. Here are some key packages where ISO/IEC 27001 is bundled with other complementary standards:
Information Technology - Security Techniques Package
ISO 27799 / ISO/IEC 27001 / ISO/IEC 27002 - Protected Health Information Security Management Package
ISO 31000 / ISO/IEC 27001 / ISO/IEC 27002 - Information Technology Risk Management Package
ISO 9001 / ISO 14001 / ISO/IEC 27001 / ISO 31000 / ISO 55001 / ISO 22301 - ISO Requirements Collection
ISO/IEC 20000-1 / ISO/IEC 27001 / ISO 9001 - Information Technology Quality Management Package
ISO/IEC 27000 Information Technology Security Techniques Collection
ISO/IEC 27001 / 27002 / 27005 / 27006 - IT Security Techniques Package
ISO/IEC 27001 / ISO 9001 - Information Technology Quality Management Set
ISO/IEC 27001 / ISO/IEC 27002 / ISO/IEC 27005 - Information and Cybersecurity Package
ISO/IEC 27001 / ISO/IEC 27002 / ISO/IEC 27017 - IT Security Control Code of Practice Package
ISO/IEC 27001 / ISO/IEC 27005 - Information Security Management and Risk Set
ISO/IEC 27001 / ISO/IEC 27018 / BS 10012 - General Data Protection Regulation Package
ISO/IEC 27001 and 27002 IT Security Techniques Package
ISO/IEC 27007 / ISO/IEC 27009 / ISO/IEC 27014 / ISO/IEC 27001 - Cybersecurity And Privacy Protection Package
ISO/IEC 27018 / ISO/IEC 29100 / ISO/IEC 27001 - Public Clouds Privacy Framework Package
ISO/IEC 27701 / ISO/IEC 27001 / ISO/IEC 27002 - IT Security Techniques Privacy Information Package
ISO/IEC 27701 / ISO/IEC 27001 / ISO/IEC 27002 / ISO/IEC 29100 - IT Privacy Information System Package
ISO/IEC 30100 / ISO/IEC 27001 - IT Home Network Security Management Package
IT Identity Theft Security Techniques Package
Understanding these related standards provides a more comprehensive view of information security and IT management, allowing SysAdmins to implement more holistic security strategies.
Access Control Management
ISO/IEC 27001 outlines best practices for managing user access, ensuring that only authorized personnel have access to sensitive information.
Incident Response Planning
The standard emphasizes the importance of having a structured incident response plan, which is critical for minimizing the impact of security breaches.
Data Encryption and Protection
It provides guidelines on data encryption, secure data storage, and transmission, all of which are crucial responsibilities for SysAdmins.
Continuous Monitoring and Improvement
ISO/IEC 27001 promotes a cycle of continuous monitoring, auditing, and improvement, essential for maintaining robust security over time.
For those interested in diving deeper into ISO/IEC 27001:2022, the official standard is available for purchase. Get the standard here to start enhancing your organization’s security posture today.
How has your organization implemented ISO/IEC 27001? What challenges have you faced in aligning with this standard? Share your experiences and join the conversation on our forum.
By understanding and applying ISO/IEC 27001:2022, SysAdmins can play a pivotal role in strengthening their organization’s information security framework, ensuring both compliance and resilience in an increasingly complex digital world.
On Thursday, February 6, 2025, multiple Cloudflare services, including R2 object storage, experienced a significant outage lasting 59 minutes. This incident resulted in complete operational failures against R2 and disruptions to dependent services such as Stream, Images, Cache Reserve, Vectorize, and Log Delivery. The root cause was traced to human error and inadequate validation safeguards during routine abuse remediation procedures.
Incident Duration: 08:14 UTC to 09:13 UTC (primary impact), with residual effects until 09:36 UTC.
Primary Issue: Disabling of the R2 Gateway service, responsible for the R2 API.
Data Integrity: No data loss or corruption occurred within R2.
R2: 100% failure of operations (uploads, downloads, metadata) during the outage. Minor residual errors (<1%) post-recovery.
Stream: Complete service disruption during the outage.
Images: Full impact on upload/download; delivery minimally affected (97% success rate).
Cache Reserve: Increased origin requests, impacting <0.049% of cacheable requests.
Log Delivery: Delays and data loss (up to 4.5% for non-R2, 13.6% for R2 jobs).
Durable Objects: 0.09% error rate spike post-recovery.
Cache Purge: 1.8% error rate increase, 10x latency during the incident.
Vectorize: 75% query failures, 100% insert/upsert/delete failures during the outage.
Key Transparency Auditor: Complete failure of publish/read operations.
Workers & Pages: Minimal deployment failures (0.002%) for projects with R2 bindings.
08:12 UTC: R2 Gateway service inadvertently disabled.
08:14 UTC: Service degradation begins.
08:25 UTC: Internal incident declared.
08:42 UTC: Root cause identified.
08:57 UTC: Operations team begins re-enabling the R2 Gateway.
09:10 UTC: R2 starts to recover.
09:13 UTC: Primary impact ends.
09:36 UTC: Residual error rates recover.
10:29 UTC: Incident officially closed after monitoring.
The incident stemmed from human error during a phishing site abuse report remediation. Instead of targeting a specific endpoint, actions mistakenly disabled the entire R2 Gateway service. Contributing factors included:
Lack of system-level safeguards.
Inadequate account tagging and validation.
Limited operator training on critical service disablement risks.
Content Delivery Networks (CDNs) play a vital role in improving website performance, scalability, and security. However, relying heavily on CDNs for critical systems can introduce significant risks when outages occur:
Lost Revenue: Downtime on e-commerce platforms or SaaS services can result in immediate lost sales and financial transactions, directly affecting revenue streams.
Lost Data: Although R2 did not suffer data loss in this incident, disruptions in data transmission processes can lead to lost or incomplete data, especially in logging and analytics services.
Lost Customers: Extended or repeated outages can erode customer trust and satisfaction, leading to churn and damage to brand reputation.
Operational Disruptions: Businesses relying on real-time data processing or automated workflows may face cascading failures when critical CDN services are unavailable.
Immediate Actions:
Deployment of additional guardrails in the Admin API.
Disabling high-risk manual actions in the abuse review UI.
In-Progress Measures:
Improved internal account provisioning.
Restricting product disablement permissions.
Implementing two-party approval for critical actions.
Enhancing abuse checks to prevent internal service disruptions.
Cloudflare acknowledges the severity of this incident and the disruption it caused to customers. We are committed to strengthening our systems, implementing robust safeguards, and ensuring that similar incidents are prevented in the future.
For more information about Cloudflare's services or to explore career opportunities, visit our website.
Before proceeding, ensure the following components are in place:
Verify BackupNinja is installed on your Linux server.
Command:
sudo apt update && sudo apt install backupninja
sudo apt update
sudo add-apt-repository universe
SMB Share Configured on the Windows Machine
BackupShare
).
Gather the necessary credentials for your databases (MySQL/PostgreSQL). Verify that the user has sufficient privileges to perform backups.
SHOW GRANTS FOR 'backupuser'@'localhost';
psql -U postgres -c "\du"
The cifs-utils
package is essential for mounting SMB shares.
Command:
sudo apt install cifs-utils
/etc/backup.d
Directory
Navigate to the directory:
cd /etc/backup.d/
/var/www
Create the backup task file:
sudo nano /etc/backup.d/01-var-www.rsync
[general] when = everyday at 02:00 [rsync] source = /var/www/ destination = //WINDOWS-MACHINE/BackupShare/www/ options = -a --delete smbuser = windowsuser smbpassword = windowspassword
//192.168.1.100/BackupShare/www/
).
Credential File Method:
sudo nano /etc/backup.d/smb.credentials
username=windowsuser password=windowspassword
smbcredentials = /etc/backup.d/smb.credential
For MySQL:
sudo nano /etc/backup.d/02-databases.mysqldump
Example Configuration:
[general] when = everyday at 03:00 [mysqldump] user = backupuser password = secretpassword host = localhost databases = --all-databases compress = true destination = //WINDOWS-MACHINE/BackupShare/mysql/all-databases.sql.gz smbuser = windowsuser smbpassword = windowspassword
For PostgreSQL:
sudo nano /etc/backup.d/02-databases.pgsql
Example Configuration:
[general] when = everyday at 03:00 [pg_dump] user = postgres host = localhost all = yes compress = true destination = //WINDOWS-MACHINE/BackupShare/pgsql/all-databases.sql.gz smbuser = windowsuser smbpassword = windowspassword
Run a configuration check:
sudo backupninja --check
/var/log/backupninja.log
.
sudo backupninja --run
Verify the Backup on the Windows Machine:
Check the BackupShare
folder for your /var/www
and database backups.
sudo mount -t cifs //WINDOWS-MACHINE/BackupShare /mnt -o username=windowsuser,password=windowspassword
/var/log/syslog
or /var/log/messages
for SMB-related errors.
BackupNinja automatically sets up cron jobs based on the when
parameter.
Verify cron jobs:
sudo crontab -l
If necessary, restart the cron service:
sudo systemctl restart cron
Example GPG Command:
gpg --encrypt --recipient 'your-email@example.com' backup-file.sql.gz
Regularly check BackupNinja logs for any errors:
tail -f /var/log/backupninja.log
Add the SMB share to /etc/fstab
to automatically mount it at boot.
Example Entry in /etc/fstab
:
//192.168.1.100/BackupShare /mnt/backup cifs credentials=/etc/backup.d/smb.credentials,iocharset=utf8,sec=ntlm 0 0
smb.credentials
file:
sudo chmod 600 /etc/backup.d/smb.credential
Welcome to this comprehensive guide on Regular Expressions (Regex). This tutorial is designed to equip you with the skills to craft powerful, time-saving regular expressions from scratch. We'll begin with foundational concepts, ensuring you can follow along even if you're new to the world of regex. However, this isn't just a basic guide; we'll delve deeper into how regex engines operate internally, giving you insights that will help you troubleshoot and optimize your patterns effectively.
At its core, a regular expression is a pattern used to match sequences of text. The term originates from formal language theory, but for practical purposes, it refers to text-matching rules you can use across various applications and programming languages.
You'll often encounter abbreviations like regex or regexp. In this guide, we'll use "regex" as it flows naturally when pluralized as "regexes." Throughout this manual, regex patterns will be displayed within guillemets: «pattern». This notation clearly differentiates the regex from surrounding text or punctuation.
For example, the simple pattern «regex» is a valid regex that matches the literal text "regex." The term match refers to the segment of text that the regex engine identifies as conforming to the specified pattern. Matches will be highlighted using double quotation marks, such as "match."
Let's consider a more complex pattern:
\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b
This regex describes an email address pattern. Breaking it down:
\b: Denotes a word boundary to ensure the match starts at a distinct word.
[A-Z0-9._%+-]+: Matches one or more letters, digits, dots, underscores, percentage signs, plus signs, or hyphens.
@: The literal at-sign.
[A-Z0-9.-]+: Matches the domain name.
.: A literal dot.
[A-Z]{2,4}: Matches the top-level domain (TLD) consisting of 2 to 4 letters.
\b: Ensures the match ends at a word boundary.
With this pattern, you can:
Search text files to identify email addresses.
Validate whether a given string resembles a legitimate email address format.
In this tutorial, we'll refer to the text being processed as a string. This term is commonly used by programmers to describe a sequence of characters. Strings will be denoted using regular double quotes, such as "example string."
Regex patterns can be applied to any data that a programming language or software application can access, making them an incredibly versatile tool in text processing and data validation tasks.
Next, we'll explore how to construct regex patterns step by step, starting from simple character matches to more advanced techniques like capturing groups and lookaheads. Let's dive in!
A regular expression engine is a software component that processes regex patterns, attempting to match them against a given string. Typically, you won’t interact directly with the engine. Instead, it operates behind the scenes within applications and programming languages, which invoke the engine as needed to apply the appropriate regex patterns to your data or files.
As is often the case in software development, not all regex engines are created equal. Different engines support different regex syntaxes, often referred to as regex flavors. This tutorial focuses on the Perl 5 regex flavor, widely considered the most popular and influential. Many modern engines, including the open-source PCRE (Perl-Compatible Regular Expressions) engine, closely mimic Perl 5’s syntax but may introduce slight variations. Other notable engines include:
.NET Regular Expression Library
Java’s Regular Expression Package (included from JDK 1.4 onwards)
Whenever significant differences arise between flavors, this guide will highlight them, ensuring you understand which features are specific to Perl-derived engines.
You can start experimenting with regular expressions in any text editor that supports regex functionality. One recommended option is EditPad Pro, which offers a robust regex engine in its evaluation version.
To try it out:
Copy and paste the text from this page into EditPad Pro.
From the menu, select Search > Show Search Panel to open the search pane at the bottom.
In the Search Text box, type «regex».
Check the Regular expression option.
Click Find First to locate the first match. Use Find Next to jump to subsequent matches. When there are no more matches, the Find Next button will briefly flash.
Let’s take it a step further. Try searching for the following regex pattern:
«reg(ular expressions?|ex(p|es)?)»
This pattern matches all variations of the term "regex" used on this page, whether singular or plural. Without regex, you’d need to perform five separate searches to achieve the same result. With regex, one pattern does the job, saving you significant time and effort.
For instance, in EditPad Pro, select Search > Count Matches to see how many times the regex matches the text. This feature showcases the power of regex for efficient text processing.
For programmers, regexes offer both performance and productivity benefits:
Efficiency: Even a basic regex engine can outperform state-of-the-art plain text search algorithms by applying a pattern once instead of running multiple searches.
Reduced Development Time: Checking if a user’s input resembles a valid email address can be accomplished with a single line of code in languages like Perl, PHP, Java, or .NET, or with just a few lines when using libraries like PCRE in C.
By incorporating regex into your workflows and applications, you can achieve faster, more efficient text processing and validation tasks.
The simplest regular expressions consist of literal characters. A literal character is a character that matches itself. For example, the regex «a» will match the first occurrence of the character "a" in a string. Consider the string "Jack is a boy": this pattern will match the "a" after the "J".
It’s important to note that the regex engine doesn’t care where the match occurs within a word unless instructed otherwise. If you want to match entire words, you’ll need to use word boundaries, a concept we’ll cover later.
Similarly, the regex «cat» will match the word "cat" in the string "About cats and dogs." This pattern consists of three literal characters in sequence: c, a, and t. The regex engine looks for these characters in the specified order.
By default, most regex engines are case-sensitive. This means that the pattern cat will not match "Cat" unless you explicitly configure the engine to perform a case-insensitive search.
To go beyond matching literal text, regex engines reserve certain characters for special functions. These are known as metacharacters. The following characters have special meanings in most regex flavors discussed in this tutorial:
[ \ ^ $ . | ? * + ( )
If you need to use any of these characters as literals in your regex, you must escape them with a backslash (\). For instance, to match "1+1=2", you would write the regex as:
1\+1=2
Without the backslash, the plus sign would be interpreted as a quantifier, causing unexpected behavior. For example, the regex «1+1=2» would match "111=2" in the string "123+111=234" because the plus sign is interpreted as "one or more of the preceding characters."
To escape a metacharacter, simply prepend it with a backslash (). For example:
«.» matches a literal dot.
«*» matches a literal asterisk.
«+» matches a literal plus sign.
Most regex flavors also support the \Q...\E escape sequence. This treats everything between \Q and \E as literal characters. For example:
\Q*\d+*\E
This pattern matches the literal text "\d+". If the \E is omitted at the end, it is assumed. This syntax is supported by many engines, including Perl, PCRE, Java, and JGsoft, but it may have quirks in older Java versions.
If you're a programmer, you might expect characters like single and double quotes to be special characters in regex. However, in most regex engines, they are treated as literal characters.
In programming, you must be mindful of characters that your language treats specially within strings. These characters will be processed by the compiler before being passed to the regex engine. For instance:
To use the regex «1+1=2» in C++ code, you would write it as "1\+1=2". The compiler converts the double backslashes into a single backslash for the regex engine.
To match a Windows file path like "c:\temp", the regex would be «c:\temp», and in C++ code, it would be written as "c:\\temp".
Refer to the specific language documentation to understand how to handle regex patterns within your code.
Regular expressions can also match non-printable characters using special sequences. Here are some common examples:
\t: Tab character (ASCII 0x09)
\r: Carriage return (ASCII 0x0D)
\n: Line feed (ASCII 0x0A)
\a: Bell (ASCII 0x07)
\e: Escape (ASCII 0x1B)
\f: Form feed (ASCII 0x0C)
\v: Vertical tab (ASCII 0x0B)
Keep in mind that Windows text files use "\r\n" to terminate lines, while UNIX text files use "\n".
You can include any character in your regex using its hexadecimal or Unicode code point. For example:
\x09: Matches a tab character (same as \t).
\xA9: Matches the copyright symbol (©) in the Latin-1 character set.
\u20AC: Matches the euro currency sign (€) in Unicode.
Additionally, most regex flavors support control characters using the syntax \cA through \cZ, which correspond to Control+A through Control+Z. For example:
\cM: Matches a carriage return, equivalent to \r.
In XML Schema regex, the token «\c» is a shorthand for matching any character allowed in an XML name.
When working with Unicode regex engines, it’s best to use the \uFFFF notation to ensure compatibility with a wide range of characters.
Understanding how a regex engine processes patterns can significantly improve your ability to write efficient and accurate regular expressions. By learning the internal mechanics, you’ll be better equipped to troubleshoot and refine your regex patterns, reducing frustration and guesswork when tackling complex tasks.
There are two primary types of regex engines:
Text-Directed Engines (also known as DFA - Deterministic Finite Automaton)
Regex-Directed Engines (also known as NFA - Non-Deterministic Finite Automaton)
All the regex flavors discussed in this tutorial utilize regex-directed engines. This type is more popular because it supports features like lazy quantifiers and backreferences, which are not possible in text-directed engines.
awk
egrep
flex
lex
MySQL
Procmail
Note: Some versions of awk and egrep use regex-directed engines.
To determine whether a regex engine is text-directed or regex-directed, you can apply a simple test using the pattern:
regex|regex not
Apply this pattern to the string "regex not":
If the result is "regex", the engine is regex-directed.
If the result is "regex not", the engine is text-directed.
The difference lies in how eager the engine is to find matches. A regex-directed engine is eager and will report the leftmost match, even if a better match exists later in the string.
A crucial concept to grasp is that a regex-directed engine will always return the leftmost match. This behavior is essential to understand because it affects how the engine processes patterns and determines matches.
When applying a regex to a string, the engine starts at the first character of the string and tries every possible permutation of the regex at that position. If all possibilities fail, the engine moves to the next character and repeats the process.
For example, consider applying the pattern «cat» to the string:
"He captured a catfish for his cat."
Here’s a step-by-step breakdown:
The engine starts at the first character "H" and tries to match "c" from the pattern. This fails.
The engine moves to "e", then space, and so on, failing each time until it reaches the fourth character "c".
At "c", it tries to match the next character "a" from the pattern with the fifth character of the string, which is "a". This succeeds.
The engine then tries to match "t" with the sixth character, "p", but this fails.
The engine backtracks and resumes at the next character "a", continuing the process.
Finally, at the 15th character in the string, it matches "c", then "a", and finally "t", successfully finding a match for "cat".
The engine reports the first valid match it finds, even if a better match could be found later in the string. In this case, it matches the first three letters of "catfish" rather than the standalone "cat" at the end of the string.
At first glance, the behavior of the regex-directed engine may seem similar to a basic text search routine. However, as we introduce more complex regex tokens, you’ll see how the internal workings of the engine have a profound impact on the matches it returns.
Understanding this behavior will help you avoid surprises and leverage the full power of regex for more effective and efficient text processing.
Character classes, also known as character sets, allow you to define a set of characters that a regex engine should match at a specific position in the text. To create a character class, place the desired characters between square brackets. For instance, to match either an a or an e, use the pattern [ae]. This can be particularly useful when dealing with variations in spelling, such as in the regex gr[ae]y, which will match both "gray" and "grey."
A character class matches only a single character.
The order of characters inside a character class does not affect the outcome.
For example, gr[ae]y will not match "graay" or "graey," as the class only matches one character from the set at a time.
You can specify a range of characters within a character class by using a hyphen (-). For example:
[0-9] matches any digit from 0 to 9.
[a-fA-F] matches any letter from a to f, regardless of case.
You can also combine multiple ranges and individual characters within a character class:
[0-9a-fxA-FX] matches any hexadecimal digit or the letter X.
Again, the order of characters inside the class does not matter.
Here are some practical use cases for character classes:
sep[ae]r[ae]te: Matches "separate" or "seperate" (common spelling errors).
li[cs]en[cs]e: Matches "license" or "licence."
[A-Za-z_][A-Za-z_0-9]*: Matches identifiers in programming languages.
0[xX][A-Fa-f0-9]+: Matches C-style hexadecimal numbers.
By adding a caret (^) immediately after the opening square bracket, you create a negated character class. This instructs the regex engine to match any character not in the specified set.
For example:
q[^u]: Matches a q followed by any character except u.
However, it’s essential to remember that a negated character class still requires a character to follow the initial match. For instance, q[^u] will match the q and the space in "Iraq is a country," but it will not match the q in "Iraq" by itself.
To ensure that the q is not followed by a u, use negative lookahead: q(?!u). We will cover lookaheads later in this tutorial.
Inside character classes, most metacharacters lose their special meaning. However, a few characters retain their special roles:
Closing bracket (])
Backslash (\)
Caret (^) (only if it appears immediately after the opening bracket)
Hyphen (-) (only if placed between characters to specify a range)
To include these characters as literals:
Backslash (\) must be escaped as [\].
Caret (^) can appear anywhere except right after the opening bracket.
Closing bracket (]) can be placed right after the opening bracket or caret.
Hyphen (-) can be placed at the start or end of the class.
Examples:
[x^] matches x or ^.
[]x] matches ] or x.
[^]x] matches any character that is not ] or x.
[-x] matches x or -.
Shorthand character classes are predefined character sets that simplify your regex patterns. Here are the most common shorthand classes:
Shorthand | Meaning | Equivalent Character Class |
---|---|---|
\d | Any digit | [0-9] |
\w | Any word character | [A-Za-z0-9_] |
\s | Any whitespace character | [ \t\r\n] |
\d matches digits from 0 to 9.
\w includes letters, digits, and underscores.
\s matches spaces, tabs, and line breaks. In some flavors, it may also include form feeds and vertical tabs.
The characters included in these shorthand classes may vary depending on the regex flavor. For example:
JavaScript treats \d and \w as ASCII-only but includes Unicode characters for \s.
XML handles \d and \w as Unicode but limits \s to ASCII characters.
Python allows you to control what the shorthand classes match using specific flags.
Shorthand character classes can be used both inside and outside of square brackets:
\s\d matches a whitespace character followed by a digit.
[\s\d] matches a single character that is either whitespace or a digit.
For instance, when applied to the string "1 + 2 = 3":
\s\d matches the space and the digit 2.
[\s\d] matches the digit 1.
The shorthand [\da-fA-F] matches a hexadecimal digit and is equivalent to [0-9a-fA-F].
The primary shorthand classes also have negated versions:
\D: Matches any character that is not a digit. Equivalent to [^\d].
\W: Matches any character that is not a word character. Equivalent to [^\w].
\S: Matches any character that is not whitespace. Equivalent to [^\s].
Be careful when using negated shorthand inside square brackets. For example:
[\D\S] is not the same as [^\d\s].
[\D\S] will match any character, including digits and whitespace, because a digit is not whitespace and whitespace is not a digit.
[^\d\s] will match any character that is neither a digit nor whitespace.
You can repeat a character class using quantifiers like ?, *, or +:
[0-9]+: Matches one or more digits and can match "837" as well as "222".
If you want to repeat the matched character instead of the entire class, you need to use backreferences:
([0-9])\1+: Matches repeated digits, like "222," but not "837."
Applied to the string "833337," this regex matches "3333."
If you want more control over repeated matches, consider using lookahead and lookbehind assertions, which we will explore later in the tutorial.
As previously discussed, the order of characters inside a character class does not matter. For instance, gr[ae]y can match both "gray" and "grey."
Let’s see how the regex engine processes gr[ae]y step by step:
Given the string:
"Is his hair grey or gray?"
The engine starts at the first character and fails to match g until it reaches the 13th character.
At the 13th character, g matches.
The next token r matches the following character.
The character class [ae] gives the engine two options:
First, it tries a, which fails.
Then, it tries e, which matches.
The final token y matches the next character, completing the match.
The engine returns "grey" as the match result and stops searching, even though "gray" also exists in the string. This is because the regex engine is eager to report the first valid match it finds.
Understanding how the regex engine processes character classes helps you write more efficient patterns and predict match results more accurately.
The dot, or period, is one of the most versatile and commonly used metacharacters in regular expressions. However, it is also one of the most misused.
The dot matches any single character except for newline characters. In most regex flavors discussed in this tutorial, the dot does not match newlines by default. This behavior stems from the early days of regex when tools were line-based and processed text line by line. In such cases, the text would not contain newline characters, so the dot could safely match any character.
In modern tools, you can enable an option to make the dot match newline characters as well. For example, in tools like RegexBuddy, EditPad Pro, or PowerGREP, you can check a box labeled "dot matches newline."
In Perl, the mode that makes the dot match newline characters is called single-line mode. You can activate this mode by adding the s
flag to the regex, like this:
m/^regex$/s;
Other languages and regex libraries, such as the .NET framework, have adopted this terminology. In .NET, you can enable single-line mode by using the RegexOptions.Singleline
option:
Regex.Match("string", "regex", RegexOptions.Singleline);
In most programming languages and libraries, enabling single-line mode only affects the behavior of the dot. It has no impact on other aspects of the regex.
However, some languages like JavaScript and VBScript do not have a built-in option to make the dot match newlines. In such cases, you can use a character class like [\s\S]
to achieve the same effect. This class matches any character that is either whitespace or non-whitespace, effectively matching any character.
The dot is a powerful metacharacter that can make your regex very flexible. However, it can also lead to unintended matches if not used carefully. It is easy to write a regex with a dot and find that it matches more than you intended.
Consider the following example:
If you want to match a date in mm/dd/yy format, you might start with the regex:
\d\d.\d\d.\d\d
This regex appears to work at first glance, as it matches "02/12/03". However, it also matches "02512703", where the dots match digits instead of separators.
A better solution is to use a character class to specify valid date separators:
\d\d[- /.]\d\d[- /.]\d\d
This regex matches dates with dashes, spaces, dots, or slashes as separators. Note that the dot inside a character class is treated as a literal character, so it does not need to be escaped.
This regex is still not perfect, as it will match "99/99/99". To improve it further, you can use:
[0-1]\d[- /.][0-3]\d[- /.]\d\d
This regex ensures that the month and day parts are within valid ranges. How perfect your regex needs to be depends on your use case. If you are validating user input, the regex must be precise. If you are parsing data files from a known source, a less strict regex might be sufficient.
Using the dot can sometimes result in overly broad matches. Instead, consider using negated character sets to specify what characters you do not want to match.
For example, to match a double-quoted string, you might be tempted to use:
".*"
At first, this regex seems to work well, matching "string" in:
Put a "string" between double quotes.
However, if you apply it to:
Houston, we have a problem with "string one" and "string two". Please respond.
The regex will match:
"string one" and "string two"
This is not what you intended. The dot matches any character, and the star (*
) quantifier allows it to match across multiple strings, leading to an overly greedy match.
To fix this, use a negated character set instead of the dot:
"[^"]*"
This regex matches any sequence of characters that are not double quotes, enclosed within double quotes. If you also want to prevent matching across multiple lines, use:
"[^"\r\n]*"
This regex ensures that the match does not include newline characters.
By using negated character sets instead of the dot, you can make your regex patterns more precise and avoid unintended matches.
In previous sections, we explored how literal characters and character classes operate in regular expressions. These match specific characters in a string. Anchors, however, are different. They match positions in the string rather than characters, allowing you to "anchor" your regex to the start or end of a string or line.
^
) AnchorThe caret (^
) matches the position before the first character of the string. For example:
^a
applied to "abc" matches "a."
^b
does not match "abc" because "b" is not the first character of the string.
The caret is useful when you want to ensure that a match occurs at the very beginning of a string.
Regex | String | Matches |
---|---|---|
| "abc" | Yes |
| "abc" | No |
$
) AnchorThe dollar sign ($
) matches the position after the last character of the string. For example:
c$
matches "c" in "abc."
a$
does not match "abc" because "a" is not the last character.
Regex | String | Matches |
---|---|---|
| "abc" | Yes |
| "abc" | No |
Anchors are essential for validating user input. For instance, if you want to ensure a user inputs only an integer number, using \d+
will accept any input containing digits, even if it includes letters (e.g., "abc123").
Instead, use ^\d+$
to enforce that the entire string consists only of digits from start to finish.
if ($input =~ /^\d+$/) {
print "Valid integer";
} else {
print "Invalid input";
}
To handle potential leading or trailing whitespace, use:
^\s+
to match leading whitespace.
\s+$
to match trailing whitespace.
In Perl, you can trim whitespace like this:
$input =~ s/^\s+|\s+$//g;
If your string contains multiple lines, you might want to match the start or end of each line instead of the entire string. Multi-line mode changes the behavior of the anchors:
^
matches at the start of each line.
$
matches at the end of each line.
Given the string:
first line
second line
^s
matches "s" in "second line" when multi-line mode is enabled.
In Perl, use the m
flag:
m/^regex$/m;
In .NET, specify RegexOptions.Multiline
:
Regex.Match("string", "regex", RegexOptions.Multiline);
In tools like EditPad Pro, GNU Emacs, and PowerGREP, multi-line mode is enabled by default.
The anchors \A
and \Z
match the start and end of the string, respectively, regardless of multi-line mode:
\A
: Matches only at the start of the string.
\Z
: Matches only at the end of the string, before any newline character.
\z
: Matches only at the very end of the string, including after a newline character.
For example:
Regex | String | Matches |
---|---|---|
| "abc" | Yes |
| "abc\n" | Yes |
| "abc\n" | No |
Some regex flavors, like JavaScript, POSIX, and XML, do not support \A
and \Z
. In such cases, use the caret (^
) and dollar sign ($
) instead.
Anchors match positions rather than characters, resulting in zero-length matches. For example:
^
matches the start of a string.
$
matches the end of a string.
Using ^\d*$
to validate a number will accept an empty string. This happens because the regex matches the position at the start of the string and the zero-length match caused by the star quantifier.
To avoid this, ensure your regex accounts for actual input:
^\d+$
In some scenarios, you may want to add a prefix to each line of a multi-line string. For example, to prepend a "> " to each line in an email reply, use multi-line mode:
Dim Quoted As String = Regex.Replace(Original, "^", "> ", RegexOptions.Multiline)
This regex matches the start of each line and inserts the prefix "> " without removing any characters.
There is an exception to how $
and \Z
behave. If the string ends with a line break, $
and \Z
match before the line break, not at the very end of the string.
For example:
The string "joe\n" will match ^[a-z]+$
and \A[a-z]+\Z
.
However, \A[a-z]+\z
will not match because \z
requires the match to be at the very end of the string, including after the newline.
Use \z
to ensure a match at the absolute end of the string.
Let’s see what happens when we apply ^4$
to the string:
749
486
4
In multi-line mode, the regex engine processes the string as follows:
The engine starts at the first character, "7". The ^
matches the position before "7".
The engine advances to 4
, and ^
cannot match because it is not preceded by a newline.
The process continues until the engine reaches the final "4", which is preceded by a newline.
The ^
matches the position before "4", and the engine successfully matches 4
.
The engine attempts to match $
at the position after "4", and it succeeds because it is the end of the string.
The regex engine reports the match as "4" at the end of the string.
When working with anchors, be mindful of zero-length matches. For example, $
can match the position after the last character of the string. Querying for String[Regex.MatchPosition]
may result in an access violation or segmentation fault if the match position points to the void after the string. Handle these cases carefully in your code.
The \b
metacharacter is an anchor, similar to the caret (^
) and dollar sign ($
). It matches a zero-length position called a word boundary. Word boundaries allow you to perform “whole word” searches in a string using patterns like \bword\b
.
A word boundary occurs at three possible positions in a string:
Before the first character if it is a word character.
After the last character if it is a word character.
Between two characters where one is a word character and the other is a non-word character.
A word character includes letters, digits, and the underscore ([a-zA-Z0-9_]
). Non-word characters are everything else.
The pattern \bword\b
matches the word "word" only if it appears as a standalone word in the text.
Regex | String | Matches |
---|---|---|
| "There are 44 sheets" | No |
| "Sheet number 4 is here" | Yes |
Digits are considered word characters, so \b4\b
will match a standalone "4" but not when it is part of "44."
The \B
metacharacter is the negated version of \b
. It matches any position that is not a word boundary.
Regex | String | Matches |
---|---|---|
| "This is a test" | No |
| "This island is beautiful" | Yes |
\Bis\B
would match "is" only if it appears within a word, such as in "island," but not if it appears as a standalone word.
Let’s see how the regex \bis\b
works on the string "This island is beautiful":
The engine starts with \b
at the first character "T." Since \b
is zero-width, it checks the position before "T." It matches because "T" is a word character, and the position before it is the start of the string.
The engine then checks the next token, i
, which does not match "T," so it moves to the next position.
The engine continues checking until it finds a match at the second "is." The final \b
matches before the space after "is," confirming a complete match.
Most regex flavors use \b
for word boundaries. However, Tcl uses different syntax:
\y
matches a word boundary.
\Y
matches a non-word boundary.
\m
matches only the start of a word.
\M
matches only the end of a word.
For example, in Tcl:
\mword\M
matches "word" as a whole word.
In most flavors, you can achieve the same with \bword\b
.
If your regex flavor supports lookahead and lookbehind, you can emulate Tcl’s \m
and \M
:
(?<!\w)(?=\w)
: Emulates \m
.
(?<=\w)(?!\w)
: Emulates \M
.
For flavors without lookbehind, use:
\b(?=\w)
to emulate \m
.
\b(?!\w)
to emulate \M
.
GNU extensions to POSIX regular expressions support \b
and \B
. Additionally, GNU regex introduces:
\<
: Matches the start of a word (like Tcl’s \m
).
\>
: Matches the end of a word (like Tcl’s \M
).
These additional tokens provide flexibility when working with word boundaries in GNU-based tools.
Word boundaries are crucial for identifying standalone words in text. They prevent partial matches within larger words and ensure more precise regex patterns. Understanding how to use \b
, \B
, and their equivalents in various regex flavors will help you craft better, more accurate regular expressions.
Previously, we explored how character classes allow you to match a single character out of several possible options. Alternation, on the other hand, enables you to match one of several possible regular expressions.
The vertical bar or pipe symbol (|
) is used for alternation. It acts as an OR operator within a regex.
To search for either "cat" or "dog," use the pattern:
cat|dog
You can add more options as needed:
cat|dog|mouse|fish
The regex engine will match any of these options. For example:
Regex | String | Matches |
---|---|---|
cat|dog|mouse|fish | "I have a cat and a dog" | ✅ Yes |
cat|dog|mouse|fish | "I have a fish" | ✅ Yes |
The alternation operator has the lowest precedence among all regex operators. This means the regex engine will try to match everything to the left or right of the vertical bar. If you need to control the scope of the alternation, use round brackets (()
) to group expressions.
Without grouping:
\bcat|dog\b
This regex will match:
A word boundary followed by "cat"
"dog" followed by a word boundary
With grouping:
\b(cat|dog)\b
This regex will match:
A word boundary, then either "cat" or "dog," followed by another word boundary.
Regex | String | Matches |
---|---|---|
\bcat|dog\b | "I saw a cat dog" | ✅ Yes |
\b(cat|dog)\b | "I saw a cat dog" | ✅ Yes |
The regex engine is eager, meaning it stops searching as soon as it finds a valid match. The order of alternatives matters.
Consider the pattern:
Get|GetValue|Set|SetValue
When applied to the string "SetValue," the engine will:
Try to match Get
, but fail.
Try GetValue
, but fail.
Match Set
and stop.
The result is that the engine matches "Set," but not "SetValue." This happens because the engine found a valid match early and stopped.
There are several ways to address this behavior:
By changing the order of options, you can ensure longer matches are attempted first:
GetValue|Get|SetValue|Set
This way, "SetValue" will be matched before "Set."
You can combine related options and use ?
to make parts of them optional:
Get(Value)?|Set(Value)?
This pattern ensures "GetValue" is matched before "Get," and "SetValue" before "Set."
To ensure you match whole words only, use word boundaries:
\b(Get|GetValue|Set|SetValue)\b
Alternatively, use:
\b(Get(Value)?|Set(Value)?)\b
Or even better:
\b(Get|Set)(Value)?\b
This pattern is more efficient and concise.
Unlike most regex engines, POSIX-compliant regex engines always return the longest possible match, regardless of the order of alternatives. In a POSIX engine, applying Get|GetValue|Set|SetValue
to "SetValue" will return "SetValue," not "Set." This behavior is due to the POSIX standard, which prioritizes the longest match.
Alternation is a powerful feature in regex that allows you to match one of several possible patterns. However, due to the eager behavior of most regex engines, it’s essential to order your alternatives carefully and use grouping to ensure accurate matches. By understanding how the engine processes alternation, you can write more effective and optimized regex patterns.
The question mark (?
) makes the preceding token in a regular expression optional. This means that the regex engine will try to match the token if it is present, but it won’t fail if the token is absent.
For example:
colou?r
This pattern matches both "colour" and "color." The u
is optional due to the question mark.
You can make multiple tokens optional by grouping them with round brackets and placing a question mark after the closing bracket:
Nov(ember)?
This regex matches both "Nov" and "November."
You can use multiple optional groups to match more complex patterns. For instance:
Feb(ruary)? 23(rd)?
This pattern matches:
"February 23rd"
"February 23"
"Feb 23rd"
"Feb 23"
The question mark is a greedy operator. This means that the regex engine will first try to match the optional part. It will only skip the optional part if matching it causes the entire regex to fail.
For example:
Feb 23(rd)?
When applied to the string "Today is Feb 23rd, 2003," the engine will match "Feb 23rd" rather than "Feb 23" because it tries to match as much as possible.
You can make the question mark lazy by adding another question mark after it:
Feb 23(rd)??
In this case, the regex will match "Feb 23" instead of "Feb 23rd."
Let’s see how the regex engine processes the pattern:
colou?r
when applied to the string "The colonel likes the color green."
The engine starts by matching the literal c
with the c
in "colonel."
It continues matching o
, l
, and o
.
It then tries to match u
, but fails when it reaches n
in "colonel."
The question mark makes u
optional, so the engine skips it and moves to r
.
r
does not match n
, so the engine backtracks and starts searching from the next occurrence of c
in the string.
The engine eventually matches color
in "color green." It matches the entire word because the u
was skipped, and the remaining characters matched successfully.
The question mark is a versatile operator that allows you to make parts of a regex optional. It is greedy by default, but you can make it lazy by using ??.
Understanding how the regex engine processes optional items is essential for creating efficient and accurate patterns.
In addition to the question mark, regex provides two more repetition operators: the asterisk (*
) and the plus (+
).
The *
(star) matches the preceding token zero or more times. The +
(plus) matches the preceding token one or more times.
For example:
<[A-Za-z][A-Za-z0-9]*>
This pattern matches HTML tags without attributes:
<[A-Za-z]
matches the first letter.
[A-Za-z0-9]*
matches zero or more alphanumeric characters after the first letter.
This regex will match tags like:
<B>
<HTML>
If you used +
instead of *
, the regex would require at least one alphanumeric character after the first letter, making it match:
<HTML>
but not <1>
.
Modern regex flavors allow you to limit repetitions using curly braces ({}
).
{min,max}
min
: Minimum number of matches.
max
: Maximum number of matches.
Examples:
{0,}
is equivalent to *
.
{1,}
is equivalent to +
.
{3}
matches exactly three repetitions.
\b[1-9][0-9]{3}\b
This pattern matches numbers between 1000 and 9999.
\b[1-9][0-9]{2,4}\b
This pattern matches numbers between 100 and 99999.
The word boundaries (\b
) ensure that only complete numbers are matched.
All repetition operators (*
, +
, and {}
) are greedy by default. This means the regex engine will try to match as much text as possible.
Consider the pattern:
<.+>
When applied to the string:
This is a <EM>first</EM> test.
You might expect it to match <EM>
and </EM>
separately. However, it will match <EM>first</EM>
instead.
This happens because the +
is greedy and matches as many characters as possible.
The first token in the regex is <
, which matches the first <
in the string.
The next token is the .
(dot), which matches any character except newlines. The +
causes the dot to repeat as many times as possible:
The dot matches E
, then M
, and so on.
It continues matching until the end of the string.
At this point, the >
token fails to match because there are no more characters left.
The engine then backtracks and tries to reduce the match length until >
matches the next character.
The final match is <EM>first</EM>
.
To fix this issue, make the quantifier lazy by adding a question mark (?
😞
<.+?>
This tells the engine to match as few characters as possible.
The <
matches the first <
.
The .
matches E
.
The engine checks for >
and finds a match right after EM
.
The final match is <EM>
, which is what we intended.
Instead of using lazy quantifiers, you can use a negated character class:
<[^>]+>
This pattern matches any sequence of characters that are not >
, followed by >
. It avoids backtracking and improves performance.
Given the string:
This is a <EM>first</EM> test.
The regex <[^>]+>
will match:
<EM>
</EM>
This approach is more efficient because it reduces backtracking, which can significantly improve performance in large datasets or tight loops.
The *
, +
, and {}
quantifiers control repetition in regex. They are greedy by default, but you can make them lazy by adding a question mark (?
). Using negated character classes is another way to handle repetition efficiently without backtracking.
In regular expressions, round brackets (()
) are used for grouping. Grouping allows you to apply operators to multiple tokens at once. For example, you can make an entire group optional or repeat the entire group using repetition operators.
For example:
Set(Value)?
This pattern matches:
"Set"
"SetValue"
The round brackets group "Value", and the question mark makes it optional.
Note:
Square brackets ([]
) define character classes.
Curly braces ({}
) specify repetition counts.
Only round brackets (()
) are used for grouping.
Round brackets not only group parts of a regex but also create backreferences. A backreference stores the text matched by the group, allowing you to reuse it later in the regex or replacement text.
Set(Value)?
If "SetValue" is matched, the backreference \1
will contain "Value". If only "Set" is matched, the backreference will be empty.
To prevent creating a backreference, use non-capturing parentheses:
Set(?:Value)?
The (?: ... )
syntax disables capturing, making the regex more efficient when backreferences are not needed.
Backreferences are often used in search-and-replace operations. The exact syntax for using backreferences in replacement text varies between tools and programming languages.
For example, in many tools:
\1
refers to the first capturing group.
\2
refers to the second capturing group, and so on.
In replacement text, you can use these backreferences to reinsert matched text:
Find: (\w+)\s+\1
Replace: \1
This pattern finds doubled words like "the the" and replaces them with a single instance.
Backreferences can also be used within the regex itself to match the same text again.
<([A-Z][A-Z0-9]*)\b[^>]*>.*?</\1>
This pattern matches an HTML tag and its corresponding closing tag. The opening tag name is captured in the first backreference, and \1
is used to ensure the closing tag matches the same name.
Backreferences are numbered based on the order of opening brackets in the regex:
The first opening bracket creates backreference \1
.
The second opening bracket creates backreference \2
.
Non-capturing groups do not count toward the numbering.
([a-c])x\1x\1
This pattern matches:
"axaxa"
"bxbxb"
"cxcxc"
If a group is optional and not matched, the backreference will be empty, but the regex will still work.
Let’s see how the regex engine processes the following pattern:
<([A-Z][A-Z0-9]*)\b[^>]*>.*?</\1>
when applied to the string:
Testing <B><I>bold italic</I></B> text
The engine matches <B>
and stores "B" in the first backreference.
It skips over the text until it finds the closing </B>
.
The backreference \1
ensures the closing tag matches the same name as the opening tag.
The entire match is <B><I>bold italic</I></B>
.
There’s a difference between a backreference to a group that matched nothing and one to a group that did not participate at all:
(q?)b\1
This pattern matches "b" because the optional q?
matched nothing.
In contrast:
(q)?b\1
This pattern fails to match "b" because the group (q)
did not participate in the match at all.
In most regex flavors, a backreference to a non-participating group causes the match to fail. However, in JavaScript, backreferences to non-participating groups match an empty string.
Some modern regex flavors, like .NET, Java, and Perl, allow forward references. A forward reference is a backreference to a group that appears later in the regex.
(\2two|(one))+
This pattern matches "oneonetwo". The forward reference \2
fails at first but succeeds when the group is matched during repetition.
In most flavors, referencing a group that doesn’t exist results in an error. In JavaScript and Ruby, such references result in a zero-width match.
The regex engine doesn’t permanently substitute backreferences in the regex. Instead, it uses the most recent value captured by the group.
([abc]+)=\1
This pattern matches "cab=cab".
In contrast:
([abc])+\1
This pattern does not match "cab" because the backreference holds only the last value captured by the group (in this case, "b").
You can use the following regex to find doubled words in a text:
\b(\w+)\s+\1\b
In your text editor, replace the doubled word with \1
to remove the duplicate.
Input: "the the cat"
Output: "the cat"
Round brackets cannot be used inside character classes. For example:
[(a)b]
This pattern matches the literal characters "a", "b", "(", and ")".
Backreferences also cannot be used inside character classes. In most flavors, \1
inside a character class is treated as an octal escape sequence.
(a)[\1b]
This pattern matches "a" followed by either \x01
(an octal escape) or "b".
Grouping with round brackets allows you to:
Apply operators to entire groups of tokens.
Create backreferences for reuse in the regex or replacement text.
Use non-capturing groups (?: ... )
to avoid creating unnecessary backreferences and improve performance. Be mindful of the limitations and differences in behavior across various regex flavors.
Terms of Use Privacy Policy Guidelines We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.