Regex Explained: A Practical Guide to Regular Expressions
Master regular expressions with practical examples. Learn regex syntax, common patterns, and how to use regex for validation, search, and data extraction.
Regular expressions (regex) are one of the most powerful tools in a developer's arsenal. They allow you to search, match, and manipulate text with incredible precision — from validating email addresses to parsing log files and extracting structured data from unstructured text.
This guide will take you from regex beginner to confident user, covering syntax, real-world patterns, multiple programming languages, performance pitfalls, and debugging techniques.
What is a Regular Expression?
A regular expression is a sequence of characters that defines a search pattern. It's used to:
- Search for text matching a pattern
- Validate input (emails, phone numbers, passwords)
- Extract specific parts of text
- Replace text based on patterns
Basic Regex Syntax
Literal Characters
The simplest regex is a literal string:
/hello/
This matches the exact text "hello".
Character Classes
Square brackets define a set of characters to match:
/[aeiou]/ → matches any vowel
/[0-9]/ → matches any digit
/[a-zA-Z]/ → matches any letter
Shorthand Character Classes
| Pattern | Meaning | Equivalent |
|---|---|---|
\d |
Digit | [0-9] |
\w |
Word character | [a-zA-Z0-9_] |
\s |
Whitespace | [ \t\n\r] |
\D |
Non-digit | [^0-9] |
\W |
Non-word | [^a-zA-Z0-9_] |
\S |
Non-whitespace | [^ \t\n\r] |
Quantifiers
Quantifiers specify how many times to match:
| Pattern | Meaning |
|---|---|
* |
Zero or more |
+ |
One or more |
? |
Zero or one |
{3} |
Exactly 3 |
{2,5} |
Between 2 and 5 |
{3,} |
3 or more |
Common Regex Patterns
Email Validation
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
Breakdown:
^— Start of string[a-zA-Z0-9._%+-]+— Username (one or more allowed characters)@— At symbol[a-zA-Z0-9.-]+— Domain name\.— Literal dot[a-zA-Z]{2,}— Top-level domain (2+ letters)$— End of string
URL Matching
^https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)$
Phone Number (US)
^(\+\d{1,2}\s?)?\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4}$
Matches formats like:
- (123) 456-7890
- 123-456-7890
- 123.456.7890
- +1 (123) 456-7890
Password Strength
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$
Requires:
- At least one lowercase letter
- At least one uppercase letter
- At least one digit
- At least one special character
- Minimum 8 characters
IP Address (IPv4)
^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$
Regex in JavaScript
Testing for a Match
const emailRegex = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
const isValid = emailRegex.test('user@example.com'); // true
Finding All Matches
const text = 'Contact us at info@example.com or support@example.com';
const emailRegex = /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g;
const emails = text.match(emailRegex);
// ['info@example.com', 'support@example.com']
Replacing Text
const phone = '123-456-7890';
const formatted = phone.replace(/(\d{3})-(\d{3})-(\d{4})/, '($1) $2-$3');
// '(123) 456-7890'
Extracting Groups
const url = 'https://www.example.com/path';
const match = url.match(/^(https?):\/\/(www\.)?(.+)$/);
// match[1] = 'https'
// match[2] = 'www.'
// match[3] = 'example.com/path'
Named Capture Groups (ES2018+)
Named groups make regex self-documenting:
const dateRegex = /^(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})$/;
const match = '2024-02-15'.match(dateRegex);
console.log(match.groups.year); // '2024'
console.log(match.groups.month); // '02'
console.log(match.groups.day); // '15'
Regex in Python
Python's re module provides a powerful regex engine:
import re
# Test for a match
email_pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
is_valid = bool(re.match(email_pattern, 'user@example.com'))
# Find all matches
text = 'Contact info@example.com or support@example.com'
emails = re.findall(r'[\w.+-]+@[\w.-]+\.\w{2,}', text)
# ['info@example.com', 'support@example.com']
# Replace
phone = '123-456-7890'
formatted = re.sub(r'(\d{3})-(\d{3})-(\d{4})', r'(\1) \2-\3', phone)
# '(123) 456-7890'
# Named groups
pattern = r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'
match = re.search(pattern, '2024-02-15')
print(match.group('year')) # '2024'
More Real-World Patterns
Date Matching (ISO 8601)
^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$
Matches valid dates like 2024-02-15 but rejects 2024-13-45.
Credit Card (basic pattern)
^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13})$
Matches Visa (starts with 4), Mastercard (51-55), and Amex (34/37).
HTML Tag Extraction
<(\w+)([^>]*)>(.*?)<\/\1>
Captures tag name, attributes, and content. Use lazy quantifier .*? to avoid matching across multiple tags.
Log File Parsing
^\[(?<timestamp>[^\]]+)\]\s+(?<level>\w+):\s+(?<message>.+)$
Extracts timestamp, log level, and message from lines like:
[2024-02-15 14:30:00] ERROR: Connection timeout
Advanced Regex Features
Lookahead and Lookbehind
(?=pattern) → Positive lookahead (must be followed by)
(?!pattern) → Negative lookahead (must NOT be followed by)
(?<=pattern) → Positive lookbehind (must be preceded by)
(?<!pattern) → Negative lookbehind (must NOT be preceded by)
Example — match "$" only when followed by digits:
const text = 'Price: $100, Code: ABC';
const matches = text.match(/\$(?=\d+)/g); // ['$']
Non-Capturing Groups
(?:pattern)
Groups without capturing — useful for alternation:
const regex = /(?:https?|ftp):\/\/(.+)/;
Common Mistakes
Greedy vs Lazy
By default, quantifiers are greedy (match as much as possible):
const text = '<b>bold</b> and <i>italic</i>';
const greedy = text.match(/<.*>/); // '<b>bold</b> and <i>italic</i>'
const lazy = text.match(/<.*?>/g); // ['<b>', '</b>', '<i>', '</i>']
Forgetting to Escape
Special characters must be escaped:
// Wrong
const regex = /file.txt/; // Matches 'file.txt', 'fileatxt', etc.
// Right
const regex = /file\.txt/; // Matches only 'file.txt'
Performance: Catastrophic Backtracking
Poorly written regex can cause exponential time complexity. Consider:
// DANGEROUS: can hang on non-matching input
const bad = /^(a+)+$/.test('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab');
// SAFE: use atomic groups or possessive quantifiers
const good = /^a+$/.test('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab');
Rules to avoid catastrophic backtracking:
- Avoid nested quantifiers like
(a+)+ - Be specific: use
[a-z]+instead of.+ - Use atomic groups
(?>...)when available - Test with long non-matching input before deploying
Debugging Regex
Common Debugging Techniques
- Use online testers: regex101.com, regexr.com — they show step-by-step matching
- Break complex patterns into parts: test each sub-pattern separately
- Use comments in verbose mode (Python):
pattern = re.compile(r'''
^ # Start of string
[a-zA-Z0-9._%+-]+ # Username
@ # At symbol
[a-zA-Z0-9.-]+ # Domain
\. # Dot
[a-zA-Z]{2,} # TLD
$ # End
''', re.VERBOSE)
- Test edge cases: empty strings, very long input, special characters, Unicode
Conclusion
Regular expressions are indispensable for text processing. With practice, you'll be able to write complex patterns confidently. The key is to start simple, test incrementally, and always consider edge cases and performance.
Build and test regex patterns instantly with our free Regex Generator — real-time matching with visual feedback, completely in your browser with no data sent to any server.
Try It Yourself
Put what you've learned into practice with our free online tools.
Related Articles
CSS Units Explained: When to Use px, rem, em, vh, and vw
Stop guessing which CSS unit to use. This practical guide breaks down every CSS unit with real-world examples, comparison tables, and decision frameworks for responsive design.
HTTP Status Codes Explained: What Every 404, 301, and 500 Really Means
From 200 OK to 503 Service Unavailable, HTTP status codes tell the story of every web request. This guide decodes every important status code with real-world examples and debugging tips.
JWT Decoding Tutorial: How to Read and Debug JSON Web Tokens
Learn how to decode JWT tokens, understand their structure, debug authentication issues, and implement JWT security best practices in your applications.