Tutorial2024-02-15

Regex Explained: A Practical Guide to Regular Expressions

Master regular expressions with practical examples. Learn regex syntax, common patterns, and how to use regex for validation, search, and data extraction.

#regex#pattern-matching#validation#web-development#tutorial

Regular expressions (regex) are one of the most powerful tools in a developer's arsenal. They allow you to search, match, and manipulate text with incredible precision — from validating email addresses to parsing log files and extracting structured data from unstructured text.

This guide will take you from regex beginner to confident user, covering syntax, real-world patterns, multiple programming languages, performance pitfalls, and debugging techniques.

What is a Regular Expression?

A regular expression is a sequence of characters that defines a search pattern. It's used to:

  • Search for text matching a pattern
  • Validate input (emails, phone numbers, passwords)
  • Extract specific parts of text
  • Replace text based on patterns

Basic Regex Syntax

Literal Characters

The simplest regex is a literal string:

/hello/

This matches the exact text "hello".

Character Classes

Square brackets define a set of characters to match:

/[aeiou]/     → matches any vowel
/[0-9]/       → matches any digit
/[a-zA-Z]/    → matches any letter

Shorthand Character Classes

Pattern Meaning Equivalent
\d Digit [0-9]
\w Word character [a-zA-Z0-9_]
\s Whitespace [ \t\n\r]
\D Non-digit [^0-9]
\W Non-word [^a-zA-Z0-9_]
\S Non-whitespace [^ \t\n\r]

Quantifiers

Quantifiers specify how many times to match:

Pattern Meaning
* Zero or more
+ One or more
? Zero or one
{3} Exactly 3
{2,5} Between 2 and 5
{3,} 3 or more

Common Regex Patterns

Email Validation

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

Breakdown:

  • ^ — Start of string
  • [a-zA-Z0-9._%+-]+ — Username (one or more allowed characters)
  • @ — At symbol
  • [a-zA-Z0-9.-]+ — Domain name
  • \. — Literal dot
  • [a-zA-Z]{2,} — Top-level domain (2+ letters)
  • $ — End of string

URL Matching

^https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)$

Phone Number (US)

^(\+\d{1,2}\s?)?\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4}$

Matches formats like:

  • (123) 456-7890
  • 123-456-7890
  • 123.456.7890
  • +1 (123) 456-7890

Password Strength

^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$

Requires:

  • At least one lowercase letter
  • At least one uppercase letter
  • At least one digit
  • At least one special character
  • Minimum 8 characters

IP Address (IPv4)

^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$

Regex in JavaScript

Testing for a Match

const emailRegex = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
const isValid = emailRegex.test('user@example.com'); // true

Finding All Matches

const text = 'Contact us at info@example.com or support@example.com';
const emailRegex = /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g;
const emails = text.match(emailRegex);
// ['info@example.com', 'support@example.com']

Replacing Text

const phone = '123-456-7890';
const formatted = phone.replace(/(\d{3})-(\d{3})-(\d{4})/, '($1) $2-$3');
// '(123) 456-7890'

Extracting Groups

const url = 'https://www.example.com/path';
const match = url.match(/^(https?):\/\/(www\.)?(.+)$/);
// match[1] = 'https'
// match[2] = 'www.'
// match[3] = 'example.com/path'

Named Capture Groups (ES2018+)

Named groups make regex self-documenting:

const dateRegex = /^(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})$/;
const match = '2024-02-15'.match(dateRegex);

console.log(match.groups.year);  // '2024'
console.log(match.groups.month); // '02'
console.log(match.groups.day);   // '15'

Regex in Python

Python's re module provides a powerful regex engine:

import re

# Test for a match
email_pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
is_valid = bool(re.match(email_pattern, 'user@example.com'))

# Find all matches
text = 'Contact info@example.com or support@example.com'
emails = re.findall(r'[\w.+-]+@[\w.-]+\.\w{2,}', text)
# ['info@example.com', 'support@example.com']

# Replace
phone = '123-456-7890'
formatted = re.sub(r'(\d{3})-(\d{3})-(\d{4})', r'(\1) \2-\3', phone)
# '(123) 456-7890'

# Named groups
pattern = r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'
match = re.search(pattern, '2024-02-15')
print(match.group('year'))  # '2024'

More Real-World Patterns

Date Matching (ISO 8601)

^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$

Matches valid dates like 2024-02-15 but rejects 2024-13-45.

Credit Card (basic pattern)

^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13})$

Matches Visa (starts with 4), Mastercard (51-55), and Amex (34/37).

HTML Tag Extraction

<(\w+)([^>]*)>(.*?)<\/\1>

Captures tag name, attributes, and content. Use lazy quantifier .*? to avoid matching across multiple tags.

Log File Parsing

^\[(?<timestamp>[^\]]+)\]\s+(?<level>\w+):\s+(?<message>.+)$

Extracts timestamp, log level, and message from lines like:

[2024-02-15 14:30:00] ERROR: Connection timeout

Advanced Regex Features

Lookahead and Lookbehind

(?=pattern)    → Positive lookahead (must be followed by)
(?!pattern)    → Negative lookahead (must NOT be followed by)
(?<=pattern)   → Positive lookbehind (must be preceded by)
(?<!pattern)   → Negative lookbehind (must NOT be preceded by)

Example — match "$" only when followed by digits:

const text = 'Price: $100, Code: ABC';
const matches = text.match(/\$(?=\d+)/g); // ['$']

Non-Capturing Groups

(?:pattern)

Groups without capturing — useful for alternation:

const regex = /(?:https?|ftp):\/\/(.+)/;

Common Mistakes

Greedy vs Lazy

By default, quantifiers are greedy (match as much as possible):

const text = '<b>bold</b> and <i>italic</i>';
const greedy = text.match(/<.*>/);      // '<b>bold</b> and <i>italic</i>'
const lazy = text.match(/<.*?>/g);      // ['<b>', '</b>', '<i>', '</i>']

Forgetting to Escape

Special characters must be escaped:

// Wrong
const regex = /file.txt/;  // Matches 'file.txt', 'fileatxt', etc.

// Right
const regex = /file\.txt/; // Matches only 'file.txt'

Performance: Catastrophic Backtracking

Poorly written regex can cause exponential time complexity. Consider:

// DANGEROUS: can hang on non-matching input
const bad = /^(a+)+$/.test('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab');

// SAFE: use atomic groups or possessive quantifiers
const good = /^a+$/.test('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab');

Rules to avoid catastrophic backtracking:

  1. Avoid nested quantifiers like (a+)+
  2. Be specific: use [a-z]+ instead of .+
  3. Use atomic groups (?>...) when available
  4. Test with long non-matching input before deploying

Debugging Regex

Common Debugging Techniques

  1. Use online testers: regex101.com, regexr.com — they show step-by-step matching
  2. Break complex patterns into parts: test each sub-pattern separately
  3. Use comments in verbose mode (Python):
pattern = re.compile(r'''
    ^                   # Start of string
    [a-zA-Z0-9._%+-]+  # Username
    @                   # At symbol
    [a-zA-Z0-9.-]+     # Domain
    \.                  # Dot
    [a-zA-Z]{2,}       # TLD
    $                   # End
''', re.VERBOSE)
  1. Test edge cases: empty strings, very long input, special characters, Unicode

Conclusion

Regular expressions are indispensable for text processing. With practice, you'll be able to write complex patterns confidently. The key is to start simple, test incrementally, and always consider edge cases and performance.

Build and test regex patterns instantly with our free Regex Generator — real-time matching with visual feedback, completely in your browser with no data sent to any server.

🛠

Try It Yourself

Put what you've learned into practice with our free online tools.