Python re Module

The re module in Python provides support for working with regular expressions, which are patterns used to match character combinations in strings. Regular expressions are used for searching, matching, and manipulating strings based on specific patterns.

Table of Contents

  1. Introduction
  2. re Module Functions
    • re.compile
    • re.search
    • re.match
    • re.fullmatch
    • re.split
    • re.findall
    • re.finditer
    • re.sub
    • re.subn
  3. Regular Expression Syntax
  4. Examples
    • Basic Usage
    • Using Groups and Capturing
    • Using Flags
    • Advanced Substitution
  5. Real-World Use Case
  6. Conclusion
  7. References

Introduction

The re module in Python is used for working with regular expressions. Regular expressions allow you to specify patterns for searching and manipulating strings. With the re module, you can perform various operations such as searching for patterns, splitting strings, replacing substrings, and more.

re Module Functions

re.compile

Compiles a regular expression pattern into a regex object, which can be used for matching.

import re

pattern = re.compile(r'\d+'

re.search

Searches the string for a match to the pattern. Returns a match object if found.

import re

result = re.search(r'\d+', 'Sample123String')
print(result.group())

Output:

123

re.match

Checks for a match only at the beginning of the string. Returns a match object if found.

import re

result = re.match(r'\d+', '123Sample')
print(result.group())

Output:

123

re.fullmatch

Checks for a match only if the entire string matches the pattern. Returns a match object if found.

import re

result = re.fullmatch(r'\d+', '123')
print(result.group())

Output:

123

re.split

Splits the string by occurrences of the pattern.

import re

result = re.split(r'\d+', 'Sample123String456Another789')
print(result)

Output:

['Sample', 'String', 'Another', '']

re.findall

Finds all non-overlapping matches of the pattern in the string. Returns a list of matches.

import re

result = re.findall(r'\d+', 'Sample123String456Another789')
print(result)

Output:

['123', '456', '789']

re.finditer

Finds all non-overlapping matches of the pattern in the string. Returns an iterator yielding match objects.

import re

result = re.finditer(r'\d+', 'Sample123String456Another789')
for match in result:
    print(match.group())

Output:

123
456
789

re.sub

Replaces occurrences of the pattern with a replacement string.

import re

result = re.sub(r'\d+', '#', 'Sample123String456Another789')
print(result)

Output:

Sample#String#Another#

re.subn

Replaces occurrences of the pattern with a replacement string. Returns a tuple containing the new string and the number of replacements.

import re

result = re.subn(r'\d+', '#', 'Sample123String456Another789')
print(result)

Output:

('Sample#String#Another#', 3)

Regular Expression Syntax

Regular expressions use special characters to define patterns. Here are some commonly used special characters:

  • .: Matches any character except a newline.
  • ^: Matches the start of the string.
  • $: Matches the end of the string.
  • *: Matches 0 or more repetitions of the preceding pattern.
  • +: Matches 1 or more repetitions of the preceding pattern.
  • ?: Matches 0 or 1 repetition of the preceding pattern.
  • {n}: Matches exactly n repetitions of the preceding pattern.
  • {n,}: Matches n or more repetitions of the preceding pattern.
  • {n,m}: Matches between n and m repetitions of the preceding pattern.
  • []: Matches any one of the characters inside the brackets.
  • |: Matches either the pattern before or the pattern after the |.
  • (): Creates a group for extracting or manipulating the matched text.

Examples

Basic Usage

Search for all digits in a string.

import re

pattern = re.compile(r'\d+')
matches = pattern.findall('Sample123String456Another789')
print(matches)

Output:

['123', '456', '789']

Using Groups and Capturing

Use groups to capture parts of the match.

import re

pattern = re.compile(r'(\d+)-(\d+)-(\d+)')
match = pattern.search('Phone number: 123-456-7890')
if match:
    print(match.groups())

Output:

('123', '456', '7890')

Using Flags

Use flags to modify the behavior of the pattern.

import re

pattern = re.compile(r'sample', re.IGNORECASE)
matches = pattern.findall('Sample123String456sample789')
print(matches)

Output:

['Sample', 'sample']

Advanced Substitution

Use a function as the replacement argument in re.sub.

import re

def replace(match):
    return str(int(match.group()) * 2)

result = re.sub(r'\d+', replace, 'Sample123String456Another789')
print(result)

Output:

Sample246String912Another1578

Real-World Use Case

Validating Email Addresses

Use regular expressions to validate email addresses.

import re

def validate_email(email):
    pattern = re.compile(r'^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$')
    return bool(pattern.match(email))

emails = ['test@example.com', 'invalid-email', 'user@domain.com']
valid_emails = [email for email in emails if validate_email(email)]
print(valid_emails)

Output:

['test@example.com', 'user@domain.com']

Conclusion

The re module in Python provides functions for working with regular expressions. From searching and matching patterns to splitting strings and performing substitutions, the re module is essential for any text processing tasks. Understanding regular expressions and the re module can significantly enhance your ability to manipulate and analyze string data in Python.

References

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top