The re
module in Python provides support for working with regular expressions, which are patterns used to match character combinations in strings. Regular expressions are used for searching, matching, and manipulating strings based on specific patterns.
Table of Contents
- Introduction
re
Module Functionsre.compile
re.search
re.match
re.fullmatch
re.split
re.findall
re.finditer
re.sub
re.subn
- Regular Expression Syntax
- Examples
- Basic Usage
- Using Groups and Capturing
- Using Flags
- Advanced Substitution
- Real-World Use Case
- Conclusion
- References
Introduction
The re
module in Python is used for working with regular expressions. Regular expressions allow you to specify patterns for searching and manipulating strings. With the re
module, you can perform various operations such as searching for patterns, splitting strings, replacing substrings, and more.
re Module Functions
re.compile
Compiles a regular expression pattern into a regex object, which can be used for matching.
import re
pattern = re.compile(r'\d+'
re.search
Searches the string for a match to the pattern. Returns a match object if found.
import re
result = re.search(r'\d+', 'Sample123String')
print(result.group())
Output:
123
re.match
Checks for a match only at the beginning of the string. Returns a match object if found.
import re
result = re.match(r'\d+', '123Sample')
print(result.group())
Output:
123
re.fullmatch
Checks for a match only if the entire string matches the pattern. Returns a match object if found.
import re
result = re.fullmatch(r'\d+', '123')
print(result.group())
Output:
123
re.split
Splits the string by occurrences of the pattern.
import re
result = re.split(r'\d+', 'Sample123String456Another789')
print(result)
Output:
['Sample', 'String', 'Another', '']
re.findall
Finds all non-overlapping matches of the pattern in the string. Returns a list of matches.
import re
result = re.findall(r'\d+', 'Sample123String456Another789')
print(result)
Output:
['123', '456', '789']
re.finditer
Finds all non-overlapping matches of the pattern in the string. Returns an iterator yielding match objects.
import re
result = re.finditer(r'\d+', 'Sample123String456Another789')
for match in result:
print(match.group())
Output:
123
456
789
re.sub
Replaces occurrences of the pattern with a replacement string.
import re
result = re.sub(r'\d+', '#', 'Sample123String456Another789')
print(result)
Output:
Sample#String#Another#
re.subn
Replaces occurrences of the pattern with a replacement string. Returns a tuple containing the new string and the number of replacements.
import re
result = re.subn(r'\d+', '#', 'Sample123String456Another789')
print(result)
Output:
('Sample#String#Another#', 3)
Regular Expression Syntax
Regular expressions use special characters to define patterns. Here are some commonly used special characters:
.
: Matches any character except a newline.^
: Matches the start of the string.$
: Matches the end of the string.*
: Matches 0 or more repetitions of the preceding pattern.+
: Matches 1 or more repetitions of the preceding pattern.?
: Matches 0 or 1 repetition of the preceding pattern.{n}
: Matches exactlyn
repetitions of the preceding pattern.{n,}
: Matchesn
or more repetitions of the preceding pattern.{n,m}
: Matches betweenn
andm
repetitions of the preceding pattern.[]
: Matches any one of the characters inside the brackets.|
: Matches either the pattern before or the pattern after the|
.()
: Creates a group for extracting or manipulating the matched text.
Examples
Basic Usage
Search for all digits in a string.
import re
pattern = re.compile(r'\d+')
matches = pattern.findall('Sample123String456Another789')
print(matches)
Output:
['123', '456', '789']
Using Groups and Capturing
Use groups to capture parts of the match.
import re
pattern = re.compile(r'(\d+)-(\d+)-(\d+)')
match = pattern.search('Phone number: 123-456-7890')
if match:
print(match.groups())
Output:
('123', '456', '7890')
Using Flags
Use flags to modify the behavior of the pattern.
import re
pattern = re.compile(r'sample', re.IGNORECASE)
matches = pattern.findall('Sample123String456sample789')
print(matches)
Output:
['Sample', 'sample']
Advanced Substitution
Use a function as the replacement argument in re.sub
.
import re
def replace(match):
return str(int(match.group()) * 2)
result = re.sub(r'\d+', replace, 'Sample123String456Another789')
print(result)
Output:
Sample246String912Another1578
Real-World Use Case
Validating Email Addresses
Use regular expressions to validate email addresses.
import re
def validate_email(email):
pattern = re.compile(r'^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$')
return bool(pattern.match(email))
emails = ['test@example.com', 'invalid-email', 'user@domain.com']
valid_emails = [email for email in emails if validate_email(email)]
print(valid_emails)
Output:
['test@example.com', 'user@domain.com']
Conclusion
The re
module in Python provides functions for working with regular expressions. From searching and matching patterns to splitting strings and performing substitutions, the re
module is essential for any text processing tasks. Understanding regular expressions and the re
module can significantly enhance your ability to manipulate and analyze string data in Python.