Introduction
Regular expressions (regex) in Python are used for matching patterns in strings. They are used for searching, replacing, and extracting data from strings. Python provides the re
module to work with regular expressions.
The re Module
To work with regular expressions, you need to import the re
module.
import re
Basic Functions
1. re.match()
The re.match()
function attempts to match a pattern at the beginning of a string.
Example
import re
pattern = r"hello"
text = "hello world"
match = re.match(pattern, text)
if match:
print("Match found:", match.group()) # Output: Match found: hello
else:
print("No match found")
2. re.search()
The re.search()
function searches the entire string for a pattern and returns the first match.
Example
pattern = r"world"
text = "hello world"
search = re.search(pattern, text)
if search:
print("Search found:", search.group()) # Output: Search found: world
else:
print("No search found")
3. re.findall()
The re.findall()
function returns a list of all non-overlapping matches of a pattern in a string.
Example
pattern = r"\d+"
text = "There are 2 apples and 5 bananas"
findall = re.findall(pattern, text)
print("Find all:", findall) # Output: Find all: ['2', '5']
4. re.finditer()
The re.finditer()
function returns an iterator yielding match objects for all non-overlapping matches of a pattern in a string.
Example
pattern = r"\d+"
text = "There are 2 apples and 5 bananas"
finditer = re.finditer(pattern, text)
for match in finditer:
print("Find iter:", match.group()) # Output: Find iter: 2
# Output: Find iter: 5
5. re.sub()
The re.sub()
function replaces occurrences of a pattern in a string with a replacement string.
Example
pattern = r"apples"
text = "There are 2 apples and 5 bananas"
replacement = "oranges"
sub = re.sub(pattern, replacement, text)
print("Sub:", sub) # Output: Sub: There are 2 oranges and 5 bananas
Regular Expression Patterns
Metacharacters
.
: Matches any character except a newline.^
: Matches the start of the string.$
: Matches the end of the string.*
: Matches 0 or more repetitions of the preceding pattern.+
: Matches 1 or more repetitions of the preceding pattern.?
: Matches 0 or 1 repetition of the preceding pattern.{m}
: Matches exactlym
repetitions of the preceding pattern.{m,n}
: Matches fromm
ton
repetitions of the preceding pattern.[]
: Matches any single character within the brackets.|
: Matches either the pattern before or the pattern after the|
.()
: Groups patterns together.
Example
pattern = r"[A-Za-z]+ \d{1,2}, \d{4}"
text = "Today's date is June 18, 2023."
match = re.search(pattern, text)
if match:
print("Match found:", match.group()) # Output: Match found: June 18, 2023
else:
print("No match found")
Special Sequences
\d
: Matches any digit (equivalent to[0-9]
).\D
: Matches any non-digit.\s
: Matches any whitespace character.\S
: Matches any non-whitespace character.\w
: Matches any alphanumeric character (equivalent to[a-zA-Z0-9_]
).\W
: Matches any non-alphanumeric character.
Example
pattern = r"\w+@\w+\.\w+"
text = "Please contact us at support@example.com."
match = re.search(pattern, text)
if match:
print("Match found:", match.group()) # Output: Match found: support@example.com
else:
print("No match found")
Compiling Regular Expressions
For improved performance, especially when using the same pattern multiple times, you can compile regular expressions using re.compile()
.
Example
pattern = re.compile(r"\d+")
text = "There are 2 apples and 5 bananas"
matches = pattern.findall(text)
print("Find all:", matches) # Output: Find all: ['2', '5']
Flags
You can modify the behavior of regular expressions using flags. Some common flags include:
re.IGNORECASE
(re.I
): Makes the pattern case-insensitive.re.MULTILINE
(re.M
): Treats the input as consisting of multiple lines.re.DOTALL
(re.S
): Makes the.
match any character, including newline.
Example
pattern = re.compile(r"hello", re.IGNORECASE)
text = "Hello world"
match = pattern.search(text)
if match:
print("Match found:", match.group()) # Output: Match found: Hello
else:
print("No match found")
Conclusion
Regular expressions are used for string manipulation in Python. The re
module provides functions for matching, searching, replacing, and iterating over patterns in strings. By understanding the various patterns and metacharacters, you can effectively use regular expressions to handle complex string operations in your programs.