Python Regular Expressions

Introduction

Regular expressions (regex) in Python are used for matching patterns in strings. They are used for searching, replacing, and extracting data from strings. Python provides the re module to work with regular expressions.

The re Module

To work with regular expressions, you need to import the re module.

import re

Basic Functions

1. re.match()

The re.match() function attempts to match a pattern at the beginning of a string.

Example

import re

pattern = r"hello"
text = "hello world"

match = re.match(pattern, text)
if match:
    print("Match found:", match.group())  # Output: Match found: hello
else:
    print("No match found")

2. re.search()

The re.search() function searches the entire string for a pattern and returns the first match.

Example

pattern = r"world"
text = "hello world"

search = re.search(pattern, text)
if search:
    print("Search found:", search.group())  # Output: Search found: world
else:
    print("No search found")

3. re.findall()

The re.findall() function returns a list of all non-overlapping matches of a pattern in a string.

Example

pattern = r"\d+"
text = "There are 2 apples and 5 bananas"

findall = re.findall(pattern, text)
print("Find all:", findall)  # Output: Find all: ['2', '5']

4. re.finditer()

The re.finditer() function returns an iterator yielding match objects for all non-overlapping matches of a pattern in a string.

Example

pattern = r"\d+"
text = "There are 2 apples and 5 bananas"

finditer = re.finditer(pattern, text)
for match in finditer:
    print("Find iter:", match.group())  # Output: Find iter: 2
                                        # Output: Find iter: 5

5. re.sub()

The re.sub() function replaces occurrences of a pattern in a string with a replacement string.

Example

pattern = r"apples"
text = "There are 2 apples and 5 bananas"
replacement = "oranges"

sub = re.sub(pattern, replacement, text)
print("Sub:", sub)  # Output: Sub: There are 2 oranges and 5 bananas

Regular Expression Patterns

Metacharacters

  • .: Matches any character except a newline.
  • ^: Matches the start of the string.
  • $: Matches the end of the string.
  • *: Matches 0 or more repetitions of the preceding pattern.
  • +: Matches 1 or more repetitions of the preceding pattern.
  • ?: Matches 0 or 1 repetition of the preceding pattern.
  • {m}: Matches exactly m repetitions of the preceding pattern.
  • {m,n}: Matches from m to n repetitions of the preceding pattern.
  • []: Matches any single character within the brackets.
  • |: Matches either the pattern before or the pattern after the |.
  • (): Groups patterns together.

Example

pattern = r"[A-Za-z]+ \d{1,2}, \d{4}"
text = "Today's date is June 18, 2023."

match = re.search(pattern, text)
if match:
    print("Match found:", match.group())  # Output: Match found: June 18, 2023
else:
    print("No match found")

Special Sequences

  • \d: Matches any digit (equivalent to [0-9]).
  • \D: Matches any non-digit.
  • \s: Matches any whitespace character.
  • \S: Matches any non-whitespace character.
  • \w: Matches any alphanumeric character (equivalent to [a-zA-Z0-9_]).
  • \W: Matches any non-alphanumeric character.

Example

pattern = r"\w+@\w+\.\w+"
text = "Please contact us at support@example.com."

match = re.search(pattern, text)
if match:
    print("Match found:", match.group())  # Output: Match found: support@example.com
else:
    print("No match found")

Compiling Regular Expressions

For improved performance, especially when using the same pattern multiple times, you can compile regular expressions using re.compile().

Example

pattern = re.compile(r"\d+")
text = "There are 2 apples and 5 bananas"

matches = pattern.findall(text)
print("Find all:", matches)  # Output: Find all: ['2', '5']

Flags

You can modify the behavior of regular expressions using flags. Some common flags include:

  • re.IGNORECASE (re.I): Makes the pattern case-insensitive.
  • re.MULTILINE (re.M): Treats the input as consisting of multiple lines.
  • re.DOTALL (re.S): Makes the . match any character, including newline.

Example

pattern = re.compile(r"hello", re.IGNORECASE)
text = "Hello world"

match = pattern.search(text)
if match:
    print("Match found:", match.group())  # Output: Match found: Hello
else:
    print("No match found")

Conclusion

Regular expressions are used for string manipulation in Python. The re module provides functions for matching, searching, replacing, and iterating over patterns in strings. By understanding the various patterns and metacharacters, you can effectively use regular expressions to handle complex string operations in your programs.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top