Regular Expressions (RegEx)

What Is RegEx?

  • RegEx, short for Regular Expression, is a sequence of characters that forms a search pattern. It's used to check if a string contains the specified search pattern.

RegEx Module in Python:

  • Python has a built-in package called re for working with regular expressions.

    python
    import re
    

Example: Using RegEx in Python:

  • Check if a string starts with "The" and ends with "Spain":

    python
    import re
    
    txt = "The rain in Spain"
    x = re.search("^The.*Spain$", txt)
    

RegEx Functions:

  • The re module offers several functions to work with regular expressions:
    • findall(): Returns a list of all matches.
    • search(): Returns a Match object if there's a match anywhere in the string.
    • split(): Returns a list where the string has been split at each match.
    • sub(): Replaces one or many matches with a string.

Metacharacters:

  • Special characters with specific meanings, such as [], \, ., ^, $, *, +, ?, {}, |, ().

Special Sequences:

  • Special sequences start with a backslash (\) followed by a character, like \A, \b, \B, \d, \D, \s, \S, \w, \W, \Z.

Sets:

  • Sets are a set of characters inside square brackets [] with special meanings, such as [a-n], [0-9], [a-zA-Z], etc.

Example Functions:

  1. The findall() Function:
    • Returns a list of all matches.

      python
      import re
      
      txt = "The rain in Spain"
      x = re.findall("ai", txt)
      print(x)  # Output: ['ai', 'ai']
      
  2. The search() Function:
    • Searches for a match and returns a Match object.

      python
      import re
      
      txt = "The rain in Spain"
      x = re.search("\s", txt)
      print("The first white-space character is located in position:", x.start())  # Output: 3
      
  3. The split() Function:
    • Splits the string at each match.

      python
      import re
      
      txt = "The rain in Spain"
      x = re.split("\s", txt)
      print(x)  # Output: ['The', 'rain', 'in', 'Spain']
      
  4. The sub() Function:
    • Replaces matches with a specified string.

      python
      import re
      
      txt = "The rain in Spain"
      x = re.sub("\s", "9", txt)
      print(x)  # Output: The9rain9in9Spain
      

Match Object:

  • Contains information about the search and the result. Methods include .span(), .string, .group().

Examples:

  1. Print the position of the first match:

    python
    import re
    
    txt = "The rain in Spain"
    x = re.search(r"\bS\w+", txt)
    print(x.span())  # Output: (12, 17)
    
  2. Print the string passed into the function:

    python
    import re
    
    txt = "The rain in Spain"
    x = re.search(r"\bS\w+", txt)
    print(x.string)  # Output: The rain in Spain
    
  3. Print the part of the string where there was a match:

    python
    import re
    
    txt = "The rain in Spain"
    x = re.search(r"\bS\w+", txt)
    print(x.group())  # Output: Spain
    

Exercises:

  1. Check if a string contains the word "rain":

    python
    import re
    
    txt = "The rain in Spain"
    x = re.search("rain", txt)
    print("Match found!" if x else "Match not found")
    
  2. Extract all numbers from a string:

    python
    import re
    
    txt = "There are 12 apples and 34 bananas."
    numbers = re.findall("\d+", txt)
    print(numbers)  # Output: ['12', '34']
    
  3. Replace all spaces with underscores:

    python
    import re
    
    txt = "The rain in Spain"
    result = re.sub("\s", "_", txt)
    print(result)  # Output: The_rain_in_Spain
    

Summary:

  • Python RegEx allows you to create search patterns to find, match, and manipulate strings. The re module provides essential functions for working with regular expressions. Practice using RegEx to become proficient in pattern matching and string manipulation!