How To Use Regular Expressions In Python

The regular expression is a special character sequence, which can help you easily check whether a string matches a pattern. Python has added the re module since version 1.5, which provides Perl-style regular expression patterns. The python re module makes the Python language have all regular expression functions.

1. Python re module introduction.

  1. The python re module provides the compile(pattern, flags=0) function that uses a pattern string as it’s first parameter. And the compile function generates a regular expression object based on the pattern string and optional flag parameters. The regular expression object has a series of methods for regular expression matching and substitution.
  2. In this article, we will introduce some methods for searching and finding strings using Python regular expression. Then we will discuss how to use grouping to deal with the children of the matching object we find in the string.
  3. Before start, you should import the python re module with the command import re to use its function.

2. Raw type string in Python.

  1. The raw type string can be simply created by adding the character ‘r’ before the double quotation mark of the normal string.
  2. When a string is of the raw type, the python compiler does not attempt to make any substitution for it. In essence, you’re telling the compiler not to interfere with your strings at all.
  3. Python normal string vs raw string example.
    # Define a normal python string.
    >>> normal_string = 'This is a\nnormal string'
    >>> 
    # The escape character \n in the normal string will take effect.
    >>> print(normal_string)
    This is a
    normal string
    >>> 
    
    # Add r in front of a python string will convert it to a raw type string.
    >>> raw_string = r'and this is a\nraw string'
    >>> 
    # The \n does not take effect in the python raw type string. 
    >>> print(raw_string)
    and this is a\nraw string

3. Python re module match function example.

  1. The match() function can find the matching object only when the beginning of the searched string matches the pattern.
    >>> import re
    >>> 
    # The pattern is 'dog', the searched string is 'dog cat dog' which start with 'dog', so the match function return an match object.
    >>> re.match(r'dog', 'dog cat dog')
    <re.Match object; span=(0, 3), match='dog'>
    >>> 
    >>> match = re.match(r'dog', 'dog cat dog')
    >>> 
    >>> match.group(0)
    'dog'
    >>> 
    # The pattern is 'dog', the searched string is 'cat dog' which does not start with 'dog', so the match function returns None. 
    >>> re.match(r'dog', 'cat dog')
    >>> 
    >>> match = re.match(r'dog', 'cat dog')
    >>> 
    >>> print(match)
    None
    >>>
    >>> match = re.match(r'dog', 'dog cat dog')
    >>>
    # The match object's start() method return the beginning index number of the match. 
    >>> match.start()
    0
    >>>
    # The match object's end() method return the ending index number of the match.  
    >>> match.end()
    3
    

4. Python re module search function example.

  1. The search() function is similar to match(), but the search() function does not restrict us to looking for a match only from the beginning of a string.
    >>> import re
    >>> 
    >>> match = re.search(r'dog', 'cat dog')
    >>> 
    >>> print(match)
    <re.Match object; span=(4, 7), match='dog'>
    >>>
    >>> match.group(0)
    'dog'
    
  2. However, the search() function will stop searching after it finds a match, so in our example string, we use the search() function to find ‘dog’ and only find its first occurrence.
    >>> import re
    >>>
    >>> match = re.search(r'dog', 'cat dog and dog')
    >>> 
    # Only return the first match.
    >>> match.group(0)
    'dog'
    >>> 
    >>> match.group(1)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    IndexError: no such group

5. Python re module findall function example.

  1. When we call the findall() function, we can easily get a list of all matched strings, it is different from the match() and search() function.
    >>> import re
    >>>
    >>> re.findall(r'dog', 'cat dog and dog')
    ['dog', 'dog']

6. Python re module finditer function example.

  1. Python re module’s finditer (pattern, string, flags = 0) function scan the whole string and return the iterator object which is composed of all substrings matched the pattern in the string. The element of the iterator is re.Match object. The pattern parameter represents the regular expression and the string parameter represents the matched string, the flags parameter represents the matching flag of the regular expression.
  2. It is not difficult to see from the above introduction that the functions of findall and finditer are basically similar, the difference is that their return values are different. The findall function returns a list of all the substrings matching the pattern, while the finditer function returns an iterator object composed of all the substrings matching the pattern.
  3. Below is an example of the python re module finditer function.
    import re
    
    def regexp_finditer_function(pattern, string):
        
        iter = re.finditer(pattern, string)
        
        for i in iter:
            print(i)
            start = i.start()
            end = i.end()
            i_str = string[start: end]
            print('start :' +  str(start) + ', end :' + str(end) + ', ' + i_str) 
    
    if __name__ == '__main__':
        
        regexp_finditer_function('dog', 'cat dog and dog')
  4. Below is the above example output.
    <re.Match object; span=(4, 7), match='dog'>
    start :4, end :7, dog
    <re.Match object; span=(12, 15), match='dog'>
    start :12, end :15, dog
    

7. Python re module fullmatch function example.

  1. The python re module fullmatch(pattern, string, flags=0) function requires the whole string to match the pattern. If it matches, it returns the re.Match object containing the matching information. Otherwise, it returns none.
    import re
    
    def regexp_fullmatch_function(pattern, string):
        
        match = re.fullmatch(pattern, string)
        
        print(match)
        
        if match is not None:
           start = match.start()
           end = match.end()
           match_str = string[start: end]
           print('start :' +  str(start) + ', end :' + str(end) + ', ' + match_str)         
    
    if __name__ == '__main__':
        
        regexp_fullmatch_function('cat dog and dog', 'cat dog and dog')
    
  2. Below is the above example output.
    <re.Match object; span=(0, 15), match='cat dog and dog'>
    start :0, end :15, cat dog and dog
    

8. Python re module sub function example.

  1. The python re module sub(pattern, repl, string, count=0, flags=0) function is used to replace all the matching pattern contents in the string with repl. The value of the parameter repl can be either a replaced string or a function. The count parameter controls the maximum number of substitutions. If set count to 0, it means to replace all.
    def regexp_sub_function(pattern, repl, string):
        
        # replace all.
        match = re.sub(pattern, repl, string)
        print(match)
        
        # replace only once.
        match_1 = re.sub(pattern, repl, string, 1)
        print(match_1)
        
      
    if __name__ == '__main__':
        
        #regexp_fullmatch_function('cat dog and dog', 'cat dog and dog')
        
        #regexp_finditer_function('dog', 'cat dog and dog')
        
        regexp_sub_function('-', '/', '2021-01-01')
  2. Below is the above example output.
    2021/01/01
    2021/01-01

9. The match object group method examples.

  1. The match object’s group function can return matched string in a group, you can get the group element by group index.
    >>> import re
    >>> 
    >>> user_data = 'Jerry, Zhao: 13901234567'
    >>> 
    >>> re.search(r'\w+, \w+: \S+', user_data)
    <re.Match object; span=(0, 24), match='Jerry, Zhao: 13901234567'>
    >>> 
    # Use the () to define the group element in a string pattern. 
    >>> match = re.search(r'(\w+), (\w+): (\S+)', user_data)
    >>> 
    >>> match.group(0)
    'Jerry, Zhao: 13901234567'
    >>> 
    >>> match.group(1)
    'Jerry'
    >>> 
    >>> match.group(2)
    'Zhao'
    >>> match.group(3)
    '13901234567'
    
  2. Assign a name to each match group element to retrieve its value.
    >>> import re
    >>> 
    >>> user_data = 'Jerry, Zhao: 13901234567'
    >>> 
    # The first group element name is 'last', the second group element name is 'first', the third group element name is 'phone'.
    >>> match = re.search(r'(?P<last>\w+), (?P<first>\w+): (?P<phone>\S+)', user_data)
    >>> 
    # Get the group element value by its name.
    >>> match.group('last')
    'Jerry'
    >>> match.group('first')
    'Zhao'
    >>> match.group('phone')
    '13901234567'
    
  3. Although the findall() method does not return grouping objects, it can also use grouping. The findall() method returns a collection of tuples, where the Nth element in each tuple corresponds to the Nth grouping in the regular expression.
    >>> import re
    >>> 
    >>> user_data = 'Jerry, Zhao: 13901234567'
    >>> 
    >>> re.findall(r'(\w+), (\w+): (\S+)', user_data)
    [('Jerry', 'Zhao', '13901234567')]
    

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.