How To Use Regular Expressions In Python

The regular expression is a special character sequence, which can help you easily check whether a string matches a pattern. Python has added the re module since version 1.5, which provides Perl-style regular expression patterns. The python re module makes the Python language have all regular expression functions.

1. Python re module introduction.

  1. The re module provides functions that use a pattern string as their first parameter. For example, the compile function generates a regular expression object based on a pattern string and optional flag parameters. The object has a series of methods for regular expression matching and substitution.
  2. In this article, we will introduce some methods of searching and finding strings using Python regular expression. Then we will discuss how to use grouping to deal with the children of the matching object we find in the string.

First, you should import the python re module with the command import re before you can use its function.

2. Raw type string in Python.

  1. The raw type string can be simply created by adding the character 'r' before the double quotation mark of the normal string.
  2. When a string is of the raw type, the python compiler does not attempt to make any substitution for it. In essence, you’re telling the compiler not to interfere with your strings at all.
  3. Python normal string vs raw string example.
    # Define a normal python string.
    >>> normal_string = 'This is a\nnormal string'
    >>> 
    # The escape character \n in the normal string will take effect.
    >>> print(normal_string)
    This is a
    normal string
    >>> 
    
    # Add r in front of a python string will convert it to a raw type string.
    >>> raw_string = r'and this is a\nraw string'
    >>> 
    # The \n does not take effect in the python raw type string. 
    >>> print(raw_string)
    and this is a\nraw string

3. Python re module match function example.

  1. The match() method can find the matching object only when the beginning of the searched string matches the pattern.
    >>> import re
    >>> 
    # The pattern is 'dog', the searched string is 'dog cat dog' which start with 'dog', so the match function return an match object.
    >>> re.match(r'dog', 'dog cat dog')
    <re.Match object; span=(0, 3), match='dog'>
    >>> 
    >>> match = re.match(r'dog', 'dog cat dog')
    >>> 
    >>> match.group(0)
    'dog'
    >>> 
    # The pattern is 'dog', the searched string is 'cat dog' which does not start with 'dog', so the match function returns None. 
    >>> re.match(r'dog', 'cat dog')
    >>> 
    >>> match = re.match(r'dog', 'cat dog')
    >>> 
    >>> print(match)
    None
    >>>
    >>> match = re.match(r'dog', 'dog cat dog')
    >>>
    # The match object's start() method return the beginning index number of the match. 
    >>> match.start()
    0
    >>>
    # The match object's end() method return the ending index number of the match.  
    >>> match.end()
    3
    

4. Python re module search function example.

  1. The search() method is similar to match(), but the search() method does not restrict us to looking for a match only from the beginning of a string.
    >>> import re
    >>> 
    >>> match = re.search(r'dog', 'cat dog')
    >>> 
    >>> print(match)
    <re.Match object; span=(4, 7), match='dog'>
    >>>
    >>> match.group(0)
    'dog'
    
  2. However, the search() method will stop searching after it finds a match, so in our example string, we use the search() method to find ‘dog’ and only find its first occurrence.
    >>> import re
    >>>
    >>> match = re.search(r'dog', 'cat dog and dog')
    >>> 
    # Only return the first match.
    >>> match.group(0)
    'dog'
    >>> 
    >>> match.group(1)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    IndexError: no such group

5. Python re module findall function example.

  1. When we call the findall() method, we can easily get a list of all matched strings, it is different from the match() and search() method.
    >>> import re
    >>>
    >>> re.findall(r'dog', 'cat dog and dog')
    ['dog', 'dog']

6. The match object group method examples.

  1. The match object’s group function can return matched string in a group, you can get the group element by group index.
    >>> import re
    >>> 
    >>> user_data = 'Jerry, Zhao: 13901234567'
    >>> 
    >>> re.search(r'\w+, \w+: \S+', user_data)
    <re.Match object; span=(0, 24), match='Jerry, Zhao: 13901234567'>
    >>> 
    # Use the () to define the group element in a string pattern. 
    >>> match = re.search(r'(\w+), (\w+): (\S+)', user_data)
    >>> 
    >>> match.group(0)
    'Jerry, Zhao: 13901234567'
    >>> 
    >>> match.group(1)
    'Jerry'
    >>> 
    >>> match.group(2)
    'Zhao'
    >>> match.group(3)
    '13901234567'
    
  2. Assign a name to each match group element to retrieve its value.
    >>> import re
    >>> 
    >>> user_data = 'Jerry, Zhao: 13901234567'
    >>> 
    # The first group element name is 'last', the second group element name is 'first', the third group element name is 'phone'.
    >>> match = re.search(r'(?P<last>\w+), (?P<first>\w+): (?P<phone>\S+)', user_data)
    >>> 
    # Get the group element value by its name.
    >>> match.group('last')
    'Jerry'
    >>> match.group('first')
    'Zhao'
    >>> match.group('phone')
    '13901234567'
    
  3. Although the findall() method does not return grouping objects, it can also use grouping. The findall() method returns a collection of tuples, where the Nth element in each tuple corresponds to the Nth grouping in the regular expression.
    >>> import re
    >>> 
    >>> user_data = 'Jerry, Zhao: 13901234567'
    >>> 
    >>> re.findall(r'(\w+), (\w+): (\S+)', user_data)
    [('Jerry', 'Zhao', '13901234567')]
    
0 0 vote
Article Rating
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x