Special cases with groups

Python provides us with some forms of groups that can help us to modify the regular expressions or even to match a pattern only when a previous group exists in the match, such as an if statement.

Flags per group

There is a way to apply the flags we've seen in Chapter 2 Regular Expressions with Python, using a special form of grouping: (?iLmsux).

Letter

Flag

i

re.IGNORECASE

L

re.LOCALE

m

re.MULTILINE

s

re.DOTALL

u

re.UNICODE

x

re.VERBOSE

For example:

>>>re.findall(r"(?u)w+" ,ur"ñ")
[u'xf1']

The above example is the same as:

>>>re.findall(r"w+" ,ur"ñ", re.U)
[u'xf1']

We've seen what these examples do several times in the previous chapter.

Remember that a flag is applied to the whole expression.

yes-pattern|no-pattern

This is a very useful case of groups. It tries to match a pattern in case a previous one was found. On the other hand, it doesn't try to match a pattern in case a previous group was not found. In short, it's like an if-else statement. The syntax for this operation is as follows:

(?(id/name)yes-pattern|no-pattern)

This expression means: if the group with this ID has already been matched, then at this point of the string, the yes-pattern pattern has to match. If the group hasn't been matched, then the no-pattern pattern has to match.

Let's see how it works continuing with our trite example. We have a list of products, but in this case the ID can be made in two different ways:

  • The country code (two digits), a dash, three or four alphanumeric characters, a dash, and the area code (2 digits). For example: 34-adrl-01.
  • Three or four alphanumeric characters. For example: adrl.

So, when there is a country code, we need to match the country area:

>>>pattern = re.compile(r"(dd-)?(w{3,4})(?(1)(-dd))")
>>>pattern.match("34-erte-22")
<_sre.SRE_Match at 0x10f68b7a0>
>>>pattern.search("erte")
<_sre.SRE_Match at 0x10f68b828>

As you can see in the previous example, there is a match when we have a country code and area code. Note that when there is a country code but no area code, there is no match:

>>>pattern.match("34-erte")
None

And what's no-pattern for? Let's add another constraint to the previous example: if there is no country code there has to be a name at the end of the string:

  • The country code (2 digits), a dash, three or four alphanumeric characters, a dash, and the area code (2 digits). For example: 34-adrl-01
  • Three or four alphanumeric characters, followed by three or four characters. For example: adrl-sala.

Let's see it in action:

>>>pattern = re.compile(r"(dd-)?(w{3,4})-(?(1)(dd)|[a-z]{3,4})$")
>>>pattern.match("34-erte-22")
<_sre.SRE_Match at 0x10f6ee750>

As expected, if there is a country code and an area code, there is a match.

>>>pattern.match("34-erte")
None

In the preceding example, we do have a country area, but there is no area code, so there is no match.

>>>pattern.match("erte-abcd")
<_sre.SRE_Match at 0x10f6ee880>

And finally, when there is no country area, there must be a name, so we have a match.

Note that no-pattern is optional, so in the first example, we've omitted it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.224.226