Hour 16. Regular Expressions

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Hour 16. Regular Expressions

What You’ll Learn in This Hour:

What regular expressions are

Defining regular expression patterns

How to use regular expressions in your scripts

One of the most common functions used in Python scripts is manipulation of string data. One of the things Python is known for is its ability to easily search and modify strings. One of the features in Python that provides support for string parsing is regular expressions. In this hour, you’ll see what regular expressions are, how to use them in Python, and how to leverage them in your own Python scripts.

What Are Regular Expressions?

Many people have a hard time understanding what regular expressions are. The first step to understanding them is defining exactly what they are and what they can do for you. The following sections explain what a regular expression is and describe how Python uses regular expressions to help with your string manipulations.

Definition of Regular Expressions

A regular expression is a pattern you create to filter text. A program or script matches the regular expression pattern you create against data as the data flows through the program. If the data matches the pattern, it’s accepted for processing. If the data doesn’t match the pattern, it’s rejected. Figure 16.1 shows how it works.

FIGURE 16.1 Matching data against a regular expression.

While are probably familiar with normal text searching, regular expressions provides a lot more than that. The regular expression pattern makes use of wildcard characters to represent one or more characters in the data stream. You can use a number of special characters in a regular expression to define a specific pattern for filtering data. This means you have a lot of flexibility in how you define your string patterns.

Types of Regular Expressions

The biggest problem with using regular expressions is that there isn’t just one set of them. Different applications use different types of regular expressions. These include such diverse things as programming languages (for example, Java, Perl, Python), Linux utilities (such as the sed editor, the gawk program, and the grep utility), and mainstream applications (such as the MySQL and PostgreSQL database servers).

A regular expression is implemented using a regular expression engine. A regular expression engine is the underlying software that interprets regular expression patterns and uses those patterns to match text.

In the open source software world, there are two popular regular expression engines:

The POSIX Basic Regular Expression (BRE) engine

The POSIX Extended Regular Expression (ERE) engine

Most open source programs at a minimum conform to the POSIX BRE engine specifications, recognizing all the pattern symbols it defines. Unfortunately, some utilities (such as the sed editor) only conform to a subset of the BRE engine specifications. This is due to speed constraints, as the sed editor attempts to process text in the data stream as quickly as possible.

The POSIX ERE engine is often found in programming languages that rely on regular expressions for text filtering. It provides advanced pattern symbols as well as special symbols for common patterns, such as matching digits, words, and alphanumeric characters. The Python programming language uses the ERE engine to process its regular expression patterns.

Working with Regular Expressions in Python

Before you can start writing regular expressions to filter data in your Python scripts, you need to know how to use them. The Python language provides the re module to support regular expressions. The re module is included in the Raspbian Python default installation, so you don’t need to do anything special to start using regular expressions in your scripts, other than import the re module at the start of a script:

import re

However, the re module provides two different ways to define and use regular expressions. The following sections discuss how to use both methods.

Regular Expression Functions

The easiest way to use regular expressions in Python is to directly use the regular expression functions provided by the re module. Table 16.1 lists the functions that are available.

TABLE 16.1 The re Module Functions

The re module functions take two parameters. The first parameter is the regular expression pattern, and the second parameter is the text string to apply the pattern to.

The match() and search() regular expression functions return either a True Boolean value if the text string matches the regular expression pattern or a False value if they don’t match. This makes them ideal for use in if-then statements.

The match() Function

The match() function does what it says: It tries to match the regular expression pattern to a text string. It is a little tricky in that it applies the regular expression string only to the start of the string value. Here’s an example:

Table of Contents for Hour 16. Regular Expressions

Create new playlist

Sign In

Sign Up

Hour 16. Regular Expressions

What Are Regular Expressions?

Definition of Regular Expressions

Types of Regular Expressions

Working with Regular Expressions in Python

Regular Expression Functions

The match() Function

The search() Function

The findall() and finditer() Functions

Compiled Regular Expressions

Defining Basic Patterns

Plain Text

Special Characters

Anchor Characters

Starting at the Beginning

Looking for the Ending

Combining Anchors

The Dot Character

Character Classes

Negating Character Classes

Using Ranges

The Asterisk

Using Advanced Regular Expressions Features

The Question Mark

The Plus Sign

Using Braces

The Pipe Symbol

Grouping Expressions

Working with Regular Expressions in Your Python Scripts

Summary

Q&A

Workshop

Quiz

Answers

Table of Contents for
Hour 16. Regular Expressions