Using regular expressions

Regular expressions (REGEXP) help you search for string patterns in your data. REGEXP expands upon the patterns you can use in the LIKE operator, which we covered in Chapter 6, Querying a Single Table. LIKE allowed you to use the percent sign (%) wildcard to represent zero or more characters or the underscore (_) wildcard to represent one or more characters to match patterns on strings in the WHERE clause. Regular expressions have far more wildcard characters, thereby giving you far more options for matching patterns. The downside to regular expressions is that they can be complicated and hard to understand. Regular expressions can also reduce query performance, depending on how complex they get. 

The following table outlines some of the commonly used regular expression characters: 

Character How it works
^ Matches the beginning of a string
$ Matches the end of a string
. Matches any character
[...] Matches any character inside square brackets
[^...] Matches any character not inside square brackets
p1|p2|p3 Matches any of the patterns specified 
+ Matches a sequence of characters one or more times

 

To use REGEXP in a MySQL query, you need to add the REGEXP keyword to a WHERE clause, as shown in the following code sample: 

SELECT col1, col2, col3
FROM tablename
WHERE stringcolumn REGEXP pattern;

You can also use NOT REGEXP to search for the opposite of the pattern. Let's go through some examples of how to use the REGEXP characters:

  • Caret (^): Use this to match at the beginning of a string. This can be seen in the following query, which will return any playerid starting with a or A:
USE lahmansbaseballdb; 
SELECT playerid
FROM people
WHERE playerid REGEXP '^a';
  • Dollar sign ($): Use this to match at the end of a string. This can be seen in the following query, which will return any birthcity values ending with y or Y:
USE lahmansbaseballdb;
SELECT birthcity
FROM people
WHERE birthcity REGEXP 'y$';
  • Period (.): Use this to match any character in a string, as shown in the following query: 
USE lahmansbaseballdb;
SELECT birthyear
FROM people
WHERE birthyear REGEXP '199.';

The preceding query will return any birthyear values that start with 199. It will return values such as 1990 through 1999. 

  • Square brackets ([...]): Use this to match any character placed inside the square brackets, as shown in the following query: 
USE lahmansbaseballdb;
SELECT playerid
FROM people
WHERE playerid REGEXP '^[C]'
ORDER BY playerid;

The preceding query will return any playerid values that start with C or c.

Use the following query to match any character not placed inside the square brackets:

USE lahmansbaseballdb;
SELECT playerid
FROM people
WHERE playerid REGEXP '^[^C]'
ORDER BY playerid;

 The preceding query will return any playerid values that don't start with C or c.

  • Pipe delimiter (|): Use this to match any of the patterns specified, as shown in the following query: 
USE lahmansbaseballdb;
SELECT playerid
FROM people
WHERE playerid REGEXP '^[C|D|E]'
ORDER BY playerid;

The preceding query will return any playerid values that start with C or c, D or d, and E or e.

  • Plus sign (+): Use this to match any of the patterns specified, as shown in the following query: 
USE lahmansbaseballdb;
SELECT birthcity
FROM people
WHERE birthcity REGEXP 'son+';

The preceding query will return any birthcity values that have the string, son, in them.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.27.45