Matching a regular expression within a string

Regular expressions provide us with a powerful method to locate an arbitrarily complex pattern within a string. The regexp command is similar to a Find function in a text editor. You search for a defined string for the character or the pattern of characters you are looking for and it returns a Boolean value that indicates success or failure and populates a list of optional variables with any matched strings. The -indices and -inline options must be used to modify the behavior, as indicated by this statement. But it doesn't stop there; by providing switches, you can control the behavior of regexp. The switches are as follows:

Switch

Behavior

-about

No actual matching is made. Instead regexp returns a list containing information about the regular expression where the first element is a subexpression count and the second is a list of property names describing various attributes about the expression.

-expanded

Allows the use of expanded regular expression, wherein whitespaces and comments are ignored.

-indices

Returns a list of two decimal strings, containing the indices in the string to match for the first and last characters in the range.

-line

Enables the newline-sensitive matching similar to passing the linestop and lineanchor switches.

-linestop

Changes the behavior of [^] bracket expressions and the "." character so that they stop at newline characters.

-lineanchor

Changes the behavior of ^ and $ (anchors) so that they match both the beginning and end of a line.

-nocase

Treats uppercase characters in the search string as lowercase.

-all

Causes the command to match as many times as possible and returns the count of the matches found.

-inline

Causes regexp to return a list of the data that would otherwise have been placed in match variables.

Match variables may NOT be used if inline is specified.

-start

Allows us to specify a character index from which searching should start.

--

Denotes the end of switches being passed to regexp.

Any argument following this switch will be treated as an expression, even if they start with a "-".

Now that we have a background in switches, let's look at the command:

	regexp switches expression string submatchvar submatchvar…

The regexp command determines if the expression matches part or all of the string and returns a 1 if the match exists or a 0 if it is not found. If the variables (submatchvar) (for example myNumber or myData) are passed after the string, they are used as variables to store the returned submatchvar. Keep in mind that if the inline switch has been passed, no return variables should be included in the command.

Getting ready

To complete the following example, we will need to create a Tcl script file in your working directory. Open the text editor of your choice and follow the next set of instructions.

How to do it…

A common use for regexp is to accept a string containing multiple words and to split it into its constituent parts. In the following example, we will create a string containing an IP address and assign the values to the named variables. Enter the following command:


% regexp "([0-9]{1,3}).([0-9]{1,3}).([0-9]{1,3}).([0-9]{1,3})" 
	$ip all first second third fourth
% puts "$all 
$first 
$second 
$third 
$fourth"
192.168.1.65
192
168
1
65

How it works…

As you can see, the IP Address has been split into its individual octet values. What regexp has done is match the groupings of decimal characters [0-9] of a varying length of 1 to 3 characters {1, 3} delimited by a "." character. The original IP address is assigned to the first variable (all) while the octet values are assigned to the remaining variables (first, second, third, and fourth).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.5.201