Match Groups

You can also use a regular expression to match one or more substrings. To do this, you should put part of the regular expression between parentheses. Here I have two groups (sometimes called captures): The first tries to match the string “hi”, and the second tries to match a string starting with “h” followed by any three characters (a dot means “match any single character,” so the three dots here will match any three consecutive characters) and ending with “o”:

groups.rb

/(hi).*(h...o)/ =˜ "The word 'hi' is short for 'hello'."

After evaluating groups in a regular expression, a number of variables, equal to the number of groups, will be assigned the matched value of those groups. These variables take the form of a $ followed by a number: $1, $2, $3, and so on. After executing the previous code, I can access the variables $1 and $2 like this:

print( $1, " ", $2, "
" )        #=> hi hello

Note that if the entire regular expression is unmatched, none of the group variables will be initialized. This would be the case if, for example, “hi” were in the string but “hello” was not. Both group variables would then be nil.

Here is another example, which returns three groups, indicated by pairs of parentheses (()), each of which contains a single character given by the dot: (.). Groups $1 and $3 are then displayed:

/(.)(.)(.)/ =˜ "abcdef"
print( $1, " ", $3, "
" )        #=> a c

Here is a new version of the comment-matching program that was given earlier (regex3a.rb); this has now been adapted to use the value of the group () containing a dot followed by an asterisk (.*) to return all the characters (zero or more) following the string matched by the preceding part of the regular expression (which here is ^s*#). This new version reads the text from the specified file and matches zero or more whitespace (s*) characters from the start of the current line (^) up to the first occurrence of a hash mark: #.

regex3b.rb

File.foreach( 'regex1.rb' ){ |line|
    if line =˜ /^s*#(.*)/ then
        puts( $1 )
    end
}

The end result of this is that only lines in which the first printable character is # are matched; $1 prints out the text of those lines minus the # character itself. As you will see shortly, this simple technique provides the basis of a useful tool for extracting documentation from a Ruby file.

You aren’t limited merely to extracting and displaying characters verbatim; you can also modify text. This example displays the text from a Ruby file but changes all Ruby line-comment characters (#) preceding full-line comments to C-style line comments (//):

regex4.rb

File.foreach( 'regex1.rb' ){ |line|
   line = line.sub(/(^s*)#(.*)/, '1//2')
      puts( line )
}

In this example, the sub method of the String class has been used; this takes a regular expression as its first argument (/(^s*)#(.*)/) and a replacement string as the second argument ('1//2'). The replacement string may contain numbered placeholders such as 1 and 2 to match any groups in the regular expression—here there are two groups between parentheses: (^s*) and (.*). The sub method returns a new string in which the matches made by the regular expression are substituted into the replacement string, while any unmatched elements (here the # character) are omitted. So, for example, let’s assume that the following comments are found in the input file:

# aStr = "hello world"
# aStr = "Hello World"

After substitution using our regular expression, the displayed output is as follows:

// aStr = "hello world"
// aStr = "Hello World"
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.14.132