Matching Against Patterns

If we want to check whether two strings are identical, we can of course use the equality operator:

 
if​ ​"something else"​ == ​"something"
 
# we'll never get here
 
end

To check whether a string matches a regular expression, Ruby offers the =~ operator. Some people call it the “matches” operator, or the “is like” operator. If the string matches the pattern defined in the regular expression, it returns a truthy value, which means we can use it in an if check. If we wanted to check whether a URL was an https one rather than http, we could use =~:

 
url = ​"https://example.com/"
 
 
if​ url =~ ​/Ahttps:/
 
puts ​"The URL is https"
 
else
 
puts ​"The URL is not https"
 
end

We can also use the match method for this. It exists on both String and Regexp, so we can call either:

 
url.match(​/Ahttps:/​)

or:

 
/Ahttps:/​.match(url)

If the string doesn’t match the regular expression, match will return nil, so you can use it in an if statement just like =~. If it does match, match will return a MatchData object containing information about the match. MatchData objects behave like collections, so you can access named or numbered capture groups with [], check the number of matches with .length, and so on.

Let’s write a pattern to match the protocol and hostname of a given URL. That means that, given the URL https://example.com/, we’d like to extract https as the protocol and example.com as the hostname. We can use match for this:

regex/matchdata.rb
 
url = ​"https://example.com"
 
 
matches = url.match(​/([a-z]+)://([w.]+)/​)
 
# => #<MatchData "https://example.com" 1:"https" 2:"example.com">
 
if​ matches
 
puts matches[0]
 
puts matches[1]
 
puts matches[2]
 
end
 
# >> https://example.com
 
# >> https
 
# >> example.com

Our pattern has two capture groups. The first matches a sequence of letters followed by a colon, therefore capturing the protocol. (We’re not just restricting our pattern to either http or https; we’ll match any URL protocol.) The second group matches a sequence of word characters or periods, and so captures our hostname. Between the two is the ://, which we don’t capture.

Accessing the zeroth element of the match data returns for us the whole string that was matched by the pattern. In this case, that’s the whole string, but that’s not necessarily always the case; we might only be matching a tiny portion of it. The first element is the first capture group—in this case, our protocol—and the second is the second capture group.

The match method also accepts a block, which is executed only if the string matches the expression. The block is passed the MatchData as an argument. This is especially useful because it means you can avoid checking whether the pattern actually matched, and therefore whether the return value from match was nil:

regex/matchdata-block.rb
 
url = ​"https://example.com"
 
 
url.match(​/([a-z]+)://([w.]+)/​) ​do​ |matches|
 
puts matches[0]
 
puts matches[1]
 
puts matches[2]
 
end
 
# >> https://example.com
 
# >> https
 
# >> example.com

Here we avoid the if, which makes things a little neater.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.188.36