Chapter 8
Regular Expressions Basics

We’ve looked now at a variety of ways that you can get text into your Ruby programs: from the user, from a pipe, from files. The next step, naturally, is to actually do something with that input.

When processing text, our aims can generally be split into three broad categories: testing, where we check whether the data matches a particular pattern; extraction, where we try to pull interesting information out of the data; and transformation, where we convert the data from one format to another. Often, we’ll want to do some combination of all three of these things: checking for particular formats, extracting key pieces of information, and then transforming them (converting them to a different text format, perhaps, or calculating an average, or converting them from a certain set of units to another).

Programmers of every sort, whether they’re programming in object-oriented or functional languages, whether they’re working with powerful servers or humble desktop machines, will often—too often, perhaps—reach for regular expressions as a tool of first resort for both extraction and transformation of text. There’s good reason for this: regular expressions are enormously powerful, and a relatively standard syntax for them has grown up, which means that once you’ve learned them in one language, there’s not a great deal of adjustment when working with them in another.

But what exactly are they? Well, in layman’s terms at least, a regular expression (almost always abbreviated to regex) is a way of describing a pattern in text, which can then be used to extract or replace text that matches those patterns.

So you might have a regular expression that matches the format of a URL. You could use that pattern to check whether a given string was a valid URL (matching). You could use it to extract all the URLs in a given piece of text (extraction). And you could use that same pattern in a replacement—for example, to make all the URLs in the text into HTML links so that they can be clicked on (transformation).

It has to be said that few topics in programming are as feared by new programmers as regular expressions. Their power, combined with their frighteningly terse syntax, makes them seem impenetrable and mysterious. Writing them can often seem closer to incanting a magical spell than to programming.

But it needn’t be like this. If you’re processing text, then regular expressions are your friend, and the quicker you dive into them the better.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.3.43