Chapter 4. Regular Expressions

In this chapter, we will see recipes related to regular expressions as used in ABAP. We will look at:

  • Using regex as an IF statement
  • Removal of characters from a string
  • Converting date into internal date type
  • Validation of format (telephone number)
  • Removing repeated words from text string
  • Inserting commas in an amount string
  • Removing comments from program code
  • Interpreting HTML stream

Introduction

As of release 7.0, ABAP supports regular expressions based on POSIX standard 1003.2. Regular expressions may be specified after the addition REGEX within the FIND and REPLACE statements.

An entire description of the topic constitutes a book by itself. However, the most important and commonly used regular expressions requirements will be discussed.

In the introduction, we will cover important operators used in regular expression processing within ABAP. These will be used in the various recipes mentioned in this chapter. We will then see recipes for writing programs that uses regular expressions for pattern matching, validation as well as conversion and extraction of data from a given text stream. Throughout the chapter, the terms "Regular Expression" and "Regex" will be used interchangeably.

In addition to FIND and REPLACE statements, SAP provides classes CL_ABAP_REGEX and CL_ABAP_MATCHER for regex processing. However, the coding in the recipes will comprise regular expressions used within FIND and REPLACE statements.

For this chapter, I assume that the reader has basic knowledge of the regular expressions, and is familiar with basic FIND and REPLACE statements. We will use the DEMO_REGEX standard program screenshots for illustration, where required.

For more information on regex, see the following link:

http://help.sap.com/abapdocu_70/en/ABENREGULAR_EXPRESSIONS.htm

Regular expressions

A regular expression comprises literals and operators. The operators are special characters used for a particular purpose and have special meanings when we need to search for any pattern within a text stream. Before the recipes, we will have a look at some useful operators available in ABAP for regular expressions.

Operator

Purpose

.

Dot matches a single character.

?

Denotes either no or a single occurrence of a character or set of characters.

*

Denotes any number of occurrences (0, 1, or more) of a character or a set of characters.

+

Matches one or more occurrence of a character or set of characters.

<

Matches start of a word.

>

Matches end of a word.

^

Used for denoting negation when used with box brackets, as well as the start of line marker.

?=

Used as a preview condition.

?!

Used as a negated preview condition.

1, 2

Used for placeholders for subgroup registers (also called the back-referencing operator). For replacement, $1 and $2 represent the subgroup registers in the replacement string (this will be discussed in a recipe ahead).

$

Denotes end of a line.

d

Denotes a digit (0-9).

w

Denotes an alphanumeric character.

u

Matches a single alphabet.

All the three operators ( *, +, and ?) must be used after a character or a character sequence specification. The box brackets denote the possible characters that may occur in a string. On the other hand, the round brackets denote a specific set of characters in a given sequence to be matched. For example, the regex [01]? will match 0 or 1. The expression [01]* will match 01, 11 0101, and so on. On the other hand, (01)* will match 01, 0101, or blank. (Also, w+ denotes one or more alphanumeric characters.)

The special characters that are used in regex may also need to be searched in a given text. For searching them, we must precede them with a backslash (Escape Character for special characters). Some of the examples are shown in the following table:

Searching for special characters

.

+

"

?

$

^

(

)

[

]

These are then treated as literals rather than operators.

The ?= is a preview condition. For example, if we write regex in the form a(?=s), the expression behaves like an IF statement condition. The pattern a will be matched only if the following substring matches the condition specified by s.

There is a difference between ^ when used within box brackets [] and round brackets (). When the ^ operator is used in box brackets, it represents the characters not included in the text to be matched. For example, [^ab] will match all strings that do not include a and b, such as cd, ch, hh, and so on. Whereas, ^ when used with round brackets (or without it) specifies the beginning of a string. For example, ^(ab) or ^ab will match all strings starting with ab, such as abc, abd, abbbbb, and so on.

For simplicity's sake, our examples will contain the regex specified within the FIND or REPLACE statements. In this case, any error in the regex is identified by the syntax checker, as shown in the following screenshot along with the exact position on which the error has occurred (so in this chapter, no error handling will be shown).

Regular expressions

For error handling, however, the CX_SY_REGEX class within the TRY and CATCH statements may be used, if required.

For replace statement, when the replace has been done successfully, the return code SY-SUBRC value is equal to 0. This may be used for checking the success of the replace statement.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.101.81