Removing comments from program code

In this recipe, we will see how we can use the replace statement in conjunction with a suitable regex in order to remove comments from a program. For the sake of this recipe, we assume that the program whose comments are to be removed is syntactically correct. Similar to program created for the Dynamic program generation recipe in Chapter 2, Dynamic Programming, the program for this recipe will remove all statements beginning with a asterisk (*) or all parts following a line after an inverted comma (").

Getting ready

For this recipe, we will make a copy of the program created in Chapter 2, Dynamic Programming, that reads the source code of a program specified by user input. In the previous program, we used two internal tables and used a loop at the first table itab to delete comments. For this recipe, we will replace the loop with one replace statement and use only one table that is the first internal table itab. The code of the original program is read in the table from which comments are removed.

How to do it...

For adjusting the program, proceed as follows:

  1. The main part of the new portion is a replace statement. This replace statement contains (^*.*)|([^"]*)("*.*) as the regular expression. The replace statement is with the addition in table and with the substring $2.
  2. A delete statement is then used for deleting all blank rows from the internal table.
    How to do it...

How it works...

There are three subgroups used in this recipe. An OR condition is used for separating the first subgroup from the other two. Let us consider the two subgroups.

  • (^*.*): This part tries to match lines having the first character an asterisk (*) that is, an entire line commented. The match found is stored in subgroup register 1.
  • ([^"]*)("*.*): This pattern tries to divide a given program line into two parts, the first with set of characters without a double quote (") followed by the part that begins with a inverted comma ("). The first part before the inverted comma is stored in subgroup register 2 and is denoted by $2.

Since we do not require lines beginning with an asterisk and the part followed by the inverted comma, the placeholders $1 and $3 are not used in the replace statement. Only the second subgroup register has been used in the replace statement. This trims the comments from the code.

If a line starting with an asterisk (*) is reached, the second register is empty, so the entire code line is replaced by blank space. If a line having some code and then comments starting with inverted comma is reached, we only pick up the code part.

We finally call the delete statement in order to remove any blank lines from the program's internal table.

(For simplicity's sake, we have specified three subgroups with parenthesis, the third parentheses subgroup may be omitted also, without affecting the functioning of the code).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.134.17