Character-based translation using tr

Another interesting shell command is tr. This translates, squeezes, or deletes characters from the standard input. The syntax will be as follows:

tr [OPTION]... SET1 [SET2]

The options for the tr commands are explained in the following table:

Option

Description

-c, -C

Use complement of SET1

-d

This deletes a range of characters specified in SET1.

-s

This replaces consecutive multiple occurrences of characters in SET1 with a single occurrence.

-t

This truncates SET1 to the length of SET2. Any extra characters in SET1 will be not considered for translation.

SETs are a string of characters that can be specified using the following:

  • A character class: [:alnum:], [:digit:], [:alpha:] and so on
  • A character range: 'a-z', 'A-Z', and '0-9'
  • An escape character: \, , , , f, v, and

To provide an input text from a file and an output to a file, we can use the file redirection operators: < (less than for input) and > (greater than for output).

Deleting input characters

Sometimes, removing a few unnecessary characters from an input text is important. For example, our input text is in the tr.txt file:

$ cat tr.txt
This is a text file for demonstrating
tr command.
This input file contains digit 2 3 4 and 5
as well.
THIS IS CAPS LINE
this a lowercase line

Suppose we want to remove all the caps letters from this file. We can use the -d option with SET1 as 'A-Z':

$ tr -d 'A-Z' < tr.txt
This is a text file for demonstrating
tr command.
This input file contains digit 2 3 4 and 5
as well.
   
this a lowercase line

We see that the output doesn't have any caps letter. We can also removed a new line and space from a file as follows:

$ tr -d ' 
' < tr.txt > tr_out1.txt

Here, we have redirected the output to tr_out1.txt:

$ cat tr_out1.txt
Thisisatextfilefordemonstratingtrcommand.Thisinputfileconatainsdigit234and5aswell.THISISCAPSLINEthisalowercaseline

Squeezing to a single occurrence

The -s option is useful when we don't want to delete a character throughout the input text, instead we want to squeeze down to a single occurrence if consecutive multiple occurrences of the given character is there.

One of the use-case where it will prove useful is when we have multiple spaces in between two words that we want to bring down to a single space between any two words/strings in the input text. Consider the tr1.txt file as an example:

$ cat tr1.txt
India            China              Canada
USA    Japan               Russia
Germany        France               Italy
Australia   Nepal

By looking into this file, it's quite clear that texts are not properly aligned. There are multiple spaces between two words. We can squeeze multiple spaces to one space using the tr option with -s:

$ tr -s ' ' < tr1.txt 
India China Canada
USA Japan Russia
Germany France Italy
Australia Nepal

Inverting a character set to be translated

Command tr also provides the -c or -C options to invert a character set to be translated. This is useful when we know what is not to be translated.

For example, we want to keep only alphanumeric, newline, and white-space in the text string. Everything should be deleted from the input text. Here, it's easy to specify what not to delete rather than what to delete.

For example, consider the tr2.txt file whose content is as follows:

$ cat tr2.txt
This is an input file.
It conatins special character like ?, ! etc
&^var is an invalid shll variable.
_var1_ is a valid shell variable

To delete characters other than alphanumeric, newline, and white-space, we can run the following command:

tr -cd '[:alnum:] 
' < tr2.txt
This is an input file
It conatins special character like   etc
var is an invalid shll variable
var1 is a valid shell variable
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.21.139