7.10. awk Built-In Functions

7.10.1. String Functions

The sub and gsub Functions

The sub function matches the regular expression for the largest and leftmost substring in the record, and then replaces that substring with the substitution string. If a target string is specified, the regular expression is matched for the largest and leftmost substring in the target string, and the substring is replaced with the substitution string. If a target string is not specified, the entire record is used.

Format

sub (regular expression, substitution string);
sub (regular expression, substitution string, target string)

Example 7.62.
1 % awk ' {sub(/Mac/, "MacIntosh");print} ' filename
2 % awk ' {sub(/Mac/, "MacIntosh", $1); print}' filename
						

Explanation

  1. The first time the regular expression Mac is matched in the record ($0), it will be replaced with the string "MacIntosh." The replacement is made only on the first occurrence of a match on the line. (See gsub for multiple occurrences.)

  2. The first time the regular expression Mac is matched in the first field of the record, it will be replaced with the string "MacIntosh." The replacement is made only on the first occurrence of a match on the line for the target string. The gsub function substitutes a regular expression with a string globally, that is, for every occurrence where the regular expression is matched in each record ($0).

Format

gsub(regular expression, substitution string)
gsub(regular expression, substitution string, target string)

Example 7.63.
1 % awk '{ gsub(/CA/, "California"); print }' datafile
2 % awk '{ gsub(/[Tt]om/, "Thomas", $1 ); print  }' filename
						

Explanation

  1. Everywhere the regular expression CA is found in the record ($0), it will be replaced with the string "California".

  2. Everywhere the regular expression Tom or tom is found in the first field, it will be replaced with the string "Thomas".

The index Function

The index function returns the first position where a substring is found in a string. Offset starts at position 1.

Format

index(string, substring)

Example 7.64.
% awk '{ print index("hollow", "low") }' filename
							4
						

Explanation

The number returned is the position where the substring low is found in hollow, with the offset starting at one.

The length Function

The length function returns the number of characters in a string Without an argument, the length function returns the number of characters in a record.

Format

length ( string )
length 

Example 7.65.
% awk ' { print length("hello") }' filename
							5
						

Explanation

The length function returns the number of characters in the string hello.

The substr Function

The substr function returns the substring of a string starting at a position where the first position is one. If the length of the substring is given, that part of the string is returned. If the specified length exceeds the actual string, the string is returned.

Format

substr(string, starting position)
substr(string, starting position, length of string)

Example 7.66.
   % awk ' { print substr("Santa Claus", 7, 6 )} ' filename
							Claus
						

Explanation

In the string "Santa Claus", print the substring starting at position 7 with a length of 6 characters.

The match Function

The match function returns the index where the regular expression is found in the string, or zero if not found. The match function sets the built-in variable RSTART to the starting position of the substring within the string, and RLENGTH to the number of characters to the end of the substring. These variables can be used with the substr function to extract the pattern. (Works only with awk and gawk.)

Format

match(string, regular expression) 

Example 7.67.
% awk 'END{start=match("Good ole USA", /[A–Z]+$/); print start}'
							filename
							10
						

Explanation

The regular expression, /[A–Z]+$/, says search for consecutive uppercase letters at the end of the string. The substring "USA" is found starting at the tenth character of the string "Good ole USA." If the string cannot be matched, 0 is returned.

Example 7.68.
  1 % awk 'END{start=match("Good ole USA", /[A–Z]+$/);
							print RSTART, RLENGTH}' filename
							10 3
  2 % awk 'BEGIN{ line="Good ole USA"}; 
							END{ match( line, /[A–Z]+$/);
							print substr(line, RSTART,RLENGTH)}' filename
							USA
						

Explanation

  1. The RSTART variable is set by the match function to the starting position of the regular expression matched. The RLENGTH variable is set to the length of the substring.

  2. The substr function is used to find a substring in the variable line, and uses the RSTART and RLENGTH values (set by the match function) as the beginning position and length of the substring.

The toupper and tolower Functions (gawk only)

The toupper function returns a string with all the lowercase characters translated to uppercase, and leaves nonalphabetic characters unchanged. Likewise, the tolower function tranlates all uppercase letters to lowercase. Strings must be quoted.

Format

toupper (string)
tolower (string)

Example 7.69.
% awk 'BEGIN{print toupper("linux"), tolower("BASH 2.0")}'
							LINUX bash 2.0
						

The split Function

The split function splits a string into an array using whatever field separator is designated as the third parameter. If the third parameter is not provided, awk will use the current value of FS.

Format

split (string, array, field separator)
split (string, array)

Example 7.70.
% awk 'BEGIN{split("12/25/99",date,"/");print date[2]}' filename
							25
						

Explanation

The split function splits the string "12/25/99" into an array, called date, using the forward slash as the separator. The array subscript starts at 1. The second element of the date array is printed.

The sprintf Function

The sprintf function returns an expression in a specified format. It allows you to apply the format specifications of the printf function.

Format

variable=sprintf("string with format specifiers ", expr1, expr2, ... 
, expr2) 

Example 7.71.
% awk '{line = sprintf ( "%–15s %6.2f ", $1 , $3 );
       print line}' filename
						

Explanation

The first and third fields are formatted according to the printf specifications (a left-justified, 15-space string and a right-justified, 6-character floating point number). The result is assigned to the user-defined variable line. See "The printf Function" on Page 202.

7.10.2. Time Functions

Gawk provides two functions for getting the time and formatting time stamps. They are the systime and strftime functions.

The systime function

The systime function returns the time of day in non-leap year seconds since January 1, 1970 (called the Epoch).

Format

systime()

Example 7.72.
% awk 'BEGIN{now=systime(); print now}'
							939515282
						

Explanation

The return value of the systime function is returned to a user-defined variable, now.

The value is the time of day in non-leap year seconds since January 1, 1970.

The strftime function

The strftime function formats the time using the C library strftime function. The format specifications are in the form %T %D, etc. (See Table 7.3). The timestamp is in the same form as the return value from systime. If the timestamp is omitted, then the current time of day is used as the default.

Table 7.3. Date and Time Format Specifications
Date Format Definition

Assume the current date and time as:

Date: Sunday, October 17, 1999Time: 15:26:26 PDT
%a Abbreviated weekday name (Sun)
%A Full weekday name (Sunday)
%b Abbreviated month name (Oct)
%B Full month name (October)
%c Date and time for locale (Sun Oct 17 15:26:46 1999)
%d Day of month in decimal (17)
%D Date as 10/17/99 [1]
%e Day of the month, padded with space if only one digit.
%H Hour for a 24 hour clock in decimal (15)
%I Hour for a 12 hour clock in decimal (03)
%j Day of the year since January 1 in decimal (290)
%m Month in decimal (10)
%M Minute in decimal (26)
%p AM/PM notation assuming a 12 hour clock (PM)
%S Second as a decimal number (26)
%U Week number of the year (with the first Sunday as the first day of week one) as a decimal number (42)
%w Weekday (Sunday is 0) as a decimal number (0)
%W The week number of the year (the first Monday as the first day of week one) as a decimal number (41)
%x Date representation for locale (10/17/99)
%X Time representation for locale (15:26:26)
%y Year as two digits in decimal (99)
%Y Year with century (1999)
%Z Time zone (PDT)
%% A literal percent sign (%)

[1] %D and %e are available only on some versions of gawk.

Format

systime([format specification][,timestamp])

Example 7.73.
% awk'BEGIN{now=strftime("%D", systime()); print now}'
							10/09/99

% awk 'BEGIN{now=strftime("%T"); print now}'
							17:58:03

% awk 'BEGIN{now=strftime("%m/%d/%y"); print now}'
							10/09/99
						

Explanation

The strftime function formats the time and date according to the format instruction provided as an argument. (See Table 7.3.) If systime is given as a second argument or no argument is given at all, the current time for this locale is assumed. If a second argument is given, it must be in the same format as the return value from the systime function.

7.10.3. Built-In Arithmetic Functions

Table 7.4 lists the built-in arithmetic functions where x and y are arbitrary expressions.

Table 7.4. Arithmetic Functions
Name Value Returned
atan2(x,y) Arctangent of y/x in the range
cos(x) Cosine of x, with x in radians
exp(x) Exponential function of x, e
int(x) Integer part of x; truncated toward 0 when x > 0
log(x) Natural (base e) logarithm of x
rand( ) Random number r, where 0 <r< 1
sin(x) Sine of x, with x in radians
sqrt(x) Square root of x
srand(x) x is a new seed for rand( )[a]

[a] From Aho, Wienburger, Kernighan. The AWK Programming Language. Addison Wesley, 1988, p. 19.

7.10.4. Integer Function

The int function truncates any digits to the right of the decimal point to create a whole number. There is no rounding off.

Example 7.74.
1 % awk 'END{print 31/3}' filename
						10.3333
2 % awk 'END{print int(31/3})' filename
						10
					

Explanation

  1. In the END block, the result of the division is to print a floating point number.

  2. In the END block, the int function causes the result of the division to be truncated at the decimal point. A whole number is displayed.

7.10.5. Random Number Generator

The rand Function

The rand function generates a pseudorandom floating point number greater than or equal to zero and less than one.

Example 7.75.
% awk '{print rand()}'  filename
							0.513871
							0.175726
							0.308634

% awk {print rand()}' filename
							0.513871
							0.175726
							0.308634
						

Explanation

Each time the program runs, the same set of numbers is printed. The srand function can be used to seed the rand function with a new starting value. Otherwise, the same sequence is repeated each time rand is called.

The srand Function

The srand function without an argument uses the time of day to generate the seed for the rand function. Srand(x) uses x as the seed. Normally, x should vary during the run of the program.

Example 7.76.
% awk 'BEGIN{srand()};{print rand()}' filename
							0.508744
							0.639485
							0.657277

% awk 'BEGIN{srand()};{print rand()}' filename
							0.133518
							0.324747
							0.691794
						

Explanation

The srand function sets a new seed for rand. The starting point is the time of day. Each time rand is called, a new sequence of numbers is printed.

Example 7.77.
% awk 'BEGIN{srand()};{print 1 + int(rand() * 25)}' filename
							6
							24
							14
						

Explanation

The srand function sets a new seed for rand. The starting point is the time of day. The rand function selects a random number between 0 and 25 and casts it to an integer value.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.171.212