awk

awk - Pattern-processing language.

awk(1)                                                               awk(1)

NAME
     awk - pattern-directed scanning and processing language

SYNOPSIS

     awk [-Ffs] [-v var=value] [program | -f progfile ...] [file ...]

DESCRIPTION
     awk scans each input file for lines that match any of a set of
     patterns specified literally in program or in one or more files
     specified as -f progfile.  With each pattern there can be an
     associated action that is to be performed when a line in a file
     matches the pattern.  Each line is matched against the pattern portion
     of every pattern-action statement, and the associated action is
     performed for each matched pattern.  The file name - means the
     standard input.  Any file of the form var=value is treated as an
     assignment, not a filename.  An assignment is evaluated at the time it
     would have been opened if it were a filename, unless the -v option is
     used.

     An input line is made up of fields separated by white space, or by
     regular expression FS.  The fields are denoted $1, $2, ...; $0 refers
     to the entire line.

   Options
     awk recognizes the following options and arguments:

          -F fs          Specify regular expression used to separate
                         fields.  The default is to recognize space and tab
                         characters, and to discard leading spaces and
                         tabs.  If the -F option is used, leading input
                         field separators are no longer discarded.

          -f progfile    Specify an awk program file.  Up to 100 program
                         files can be specified.  The pattern-action
                         statements in these files are executed in the same
                         order as the files were specified.

          -v var=value   Cause var=value assignment to occur before the
                         BEGIN action (if it exists) is executed.

   Statements
     A pattern-action statement has the form:

          pattern { action }

     A missing { action } means print the line; a missing pattern always
     matches.  Pattern-action statements are separated by new-lines or
     semicolons.

     An action is a sequence of statements.  A statement can be one of the
     following:

          if(expression) statement [else statement]
          while(expression) statement
          for(expression;expression;expression) statement
          for(var in array) statement
          do statement while(expression)
          break
          continue
          {[statement ...]}
          expression                   # commonly var=expression
          print[expression-list] [> expression]
          printf format [, expression-list] [> expression]
          return [expression]
          next           # skip remaining patterns on this input line.
          delete array [expression]          # delete an array element.
          exit [expression]      # exit immediately; status is expression.

     Statements are terminated by semicolons, newlines or right braces.  An
     empty expression-list stands for $0.  String constants are quoted
     (""), with the usual C escapes recognized within.  Expressions take on
     string or numeric values as appropriate, and are built using the
     operators +, -, *, /, %, ^ (exponentiation), and concatenation
     (indicated by a blank).  The operators ++, --, +=, -=, *=, /=, %=, ^=,
     **=, >, >=, <, <=, ==, !=, and ?: are also available in expressions.
     Variables can be scalars, array elements (denoted x[i]) or fields.
     Variables are initialized to the null string.  Array subscripts can be
     any string, not necessarily numeric (this allows for a form of
     associative memory).  Multiple subscripts such as [i,j,k] are
     permitted.  The constituents are concatenated, separated by the value
     of SUBSEP.

     The print statement prints its arguments on the standard output (or on
     a file if >file or >>file is present or on a pipe if |cmd is present),
     separated by the current output field separator, and terminated by the
     output record separator.  file and cmd can be literal names or
     parenthesized expressions.  Identical string values in different
     statements denote the same open file.  The printf statement formats
     its expression list according to the format (see printf(3)).

   Built-In Functions
     The built-in function close(expr) closes the file or pipe expr opened
     by a print or printf statement or a call to getline with the same
     string-valued expr.  This function returns zero if successful,
     otherwise, it returns non-zero.

     The customary functions exp, log, sqrt, sin, cos, atan2 are built in.
     Other built-in functions are:

        blength[([s])]    Length of its associated argument (in bytes)
                          taken as a string, or of $0 if no argument.

        length[([s])]     Length of its associated argument (in characters)
                          taken as a string, or of $0 if no argument.

        rand()            Returns a random number between zero and one.

        srand([expr])     Sets the seed value for rand, and returns the
                          previous seed value.  If no argument is given,
                          the time of day is used as the seed value;
                          otherwise, expr is used.

        int(x)            Truncates to an integer value

        substr(s, m[, n]) Return the at most n-character substring of s
                          that begins at position m, numbering from 1.  If
                          n is omitted, the substring is limited by the
                          length of string s.

        index(s, t)       Return the position, in characters, numbering
                          from 1, in string s where string t first occurs,
                          or zero if it does not occur at all.

        match(s, ere)     Return the position, in characters, numbering
                          from 1, in string s where the extended regular
                          expression ere occurs, or 0 if it does not.  The
                          variables RSTART and RLENGTH are set to the
                          position and length of the matched string.

        split(s, a[, fs]) Splits the string s into array elements a[1],
                          a[2], ..., a[n], and returns n.  The separation
                          is done with the regular expression fs, or with
                          the field separator FS if fs is not given.

        sub(ere, repl [, in])
                          Substitutes repl for the first occurrence of the
                          extended regular expression ere in the string in.
                          If in is not given, $0 is used.

        gsub              Same as sub except that all occurrences of the
                          regular expression are replaced; sub and gsub
                          return the number of replacements.

        sprintf(fmt, expr, ...)
                          String resulting from formatting expr ...
                          according to the printf(3S) format fmt

        system(cmd)       Executes cmd and returns its exit status

        toupper(s)        Converts the argument string s to uppercase and
                          returns the result.

        tolower(s)        Converts the argument string s to lowercase and
                          returns the result.

     The built-in function getline sets $0 to the next input record from
     the current input file; getline < file sets $0 to the next record from
     file.  getline x sets variable x instead.  Finally, cmd | getline
     pipes the output of cmd into getline; each call of getline returns the
     next line of output from cmd.  In all cases, getline returns 1 for a
     successful input, 0 for end of file, and -1 for an error.

   Patterns
     Patterns are arbitrary Boolean combinations (with ! || &&) of regular
     expressions and relational expressions.  awk supports Extended Regular
     Expressions as described in regexp(5).  Isolated regular expressions
     in a pattern apply to the entire line.  Regular expressions can also
     occur in relational expressions, using the operators ~ and !~.  /re/
     is a constant regular expression; any string (constant or variable)
     can be used as a regular expression, except in the position of an
     isolated regular expression in a pattern.

     A pattern can consist of two patterns separated by a comma; in this
     case, the action is performed for all lines from an occurrence of the
     first pattern though an occurrence of the second.

     A relational expression is one of the following:

          expression matchop regular-expression
          expression relop expression
          expression in array-name
          (expr,expr,...) in array-name

     where a relop is any of the six relational operators in C, and a
     matchop is either ~ (matches) or !~ (does not match).  A conditional
     is an arithmetic expression, a relational expression, or a Boolean
     combination of the two.

     The special patterns BEGIN and END can be used to capture control
     before the first input line is read and after the last.  BEGIN and END
     do not combine with other patterns.

   Special Characters
     The following special escape sequences are recognized by awk in both
     regular expressions and strings:

          Escape    Meaning
            a      alert character
                  backspace character
            f      form-feed character
            
      new-line character
            
      carriage-return character
            	      tab character
            v      vertical-tab character
            
nn    1- to 3-digit octal value nnn
            xhhh   1- to n-digit hexadecimal number

   Variable Names
     Variable names with special meanings are:

          FS                Input field separator regular expression; a
                            space character by default; also settable by
                            option -Ffs.

          NF                The number of fields in the current record.

          NR                The ordinal number of the current record from
                            the start of input. Inside a BEGIN action the
                            value is zero. Inside an END action the value
                            is the number of the last record processed.

          FNR               The ordinal number of the current record in the
                            current file. Inside a BEGIN action the value
                            is zero. Inside an END action the value is the
                            number of the last record processed in the last
                            file processed.

          FILENAME          A pathname of the current input file.

          RS                The input record separator; a newline character
                            by default.

          OFS               The print statement output field separator; a
                            space character by default.

          ORS               The print statement output record separator; a
                            newline character by default.

          OFMT              Output format for numbers (default %.6g).  If
                            the value of OFMT is not a floating-point
                            format specification, the results are
                            unspecified.

          CONVFMT           Internal conversion format for numbers (default
                            %.6g).  If the value of CONVFMT is not a
                            floating-point format specification, the
                            results are unspecified.

          SUBSEP            The subscript separator string for multi-
                            dimensional arrays; the default value is " 34"


          ARGC              The number of elements in the ARGV array.

          ARGV              An array of command line arguments, excluding
                            options and the program argument numbered from
                            zero to ARGC-1.

                            The arguments in ARGV can be modified or added
                            to; ARGC can be altered. As each input file
                            ends, awk will treat the next non-null element
                            of ARGV, up to the current value of ARGC-1,
                            inclusive, as the name of the next input file.
                            Thus, setting an element of ARGV to null means
                            that it will not be treated as an input file.
                            The name - indicates the standard input. If an
                            argument matches the format of an assignment
                            operand, this argument will be treated as an
                            assignment rather than a file argument.

          ENVIRON           Array of environment variables; subscripts are
                            names.  For example, if environment variable
                            V=thing, ENVIRON["V"] produces thing.

          RSTART            The starting position of the string matched by
                            the match function, numbering from 1. This is
                            always equivalent to the return value of the
                            match function.

          RLENGTH           The length of the string matched by the match
                            function.

     Functions can be defined (at the position of a pattern-action
     statement) as follows:

          function foo(a, b, c) { ...; return x }

     Parameters are passed by value if scalar, and by reference if array
     name.  Functions can be called recursively.  Parameters are local to
     the function; all other variables are global.

     Note that if pattern-action statements are used in an HP-UX command
     line as an argument to the awk command, the pattern-action statement
     must be enclosed in single quotes to protect it from the shell.  For
     example, to print lines longer than 72 characters, the pattern-action
     statement as used in a script (-f progfile command form) is:

          length > 72

     The same pattern action statement used as an argument to the awk
     command is quoted in this manner:

          awk 'length > 72'

EXTERNAL INFLUENCES
   Environment Variables
     LANG           Provides a default value for the internationalization
                    variables that are unset or null.  If LANG is unset or
                    null, the default value of "C" (see lang(5)) is used.
                    If any of the internationalization variables contains
                    an invalid setting, awk will behave as if all
                    internationalization variables are set to "C".  See
                    environ(5).

     LC_ALL         If set to a non-empty string value, overrides the
                    values of all the other internationalization variables.


     LC_CTYPE       Determines the interpretation of text as single and/or
                    multi-byte characters, the classification of characters
                    as printable, and the characters matched by character
                    class expressions in regular expressions.

     LC_NUMERIC     Determines the radix character used when interpreting
                    numeric input, performing conversion between numeric
                    and string values and formatting numeric output.
                    Regardless of locale, the period character (the
                    decimal-point character of the POSIX locale) is the
                    decimal-point character recognized in processing awk
                    programs (including assignments in command-line
                    arguments).

     LC_COLLATE     Determines the locale for the behavior of ranges,
                    equivalence classes and multi-character collating
                    elements within regular expressions.

     LC_MESSAGES    Determines the locale that should be used to affect the
                    format and contents of diagnostic messages written to
                    standard error and informative messages written to
                    standard output.

     NLSPATH        Determines the location of message catalogues for the
                    processing of LC_MESSAGES.

     PATH           Determines the search path when looking for commands
                    executed by system(cmd), or input and output pipes.

     In addition, all environment variables will be visible via the awk
     variable ENVIRON.

   International Code Set Support
     Single- and multi-byte character code sets are supported except that
     variable names must contain only ASCII characters and regular
     expressions must contain only valid characters.

DIAGNOSTICS
     awk supports up to 199 fields ($1, $2, ..., $199) per record.

EXAMPLES
     Print lines longer than 72 characters:

          length > 72

     Print first two fields in opposite order:

          { print $2, $1 }

     Same, with input fields separated by comma and/or blanks and tabs:

          BEGIN { FS = ",[ 	]*|[ 	]+" }
                { print $2, $1 }

     Add up first column, print sum and average:

                  { s += $1 }"
          END     { print "sum is", s, " average is", s/NR }

     Print all lines between start/stop pairs:

          /start/, /stop/

     Simulate echo command (see echo(1)):

          BEGIN   {                             # Simulate echo(1)
                  for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
                  printf "
"
                  exit }

AUTHOR
     awk was developed by AT&T, IBM, OSF, and HP.

SEE ALSO
     lex(1), sed(1).
     A. V. Aho, B. W. Kernighan, P. J. Weinberger: The AWK Programming
     Language, Addison-Wesley, 1988.

STANDARDS CONFORMANCE
     awk: SVID2, SVID3, XPG2, XPG3, XPG4, POSIX.2

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.108.18