awk(1)								       awk(1)



NAME
  awk, nawk - Finds lines in files and makes specified changes to them

SYNOPSIS

  awk [-F ere] [-fprogram_file]...  [-v var=val]...  [argument]...

  awk [-F ere] [-v var=val]...	['program_text'] [argument]...

  The awk and nawk commands are synonyms for the same program.

FLAGS

  -F ere
      Defines ere (extended regular expression) as the value of the input
      field separator before any input is read.	 Using this option is compar-
      able to assigning a value to the builtin variable FS.

  -f program_file
      Specifies the pathname (program_file) of a file containing a awk pro-
      gram.  If multiple instances of this option are specified, the concate-
      nation of the files specified as program_file in the order specified is
      the awk program.	The awk program can alternatively be specified in the
      command line as a single argument (program_text).

  -v var=val
      The var=val argument is an assignment operand that specifies a value
      (val) for a variable (var).  The specified variable assignment occurs
      prior to executing the awk program, including the actions associated
      with BEGIN patterns (if any are in the program).	Multiple occurrences
      of the -v option can be specified on the awk command line.

DESCRIPTION

  The awk command executes programs written in the awk programming language,
  a powerful pattern matching utility for textual data manipulation.  An awk
  program is a sequence of patterns and corresponding actions that are car-
  ried out when a pattern is read.  The awk command is a more powerful tool
  for text manipulation than either sed or grep.  The awk command differs
  from the commands oawk and gawk in that awk conforms to the X/Open Porta-
  bility Guide, Issue 4 (XPG4). The awk command is therefore capable of han-
  dling multibyte characters that occur in coded character sets defined for
  some native languages.

  The awk command:

    +  Performs convenient numeric processing

    +  Allows variables within actions

    +  Allows general selection of patterns

    +  Allows control flow in the actions

    +  Does not require any compiling of programs

  The pattern-matching and action statements of the awk language can be
  specified either on the command line or in a program file.  In either case,
  the awk command first reads all program statements.

  If -f program_file is not specified, the first operand to awk is
  program_text, delimited by single quotation (') characters.

  Execution of an awk program starts by executing the actions associated with
  all BEGIN patterns in the order they occur in the program.  Then, each
  operand in an input-file argument (or standard input if an input file is
  not specified) is processed in turn by:

    +  Reading input data until a record separator is seen (a newline charac-
       ter by default)

    +  Splitting the current record into fields using the current value of FS

    +  Evaluating each pattern in the program in the order of occurrence

    +  Executing the action associated with each pattern that matches the
       current record

       The action for a matching pattern is executed before evaluating subse-
       quent patterns.	The actions associated with all END patterns are exe-
       cuted in program order.

  The following two types of argument can be intermixed:

  input_file
      A pathname of a file that contains the input to be read, which is
      matched against the set of patterns in the program.  If no input_file
      operands are specified, or if the input_file argument is -, standard
      input is used.

  var=val
      The characters before the = represent the name of an awk variable.  If
      that name is an awk reserved word, the behavior is undefined.  The
      characters following the = are interpreted as if they appeared in the
      awk program preceded and followed by a double quotation (") character,
      in other words, as a string value.  If the value is considered a
      numeric string, the variable is assigned a numeric value.	 Each such
      variable assignment occurs just prior to the processing of the follow-
      ing program_file, if any.	 Thus, an assignment before the first
      program_file argument is executed after the BEGIN actions (if any),
      while an assignment after the last program_file argument occurs before
      the END actions (if any).	 If there are no program_file arguments,
      assignments are executed before processing the standard input.

  Refer to the EXAMPLES section for an example that demonstrates the results
  of specifying a variable assignment as a flag argument or command argument
  in different positions on the awk command line.

  The awk command reads input data in the order stated on the command line.
  If you specify input_file as a - (dash) or do not specify a filename, awk
  reads standard input.

  The awk command reads input data from any of the following sources:

    +  Any input_file operands or their equivalents, which can be affected by
       modifying the awk variables ARGV and ARGC

    +  Standard input, in the absence of any input_file operands

    +  Arguments to the getline function

  Input files must be text files.  When the builtin variable RS is set to a
  value other than a newline character, awk supports records terminated with
  the specified separator up to LINE_MAX bytes.

  Pattern-action statements on the command line are enclosed in ' (single
  quote characters) to protect them from interpretation by the shell.  Con-
  secutive pattern-action statements on the same command line are separated
  by a ; (semicolon), within one set of quote delimiters.

  When you expect awk to read standard input, and you want to assign a value
  to a variable on the command line, you must specify - after the variable
  assignment.

  By default, the awk command treats input lines as records, separated by
  spaces, tabs, or a field separator you set with the FS variable.  (When a
  space character is the field separator, multiple spaces are recognized as a
  single separator.) Fields are referenced as $1, $2, and so on.  The refer-
  ence $0 specifies the entire record (by default, a line).

  Program Structure


  A awk program is composed of pairs of the form:

       pattern { action }

  Either the pattern or the action (including the enclosing brace characters)
  can be omitted.

  If pattern lacks a corresponding action, awk writes the entire record that
  contains the pattern to standard output.  If action lacks a corresponding
  pattern, awk applies the action to every record.

  Actions


  An action is a sequence of statements that follow C language syntax.	Any
  single statement can be replaced by a statement list enclosed in braces.
  When statement is a list of statements, they must be separated by newline
  characters or semicolons, and are executed sequentially in order of appear-
  ance.	 Statements in the awk language include:

  break

  continue

  delete array [ expression ]

  exit [ expression ]

  for (expression;expression;expression) statement

  for (variable in array) statement

  if (expression) statement [ else statement ]

  next

  print [ expression_list ] [ >file | >>file ]
  [ | command ]

  printf format[ ,expression_list ]
  [ >file | >>file  ] [ | command ]

  printf format [ ,expression_list  ] [ >file ]

  while (expression) statement

  variable=expression

  Statements can end with a semicolon, a newline character, or the right
  brace enclosing the action:

  { [ statement ... ] }

  Expressions can have string or numeric values and are built using the
  operators +, -, *, /, %, a space for string concatenation, and the C opera-
  tors ++, --, +=, -=, *=, /=, =, ^=, ?:, >, >=, <, <=, ==, $, (), ~, !~, in,
  ||, &&, !, and !=.

  Because the actions process fields, input white space is not preserved in
  the output.

  The file and command arguments in awk statements can be literal names or
  expressions enclosed in double quotation (") characters.  Identical string
  values in different statements refer to the same open file.

  The print statement writes its arguments to standard output (or to a file
  if > file or >> file is present), separated by the current output field
  separator and terminated by the current output record separator.

  The printf statement formats its expression list according to the format of
  the printf subroutine, and writes it arguments to standard output,
  separated by the output field separator and terminated by the output record
  separator.  You can redirect the output into a file using the print ... >
  file or printf( ...  ) > file statements.

  Variables


  Variables can be scalars, array elements (denoted x[i]), or fields.  With
  the exception of function parameters, variables are not explicitly
  declared.

  Variable names can consist of uppercase and lowercase alphabetic letters,
  the underscore character, the digits (0 to 9), and extended characters.
  Variable names cannot begin with a digit. Field variables are designated by
  $ (dollar sign), followed by a number or numerical expression.  The effect
  of the field number expression evaluating to anything other than a non-
  negative integer is unspecified.

  Variables are initialized to the null string.	 Array subscripts can be any
  string; they do not have to be numeric.  This allows for a form of associa-
  tive memory.	Enclose string constants in expressions in double quotation
  (") characters.

  There are several variables with special meaning to awk.  They include:

  ARGC
      The number of elements in the ARGV array.

  ARGV
      An array of command line arguments, excluding options and the
      program_file arguments, numbered from zero to ARGC-1.

      The arguments in ARGV can be modified or added to; ARGC can be altered.
      As each input file ends, awk treats the next non-null element of ARGV,
      up to and including the current value of ARGC-1, as the name of the
      next input file.	Therefore, setting an element of ARGV to null means
      that it is not be treated as an input file.  When the element is the
      character -, standard input is specified.	 When the element matches the
      format for an assignment (variable=value), the element is treated as an
      assignment rather than as the name of an awk input file.

  CONVFMT
      The PRINTF format for converting numbers to strings (except for output
      statements, where OFMT is used); %.6g by default.

  ENVIRON
      The variable ENVIRON is an array representing the value of the environ-
      ment.  The indexes of the array are strings consisting of the names of
      the environmental variables, and the value of each array element is a
      string consisting of the value of that variable.

  FILENAME
      The name of the current input file.  Inside a BEGIN action, the
      FILENAME value is undefined.  Inside an END action, the the value is
      the name of the last input file processed.

  FNR The ordinal number of the current input line (record) in the current
      file.  Inside a BEGIN action, the value is zero.	Inside an END action,
      the value is the number of the last record processed in the last file
      processed.

  FS  Input field separator (default is a space). If it is a space, then any
      number of spaces and tabs can separate fields.

  NF  The number of fields in the current input line (record) with a limit of
      99.

  NR  The number of the current input line (record).

  OFS The print statement output field separator (default is a space).

  ORS The print statement output record separator (default is a newline char-
      acter).

  OFMT
      The printf statement output format for converting numbers to strings in
      output statements (default is %.6g).

  RLENGTH
      The length of the string matched by the match function.

  RS  Input record separator (default is a newline character).

  RSTART
      The starting position of the string matched by the match function,
      numbering from 1.	 This is always equivalent to the return value of the
      match function.

  SUBSEP
      The subscript separator string for multi-dimensional arrays.

  Functions


  There are a variety of built-in functions that can be used in awk actions.

  Arithmetic Functions


  The arithmetic functions, except for int, are based on the ISO C standard.
  The behavior is undefined in cases where the ISO C standard specifies that
  an error be returned or that the behavior is undefined.

  atan2 (y,x)
      Return arctangent of y/x.

  cos (x)
      Return cosine of x, where x is in radians.

  sin (x)
      Return sine of x where x is in radians.

  exp (x)
      Return the exponential factor of x.

  log (x)
      Return the natural logarithm of x.

  sqrt (x)
      Return the square root of x.

  int (x)
      Truncate its argument to an integer.  It is truncated toward 0 when x >
      0.

  rand ()
      Return a random number n, such that 0 < -n > 1.

  srand([expr])
      Set the seed value for rand to expr or use the time of day if expr is
      omitted.	The previous seed value is returned.

  String Functions


  gsub(ere, repl[, in])
      Behave like sub (see below), except replace all occurrences of the reg-
      ular expression (like the ed utility global substitute) in $0 or in the
      in argument, when specified.

  index(s, t)
      Return the position, in characters, numbering from 1, in string s where
      string t first occurs, or zero if it does not occur at all.

  length[([)]
      Return the length, in characters, of its argument taken as a string, or
      of the whole record, $0, if there is no argument.

  match(s, ere)
      Return the position, in characters, numbering from 1, in string s where
      the extended regular expression ere occurs, or zero if it does not
      occur at all.  RSTART is set to the starting position, zero if no match
      is found; RLENGTH is set to the length of the matched string, -1 if no
      match is found.

  split(s, a[fs])
      Split the string s into array elements a[1], a[2], the extended regular
      expression fs or with the field separator FS if fs is not given.	Each
      array element has a string value when created.  If the string assigned
      to any array element, with any occurrence of the decimal point charac-
      ter from the current locale changed to a period character, would be
      considered a numeric string, the array element also has the numeric
      value of the numeric string.  The effect of a null string as the value
      of fs is unspecified.

  sprintf(fmt, expr, expr, ...)
      Format the expressions according to the printf format given by fmt and
      return the resulting string.

  sub(ere, repl[, in])
      Substitute the string repl in place of the first instance of the
      extended regular expression ERE in string in and return the number of
      substitutions.  An ampersand (&) appearing in the string repl is
      replaced by the string from in that matches the regular expression.
      For each occurrence of backslash (\) encountered when scanning the
      string repl from beginning to end, the next character is taken
      literally and loses its special meaning (for example, \& is interpreted
      as a literal ampersand character).  Except for & and \, it is unspeci-
      fied what the special meaning of any such character is.  If in is
      specified and it is not an lvalue, the behavior is undefined.  If in is
      omitted, awk substitutes in the current record ($0).

  substr(s, m[,n])
      Return the at most n character substring of s that begins at position
      m, numbering from 1.  If n is missing, the length of the substring is
      limited by the length of the string s.

  tolower(s)
      Return a string based on the string s.  Each character in s that is an
      upper case letter specified to have a tolower mapping by the LC_TYPE
      category of the current locale is replaced in the returned string by
      the lower case letter specified by the mapping.  Other characters in s
      are unchanged in the returned string.

  toupper(s)
      Return a string based on the string s.  Each character in s that is a
      lower case letter specified to have a toupper mapping by the LC_TYPE
      category of the current locale is replaced in the returned string by
      the upper case letter specified by the mapping.  Other characters in s
      are unchanged in the returned string.

  Input/Output and General Functions


  close(expression)
     Close the file or pipe opened by a print or printf statement or a call
     to getline with the same string-valued expression.	 If the close was
     successful, the function returns zero; otherwise, it returns non-zero.

  expression | getline [var]
     Read a record of input from a stream piped from the output of a command.
     The stream is created if no stream is currently open with the value of
     expression as its common name.  The stream created is equivalent to one
     created by a call to the popen function with the value of expression as
     the command argument and a value of r as the mode argument.  As long as
     the stream remains open, subsequent calls in which expression evaluates
     to the same string read subsequent records from the file.	The stream
     will remain open until the close function is called with an expression
     that evaluates to the same string value.  At that time, the stream will
     be closed as if by a call to the pclose function.	If var is missing, $0
     and NF will be set; otherwise, var will be set.

  getline
     Set $0 to the next input record from the current input file.  This form
     of getline sets the NF, NR, and FNR variables.

  getline var
     Set variable var to the next input record from the current input file.
     This form of getline sets the FNR and NR variables.

  getline [var] < expression
     Read the next record of input from a named file.  The expression is
     evaluated to produce a string that is used as a full pathname.  If the
     file of that name is not currently open, it is opened.  As long as the
     stream remains open, subsequent calls in which expression evaluates to
     the same string value, read subsequent records from the file.  The file
     remains open until the close function is called with an expression that
     evaluates to the same string value.  If var is missing, $0 and NF are
     set; otherwise, var is set.

  system(expression)
     Execute the command given by expression in a manner equivalent to the
     system function and return the exit status to the command.

  All forms of getline return 1 for successful input, zero for end of file,
  and -1 for an error.

  The getline function sets $0 to the next input record from the current
  input file; getline < file sets $0 to the next record from file.  The func-
  tion getline x sets variable x instead.  Finally, command| getline pipes
  the output of command into getline.  Each call of getline returns the next
  line of output from command.	In all cases, getline returns 1 for a suc-
  cessful input, 0 (zero) for End-of-File, and -1 for an error.

  The getline function sets $0 to the next input record from the current
  input file.  The getline function returns 1 for a successful input and 0
  for End-of-File.

  Where strings are used as the name of a file or pipeline, the strings must
  be textually identical.  The terminology "same string value" implies that
  "equivalent strings", even those that differ only by space characters,
  represent different files.

  User-defined Functions


  The awk language also provides user-defined functions.  Such functions can
  be defined as:

  function name(args,...) { statements }

  A function can be referred to anywhere in a awk program; in particular, the
  function's use can precede the function definition.  The scope of a func-
  tion is global.

  Function arguments can be either scalars or arrays; the behavior is unde-
  fined if an array name is passed as an argument that the function uses as a
  scalar, or if a scalar expression is passed as an argument that the func-
  tion uses as an array.  Function arguments are passed by value if scalar
  and by reference if array name.  Argument names are local to the function;
  all other variable names are global.	The same name will not be used as
  both an argument name and as the name of a function or special awk vari-
  able.	 The same name must not be used both as a variable name with global
  scope and as the name of a function.	The same name must not be used within
  the same scope both as a scalar variable and as an array.

  The number of parameters in the function definition need not match the
  number of parameters in the function call.  Excess formal parameters can be
  used as local variables.  If fewer arguments are supplied in a function
  call than are in the function definition, the extra parameters that are
  used in the function body as scalars is initialized with a string value of
  the null string and a numeric value of zero, and the extra parameters that
  are used in the function body as arrays are initialized as empty arrays.
  If more arguments are supplied in a function call than are in the function
  definition, the behavior is undefined.

  When invoking a function, no white space can be placed between the function
  name and the opening parenthesis.  Function calls can be nested and recur-
  sive calls can be made upon functions.  Upon return from any nested or
  recursive function call, the values of all the calling function's parame-
  ters are unchanged, except for array parameters passed by reference.	The
  return statement can be used to return a value.

  Patterns



  Patterns are arbitrary Boolean combinations of patterns and relational
  expressions (the !, |, and & operators and parentheses for grouping).	 You
  must start and end regular expressions with slashes.	You can use regular
  expressions as described for grep, including the following special charac-
  ters:

  +   One or more occurrences of the pattern.

  ?   Zero or one occurrence of the pattern.

  |   Either of two statements.

  ( ) Grouping of expressions.

  Isolated regular expressions in a pattern apply to the entire line.  Regu-
  lar expressions can occur in relational expressions.	Any string (constant
  or variable) can be used as a regular expression, except in the position of
  an isolated regular expression in a pattern.

  If two patterns are separated by a comma, the action is performed on all
  lines between an occurrence of the first pattern and the next occurrence of
  the second.

  There are two types of relational expressions that you can use.  The first
  type has the form:

  expression  match_operator  pattern

  where match_operator is either: ~ (for contains) or !~ (for does not con-
  tain).

  The second type has the form:

  expression  relational_operator  expression

  where relational_operator is any of the six C relational operators: <, >,
  <=, >=, ==, and !=.  An expression can be an arithmetic expression, a rela-
  tional expression, or a Boolean combination of these.

  Special Patterns


  You can use the BEGIN and END special patterns to capture control before
  the first and after the last input line is read, respectively.  BEGIN must
  be the first pattern; END must be the last.

  Each BEGIN pattern is matched once and its associated action executed
  before the first record of input is read and before command line assignment
  is done.  Each END pattern is matched once and its associated action exe-
  cuted after the last record of input has been read.  These two patterns
  have associated actions.

  BEGIN and END do not combine with other patterns.  Multiple BEGIN and END
  patterns are allowed.	 The actions associated with the BEGIN patterns is
  executed in the order specified in the program, as are the END actions.  An
  END pattern can precede a BEGIN pattern in a program.

  You have two ways to designate an extended regular expression other than
  white space to separate fields.  You can use the -Fere flag on the command
  line, or you can assign a string with the expression to the builtin
  variable FS.	Either action changes the field separator to ere.

  There are no explicit conversions between numbers and strings.  To force an
  expression to be treated as a number, add 0 to it.  To force it to be
  treated as a string, append a null string ("").

EXAMPLES

   1.  To display the file lines that are longer than 72 bytes, enter:


	    % awk  'length  >72'  chapter1

       This command selects each line of the file chapter1 that is longer
       than 72 bytes.  The command then writes these lines to standard output
       because no action is specified.

   2.  To display all lines between the words start and stop, enter:


	    % awk  '/start/,/stop/'  chapter1

   3.  To run an awk program (sum2.awk) that processes a file (chapter1),
       enter:


	    % awk  -f  sum2.awk	 chapter1

   4.  The following awk program computes the sum and average of the numbers
       in the second column of the input file:


		    {
			    sum += $2
		    }
	    END	    {
		    print "Sum: ", sum;
		    print "Average:", sum/NR;
		    }

       The first action adds the value of the second field of each line to
       the sum variable.  The awk command initializes sum, and all variables,
       to 0 (zero) before starting.  The keyword END before the second action
       causes awk to perform that action after all of the input file is read.
       The NR variable, which is used to calculate the average, is a special
       variable containing the number of records (lines) that were read.

   5.  To print the names of the users who have the C shell as the initial
       shell, enter:


	    % awk  -F: '$7 ~ /csh/ {print $1}' /etc/passwd

   6.  To print the first two fields in reversed order, enter:


	    % awk '{ print $2, $1 }'

   7.  The following awk program prints the first two fields of the input
       file in reversed order, with input fields separated by a comma, then
       adds up the first column and prints the sum and average:


	    BEGIN   { FS = "," }
		    { print $2, $1}
		    { s += $1 }
	    END	    { print "sum is", s, "average is", s/NR }

   8.  The following example shows how command line assignments synchronize
       with awk program statements.

       Consider the following set of awk statements that make up a program
       named test_program:


	    BEGIN { if (RS == ":")
		    print "Assignment in effect for BEGIN statements"
		  }
		  { if (RS == ":")
		    print "Assignment in effect for middle statements"
		  }
	    END	  { if (RS == ":")
		    print "Assignment in effect for END statements"
		  }

       Notice the different results that are produced by different ways of
       assigning a value to RS on the awk command line.	 The file text_file
       contains the line "Hello, Hello".


	    % awk -f test_program -v RS=: text_file
	    Assignment in effect for BEGIN statements
	    Assignment in effect for middle statements
	    Assignment in effect for END statements


	    % awk -f test_program RS=: text_file
	    Assignment in effect for middle statements
	    Assignment in effect for END statements


	    % awk -f test_program text_file RS=:
	    Assignment in effect for END statements

RELATED INFORMATION

  Commands:  gawk(1), grep(1), oawk(1), sed(1).

  Functions:  printf(3).

  Programming Support Tools