sort(1)								      sort(1)



NAME
  sort - Sorts or merges files

SYNOPSIS

  sort [-m] [-o output_file] [-Abdfinru] [-k keydef] ...  [-t character]
  [-T directory] [-y][kilobytes] [-z record_size] ...  file ...

  sort -c [-u] [-Abdfinru] [-k keydef] ...  [-t character] [-T directory]
  [-y][kilobytes] [-z record_size] ...	file ...

  The following older syntax is now maintained for backward compatibility,
  but may be withdrawn in future issues:

  sort [-Abcdfimnru] [-o output_file] [-t character] [-T directory]
  [-y][kilobytes] [-z record_size] [+fskip][.cskip] [-fskip][.cskip]
  [-bdfinr] ...	 file ...

FLAGS

  The -d, -f, -i, -n, and -r flags override the default ordering rules.	 When
  ordering flags appear independent of any key field specifications, the
  requested field ordering rules are applied globally to all sort keys.	 When
  attached to a specific key (see -k), the specified ordering flags override
  all global ordering flags for that key.  In the obsolescent forms, if one
  or more of these flags follows a +fskip flag, it affects only the key field
  specified by that preceding flag.

  -A  Sorts on a byte-by-byte basis using each character's encoded value.  On
      some systems, extended characters will be considered negative values,
      and so sort before ASCII characters.  If you are sorting ASCII charac-
      ters in a non-C/POSIX locale, this flag performs much faster.

  -b  Ignores leading spaces and tabs when determining the starting and end-
      ing positions of a restricted sort key.  If the -b flag is specified
      before the first -k flag, the -b flag is applied to all -k flags on the
      command line; otherwise, the -b flag can be independently attached to
      each -k field_start or field_end argument.

  -c  Checks that the input is sorted according to the ordering rules speci-
      fied in the flags and the collating sequence of the current locale.  No
      output is produced; only the exit code is affected.

  -d  Specifies that only spaces and alphanumeric characters (according to
      the current setting of LC_TYPE) are significant in comparisons.

  -f  Treats all lowercase characters as their uppercase equivalents (accord-
      ing to the current setting of LC_TYPE) for the purposes of comparison.

  -i  Sorts only by printable characters (according to the current setting of
      LC_TYPE).

  -k keydef
      Specifies one or more (up to 10) restricted sort key field definitions.
      This flag replaces the obsolescent +fskip.cskip and -fskip.cskip flags.
      A field comprises a maximal sequence of non-separating characters and,
      in the absence of the -t flag, any preceding field separator.

      The format of a key field definition is as follows:
	   field_start[type][,field_end[type]]
      where the field_start and field_end arguments define a key field that
      is restricted to a portion of the line, and type is a modifier speci-
      fied by b, d, f, i, n, or r.  The b modifier behaves like the -b flag,
      but applies only to the field_start or field_end argument to which it
      is attached.  The other modifiers behave like their corresponding
      flags, but apply only to the key field to which they are attached;
      these modifiers have this effect if specified with field_start,
      field_end or both.

      Modifiers attached to a field_start or field_end argument override any
      specifications made by the flags.	 A missing field_end argument means
      the last character of the line.

      The field_start portion of the keydef argument takes the following
      form:
	   field_number[.first_character]
      Fields and characters within fields are numbered starting with 1.	 The
      field_number and first_character pieces, interpreted as positive
      decimal integers, specify the character to be used as part of a sort
      key.  If first_character is not specified, the default is the first
      character of the field.

      The field_end portion of the keydef argument takes the following form:
	   field_number[.last_character]
      The field_number is the same as that described for field_start.  The
      last_character argument, interpreted as a nonnegative decimal integer,
      specifies the last character to be used as part of the sort key.	If
      last_character evaluates to 0 (zero) or is not specified, the default
      is the last character of the field specified by field_number.

      If -b is in effect, characters within a field are counted from the
      first nonspace character in the field.  (This applies separately to
      first_character and last_character.)

      If -k is not specified, the default sort key is the entire line.

      When there are multiple key fields, later keys are compared only after
      all earlier keys compare as equal.  Except when the -u flag is speci-
      fied, lines that otherwise compare as equal are ordered as though none
      of the flags -d, -f, -i, -n, or -k were present (but with -r still in
      effect, if it was specified) and with all bytes in the lines signifi-
      cant to the comparison.

      The algorithm for the -k flag can be summarized as follows:
	   /*
	    * -ka.b,c.d = if d==0 then +(a-1).(b-1) -c.d
	    *		   else +(a-1).(b-1) -(c-1).d
	    */

  -m  Merges only (assumes sorted input).

  -n  Sorts any initial numeric strings (including regular expressions con-
      sisting of optional spaces, optional dashes, and zero (0) or more
      digits with optional radix character and thousands separator, as
      defined by the current locale) by arithmetic value.  An empty digit
      string is treated as zero; leading zeros and signs on zeros do not
      affect ordering.	Only one period (.) can be used in numeric strings.
      All subsequent periods (.) and any character to the right of the period
      (.) will be ignored.

  -o output_file
      Directs output to output_file instead of standard output.	 The
      output_file can be the same as one of the input files.

  -r  Reverses the order of the specified sort.

  -t character
      Sets the field separator character to character.	The character argu-
      ment is not considered to be part of a field (although it can be
      included in a sort key).	Each occurrence of character is significant
      (for example, two consecutive occurrences of character delimit an empty
      field).  To specify the tab character as the field separator, you must
      enclose it in ' ' (single quotes).

      The default field separator is one or more spaces.

  -T directory
      Places all the temporary files that are created in directory.

  -u  Suppresses all but one in each set of equal lines (for example, lines
      whose sort keys match exactly).  Ignored characters such as leading
      tabs and spaces, and characters outside of sort keys are not considered
      in this type of comparison.

      If used with the -c flag, -u checks that there are no lines with dupli-
      cate keys, in addition to checking that the input file is sorted.

  -y [kilobytes]
      Starts the sort command using kilobytes of main storage and adds
      storage as needed.  (If kilobytes is less than the minimum storage size
      or greater than the maximum, the minimum or maximum is used instead.)
      If the -y flag is omitted, the sort command starts with the default
      storage size; -y 0 starts with minimum storage, and -y (with no value)
      starts with the maximum storage.	The amount of storage used by the
      sort command has a significant impact on performance.  Sorting a small
      file in a large amount of storage is wasteful.

  -z record_size
      Prevents abnormal termination if lines being sorted are longer than the
      default buffer size can handle.  When the -c or -m flags are specified,
      the sorting phase is omitted and a system default size buffer is used.
      If sorted lines are longer than this size, sort terminates abnormally.
      The -z option specifies that the longest line be recorded in the sort
      phase so that adequate buffers can be allocated in the merge phase.
      The record_size argument must be a value in bytes equal to or greater
      than the number of bytes in the longest line to be merged.

  +fskip.cskip
      Specifies the start position of a key field.  See the -k flag for a
      description of the current way to perform this operation.	 (Obsoles-
      cent)

      The fskip variable specifies the number of fields to skip from the
      beginning of the input line, and the cskip variable specifies the
      number of additional characters to skip to the right beyond that point.
      For both the starting point (+fskip.cskip) and the ending point
      (-fskip.cskip) of a sort key, fskip is measured from the beginning of
      the input line, and cskip is measured from the last field skipped.  If
      you omit .cskip, .0 (zero) is assumed.  If you omit fskip, 0 (zero) is
      assumed.	If you omit the ending field specifier (-fskip.cskip), the
      end of the line is the end of the sort key.

      You can supply more than one sort key by repeating +fskip.cskip and
      -fskip.cskip.  In cases where you specify more than one sort key, keys
      specified further to the right on the command line are compared only
      after all earlier keys are sorted.  For example, if the first key is to
      be sorted in numerical order and the second according to the collating
      sequence, all strings that start with the number 1 are sorted according
      to the collating order before the strings that start with the number 2.
      Lines that are identical in all keys are sorted with all characters
      significant.  You can also specify different flags for different sort
      keys in multiple sort keys.

  -fskip.cskip
      Specifies the end position of a key field.  See the -k flag for a
      description of the current way to perform this operation.	 (Obsoles-
      cent)

DESCRIPTION

  The sort command sorts lines in its input files and writes the result to
  standard output.

  The sort command performs one of the following functions:

   1.  Sorts lines of all the named files together and writes the result to
       the specified output.

   2.  Merges lines of all the named (presorted) files together and writes
       the result to the specified output.

   3.  Checks that a single input file is correctly presorted.

  Comparisons are based on one or more sort keys extracted from each line of
  input (or the entire line if no sort keys are specified), and are performed
  using the collating sequence of the current locale.

  The sort command treats all of its input files as one file when it performs
  the sort.  A - (dash) in place of a filename specifies standard input.  If
  you do not specify a filename, it sorts standard input.

  The sort command can handle a variety of collation rules typically used in
  Western European languages, including primary/secondary sorting, one-to-two
  character mapping, N-to-one character mapping, and ignore-character map-
  ping.	 To summarize briefly:

  Primary/Secondary Sorting

  In this system, a group of characters all sort to the same primary loca-
  tion.	 If there is a tie, a secondary sort is applied.  For example, in
  French, the plain and accented a's all sort to the same primary location.
  If two strings collate to the same primary location, the secondary sort
  goes into effect.  These words are in correct French order:

       abord
       pre
       aprs
       pret
       azur


  One-to-Two Character Mappings

  This system requires that certain single characters be treated as if they
  were two characters.	For example, in German, the  (scharfes-S) is collated
  as if it were ss.

  N-to-One Character Mappings

  Some languages treat a string of characters as if it were one single col-
  lating element.  For example, in Spanish, the ch and ll sequences are
  treated as their own elements within the alphabet.  (ch comes between c and
  d in the alphabet, and ll comes between l and m.)




  Ignore-Character Mappings

  In some cases, certain characters may be ignored in collation.  For exam-
  ple, if - were defined as an ignore-character, the strings re-locate and
  relocate would sort to the same place.

  The results that you get from sort depend on the collating sequence as
  defined by the current setting of the LC_COLLATE environment variable.  The
  configuration files for collation and character classification information
  are /usr/lib/nls/loc/src/locale.src.

  A field is one or more characters bounded by the beginning of a line and
  the current field separator, or one or more characters bounded by a field
  separator on either side.  The space character is the default field separa-
  tor.

  Lines longer than 1024 bytes are truncated by sort.  The maximum number of
  fields on a line is 10.

EXAMPLES

  The following examples apply to the C locale, unless it is specifically
  stated otherwise.

   1.  To perform a simple sort, enter:
	    sort  fruits


       This displays the contents of fruits sorted in ascending lexicographic
       order.  This means that the characters in each column are compared one
       by one, including spaces, digits, and special characters.

       For instance, if fruits contains the text:
	    banana
	    orange
	    Persimmon
	    apple
	    %%banana
	    apple
	    ORANGE


       then sort fruits displays:
	    %%banana
	    ORANGE
	    Persimmon
	    apple
	    apple
	    banana
	    orange


       This order follows from the fact that in the ASCII collating sequence,
       symbols (such as %) precede uppercase letters, and all uppercase
       letters precede the lowercase letters.  If you are using a different
       collating order, your results may be different.

   2.  To group lines that contain uppercase and special characters with
       similar lowercase lines, and remove duplicate lines, enter:
	    sort  -d  -f  -u  fruits


       The -u flag tells sort to remove duplicate lines, making each line of
       the file unique.	 This displays:
	    apple
	    %%banana
	    orange
	    Persimmon


       Note that not only was the duplicate apple removed, but banana and
       ORANGE were removed as well.  The -d flag told sort to ignore symbols,
       so %%banana and banana were considered to be duplicate lines and
       banana was removed.  The -f flag told sort not to differentiate
       between uppercase and lowercase, so ORANGE and orange were considered
       to be duplicate lines and ORANGE was removed.

       When the -u flag is used with input that contains nonidentical lines
       that are considered by sort (due to other flags) to be duplicates,
       there is no way to predict which lines sort will keep and which it
       will remove.

   3.  To sort as in Example 2, but remove duplicates unless capitalized or
       punctuated differently, enter:
	    sort -u -k 1df -k 1	 fruits


       Flags appearing between sort key specifiers apply only to the specif-
       ier preceding them.  There are two sorts specified in this command
       line.  The -k 1df argument specifies the first sort, of the same type
       done with -d -f in Example 3.  Then -k 1 performs another comparison
       to distinguish lines that are not actually identical.  This prevents
       -u, which applies to both sorts because it precedes the first sort key
       specifier, from removing lines that are not exactly identical to other
       lines.

       Given the fruits file shown in Example 1, the added -k 1 distinguishes
       %%banana from banana and ORANGE from orange.  However, the two
       instances of apple are exactly identical, so one of them is deleted.
	    apple
	    %%banana
	    banana
	    ORANGE
	    orange
	    Persimmon


   4.  To specify a new field separator, enter:
	    sort  -t : -k 2  vegetables


       This sorts vegetables, comparing the text that follows the first colon
       on each line.  The -t : option tells sort that colons separate fields.
       The -k 2 argument tells sort to ignore the first field and to compare
       from the start of the second field to the end of the line.  If veget-
       ables contains:
	    yams:104
	    turnips:8
	    potatoes:15
	    carrots:104
	    green beans:32
	    radishes:5
	    lettuce:15


       then sort -t : -k 2 vegetables displays:
	    carrots:104
	    yams:104
	    lettuce:15
	    potatoes:15
	    green beans:32
	    radishes:5
	    turnips:8


       Note that the numbers are not in ascending order.  This is because a
       lexicographic sort compares each character from left to right.  In
       other words, 3 comes before 5 so 32 comes before 5.

   5.  To sort on more than one field, enter:
	    sort  -t : -k 2n  -k 1r  vegetables


       This performs a numeric sort on the second field (-k 2n) and then,
       within that ordering, sorts the first field in reverse collating order
       (-k 1r).	 The output looks like this:
	    radishes:5
	    turnips:8
	    potatoes:15
	    lettuce:15
	    green beans:32
	    yams:104
	    carrots:104


       The lines are sorted in numeric order; when two lines have the same
       number, they appear in reverse collating order.

   6.  To replace the original file with the sorted text, enter:
	    sort  -o  vegetables  vegetables


       The -o vegetables flag stores the sorted output into the file veget-
       ables.

   7.  To collate using Spanish rules, set the LC_COLLATE (or LANG) environ-
       ment variable to a Spanish locale, and then use sort in the regular
       way, enter:
	    sort sp.words


       If an input file named sp.words contains the following Spanish words:
	    dama
	    loro
	    chapa
	    canto
	    mover
	    chocolate
	    curioso
	    llanura


       The sorted file looks like this:
	    canto
	    curioso
	    chapa
	    chocolate
	    dama
	    loro
	    llanura
	    mover


       If you sort the file in the default C locale, the output looks like
       this:
	    canto
	    chapa
	    chocolate
	    curioso
	    dama
	    llanura
	    loro
	    mover


FILES

  /usr/lib/nls/loc/src/locale.src
		    Configuration files.

EXIT VALUES

  The sort command returns the following exit values:

  0   All input files were output successfully, or -c was specified and the
      input file was correctly sorted.

  1   Under the -c flag, the file was not ordered as specified, or if the -c
      and -u flags were both specified, two input lines were found with equal
      keys.

  >1  An error occurred.

RELATED INFORMATION

  Commands:  comm(1), join(1), uniq(1).

  Functions: setlocale(3)

  Files:  locale(4).