lex(1)								       lex(1)



NAME
  lex - Generates a C Language program that matches patterns for simple lexi-
  cal analysis of an input stream

SYNOPSIS

  lex [-cnrtv] [-V] [-Qy|-Qn] [file ...]

  The lex command reads file or standard input, generates a C Language pro-
  gram, and writes it to a file named lex.yy.c, a compilable C Language pro-
  gram.

FLAGS

  If the environment variable CMD_ENV is set to svr4, all flags listed in the
  synopsis are legal. Otherwise n, t, v are the only legal flags, and they
  may be upper or lower case.

  -c  Writes C code to the file lex.yy.c. This is the default.

  -n  Suppresses the statistics summary.  When you set your own table sizes
      for the finite state machine, lex automatically produces this summary
      if you do not select this flag.

  -r  Writes RATFOR code to the file lex.yy.r. Note: there is no RATFOR com-
      piler for DEC OSF/1.

  -t  Writes to standard output instead of to a file.

  -v  Provides a summary of the generated finite state machine statistics.

  -V  Outputs lex version number to standard error. Requires the environment
      variable CMD_ENV to be set to svr4.

  -Q[y|n]
      Determines whether the lex version number is written to the output
      file.  -Qn does not do so, and is the default.  Requires the environ-
      ment variable CMD_ENV to be set to svr4.

DESCRIPTION

  The lex command uses the rules and actions contained in file to generate a
  program, lex.yy.c, which can be compiled with the cc command.	 That program
  can then receive input, break the input into the logical pieces defined by
  the rules in file, and run program fragments contained in the actions in
  file.

  The generated program is a C Language function called yylex().  The lex
  command stores yylex() in a file named lex.yy.c.  You can use yylex() alone
  to recognize simple, 1-word input, or you can use it with other C Language
  programs to perform more difficult input analysis functions.	For example,
  you can use lex to generate a program that tokenizes an input stream before
  sending it to a parser program generated by the yacc command.

  The yylex() function analyzes the input stream using a program structure
  called a finite state machine.  This structure allows the program to exist
  in only one state (or condition) at a time.  There is a finite number of
  states allowed.  The rules in file determine how the program moves from one
  state to another based on the input the program receives.

  The lex command reads its skeleton finite state machine from the file
  /usr/ccs/lib/ncform. Use the environment variable LEXER to specify another
  location for lex to read from.

  If you do not specify a file, lex reads standard input.  It treats multiple
  files as a single file.

  Input File Format

  The input file can contain three sections:  definitions, rules, and user
  subroutines.	Each section must be separated from the others by a line con-
  taining only the delimiter, %%.  The format is as follows:

       definitions
       %%
       rules
       %%
       user_subroutines

  The purpose and format of each are described in the following sections.

  Definitions

  If you want to use variables in rules, you must define them in this sec-
  tion.	 The variables make up the left column, and their definitions make up
  the right column.  For example, to define D as a numerical digit, enter:

       D       [0-9]


  You can use a defined variable in the rules section by enclosing the vari-
  able name in braces, {D}.

  In the definitions section, you can also set table sizes for the resulting
  finite state machine.	 The default sizes are large enough for small pro-
  grams.  You may want to set larger sizes for more complex programs.

  %p  number
	  Number of positions is number (default 5000)

  %n  number
	  Number of states is number (default 2500)

  %e  number
	  Number of parse tree nodes is number (default 2000)

  %a  number
	  Number of transitions is number (default 5000)

  %k  number
	  Number of packed character classes is number (default 1000)

  %o  number
	  Number of output slots is number (default 5000)

  If extended characters appear in regular expression strings, you may need
  to reset the output array size with the %o parameter (possibly to array
  sizes in the range 10,000 to 20,000).	 This reset reflects the much larger
  number of characters relative to the number of ASCII characters.

  Rules

  Once you have defined your terms, you can write the rules section.  In this
  section, the left column contains the pattern to be recognized in an input
  file to yylex().  The right column contains the C program fragment executed
  when that pattern is recognized.  This section is required, and it must be
  preceded by the %% delimiter, whether or not you have a definitions sec-
  tion.	 The lex command does not recognize rules without this delimiter.

  Patterns can include extended characters with one exception: these charac-
  ters may not appear in range specifications within character class expres-
  sions surrounded by brackets.

  The columns are separated by a tab.  For example, to search files for the
  word LEAD and replace it with GOLD, perform the following steps:

  Create a file called transmute.l containing the lines:

       %%
       (LEAD)  printf("GOLD");


  Then issue the following commands to the shell:

       lex transmute.l
       cc -o transmute lex.yy.c -ll


  You can test the resulting program with the command:

       transmute