yacc(1)								      yacc(1)

  yacc - Generates an LR(1) parsing program from input consisting of a
  context-free grammar specification


  yacc [-vltds] [-b prefix] [-N number] [-p symbol_prefix] [-P pathname]

  The yacc command converts a context-free grammar specification into a set
  of tables for a simple automaton that executes an LR(1) parsing algorithm.


  -b prefix
      Uses prefix instead of y as the prefix for all output filenames
      (prefix.tab.c, prefix.tab.h, and prefix.output).

  -d  Produces the y.tab.h file, which contains the #define statements that
      associate the yacc-assigned token codes with your token names.  This
      allows source files other than y.tab.c to access the token codes by
      including this header file.

  -l  Includes no #line constructs in y.tab.c.	Use this only after the gram-
      mar and associated actions are fully debugged.

  -N number
      Provides yacc with extra storage for building its LALR tables, which
      may be necessary when compiling very large grammars.  Thenumber should
      be larger than 40,000 when you use this flag.

  -p symbol_prefix
      Allows multiple yacc parsers to be linked together.  Use symbol_prefix
      instead of yy to prefix global symbols.

  -P pathname
      Specifies an alternative parser (instead of /usr/ccs/lib/yaccpar).  The
      pathname specifies the filename of the skeleton to be used in place of

  -s  Breaks the yyparse() function into several smaller functions.  Because
      its size is somewhat proportional to that of the grammar, it is possi-
      ble for yyparse() to become too large to compile, optimize, or execute

  -t  Compiles run-time debugging code.	 By default, this code is not
      included when y.tab.c is compiled.  If YYDEBUG has a nonzero value, the
      C compiler (cc) includes the debugging code, whether or not the -t flag
      was used.	 Without compiling this code, yyparse() will run more

  -v  Produces the y.output file, which contains a readable description of
      the parsing tables and a report on conflicts generated by grammar


  The yacc grammar can be ambiguous; specified precedence rules are used to
  break ambiguities.

  You must compile the y.tab.c output file with a C language compiler to pro-
  duce the yyparse() function.	This function must be loaded with a yylex
  lexical analyzer function, as well as main() and yyerror(), an error-
  handling routine (you must provide these routines).  The lex command is
  useful for creating lexical analyzers usable by yacc.

  The yacc program reads its skeleton parser from the file
  /usr/ccs/lib/yaccpar.	 Use the environment variable PARSER to specify
  another location for yacc to read from.

  Syntax for yacc Input

  This section contains a formal description of the yacc input file (or gram-
  mar file), which is normally named with a .y suffix.	The section provides
  a listing of the special values, macros, and functions recognized by yacc.

  The general format of the yacc input file is:

       [ definitions ]
       [ rules ]
       [ %%
       [ user functions ] ]


	    Is the section where you define the variables to be used later in
	    the grammar, such as in the rules section.	It is also where
	    files are included (#include) and processing conditions are
	    defined.  This section is optional.

  rules	    Is the section that contains grammar rules for the parser.	A
	    yacc input file must have a rules section.

  user functions
	    Is the section that contains user-supplied functions that can be
	    used by the actions in the rules section.  This section is

  The NULL character must not be used in grammar rules or literals.  Each
  line in the definitions can be:


  %}	    When placed on lines by themselves, these enclose C code to be
	    passed into the global definitions of the output file.  Such
	    lines commonly include preprocessor directives and declarations
	    of external variables and functions.

  %token [type] token [number] [name [number]...
	    Lists tokens or tty symbols to be used in the rest of the input
	    file.  This line is needed for tokens that do not appear in other
	    % definitions. If type is present, the C type for all tokens on
	    this line is declared to be the type referenced by type. If a
	    positive integer number follows a token, that value is assigned
	    to the token.

  %left [] token [ number][name[number]]...
	    Indicates that each token is an operator, that all tokens in this
	    definition have equal precedence, and that a succession of the
	    operators listed in this definition are evaluated left to right.

  %right  [] token [number] [name	 [number]]...
	    Indicates that each token is an operator, that all tokens in this
	    definition have equal precedence, and that a succession of the
	    operators listed in this definition are evaluated right to left.

  %nonassoc [] name [ number ] [name  [ number]]...
	    Indicates that each token is an operator, and that the operators
	    listed in this definition cannot appear in succession. Indicates
	    that the token cannot be used associatively.

  %start symbol
	    Indicates the highest-level production rule to be reduced; in
	    other words, the rule where the parser can consider its work done
	    and terminate.  If this definition is not included, the parser
	    uses the first production rule.  The symbol must be non-terminal
	    (not a token).

  %type < type > symbol [ symbol ... ]
	    Defines each symbol as data type type, to resolve ambiguities. If
	    this construct is present, yacc performs type checking and other-
	    wise assumes all symbols to be of type integer.

  %union union-def
	    Defines the yylval global variable as a union, where union-def is
	    a standard C definition in the format:
		 { type member ; [ type member ; ... ] }

	    At least one member should be an int.  Any valid C data type can
	    be defined, including structures.  When you run yacc with the -d
	    option, the definition of yylval is placed in the y.tab.h file
	    and can be referred to in a lex input file.

  Every token (non-terminal symbol) must be listed in one of the preceding %
  definitions.	Multiple tokens can be separated by white space or commas.
  All the tokens in %left, %right, and %nonassoc definitions are assigned a
  precedence with tokens in later definitions having precedence over those in
  earlier definitions.

  In addition to symbols, a token can be literal character enclosed in single
  quotes.  (Multibyte characters are recognized by the lexical analyzer and
  returned as tokens.) The following special characters can be used, just as
  in C programs:

  \a Alert

  \n Newline

  \t Tab

  \v Vertical tab

  \r Carriage Return

  \b Backspace

  \f Form Feed

  \\ Backslash

  \' Single Quote

  \? Question mark

  \n One or more octal digits specifying the integer value of the character

  The rules section consists of a series of production rules that the parser
  tries to reduce.  The format of each production rule is:

symbol : symbol-sequence [ action ] [ | symbol-sequence [ action ] ... ]  ;

  where symbol-sequence consists of zero or more symbols separated by white
  space.  The first symbol must be the first character of the line, but new-
  lines and other white space can appear anywhere else in the rule.  All ter-
  minal symbols must be declared in %token definitions.

  Each symbol-sequence represents an alternative way of reducing the rule.  A
  symbol can appear recursively in its own rule.  Always use left-recursion
  (where the recursive symbol appears before the terminating case in

  The specific sequence:

       %prec token

  indicates that the current sequence of symbols is to be preferred over oth-
  ers, at the level of precedence assigned to token in the definitions sec-

  The specially defined token error matches any unrecognized sequence of
  input.  This token causes the parser to invoke the yyerror function.	By
  default, the parser tries to synchronize with the input and continue pro-
  cessing it by reading and discarding all input up to the symbol following
  error.  (You can override this behavior through the yyerrok action.)	If no
  error token appears in the yacc input file, the parser exits with an error
  message upon encountering unrecognized input.

  The parser always executes action after encountering the symbol that pre-
  cedes it.  Thus, an action can appear in the middle of a symbol-sequence,
  after each symbol-sequence, or after multiple instances of symbol-sequence.
  In the last case, action is executed when the parser matches any of the

  The action consists of standard C code within braces and can also take the
  following values, variables, and keywords.

  yylval    If the token returned by the yylex function is associated with a
	    significant value, yylex should place the value in this global
	    variable.  By default, yylval is of type int.  The definitions
	    section can include a %union definition to associate with other
	    data types, including structures.  If you run yacc with the -d
	    option, the full yylval definition is passed into the y.tab.h
	    file for access by lex

  yyerrok   Causes the parser to start parsing tokens immediately after an
	    erroneous sequence, instead of performing the default action of
	    reading and discarding tokens up to a synchronization token.  The
	    yyerrok action should appear immediately after the error token.

  $ [  ] n
	    Refers to symbol n, a token index in the production, counting
	    from the beginning of the production rule, where the first symbol
	    after the colon is $1.  The type variable is the name of one of
	    the union lines listed in the %union directive in the declaration
	    section.  The  syntax (non-standard) allows the value to be
	    cast to a specific data type.  Note that you will rarely need to
	    use the type syntax.

  $ [  ] $
	    Refers to the value returned by the matched symbol-sequence and
	    used for the matched symbol when reducing other rules.  The
	    symbol-sequence generally assigns a value to $$.  The type vari-
	    able is the name of one of the union lines listed in the %union
	    directive in the declaration section.  The  syntax (non-
	    standard) allows the value to be cast to a specific data type.
	    Note that you will rarely need to use the type syntax.

  The user functions section contains user-supplied programs.  If you supply
  a lexical analyzer (yylex) to the parser, it must be contained in the user
  functions section.

  The following functions, which are contained in the user functions section,
  are invoked within the yyparse function generated by yacc.

  yylex()   The lexical analyzer called by yyparse to recognize each token of
	    input.  Usually this function is created by lex.  yylex reads
	    input, recognizes expressions within the input, and returns a
	    token number representing the kind of token read.  The function
	    returns an int value.  A return value of 0 (zero) means the end
	    of input.

	    If the parser and yylex do not agree on these token numbers,
	    reliable communication between them cannot occur. For (one char-
	    acter) literals, the token is simply the numeric value of the
	    character in the current character set. The numbers for other
	    tokens can either be chosen by yacc, or by the user. In either
	    case, the #define construct of C is used to allow yylex () to
	    return these numbers symbolically. The #define statements are put
	    into the code file, and the header file if that file is
	    requested. The set of characters permitted by yacc in an identif-
	    ier is larger than that permitted by C. Token names found to con-
	    tain such characters will not be included in the #define declara-

	    If the token numbers are chosed by yacc, the tokens other than
	    literals, are assigned numbers greater than 256, although no
	    order is implied. A token can be explicitly assigned a number by
	    following its first appearance in the declaration section with a
	    number. Names and literals not defined this way retain their
	    default definition. All assigned token numbers are unique and
	    distinct from the token numbers used for literals.If duplicate
	    token numbers cause conflicts in parser generation, yacc reports
	    an error; otherwise, it is unspecified whether the token assign-
	    ment is accepted or an error is reported.

	    The end of the input is marked by a special token called the end-
	    marker that has a token number that is zero or negative. All lex-
	    ical analyzers return zero or negative as a token number upon
	    reaching the end of their input. If the tokens up to, but not
	    excluding, the endmarker form a structure that matches the start
	    symbol, the parser accepts the input.  If the endmarker is seen
	    in any other context, it is considered an error.

	    The function that the parser calls upon encountering an input
	    error.  The default function, defined in liby.a, simply prints
	    string to the standard error.  The user can redefine the func-
	    tion.  The function's type is void.

  The liby.a library contains default main() and yyerror() functions.  These
  look like the following, respectively:

	    setlocale(LC_ALL, "");
	    (void) yyparse();

       int yyerror(s);
	    char *s;
	    return (0);

  Comments, in C syntax, can appear anywhere in the user functions or defini-
  tions sections.  In the rules section, comments can appear wherever a sym-
  bol is allowed.  Blank lines or lines consisting of white space can be
  inserted anywhere in the file, and are ignored.


  This section describes the example programs for the lex and yacc commands,
  which together create a simple desk calculator program that performs addi-
  tion, subtraction, multiplication, and division operations.  The calculator
  program also allows you to assign values to variables (each designated by a
  single lowercase ASCII letter), and then use the variables in calculations.
  The files that contain the program are as follows:

      The lex specification file that defines the lexical analysis rules.

      The yacc grammar file that defines the parsing rules and calls the
      yylex() function created by lex to provide input.

  The remaining text expects that the current directory is the directory that
  contains the lex and yacc example program files.

  Compiling the Example Program

  Perform the following steps to create the example program using lex and

   1.  Process the yacc grammar file using the -d flag.	 The -d flag tells
       yacc to create a file that defines the tokens it uses in addition to
       the C language source code.
	    yacc -d calc.y

   2.  The following files are created (the *.o files are created temporarily
       and then removed):

	   The C language source file that yacc created for the parser.

	   A header file containing #define statements for the tokens used by
	   the parser.

   3.  Process the lex specification file:
	    lex calc.l

   4.  The following file is created:

	   The C language source file that lex created for the lexical

   5.  Compile and link the two C language source files:
	    cc -o calc y.tab.c lex.yy.c

   6.  The following files are created:

	   The object file for y.tab.c.

	   The object file for lex.yy.c.

	   The executable program file.

       You can then run the program directly by entering:


       Then enter numbers and operators in calculator fashion.	After you
       press , the program displays the result of the operation.  If
       you assign a value to a variable as follows, the cursor moves to the
       next line:

       You can then use the variable in calculations and it will have the
       value assigned to it:

  The Parser Source Code

  The text that follows shows the contents of the file calc.y.	This file has
  entries in all three of the sections of a yacc grammar file:	declarations,
  rules, and programs.


       int regs[26];
       int base;


       %start list

       %token DIGIT LETTER

       %left '|'
       %left '&'
       %left '+' '-'
       %left '*' '/' '%'
       %left UMINUS /*supplies precedence for unary minus */

       %%      /*beginning of rules section */

       list    :       /*empty */
	       |       list stat '\n'
	       |       list error '\n'
		       {       yyerrok;	       }

       stat    :       expr
		       {       printf("%d\n",$1);      }
	       |       LETTER '=' expr
		       {       regs[$1] = $3;  }

       expr    :       '(' expr ')'
	       {       $$ = $2;	       }
	       |       expr '*' expr
		       {       $$ = $1 * $3;   }
	       |       expr '/' expr
	       {       $$ = $1 / $3;   }
	       |       expr '%' expr
		       {       $$ = $1 % $3;   }
	       |       expr '+' expr
		       {       $$ = $1 + $3;   }
	       |       expr '-' expr
		       {       $$ = $1 - $3;   }
	       |       expr '&' expr
		       {       $$ = $1 & $3;   }
	       |       expr '|' expr
		       {       $$ = $1 | $3;   }
	       |       '-' expr %prec UMINUS
		       {       $$ = -$2;       }
	       |       LETTER
		       {       $$ = regs[$1];  }
	       |       number

       number  :       DIGIT
		       {       $$ = $1; base = ($1==0) ? 8:10; }
	       |       number  DIGIT
		       {       $$ = base * $1 + $2;    }


       char *s;


  Declarations Section

  This section contains entries that perform the following functions:

    +  Includes standard I/O header file.

    +  Defines global variables.

    +  Defines the list rule as the place to start processing.

    +  Defines the tokens used by the parser.

    +  Defines the operators and their precedence.

  Rules Section

  The rules section defines the rules that parse the input stream.

  Programs Section

  The programs section contains the following routines.	 Because these rou-
  tines are included in this file, you do not need to use the yacc library
  when processing this file.

  main()     The required main program that calls yyparse() to start the pro-

  yyerror(s) This error handling routine only prints a syntax error message.

  yywrap()   The wrap-up routine that returns a value of 1 when the end of
	     input occurs.

  The Lexical Analyzer Source Code

  This shows the contents of the file calc.lex.	 This file contains include
  statements for standard input and output, as well as for the y.tab.h file.
  The yacc program generates that file from the yacc grammar file informa-
  tion, if you use the -d flag with the yacc command.  The file y.tab.h con-
  tains definitions for the tokens that the parser program uses.  In addi-
  tion, calc.lex contains the rules used to generate the tokens from the
  input stream.


       #include "y.tab.h"
       int c;
       extern YYSTYPE yylval;
       " "     ;
       [a-z]   {
		       c = yytext[0];
		       yylval = c - 'a';
       [0-9]   {
		       c = yytext[0];
		       yylval = c - '0';
       [^a-z 0-9]      {
		       c = yytext[0];


  y.output   A readable description of parsing tables and a report on con-
	     flicts generated by grammar ambiguities.

  y.tab.c    Output file.

  y.tab.h    Definitions for token names.

  yacc.tmp   Temporary file.

  yacc.debug Temporary file.

  yacc.acts  Temporary file.

	     Default skeleton parser for C programs.

	     yacc library.


  Commands:  lex(1).

  Programming Support Tools