Jacc Grammar File Format

This describes the format we use to specify a grammar pointing out similarities and differences with UNIX' yacc.

A grammar file must be of the form:

The <declarations> and <classes> sections may be empty. If the <classes> section is empty, the second %% may be omitted. Anywhere in a grammar file, Java comments (i.e., C-style or C++-style) may be used and are accordingly ignored. Javadoc-style comments are also supported and may be used for documenting a grammar (invoking Jacc with the -doc option).

What goes in each section is detailed next (see also the sorted index giving the complete listing of all the Jacc commands).

The <declarations> section:

This contains commands to define various parameters of the grammar. Each command is of the form %<command> and usually, but not necessarily, appears at the start of a line. Jacc supports most of the traditional yacc commands; namely:

The foregoing commands essentially behave as they do in yacc. The only yacc command not supported is %union which does not make sense for Java. This also means that the use of union member tags in %token, %left, %right, and %nonassoc is not supported. Note that the yacc %type command is not implemented; it is rendered useless by the %nodeclass command, which achieves similar, and more, effects.

In addition to the commands above, the following new commands are available:

The <rules> section:

This contains the grammar rules. As in yacc, a rule is of the form:
             <head> : <body> ;
where <head> is a non-terminal symbol of the grammar, and <body> is a (possibly empty) sequence of (terminal or non-terminal) grammar symbols, or semantic actions. A terminal symbol is an identifier that has been defined in the <declarations> part using the %token command, or any character sequence between single or double quotes. A non-terminal symbol is any identifier appearing in a rule's <head>, or in a rule's <body>, but is not a terminal symbol. A semantic action is anything of the form { <Java statements> }.

As in yacc, the short hand:

             <head> : <body>1 | ... | <body>n ;
may be used instead of:
             <head> : <body>1 ;
             <head> : ...
             <head> : <body>n ;

As in yacc, so-called pseudo-variables of the form $$, $1, ..., $n, ..., where n is a number, may be used in a semantic action's statements. As in yacc again, $$ stands for the value returned by the rule's semantic action, and $n stands for the value returned by the parse of the n-th element in the rule's <body> to the left of the semantic action where it appears. Also as in yacc, if n is 0 or negative, the pseudo-variable refers to the value returned by a rule recognized before this rule ($0 for the value of the rule immediately preceding, $-1 for the one before that, etc...). The value returned by a rule is selected as follows:

As in yacc, the precedence and associativity associated to a rule are, by default, those of the rightmost terminal symbol occurring in it. If the rule contains no terminal symbol, its precedence is set to to the minimum value and its associativity to none. In order to override this default behavior, the %prec command may be used (at most once) anywhere in the body of a rule. One simply uses %prec token (or %prec looseness specifier) to confer to the rule the precedence and associativity of a specific token token (or using a looseness number and a specifier as for defining dynamic operators).

The <classes> section:

This contains Java code needed by the parser. In contrast with the code specified in the <declarations> section by %{...%} or %include, this section's code is not part of the generated parser's class. So this section is where ancillary classes implementing objects needed by this parser are specified. Only the %include command is recognized in this section and can be used to specify one or many files. Accordingly, the contents of the specified file(s) will not be part of the generated parser's class. Anything else in this section is copied verbatim.

The type of the value returned by a semantic action is ParseNode. This class is the type of objects pushed on the semantic evaluation stack. It comes with two predefined attributes that may be used in a rule's semantic action: a numeric one (nvalue) of type double, and a symbolic one (svalue) of type String. If attributes of other types are needed, one may use the %nodeclass command for the appropriate symbol, which has for effect to make the type of the value it returns a subclass of ParseNode containing the desired class members.

Hassan Aït-Kaci
© all rights reserved by the author
Last modified on Wed Jan 30 07:43:55 2013 by hak