% Copyright 2012-2024, Alexander Shibakov % Copyright 2002-2014 Free Software Foundation, Inc. % This file is part of SPLinT % % SPLinT is free software: you can redistribute it and/or modify % it under the terms of the GNU General Public License as published by % the Free Software Foundation, either version 3 of the License, or % (at your option) any later version. % % SPLinT is distributed in the hope that it will be useful, % but WITHOUT ANY WARRANTY; without even the implied warranty of % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the % GNU General Public License for more details. % % You should have received a copy of the GNU General Public License % along with SPLinT. If not, see . \ifbootstrapmode % this is a bootstrap run to extract the states \message{bootstrapping \jobname.tex ...}% \input limbo.sty \def\optimization{5} \input yy.sty \modebootstrap \fi @**The \eatone{bison}\bison\ parser stack. {% \newdimen\halfhsize \newdimen\preskip \halfhsize=\hsize \divide\halfhsize by2 \def\mypar{% \parshape 6 0pt \hsize 0pt \hsize 0pt \hsize 0pt \hsize 0pt \hsize \halfhsize \halfhsize }% The input language for \bison\ loosely follows the {\sc BNF} notation, with a few enhancements, such as the syntax for {\em actions}, to implement the syntax-directed translation@^syntax-directed translation@>, as well as various declarations for tokens, nonterminals, etc. On the one hand, the language is relatively easy to handle, is nearly whitespace agnostic, on the other, a primitive parser is required for some basic setup even at a very early stage, so the design must be carefully thought out. This {\em bootstrapping\/}@^bootstrapping@> step is discussed in more detail further down. The path chosen here is by no means optimal. What it lacks in efficiency, though, it may amply gain in practicality, as we reuse the original grammar used by \bison\ to produce the parser(s) for both pretty printing and bootstrapping. Some minor subtleties arising from this approach are explained in later sections. As was described in the \ifbootstrapmode\else\locallink{parser.stacks} discussion of parser stacks \endlink\fi@^parser stack@> above, to pretty print a variety of grammar fragments, one may employ a {\em parser stack\/} derived from the original grammar. The most common unit of a \bison\ grammar is a set of productions. It is thus natural to begin our discussion of the parsers in the \bison\ stack with the parser responsible for processing individual rules. One should note that the productions below are not directly concerned with the typesetting of the grammar. Instead, this task is delegated to the macros in \.{yyunion.sty} and its companions. The first pass of the parser merely constructs an `executable abstract syntax tree' (or \EAST\footnote{One may argue that \EAST\ is still merely a syntactic construct requiring a proper macro framework for its execution and should be called a `weak executable syntax tree' or \WEST. This acronym extravagnza is heading south so we shall stop here.}) which can serve very diverse purposes: from collecting token declarations in the boostrapping pass to typesetting the grammar rules. This allows for a great deal of flexibility in where and when the parsing results are used. A clear divide between the parsing step and the typesetting step provides for better debugging facilities, as well as more reliable macro design. It would be impossible to completely avoid the question of the visual presentation of the \bison\ input, however. It has already been pointed out that the syntax adopted by \bison\ is nearly insensitive to whitespace. This makes {\em writing\/} \bison\ grammars easier. On the other hand, {\em presenting\/} a grammar is best done using a variety of typographic devices that take advantage of the meaningful positioning of text on the page: skips, indents, etc. Therefore, the macros for \bison\ pretty printing trade a number of \bison\ syntax elements (such as \.{\yl}, \.{;}, action braces, etc.) for the careful placement of each fragment of the input on the page. The syntax tree generated by the parsers in the \bison\ stack is not fully {\em faithful\/} in that it does not preserve every syntactic element from the original input. Thus, e.g.\ optional semicolons (\prodstyle{semi.opt}) never find their way into the tree and their original position is lost\footnote{The opposite is true about the {\em whitespace\/} the parser sees (or {\em stash\/} as it is called in this document): all of it is carefully packaged into streams, as was described \locallink{parser.streams}earlier\endlink.}. Let's take a short break for a broad overview of the input file. The basic structure is that of an ordinary \bison\ file that produces plain \Cee\ output. The \Cee\ actions, however, are programmed to output \TeX. The \bison\ sections (separated by \.{\%\%} (shown (pretty printed) as \prodstyle{\%\%} below)) appear between the successive dotted lines. A number of sections are empty, since the generated \Cee\ is rather trivial. }% %\checktabletrue @(bg.yy@>= @G Switch to generic mode. %{@> @ @=%} @> @ @= %union {@> @ @=} %{@> @ @=%} @> @ @= %% @> @ @= @> @ @= @> @ @= %% @g @*1 Bootstrapping. %\checktablefalse Bootstrap\namedspot{bootstrapping}@^bootstrapping@> parser is defined next. The purpose of the bootstrapping parser is to collect a minimal amount of information to `spool up' the `production' parsers. To understand its inner workings and the reasons behind it, consider what happens following a declaration such as \.{\%token TOKEN "token"} (or, as it would be typeset by the macros in this package `\prodstyle{\%token} \.{TOKEN} \.{token}'; see the index entries for more details)% \idxinline{TOKEN}\idxinline{token}. The two names for the same token are treated very differently. \.{TOKEN} becomes an |enum| constant in the \Cee\ parser generated by \bison. Even when that parser becomes part of the `driver' program that outputs the \TeX\ version of the parser tables, there is no easy way to output the {\it names\/} of the appropriate |enum| constants. The other name (\.{"token"}) becomes an entry in the |yytname| array. These names can be output by either the `driver' or \TeX\ itself after the \.{\\yytname} table has been input. The scanner, on the other hand, will use the first version (\.{TOKEN}). Therefore, it is important to establish an equivalence between the two versions of the name. In the `real' parser, the token values are output in a special header file. Hence, one has to either parse the header file to establish the equivalences or find some other means to find out the numerical values of the tokens. One approach is to parse the file containing the {\it declarations\/} and extract the equivalences between the names from it. This is precisely the function of the bootstrap parser. Since the lexer is reused, some token values need to be known in advance (and the rest either ignored or replaced by some `made up' values). These tokens are `hard coded' into the parser file generated by \bison\ and output using a special function. The switch `|@[#define@]@; BISON_BOOTSTRAP_MODE|' tells the `driver' program to output the hard coded token values. @q Bizarre looking way of typing #define is due to the awkward way@> @q \CWEB\ treats switching in and out of $-mode in inline \Cee@> Note that the equivalence of the two versions of token names would have to be established every time a `string version' of a token is declared in the \bison\ file and the `macro name version' of the token is used by the corresponding scanner. To establish this equivalence, however, the bootstrapping parser below is not always necessary (see the \.{xxpression} example, specifically, the file \.{xxpression.w} in the \.{examples} directory for an example of using a different parser for this purpose). The reason it is necessary here is that a parser for an appropriate subset of the \bison\ syntax is not yet available (indeed, {\it any\/} functional parser for a \bison\ syntax subset would have to use the same scanner (unless you want to write a custom scanner for it), which would need to know how to output tokens, for which it would need a parser for a subset of \bison\ syntax $\ldots$ it is a genuine `chicken and egg' problem). Hence the need for `bootstrap'. Once a functional parser for a large enough subset of the \bison\ input grammar is operational, {\it it\/} can be used to pair up the token names. The bootstrap parser is not strictly minimal in that it is also capable of parsing the \prodstyle{\%nterm} declarations. This ability is not utilized by the parsers in \splint, however (nor is the accompanying bootstrap lexer designed to output the \prodstyle{\%nterm} tokens), and was added for the scenarios other than bootstrapping. The second, perhaps even more important function of the bootstrap process is to collect information about the scanner's states. The mechanism is slightly different from that for token definition gathering. While the token equivalences are collected purely in `\TeX\ mode', the bootstrap mode parser collects all the state names into a special \Cee\ header file. The reason is simple: unlike the token values, the numerical values of the scanner states are not passed to the `driver' program in any data structure (the |yytname| array) and are instead defined as ordinary (\Cee) macros. The header file is the information the `driver' file needs to output the state values for the use by the lexer. Naturally, to accomplish their task, the lexer and the parser emplyed in state gathering need the state and token information, as well. Fortunately, the parser is a subset of the \flex\ input parser that does not define any `string' names for it tokens. Similarly, the lexer collects all the necessary tokens in the \flexsnstyle{INITIAL} state\footnote{An additional subtlety is the necessity to gracefully handle (and, in some cases, cause) the multiple possible {\em failures\/} for which the lexer redefines \inlineTeXx{/yyBEGIN}\ to fail immediately when attempting to switch states. Note that the bootstrap mode parser looks at sections other than those where the declarations reside and must fail quickly and quietly in such cases.}. To reiterate a point made in the middle of this section, the bootstrapping process described here is necessary to `spool up' the \bison\ and \flex\ input parsers. A simpler procedure may be followed while designing other custom parsers where the programmer uses, say the full \bison\ parser to collect information about the token equivalences (whether such information is needed to make the parser operational or just to facilitate the typesetting of the token names). By adding custom `bootstrapping' macros to the ones defined in \.{yyunion.sty}, a number of different preprocessing tasks can be accomplished. @(bb.yy@>= @G Switch to generic mode. %{ @> @ @= @> @/#define BISON_BOOTSTRAP_MODE @= %} @> @ @= %union {@> @ @=} %{@> @ @=%} @> @ @= %% @> @ @= @> @ @= %% @g @*1 Prologue and full parsers. The prologue parser is responsible for parsing various grammar declarations as well as parser options. @(bd.yy@>= @G Switch to generic mode. %{@> @ @=%} @> @ @= %union {@> @ @=} %{@> @ @=%} @> @ @= %% @> @ @= @> @ @= @> @ @= %% @g @ The full \bison\ input parser is used when a complete \bison\ file is expected. It is also capable of parsing a `skeleton' of such a file, similar to the one that follows this paragraph. As a stopgap measure, the skeleton of a \flex\ scanner is also parsed by this parser, as they have an almost identical structure. This is not a perfect arrangement, however, since it precludes one from putting the constructs that this parser does not recognize into the outline. To give an example, one cannot put \flex\ specific options into such `skeleton'. @(bf.yy@>= @G Switch to generic mode. %{@> @ @=%} @> @ @= %union {@> @ @=} %{@> @ @=%} @> @ @= %% @> @ @= @> @ @= @> @ @= @> @ @= %% @g @ \namedspot{bison.options}The first two options below are essential for the parser operation as each of them makes \bison\ produce additional tables (arrays) used in the operation (or bootstrapping) of \bison\ parsers. The start symbol can be set implicitly by listing the appropriate production first. Modern \bison\ also allows specifying the kind of parsing algorithm to be used (provided the supplied grammar is in the appropriate class): {\sc LALR}($n$), {\sc LR}($n$), {\sc GLR}, etc. The default is to use the {\sc LALR}($1$) algorithm (with the corresponding assumption about the grammar) which can also be set explicitly by putting\gtextidx{\bison\ options example}{bison options example}{\bisonidxdomain}% \medskip \beginprod \%define lr.type canonical-lr \endprod \medskip \noindent in with the rest of the options. Using other types of grammars will wreak havoc on the parsing algorithm hardcoded into \splint\ (see \.{yyparse.sty}) as well as on the production of \.{\\stashed} and \.{\\format} streams. @= @G %token-table %debug %start input @g @*1 Token declarations. Most of the original comments present in the grammar file used by \bison\ itself have been preserved and appear in {\it italics\/} at the beginning of the appropriate section. To facilitate the {\it bootstrapping\/} of the parser (see above), some declarations have been separated into their own sections. Also, a number of new rules have been introduced to create a hierarchy of `subparsers' that parse subsets of the grammar. We begin by listing most of the tokens used by the grammar. Only the string versions are kept in the |yytname| array, which, in part is the reason for a special bootstrapping parser as explained earlier. \iffalse \checktrailingstashtrue % see what is left at the end \checktabletrue % display the table \fi @= @G %token GRAM_EOF 0 "end of file" %token STRING "string" %token PERCENT_TOKEN "%token" %token PERCENT_NTERM "%nterm" %token PERCENT_TYPE "%type" %token PERCENT_DESTRUCTOR "%destructor" %token PERCENT_PRINTER "%printer" %token PERCENT_LEFT "%left" %token PERCENT_RIGHT "%right" %token PERCENT_NONASSOC "%nonassoc" %token PERCENT_PRECEDENCE "%precedence" %token PERCENT_PREC "%prec" %token PERCENT_DPREC "%dprec" %token PERCENT_MERGE "%merge" @g @@; @ We continue with the list of tokens below, following the layout of the original parser. \iffalse \checktrailingstashfalse \checktablefalse \fi @= @G %token PERCENT_CODE "%code" PERCENT_DEFAULT_PREC "%default-prec" PERCENT_DEFINE "%define" PERCENT_DEFINES "%defines" PERCENT_ERROR_VERBOSE "%error-verbose" PERCENT_EXPECT "%expect" PERCENT_EXPECT_RR "%expect-rr" PERCENT_FLAG "%" PERCENT_FILE_PREFIX "%file-prefix" PERCENT_GLR_PARSER "%glr-parser" PERCENT_INITIAL_ACTION "%initial-action" PERCENT_LANGUAGE "%language" PERCENT_NAME_PREFIX "%name-prefix" PERCENT_NO_DEFAULT_PREC "%no-default-prec" PERCENT_NO_LINES "%no-lines" PERCENT_NONDETERMINISTIC_PARSER "%nondeterministic-parser" PERCENT_OUTPUT "%output" PERCENT_REQUIRE "%require" PERCENT_SKELETON "%skeleton" PERCENT_START "%start" PERCENT_TOKEN_TABLE "%token-table" PERCENT_VERBOSE "%verbose" PERCENT_YACC "%yacc" ; %token BRACED_CODE "{...}" %token BRACED_PREDICATE "%?{...}" %token BRACKETED_ID "[identifier]" %token CHAR "char" %token EPILOGUE "epilogue" %token EQUAL "=" %token ID "identifier" %token ID_COLON "identifier:" %token PERCENT_PERCENT "%%" %token PIPE "|" %token PROLOGUE "%{...%}" %token SEMICOLON ";" %token TAG "" %token TAG_ANY "<*>" %token TAG_NONE "<>" %token INT "integer" %token PERCENT_PARAM "%param"; @g @*1 Grammar productions. We are ready to describe the top levels of the parse tree. The first `sub parser' we consider is a `full' parser, that is the parser that expects a full grammar file, complete with the prologue, declarations, etc. This parser can be used to extract information from the grammar that is otherwise absent from the executable code generated by \bison. This includes, for example, the `name' part of \.{\$}\.{[}{\rm name}\.{]}. This parser is therefore used to generate the `symbolic switch' to provide support for symbolic term names similar to the `genuine' \bison's \.{\$}\.{[}$\ldots$\.{]} syntax. The action of the parser in this case is simply to separate the accumulated `parse tree' from the auxiliary information carried by the parser on the stack. \saveparseoutputfalse \checktablefalse \tracenamesfalse @= @G @t}\vb{\inline}{@> input: prologue_declarations "%%" grammar epilogue.opt {@> @ @=} ; @g @ @= @[TeX_( "/finishlist{/expandafter/yyfirstoftwo/the/yy(3)}" );@]@; /* complete the list */ @[TeX_( "/table/expandafter{/romannumeral0" );@]@; @[TeX_( " /executelistat{/expandafter/yyfirstoftwo/the/yy(3)}{0}}" );@]@; @ Another subgrammar deals with the syntax of isolated \bison\ rules. This is the most commonly used `subparser' since a rules cluster is the most natural `unit' to include in a \CWEB\ file. @= @G @t}\vb{\inline}{@> input: grammar epilogue.opt {@> @ @=} ; @g @ @= @[TeX_( "/finishlist{/expandafter/yyfirstoftwo/the/yy(1)}" );@]@; /* complete the list */ @[TeX_( "/table/expandafter{/romannumeral0" );@]@; @[TeX_( " /executelistat{/expandafter/yyfirstoftwo/the/yy(1)}{0}}" );@]@; @ The bootstrap parser has a very narrow set of goals: it is concerned with \prodstyle{\%token} declarations only in order to supply the token information to the lexer (since, as noted above, such information is not kept in the |yytname| array). The parser can also parse \prodstyle{\%nterm} declarations but the bootstrap lexer ignores the \prodstyle{\%nterm} token, since the \bison\ grammar does not use one. It also extends the syntax of a \prodstyle{grammar_declaration} by allowing a declaration with or without a semicolon at the end (the latter is only allowed in the prologue). This works since the token declarations have been carefully separated from the rest of the grammar in different \CWEB\ sections. The range of tokens output by the bootstrap lexer is limited, hence most of the other rules are ignored. @= @G @t}\vb{\inline}{@> input: grammar_declarations {@> TeX_( "/table=/yy(1)" ); @=} ; @t}\vb{\resetf}{@> grammar_declarations: symbol_declaration semi.opt {@> @ @=} | grammar_declarations symbol_declaration semi.opt {@> TeX_( "/yy0{/the/yy(1)/the/yy(2)}" ); @=} ; @t}\vb{\inline\flatten}{@> semi.opt: {} | ";" {}; @g @ The following is perhaps the most common action performed by the parser. It is done automatically by the parser code but this feature is undocumented so we supply an explicit action in each case. @= @[TeX_( "/yy0{/the/yy(1)}" );@]@; @ Next comes a subgrammar for processing prologue declarations. Finer differentiation is possible but the `subparsers' described here work pretty well and impose a mild style on the grammar writer. Note that these rules are not part of the official \bison\ input grammar and are added to make the typesetting of `file outlines' (e.g.~|@(bb.yy@>| above) possible. @= @G @t}\vb{\inline}{@> input: prologue_declarations epilogue.opt {@> @ @=} | prologue_declarations "%%" "%%" EPILOGUE {@> @ @=} | prologue_declarations "%%" "%%" {@> @ @=} ; @g @ @= @[TeX_( "/finishlist{/expandafter/yyfirstoftwo/the/yy(1)}" );@]@; /* complete the list */ @[TeX_( "/table/expandafter{/romannumeral0" );@]@; @[TeX_( " /executelistat{/expandafter/yyfirstoftwo/the/yy(1)}{0}}" );@]@; @ {\it Declarations: before the first \prodstyle{\%\%}}. We are now ready to deal with the specifics of the declarations themselves. @= @G prologue_declarations: {@> @ @=} | prologue_declarations prologue_declaration {@> @ @=} ; @g @ @= @[TeX_( "/initlist{/prologuedeclarationsprefix prologue_declarations}" );@]@; @[TeX_( "/yy0{{/prologuedeclarationsprefix prologue_declarations}{/nx/empty}}" );@]@; @[TeX_( "/edef/prologuedeclarationsprefix{./prologuedeclarationsprefix}" );@]@; @ @= @@; @ Here is a list of most kinds of declarations that can appear in the prologue. The scanner returns the `stream pointers' for all the keywords so the declaration `structures' pass on those pointers to the grammar list. The original syntax has been left intact even though for the purposes of this parser some of the inline rules are unnecessary. \eraselocalformattrue @= @G prologue_declaration: grammar_declaration {@> @ @=} | "%{...%}" {@> TeX_( "/yy0{/nx/prologuecode/the/yy(1)}" ); @=} | "%" {@> TeX_( "/yy0{/nx/optionflag/the/yy(1)}" ); @=} | "%define" variable value {@> TeX_( "/yy0{/nx/vardef{/the/yy(2)}{/the/yy(3)}/the/yy(1)}" ); @=} | "%defines" {@> TeX_( "/yy0{/nx/optionflag{defines}{}/the/yy(1)}" ); @=} | "%defines" STRING {@> @t}\vb{\stashed{\Xmark prologue.decls:\Xmark}}{@> @= @> @[TeX_( "/toksa{defines}" );@]@+@ @=} | "%error-verbose" {@> TeX_( "/yy0{/nx/optionflag{error verbose}{}/the/yy(1)}" ); @=} | "%expect" INT {@> @t}\vb{\stashed{\Xmark prologue.decls(g):\Xmark}}{@> @= @> @[TeX_( "/toksa{expect}" );@]@+@ @=} | "%expect-rr" INT {@> @t}\vb{\stashed{\Xmark prologue.decls(g):\Xmark}}{@> @= @> @[TeX_( "/toksa{expect-rr}" );@]@+@ @=} | "%file-prefix" STRING {@> @[TeX_( "/toksa{file prefix}" );@]@+@ @=} | "%glr-parser" {@> TeX_( "/yy0{/nx/optionflag{glr parser}{}/the/yy(1)}" ); @=} | "%initial-action" "{...}" {@> TeX_( "/yy0{/nx/initaction/the/yy(2)}" ); @=} | "%language" STRING {@> @[TeX_( "/toksa{language}" );@]@+@ @=} | "%name-prefix" STRING {@> @[TeX_( "/toksa{name prefix}" );@]@+@ @=} | "%no-lines" {@> TeX_( "/yy0{/nx/optionflag{no lines}{}/the/yy(1)}" ); @=} | "%nondeterministic-parser" {@> TeX_( "/yy0{/nx/optionflag{nondet. parser}{}/the/yy(1)}" ); @=} | "%output" STRING {@> @t}\vb{\stashed{\Xmark prologue.decls:\Xmark}}{@> @= @> @[TeX_( "/toksa{output}" );@]@+@ @=} @t}\vb{\flatten}{@> | "%param" {@> @t}\vb{\stashed{\rm (we simply return pointers below)}}{@> @=} params {@> TeX_( "/yy0{/nx/paramdef{/the/yy(3)}/the/yy(1)}" ); @=} @t}\vb{\fold}{@> | "%require" STRING {@> @t}\vb{\stashed{\Xmark prologue.decls:\Xmark}}{@> @= @> @[TeX_( "/toksa{require}" );@]@+@ @=} | "%skeleton" STRING {@> @[TeX_( "/toksa{skeleton}" );@]@+@ @=} | "%token-table" {@> TeX_( "/yy0{/nx/optionflag{token table}{}/the/yy(1)}" ); @=} | "%verbose" {@> TeX_( "/yy0{/nx/optionflag{verbose}{}/the/yy(1)}" ); @=} | "%yacc" {@> TeX_( "/yy0{/nx/optionflag{yacc}{}/the/yy(1)}" ); @=} | ";" {@> TeX_( "/yy0{/nx/empty}" ); @=} ; params: params "{...}" {@> TeX_( "/yy0{/the/yy(1)/nx/braceit/the/yy(2)}" ); @=} | "{...}" {@> TeX_( "/yy0{/nx/braceit/the/yy(1)}" ); @=} ; @g @ This is a typical parser action: encapsulate the `type' of the construct just parsed and attach some auxiliary info, in this case the stream pointers. \eraselocalformatfalse \smallskip \rulereferencex{\showlastactionfalse}{\nx\inline\nx\flatten}{prologue.decls} \smallskip \noindent The productions above are typical examples. @= @[TeX_( "/yy0{/nx/oneparametricoption{/the/toksa}{/nx/stringify/the/yy(2)}/the/yy(1)}" );@]@; @ A variation on the theme above where the parameter is not a \prodstyle{STRING}. \smallskip \rulereferencex{\showlastactionfalse}{\nx\inline\nx\flatten}{prologue.decls(g)} \smallskip \noindent A sample of the rules to which the code below applies are given above. @= @[TeX_( "/yy0{/nx/oneparametricoption{/the/toksa}{/the/yy(2)}/the/yy(1)}" );@]@; @ {\it Grammar declarations}. These declarations can appear in both the prologue and the rules sections. Their treatment is very similar to the prologue-only options. @= @G grammar_declaration: precedence_declaration {@> @ @=} | symbol_declaration {@> @ @=} | "%start" symbol {@> @t}\vb{\stashed{\Xmark prologue.decls(g):\Xmark}}{@> @= @> @[TeX_( "/toksa{start}" );@]@+@ @=} | code_props_type "{...}" generic_symlist {@> @ @=} | "%default-prec" {@> TeX_( "/yy0{/nx/optionflag{default prec.}{}/the/yy(1)}" ); @=} | "%no-default-prec" {@> TeX_( "/yy0{/nx/optionflag{no default prec.}{}/the/yy(1)}" ); @=} | "%code" "{...}" {@> TeX_( "/yy0{/nx/codeassoc{code}{}/the/yy(2)/the/yy(1)}" ); @=} | "%code" ID "{...}" {@> TeX_( "/yy0{/nx/codeassoc{code}{/nx/idit/the/yy(2)}/the/yy(3)/the/yy(1)}" ); @=} ; code_props_type: "%destructor" {@> TeX_( "/yy0{{destructor}/the/yy(1)}" ); @=} | "%printer" {@> TeX_( "/yy0{{printer}/the/yy(1)}" ); @=} ; @g @ @= @[TeX_( "/getfirst{/yy(1)}/to/toksa" );@]@; /* name of the property */ @[TeX_( "/getfirst{/yy(2)}/to/toksb" );@]@; /* contents of the braced code */ @[TeX_( "/getsecond{/yy(2)}/to/toksc" );@]@; /* braced code format pointer */ @[TeX_( "/getthird{/yy(2)}/to/toksd" );@]@; /* braced code stash pointer */ @[TeX_( "/getsecond{/yy(1)}/to/tokse" );@]@; /* code format pointer */ @[TeX_( "/getthird{/yy(1)}/to/toksf" );@]@; /* code stash pointer */ @[TeX_( "/yy0{/nx/codepropstype{/the/toksa}{/the/toksb}{/the/yy(3)}{/the/toksc}{/the/toksd}{/the/tokse}{/the/toksf}}" );@]@; @ @= @G %token PERCENT_UNION "%union"; @g @ @= @G @t}\vb{\inline\flatten}{@> union_name: {@> TeX_( "/yy0{}" ); @=} | ID {@> @ @=} ; grammar_declaration: "%union" union_name "{...}" {@> @ @=} ; symbol_declaration: "%type" TAG symbols.1 {@> @ @=} ; @t}\vb{\resetf\flatten}{@> precedence_declaration: precedence_declarator tag.opt symbols.prec {@> @ @=} ; precedence_declarator: "%left" {@> TeX_( "/yy0{/nx/preckind{left}/the/yy(1)}" ); @=} | "%right" {@> TeX_( "/yy0{/nx/preckind{right}/the/yy(1)}" ); @=} | "%nonassoc" {@> TeX_( "/yy0{/nx/preckind{nonassoc}/the/yy(1)}" ); @=} | "%precedence" {@> TeX_( "/yy0{/nx/preckind{precedence}/the/yy(1)}" ); @=} ; @t}\vb{\inline}{@> tag.opt: {@> TeX_( "/yy0{}" ); @=} | TAG {@> @ @=} ; @t}\vb{\insertraw{\beginfoldedsections}}{@> @g @ @= @[TeX_( "/yy0{/nx/codeassoc{union}{/the/yy(2)}/the/yy(3)/the/yy(1)}" );@]@; @ @= @[TeX_( "/yy0{/nx/typedecls{/nx/tagit/the/yy(2)}{/the/yy(3)}/the/yy(1)}" );@]@; @t}\endfoldedsections{@> @ @= @[TeX_( "/getthird{/yy(1)}/to/toksa" );@]@; /* format pointer */ @[TeX_( "/getfourth{/yy(1)}/to/toksb" );@]@; /* stash pointer */ @[TeX_( "/getsecond{/yy(1)}/to/toksc" );@]@; /* kind of precedence */ @[TeX_( "/yy0{/nx/precdecls{/the/toksc}{/the/yy(2)}{/the/yy(3)}{/the/toksa}{/the/toksb}}" );@]@; @ The bootstrap grammar forms the smallest subset of the full grammar. @= @@; @ @= @[TeX_( "/yy0{/nx/tagit/the/yy(1)}" );@]@; @ These are the two most important rules for the bootstrap parser. The reasons for the~\prodstyle{\%token} declarations to be collected during the bootstrap pass are outlined in the \locallink{bootstrapping}section on bootstrapping\endlink. The~\prodstyle{\%nterm} declarations are not strictly necessary for boostrapping the parsers included in \splint\ but they are added for the cases when the bootstrap mode is used for purposes other than bootstrapping \splint. @= @G @t}\vb{\flatten}{@> symbol_declaration: "%nterm" {} symbol_defs.1 {@> TeX_( "/yy0{/nx/ntermdecls{/the/yy(3)}/the/yy(1)}" ); @=} @t}\vb{\fold\flatten}{@> | "%token" {} symbol_defs.1 {@> TeX_( "/yy0{/nx/tokendecls{/the/yy(3)}/the/yy(1)}" ); @=} ; @g @ {\it Just like \prodstyle{symbols.1} but accept \prodstyle{INT} for the sake of \POSIX}. Perhaps the only point worth mentioning here is the inserted separator (% \texrefx{/hspace}{other}% \.{\{}$p_0$\.{\}\{}$p_1$\.{\}}, typeset as |TeXa("/hspace"); TeXao(@t\TeXlit"\{\hbox{$p_0$}\}\{\hbox{$p_1$}\}\hbox{$\!$}"@>);|). @q A string "..." is a syntactic unit in \CWEB\ so it is impossible@> @q to insert \TeX\ material in the middle of the string directly@> Like any other separator, it takes two parameters, the stream pointers $p_0$ and~$p_1$. In this case, however, both pointers are null since there seems to be no other meaningful assignment. If any formatting or stash information is needed, it can be extracted by the symbols themselves. @= @G symbols.prec: symbol.prec {@> @ @=} | symbols.prec symbol.prec {@> TeX_( "/yy0{/the/yy(1)/nx/hspace{0}{0}/the/yy(2)}" ); @=} ; symbol.prec: symbol {@> TeX_( "/yy0{/nx/symbolprec{/the/yy(1)}{}}" ); @=} | symbol INT {@> TeX_( "/yy0{/nx/symbolprec{/the/yy(1)}{/the/yy(2)}}" ); @=} ; @g @ {\it One or more symbols to be \prodstyle{\%type}'d}. @= @G %type symbols.1 symbol; symbols.1: symbol {@> @ @=} | symbols.1 symbol {@> TeX_( "/yy0{/the/yy(1)/nx/hspace{0}{0}/the$[symbol]}" ); @=} ; generic_symlist: generic_symlist_item {@> @ @=} | generic_symlist generic_symlist_item {@> TeX_( "/yy0{/the/yy(1)/nx/hspace{0}{0}/the/yy(2)}" ); @=} ; @t}\vb{\flatten\inline}{@> generic_symlist_item: symbol {@> @ @=} | tag {@> @ @=} ; tag: TAG {@> @ @=} | "<*>" {@> @ @=} | "<>" {@> @ @=} ; @g @ {\it One token definition}. @= @G symbol_def: TAG {@> @ @=} @t}\vb{\flatten}{@> | id {@> TeX_( "/yy0{/nx/onesymbol{/the/yy(1)}{}{}}" ); @=} | id INT {@> TeX_( "/yy0{/nx/onesymbol{/the/yy(1)}{/the/yy(2)}{}}" ); @=} | id string_as_id {@> TeX_( "/yy0{/nx/onesymbol{/the/yy(1)}{}{/the/yy(2)}}" ); @=} | id INT string_as_id {@> TeX_( "/yy0{/nx/onesymbol{/the/yy(1)}{/the/yy(2)}{/the/yy(3)}}" ); @=} ; @g @ {\it One or more symbol definitions}. @= @G symbol_defs.1: symbol_def {@> @ @=} | symbol_defs.1 symbol_def {@> @ @=} ; @g @ @= @[TeX_( "/getsecond{/yy(2)}/to/toksa" );@]@; /* the identifier */ @[TeX_( "/getfourth{/toksa}/to/toksb" );@]@; /* the format pointer */ @[TeX_( "/getfifth{/toksa}/to/toksc" );@]@; /* the stash pointer */ @[TeX_( "/yy0{/the/yy(1)/nx/hspace{/the/toksb}{/the/toksc}/the/yy(2)}" );@]@; @ {\it The grammar section: between the two \prodstyle{\%\%}'s}. Finally, the following few short sections define the syntax of \bison's rules. @= @G grammar: rules_or_grammar_declaration {@> @ @=} | grammar rules_or_grammar_declaration {@> @ @=} ; @g @*2 Rules syntax. {\it As a \bison\ extension, one can use the grammar declarations in the body of the grammar}. What follows is the syntax of the right hand side of a grammar rule. The type declarations for various non-terminals are used exclusively by the postprocessor whenever the `native' \bison\ term references are used (see elsewhere for details). @= @G %type rhs id_colon named_ref.opt rhses.1 "|"; rules_or_grammar_declaration: rules {@> @
@=} | grammar_declaration ";" {@> @ @=} | error ";" {@> TeX_( "/errmessage{parsing error!}" ); @=} ; @t}\vb{\flatten\inline}{@> rules: id_colon named_ref.opt {@> @t}\vb{\stashed{\rm (we simply return pointers below)}}{@> @=} rhses.1 {@> @ @=} ; @t}\vb{\resetf}{@> rhses.1[o]: rhs {@> @ @=} | rhses.1[rhses] "|"[mid] {@> @ @=}[c] rhs[d] {@> @ @=} | rhses.1 ";" {@> @ @=} ; @g @ The next few actions describe what happens when a left hand side is attached to a rule. @= @[TeX_( "/initlist{/grammarprefix grammar}" );@]@; @[TeX_( "/getfirst{/yy(1)}/to/toksa" );@]@; /* type of the last addition */ @[TeX_( "/yy0{{/grammarprefix grammar}{/the/toksa}}" );@]@; @[TeX_( "/appendtolistx{/grammarprefix grammar}{/the/yy(1)}" );@]@; @[TeX_( "/edef/grammarprefix{./grammarprefix}" );@]@; @ @= @[TeX_( "/getsecond{/yy(1)}/to/toksa" );@]@; /* type of the last rule */ @[TeX_( "/getfirst{/yy(1)}/to/toksc" );@]@; /* pointer to the accumulated rules */ @[TeX_( "/getfirst{/yy(2)}/to/toksb" );@]@; /* type of the new rule */ @[TeX_( "/let/default/positionswitchdefault" );@]@; @[TeX_( "/switchon{/the/toksb}/in/positionswitch" );@]@; /* determine the position of the first token in the group */ @; /* determine the spacing between sections */ @[TeX_( "/edef/next{/the/toksa}" );@]@; @[TeX_( "/edef/default{/the/toksb}" );@]@; /* reuse \.{\\default} */ @[TeX_( "/ifx/next/default" );@]@; @[TeX_( " /let/default/separatorswitchdefaulteq" );@]@; @[TeX_( " /switchon{/the/toksa}/in/separatorswitcheq" );@]@; @[TeX_( "/else" );@]@; @[TeX_( " /concat/toksa/toksb" );@]@; @[TeX_( " /let/default/separatorswitchdefaultneq" );@]@; @[TeX_( " /switchon{/the/toksa}/in/separatorswitchneq" );@]@; @[TeX_( "/fi" );@]@; @[TeX_( "/appendtolistx{/the/toksc}{/the/postoks/the/toksd/the/yy(2)}" );@]@; @[TeX_( "/yy0{{/the/toksc}{/the/toksb}}" );@]@; @ @= @[TeX_( "/getsecond{/yy(1)}/to/toksa" );@]@; /* \.{\\prodheader} */ @[TeX_( "/getsecond{/toksa}/to/toksb" );@]@; /* \.{\\idit} */ @[TeX_( "/getfourth{/toksb}/to/toksc" );@]@; /* format stream pointer */ @[TeX_( "/getfifth{/toksb}/to/toksd" );@]@; /* stash stream pointer */ @[TeX_( "/getthird{/yy(1)}/to/toksb" );@]@; /* \.{\\rules} */ @[TeX_( "/yy0{/nx/oneproduction{/the/toksa/the/toksb}{/the/toksc}{/the/toksd}}" );@]@; @ Several productions for a given nonterminal are collected in a `production cluster': \smallskip \thisrulereference{}% \smallskip \noindent The inline action does nothing at the moment and is omitted in the main text. @= @[TeX_( "/getfourth{$[id_colon]}/to/toksa" );@]@; /* format stream pointer */ @[TeX_( "/getfifth{$[id_colon]}/to/toksb" );@]@; /* stash stream pointer */ @[TeX_( "/finishlist{/the$[rhses.1]}" );@]@; /* complete the list of rules */ @[TeXb_( "/yy0{/nx/pcluster{/nx/prodheader{/the$[id_colon]}{/the$[named_ref.opt]}" );@]@; @[TeXfo_( " {/the/toksa}{/the/toksb}}{/nx/rules{/nx/executelist{/the$[rhses.1]}}}}" );@]@; @ It is important to format the right hand side properly, since we would like to indicate that an action is inlined by an indentation. \smallskip \thisrulereference{\nx\inline\nx\flatten} \smallskip \noindent The `layout' of the \texref{/rhs} `structure' includes a `boolean' to indicate whether the right hand side ends with an action. Since the action can be implicit, this decision has to be postponed until, say, a semicolon is seen. No formatting or stash pointers are added for implicit actions. @= @[TeX_( "/initlist{/rhsesoneprefix rhses1}" );@]@; @[TeX_( "/yy0{/rhsesoneprefix rhses1}" );@]@; @[TeX_( "/edef/rhsesoneprefix{./rhsesoneprefix}" );@]@; @[TeX_( "/rhsbool{$[rhs]}/to/toksa /the/toksa" );@]@; @[TeX_( "/ifrhsfull" );@]@; @[TeX_( " /appendtolistx{/the/yyval}{/the$[rhs]}" );@]@; @[TeX_( "/else" );@]@; /* right hand side does not end with an action, fake one */ @[TeX_( " /rhscont{$[rhs]}/to/toksa" );@]@; /* rules */ @[TeX_( " /yytoksempty/toksa{/toksa{/emptyterm}}{}" );@]@; @[TeXb( " /appendtolistx{/the/yyval}{/nx/rhs{/the/toksa/nx/rarhssep{0}{0}" );@]@; @[TeXfo( " /nx/actbraces{}{}{0}{0}/nx/bdend}{}{/nx/rhsfulltrue}}" );@]@; @[TeX_( "/fi" );@]@; @ Using standard notation, here is what the middle action does. The part of the rule this action applies to is given below for reference. This action may have been omitted altogether but it serves as a good illustration of how `inline actions' work. \smallskip \rulereference{\nx\inline\nx\flatten}{|@|}% \smallskip \noindent The terms are counted from left (deeper in the value stack) to right (on top of the value stack) although \texref{/yy(0)} (which is the same as \texref{/yyval}) is the {\it right\/}most term, i.e.\ the implicit action itself. What the parser sees at this point are the first two terms on the stack (i.e.\ \prodstyle{rhses.1} and {\toksa\expandafter{\expandafter'\vl'}\expandafter\prodtstyle\expandafter{\the\toksa}}) and is ready to make a reduction which will push the value of the term corresponding to the inline action (i.e.\ |@|) on the stack. The way \bison\ does this is by introducing a new grammar term (named \prodstyle{bogus_inline} for some integer $n$) for each inline action and adding a new rule that reduces an empty sequence of terms to \prodstyle{bogus_inline}. The action for this rule is the inline action. In our case this would read as \begingroup \medskip \def\skipalltoks#1\par{} \def\preparsefallbacktext#1{% \let\postparse\relax \message{#1}% \skipalltoks } \extendswitch\multicharswitch\at\stashed\by\PB\to\multicharswitchadjust \let\multicharswitch\multicharswitchadjust \def\textproductionsetup{% \rulereftextproductionsetup \let\acharswitch\texcharadjust \let\onecharswitch\texcsadjust }% \beginprod \inline\flatten bogus_inline: \{|@|\} \endprod \medskip \endgroup \noindent$\ldots$except the parser knows what the state of the stack is at this point and thus the code inside |@| can now refer to the terms on the stack as described above. @= @[TeX_( "/appendtolistx{/the/yy(1)}{/nx/midf/the/yy(2)}" );@]@; @ However, if the length of the rule preceding the inline action is not known to the parser in advance (as is the case for the parsers \splint\ generates using any version of \bison\ that is $\geq3.0$) a different way of accessing the stack is necessary. This notation is also more natural as it counts the terms from right to left, i.e.\ `into the depths of the stack' (for example \texref{/bb2\{\}} is the register holding the value of~\prodstyle{rhses1}). It is worth noting that in this case \texref{/yy(0)} and \texref{/yyval} are still the same register, the one that holds the value of the term corresponding to the inline action itself. @= @[TeX_( "/bb2{/toksa}/bb1{/toksb}" );@]@; @[TeX_( "/appendtolistx{/the/toksa}{/nx/midf/the/toksb}" );@]@; @ Finally, using the `native' way of referring to term values results in the most natural code. In this case, one can mix numeric and symbolic references for both implicit and explicit rules. @= @[TeX_( "/appendtolistx{/the$[rhses]}{/nx/midf/the$[mid]}" );@]@; @ Productions are collected in a `productions cluster' (not an official term) by the following action: \smallskip \thisrulereference{\nx\inline\nx\flatten} \smallskip\noindent As can be seen in the code below, no pointers are provided for an {\it implicit\/} action (since there are no tokens associated with it). Processing a set of rules involves a large number of reexpansions. This seems to be a good place to use a list to store the nodes (see \.{yycommon.sty} for details on list macros). While providing a noticeable speed up, this technique significantly complicates the debugging of the grammar. In particular, inspecting a parsed table supplies very little information if the list not expanded. The macros in \.{yyunion.sty} provide a special debugging namespace where the expansion of the parser produced control sequences may be modified to safely expand the generated table. The code below relies on the inline action |@| above to store the relevant information from \texref{/yy(1)} (corresponding to \prodstyle{rhses1}) in \texref{/yy(3)} (which is the inline action `term' \inlineactionsymbol\ in the production above). @q Note that one cannot use \prodstyle{...} above to display \inlineactionsymbol@> @q since the \prodstyle{...} macro relies on the name parser. See yyunion.sty @> @q for further details about the special terms like this. @> @= @[TeX_( "/rhsbool{/yy(4)}/to/toksa /the/toksa" );@]@; @[TeX_( "/ifrhsfull" );@]@; @[TeX_( " /appendtolistx{/the/yy(1)}{/nx/rrhssep/the/yy(2)/the/yy(4)}" );@]@; @[TeX_( "/else" );@]@; @[TeX_( " /rhscont{/yy(4)}/to/toksa" );@]@; @[TeX_( " /yytoksempty/toksa{/toksa{/emptyterm}}{}" );@]@; @[TeXb( " /appendtolistx{/the/yy(1)}{/nx/rrhssep/the/yy(2)" );@]@; @[TeXf( " /nx/rhs{/the/toksa/nx/rarhssep{0}{0}" );@]@; /* streams have already been grabbed */ @[TeXfo( " /nx/actbraces{}{}{0}{0}/nx/bdend}{}{/nx/rhsfulltrue}}" );@]@; @[TeX_( "/fi" );@]@; @[TeX_( "/yy0{/the/yy(1)}" );@]@; @ @= @G %token PERCENT_EMPTY "%empty"; @g @ The centerpiece of the grammar is the syntax of the right hand side of a production. Various `precedence hints' must be attached to an appropriate portion of the rule, just before an action (which can be inline, implicit or both in this case). \saveparseoutputtrue @= @G(b) rhs: {@> @ @=} | rhs symbol named_ref.opt {@> @ @=} | rhs "{...}" named_ref.opt {@> @ @=} | rhs "%?{...}" {@> @ @=} | rhs "%empty" {@> @ @=} | rhs "%prec" symbol {@> @ @=} | rhs "%dprec" INT {@> @ @=} | rhs "%merge" TAG {@> @ @=} ; named_ref.opt: {@> @ @=} | BRACKETED_ID {@> @ @=} ; @g @ The simplest form of the right hand side is an empty rule. In this case the parser must make a reduction based on the lookahead only (or the current state), i.e.\ no tokens are consumed from the input. \saveparseoutputfalse @= @[TeX_( "/yy0{/nx/rhs{}{}{/nx/rhsfullfalse}}" );@]@; @ Adding a \bison\ term to the right hand side involves collecting of several pieces of information. One of them is the (optional) symbolic named that can be used by the action code to refer to the place on the value stack that is allocated for this term. \smallskip \thisrulereference{\nx\inline\nx\flatten}% \smallskip \noindent The space between the term and the preceeding part of the rule may depend on the type of rule element that appears at the end of the rule parsed so far. @= @[TeX_( "/rhscont{/yy(1)}/to/toksa" );@]@; @[TeX_( "/rhscnct{/yy(1)}/to/toksb" );@]@; @[TeX_( "/yytoksempty/toksb{}{" );@]@; @[TeX_( " /getfourth{/yy(2)}/to/toksc" );@]@; @[TeX_( " /getfifth{/yy(2)}/to/toksd" );@]@; @[TeX_( " /appendr/toksb{{/the/toksc}{/the/toksd}}" );@]@; @[TeX_( "}" );@]@; @[TeXb( "/yy0{/nx/rhs{/the/toksa/the/toksb" );@]@; @[TeXao( "/nx/termname{/the/yy(2)}{/the/yy(3)}}{/nx/hspace}{/nx/rhsfullfalse}}" );@]@; @ Action processing is somewhat complicated since the action can be either inline or terminal, affecting the typesetting. \smallskip \thisrulereference{\nx\inline\nx\flatten}% \smallskip \noindent Additionally, an action may follow an empty rule in which case a special term must be added to aid the reader. @= @[TeX_( "/rhscont{/yy(1)}/to/toksa" );@]@; @[TeX_( "/rhsbool{/yy(1)}/to/toksb /the/toksb" );@]@; @[TeX_( "/ifrhsfull" );@]@; /* the first half ends with an action */ @[TeX_( " /appendr/toksa{/nx/arhssep{0}{0}/nx/emptyterm}" );@]@; /* no pointers to streams */ @[TeX_( "/fi" );@]@; @[TeX_( "/yytoksempty/toksa{/toksa{/emptyterm}}{}" );@]@; @[TeX_( "/getfirst{/yy(2)}/to/toksb" );@]@; /* the contents of the braced code */ @[TeX_( "/getsecond{/yy(2)}/to/toksc" );@]@; /* the format stream pointer */ @[TeX_( "/getthird{/yy(2)}/to/toksd" );@]@; /* the stash stream pointer */ @[TeXb( "/yy0{/nx/rhs{/the/toksa/nx/rarhssep{/the/toksc}{/the/toksd}" );@]@; @[TeXf( " /nx/actbraces{/the/toksb}{/the/yy(3)}{/the/toksc}{/the/toksd}/nx/bdend}" );@]@; @[TeXfo( " {/nx/arhssep}{/nx/rhsfulltrue}}" );@]@; @ @= @[TeX_( "/rhscont{/yy(1)}/to/toksa" );@]@; @[TeX_( "/rhsbool{/yy(1)}/to/toksb /the/toksb" );@]@; @[TeX_( "/ifrhsfull" );@]@; /* the first half ends with an action */ @[TeX_( " /appendr/toksa{/nx/arhssep{0}{0}/nx/emptyterm}" );@]@; /* no pointers to streams */ @[TeX_( "/fi" );@]@; @[TeX_( "/yytoksempty/toksa{/toksa{/emptyterm}}{}" );@]@; @[TeX_( "/getfirst{/yy(2)}/to/toksb" );@]@; /* the contents of the braced code */ @[TeX_( "/getsecond{/yy(2)}/to/toksc" );@]@; /* the format stream pointer */ @[TeX_( "/getthird{/yy(2)}/to/toksd" );@]@; /* the stash stream pointer */ @[TeXb( "/yy0{/nx/rhs{/the/toksa/nx/rarhssep{/the/toksc}{/the/toksd}" );@]@; @[TeXf( " /nx/bpredicate{/the/toksb}{}{/the/toksc}{/the/toksd}/nx/bdend}" );@]@; @[TeXao( "{/nx/arhssep}{/nx/rhsfulltrue}}" );@]@; @ An empty right hand side may be specified explicitly by using \prodstyle{\%empty} as the sole token in the production. This will increase the readability of the grammar by making the programmer's intentions more transparent. @= @[TeX_( "/rhscont{/yy(1)}/to/toksa" );@]@; @[TeX_( "/rhscnct{/yy(1)}/to/toksb" );@]@; @[TeX_( "/yytoksempty/toksb{}{" );@]@; @[TeX_( " /getfourth{/yy(2)}/to/toksc" );@]@; @[TeX_( " /getfifth{/yy(2)}/to/toksd" );@]@; @[TeX_( " /appendr/toksb{{/the/toksc}{/the/toksd}}" );@]@; @[TeX_( "}" );@]@; @[TeXb( "/yy0{/nx/rhs{/the/toksa/the/toksb" );@]@; @[TeXao( "/nx/emptyterm}{/nx/hspace}{/nx/rhsfullfalse}}" );@]@; @ @= @[TeX_( "/rhscont{/yy(1)}/to/toksa" );@]@; @[TeX_( "/rhscnct{/yy(1)}/to/toksb" );@]@; @[TeX_( "/rhsbool{/yy(1)}/to/toksc /the/toksc" );@]@; @[TeX_( "/ifrhsfull" );@]@; @[TeX_( " /yy0{/nx/sprecop{/the/yy(3)}/the/yy(2)}" );@]@; /* reuse \.{\\yyval} */ @[TeX_( " /supplybdirective/toksa/yyval" );@]@; /* the directive is `absorbed' by the action */ @[TeX_( " /yy0{/nx/rhs{/the/toksa}{/the/toksb}{/nx/rhsfulltrue}}" );@]@; @[TeX_( "/else" );@]@; @[TeXb( " /yy0{/nx/rhs{/the/toksa" );@]@; @[TeXao( "/nx/sprecop{/the/yy(3)}/the/yy(2)}{/the/toksb}{/nx/rhsfullfalse}}" );@]@; @[TeX_( "/fi" );@]@; @ @= @[TeX_( "/rhscont{/yy(1)}/to/toksa" );@]@; @[TeX_( "/rhscnct{/yy(1)}/to/toksb" );@]@; @[TeX_( "/rhsbool{/yy(1)}/to/toksc /the/toksc" );@]@; @[TeX_( "/ifrhsfull" );@]@; @[TeX_( " /yy0{/nx/dprecop{/the/yy(3)}/the/yy(2)}" );@]@; /* reuse \.{\\yyval} */ @[TeX_( " /supplybdirective/toksa/yyval" );@]@; /* the directive is `absorbed' by the action */ @[TeX_( " /yy0{/nx/rhs{/the/toksa}{/the/toksb}{/nx/rhsfulltrue}}" );@]@; @[TeX_( "/else" );@]@; @[TeXb( " /yy0{/nx/rhs{/the/toksa" );@]@; @[TeXao( "/nx/dprecop{/the/yy(3)}/the/yy(2)}{/the/toksb}{/nx/rhsfullfalse}}" );@]@; @[TeX_( "/fi" );@]@; @ @= @[TeX_( "/rhscont{/yy(1)}/to/toksa" );@]@; @[TeX_( "/rhscnct{/yy(1)}/to/toksb" );@]@; @[TeX_( "/rhsbool{/yy(1)}/to/toksc /the/toksc" );@]@; @[TeX_( "/ifrhsfull" );@]@; @[TeX_( " /yy0{/nx/mergeop{/nx/tagit/the/yy(3)}/the/yy(2)}" );@]@; /* reuse \.{\\yyval} */ @[TeX_( " /supplybdirective/toksa/yyval" );@]@; /* the directive is `absorbed' by the action */ @[TeX_( " /yy0{/nx/rhs{/the/toksa}{/the/toksb}{/nx/rhsfulltrue}}" );@]@; @[TeX_( "/else" );@]@; @[TeXb( " /yy0{/nx/rhs{/the/toksa" );@]@; @[TeXao( "/nx/mergeop{/nx/tagit/the/yy(3)}/the/yy(2)}{/the/toksb}{/nx/rhsfullfalse}}" );@]@; @[TeX_( "/fi" );@]@; @t}\beginfoldedsections{@> @ @= @[TeX_( "/yy0{}" );@]@; @ @= @@; @t}\endfoldedsections{@> @*2 Identifiers and other symbols. {\it Identifiers are returned as |uniqstr| values by the scanner. Depending on their use, we may need to make them genuine symbols}. We, on the other hand, simply copy the values returned by the scanner. @= @G id: ID {@> @ @=} | CHAR {@> @ @=} ; @g @ @= @@; @ @= @G symbol: id {@> @ @=} | string_as_id {@> @ @=} ; @g @ @= @G @t}\vb{\inline}{@> id_colon: ID_COLON {@> @ @=} ; @g @ A string used as an \prodstyle{ID}. @= @G @t}\vb{\inline}{@> string_as_id: STRING {@> @ @=} ; @g @ The remainder of the action code is trivial but we reserved the placeholders for the appropriate actions in case the parser gains some sophistication in processing low level types (or starts expecting different types from the scanner). \beginfoldedsectionshere @= @[TeX_( "/yy0{/nx/idit/the/yy(1)}" );@]@; @ @= @[TeX_( "/yy0{/nx/charit/the/yy(1)}" );@]@; @ @= @@; @ @= @@; @ @= @@; @ @= @[TeX_( "/yy0{/nx/stringify/the/yy(1)}" );@]@; @t}\endfoldedsections{@> @ {\it Variable and value. The \prodstyle{STRING} form of variable is deprecated and is not \.{M4}-friendly. For example, \.{M4} fails for \.{\%define "[" "value"}.} @= @G @t}\vb{\flatten\inline}{@> variable: ID {@> @ @=} | STRING {@> @ @=} ; value: {@> TeX_( "/yy0{}" ); @=} | ID {@> @ @=} | STRING {@> @ @=} | "{...}" {@> TeX_( "/yy0{/nx/bracedvalue/the/yy(1)}" ); @=} ; @g @ @= @G @t}\vb{\flatten\inline}{@> epilogue.opt: {@> TeX_( "/yy0{}" ); @=} | "%%" EPILOGUE {} ; @g @ \Cee\ preamble for the grammar parser. In this case, there are no `real' actions that our grammar performs, only \TeX\ output, so this section is empty. @= @ \Cee\ postamble for the grammar parser. It is tricky to insert function definitions that use \bison's internal types, as they have to be inserted in a place that is aware of the internal definitions but before said definitions are used. @= @ @= @@; @@; @ @= void bootstrap_tokens( char *bootstrap_token_format ) { #define _register_token_d(name) @[fprintf( tables_out, bootstrap_token_format, #name, name, #name );@; @@; #undef _register_token_d@; } @ \namedspot{bootstraptokens}Here is the minimal list of tokens needed to make the lexer operational just enough to extract the rest of the token information from the grammar. @= _register_token_d(ID)@; _register_token_d(PERCENT_TOKEN)@; _register_token_d(STRING)@; @q The tokens below are not required to make a minimal bootstrapping parser work @> @q but they do appear in the rules the parser will encounter while extracting @> @q token information. @> @q _register_token_d(INT) /* only encountered in GRAM_EOF definition which is never used */ @> @q _register_token_d(CHAR) /* \bison\ never declares character tokens */ @> @q _register_token_d(SEMICOLON) /* can be omitted in prologue */ @> @q _register_token_d(TAG) /* only encountered in the definition of PERCENT_PARAM */ @> @*1 Union of types. This section of the \bison\ input lists the types that may appear on the value stack. Since \TeX\ does not provide any mechanism for type checking (nor is it clear how to translate a \Cee\ |union| into any data structure usable in \TeX), this section is left (nearly) empty. The reason for the lonely type below is the postprocessor that facilitates the use of \bison\ `native' term references (see elsewhere). In order to translate such references into appropriate \TeX\ code, the postprocessor must let \bison\ calculate offsets into the value stack, which requires assigning types to various terminals and non-terminals. The specific type has no significance. @= int intval;