|
Home |
Download |
ANTLRWorks |
Wiki |
About ANTLR |
Feedback |
Support |
Bugs |
v2
|
|
|
Latest version is 3.0.1 Download now! » |
|
|
ANTLR Code generation
March 25, 2006Updated June 3, 2006 as I try to finish implementation; added eot/eof tables. Currently I generate a bunch of class definitions in order to build
the state machines for lookahead prediction. The problem as that you
end up with hundreds of .class files in the directory, which is highly
annoying. Further, it requires a huge amount of time and to load these
dynamically a runtime. The more traditional state machine
implementation has a transition function as follows:
It is very fast and uses a very dense representation of the DFA graph (compared to the horde of classes I generate now). It avoids method calls too, which are very expensive in a polymorphic language. The reason I did not use those for ANTLR is because of unicode characters. For n states, it requires n*65536 integers for the obvious implementation. Also, some state transitions require semantic predicate expressions that cannot be encoded in this simple strategy. I have now a very simple compression strategy that should be able to use the above strategy for implementating the vast majority of the states, only moving to a switch statement when predicates are required or the strategy is unable to compress the data far enough to fit in the array. This strategy requires the following arrays:
I have not tried this out, but the DFA simulation algorithm should
look like the following:
In the Mantra language parser there are about 700 lines out of 8726 of class definitions for DFA states. In the lexer there were 2051 lines out of 5670. There are about 400 .class files in the directory just for DFA states (from 1 parser and 3 tree parsers). Ick. October 17, 2005A followup from yesterday. Ok, I yanked out all the automatic tree
creation stuff from Java.stg into JavaAST.stg (well, the templates).
Modified the CodeGenerator to look for <language>AST.stg if you use
output=AST. I still needed to alter existing templates like alt:
So it didn't have any AST stuff in it. First problem would be: get
rid of autoAST parameter in Java code. Then need to override alt in
JavaAST.stg. I tried
Which works, but I realized it duplicates the curlies on the outside
so then I decided I needed to leave holes in the Java.stg version:
Hmm...it will get an error when those holes have no values... Define
default parameters:
Looks good except for the JavaAST.stg case:
Gross! You can't see the template defs as default arguments! Ok, so
I decided we needed to try inheritance for this:
That is pretty annoying with the "alt_" prefix to make it unique and
that you have to define them even though they are empty!
Then in the subgroup, you do this:
Note the alt_ is like the scope override for a method definition in C++. Ok, the problems with this straight inheritance scheme:
Maybe we need the "aspect" approach after all. Currently with
inheritance, we define a group with new templates and template
overrides. What if we could have finer granularity than template?
We could define regions of a template that could be overridden but
the syntax could be nasty:
Most of the time the region would be empty. Seems like a special kind
of hole is what we want:
These are essentially attributes that don't generate an error if they
have no value and we could have special syntax to set them in a
subgroup. Now a subgroup could define new templates, define template
overrides, and define supergroup template aspects:
It's like overriding template regions or setting the aspects of a
template. I suppose you could allow setting of aspects even within a
group:
This would be identical to
except that a subgroup could not remove the "int count=0;". When
@alt.declarations is a separate aspect, the subgroup could remove
the decl:
or add to it:
where <...> would mean "previous aspect value". Or,
using the "list creation" syntax. Seems cleaner. Just to be clear, any aspect definitions in a group are not overridden in a subgroup. Just like constructors invoke their super ctors before they do any work, the supergroup templates must be complete before they are overridden; complete in the sense that all aspects within that group have been applied. Then the overridden method can ref super.template() or whatever normally. The subgroup aspects are applied after the subgroup template has been created. Confusing language: will have to clean this up. Oh, and seems like code cannot set aspects as they are output snippets not data. RegionsAh! I just talked to Ric Klaren again...let's go with regions as it's
consistent. You are just defining chunks of templates that you can
override. All we need is some syntax. How about:
Or, if we want angle brackets around it:
Can't have isolated <@cleanup> as it wouldn't know whether to seek an <@end>. Perhaps another tag for shorthand <@cleanup@>. One can imagine an implementation of this as simply syntactic sugar that converted regions into actual templates so StringTemplate could remain the same. The parser group syntax parser would simply treat this @ stuff as a bunch of macros. With regions we don't need special syntax to add to an existing region etc... Can just refer to <@super.declarations> or something similar to get the supergroup's region definition. Have to be careful about the syntax though so it's not ambiguous between define a region and refer to a region. October 16, 2005ANTLR2005 workshop is this week and Ric Klaren is in town early for code generation hacking! The code generation templates have become a complete mess due to the introduction of code gen aspects such as tree construction, debugging, etc... I brought this up and we discussed solutions. First the problem. Each template by itself is fine like:
Then, I need to add functionality for a token ref with a += label. Then automatic tree construction with += label. Then tree rewriting with += label. Etc... The template got totally unreadable with all the IF conditionals! Ick. So, as much as I hated to do it, I started creating very specialized templates. I currently have the following huge list of templates to handle all possible combinations of label types and tree construction:
Note that I think we should still factor out the various operations like tokenRefASTRootAndListLabel as it represents a different grammar element rather than an option change like "build ast". But the template should say how it's different from a normal tokenRefASTRoot template for example (with whatever mechanism we choose) to avoid duplication in the templates (maintenance nightmare). It's only 93 lines of template code, but it's the repeated template snippets within templates that is really bad...I suppose the proliferation of template names is a necessary evil, though unpleasant. We discussed using template inheritance but we thought it would be
almost as cumbersome:
A subgroup would then override tokenRefBEGIN template etc... What we need is some simple form of composition so we thought about
using an "aspect" like mechanism. The "AST aspect" would fill in
holes in the tokenRef template by just setting attributes:
That would be the normal template and in a separate aspect file you'd
say
You'd register an aspect on the normal group something like this:
Executing toString() would first get the template and then apply all
aspects, setting begin and end attributes before actually
rendering. It would behave as if you had simply done these extra
lines:
before the toString(). That was a really interesting approach until I realized that we need to do more than modify existing templates--we need to add new templates. You don't have the normal Java output templates to know anything about the need for template tokenRefAST() for example. That template should be defined in a different group. Again, we're back to inheritance. You need to have group JavaAST
inherit from Java group to add templates...we could define aspects
to allow new templates, but then it's duplicated functionality. Ok,
let's try again with inheritance, but using the default parameter
functionality to refine and set parametes. I use this default
parameter thing already for making parser and treeParser refine
or override or whatever you want to call it like this:
Ok, now we need to define what a token reference with += label looks
like. Here is what it looks like now:
which share identical snippet:
with tokenRef. Seems like we want to define tokenRefAndListLabel
as it differs from tokenRef. This has nothing to do with trees so
should be in the same template group. Together the two would look like:
where I have reused template tokenRef by simply referring to it and using (...) as the arg list to say "allow tokenRef to see all of tokenRefAndListLabel's parameters. I also don't need begin and end holes because I can always surround the other content with literals as I've done here. What about altering the parameters or defining what goes in a hole?
The following template adds an 's' to the label tokenRef will use to
define the label and it sets the middle hole to be extra:
So, within a group, I can define one template as it differs from another pretty easily. That is within a group. What about a new group like JavaAST where we want to encapsulate all AST stuff? You want to perhaps:
To add a template in a subgroup and define it as it differs from
another is easy and looks the same as tokenRef vs
tokenRefAndListLabel:
We are saying that to add this token to an AST, add that stuff after a
normal tokenRef. Actually, it seems we could also override
tokenRef to have the AST stuff when we use group JavaAST:
That makes sense. Remove the "AST" component of the name as we're now in the JavaAST group. This will simply the code generator code also because it doesn't have to add or omit that word when computing a template name. Ok, I just refactored all the templates. Dropped from 93 to 18 (normal) + 59 (AST) == 77 lines, but there is essentially no duplicated template content! 20% drop represents the duplicated content. The normal template file looks very clean now for tokenRef:
where listLabel is reused in the JavaAST subgroup:
May 8, 2005On Friday, I had a beautiful "StringTemplate code generation moment" when working on ANTLR v3. Background on multiple phase translationsComputer language translation is a matter of making 1 or more passes over the input to collect information and then generating the necessary output with a final pass. Only the simplest, syntax directed, translations get away with a single pass over the input integrating information collection and generation. For example, the simple notion of forward references causes lots of trouble, requiring at least two passes over the input to discover references to undefined language elements like functions. Older languages such as C and Pascal require the human programmer to declare functions before they are seen lexically in a program in order to simplify compiler construction. Using a multiple pass approach is an excellent programming technique to simplify a difficult translation into manageable phases, but it often demands more complicated data structures to hold phase results. Consider translating from a language that allows implicitly-defined
variables like Fortran and Python to a language that requires
definitions at the start of a block like C. Among all of the other
work, it is nice to have a pass that explicitly defines any
implicitly-defined variables. For example, in Python the following
code
would be translated to the following C code:
The define implicitly-defined scalars phase would collect a list of all scalars without declarations by visiting all assignment statements. Later, in the final code generation phase, that list of scalars must be available and the generation phase must be changed every time this phase alters the scalar lists or other data structures in any way. It would be nice to isolate each phase from the others as much as possible. Using StringTemplate in multi-phase translatorsStringTemplate is to output languages as ANTLR is to input languages. In other words, a group of StringTemplate templates represents an output (generational) grammar. The StringTemplate engine feeds off of attributes while an ANTLR parser feeds off of a token stream. The two are duals of each other, which is how I showed (roughly) that even a template engine restricted to enforce model-view separation can generate the context-free languages. Translators should use a parser generator for recognizing input and a template group as a code generator. Aside from the nice symmetry, using templates for code generation helps to isolate translation phases. In fact, there may be no final translation phase that combines the results from the internal phases, hence, there are usually fewer inter-phase dependencies. The outermost template's toString() method renders the template to text. A StringTemplate group describes the overall structure of the output
language and provides an infrastructure or scaffolding. A phase
merely fills in some of the holes. For example, here is a template
definition for C functions:
The arg:argument(); separator="," notation indicates that template argument is to be applied to each arg attribute (which must be an aggregate with type and name properties) with a comma in between multiple values. Aside from being extremely clean and informative, a StringTemplate template provides a form of lazy evaluation, an incredibly important characteristic that allows a template to refer to values that have not yet been defined. This permits a translator to compute values when it finds it efficient or convenient. An output sentence is represented via a highly-nested tree of templates and expressions (referring to attributes) analogous to a parse tree associated with input parsing. The function template lazily evaluates expressions <decls> and <stats> in the sense that many phases can add declarations or statements as needed and in any phase order. The StringTemplate.toString() method forces evaluation of expressions, but until that time templates may refer to any value(s) they want. Phases may define the attributes in any order. The implicitly-defined scalar phase can walk the assignment statements directly adding to the decls attribute without affecting any other phases that need to declare variables. Altering the pastLast Friday I was trying to implement an ANTLR feature where actions
may refer to tokens or rules in a production if the reference is
unique. For example, I wanted the following to work:
The translation should behave as if the programmer had typed:
A non-StringTemplate-based translator might have trouble with this situation. The problem is that when the translator was translating the action, and discovered the $ID, it would already have generated code for the ID token reference. The output may already have been written to a file. By evaluating output early, the translator cannot affect previously-seen language constructs such as the ID. The template for a token reference is simply:
An isolated ID yields match(ID); and x=ID yields:
Using a template engine with lazy evaluation, a phase that translates actions may affect output for prior elements such as the ID reference by simply, in this case, defining the label attribute. The ANTLR ActionTranslator asks for the code chunk (a template instance of "type" tokenRef) associated with the ID reference and just sets the label attribute to a new, unique label name. Information can flow the other direction as well. The
ActionTranslator may ask if the ID reference already has a label
associated with it. This simultaneously prevents multiple $ID
references in the action from trying to create multiple labels on the
ID and also allows the reuse of an existing user-defined label. The
following ANTLR grammar
would be interpreted as
In other words, the generated code would refer to the id label not an unfamiliar automatically generated label conjured up by ANTLR. Now that is cool. October 27, 2004I just reread the RE2C scanner generator by Bumbulis and Cowan. Generates hardcoded DFA like I intend to do. Tom Pennello did this for LR parsing back in the 80's. A very fast method in general. Must faster than simulating DFA with tables. My notes from the paper: OptimizationsThey have an analysis showing IFs are better than SWITCHes sometimes depending on the density of the edge labels. When they generate IFs, they order the edge label values to make simpler expressions. Can also do binary search on the labels. They use bitsets but they have 8 bit char vs 16 bit UNICODE. When the labels are latin char etc... though I can consider doing bitsets for complicated edge transitions. Also, for chains of states (such as generated for "abc"), you can
avoid a goto by generating stuff like this:
Just let the CPU program counter fall through to the next state. Buffering IOWhile I think just about all applications will be able to buffer the entire input, some very large input will require buffering. They mention a "sentinel" technique where some illegal char like ASCII 0 signals when to reload. So if you buffer 1024 char, then add 0 to the end and then have a state transition on 0 for every state that goes to a special "buffer me again" state. An easier mechanism is apparently to simply buffer lines since \n is unlikely to exist in the middle of a token (I've never seen it do that, so probably ok). October 19, 2004For the Java target of ANTLR 3.0, I've been concerned about the hideous nature of the code generated for DFA machines. Without a goto instruction in Java, you have to use a massive switch (slow as death) or use method calls (slow); plus you have to create like an object for each of thousands of states when the Lexer or Parser is loaded. Ick. Sriram Srinivasan at the ANTLR2004 workshop suggested generating byte code directly. There is a goto bytecode of course. :) The beauty of these suckers is that the CPU program counter will be used (once compiled to native code) to traverse the states of the DFA. That will be insanely fast; about the same speed as the C DFAs Ric generates for 3.0. So, I will generate byte code assembly for the DFAs from the ANTLR code generator using StringTemplate so people can tweak the byte codes easily. For example, you might want to inline some of the char fetch stuff. That will haul ass! The only problem was that I didn't want to use BCEL or Jasmin or any other package since it means ANTLR depends on those packages (i hate that). So, I spent the weekend and yesterday building a .class file writer and then a byte code assembler. Both are tiny as they are very specific to my needs (only handles like 50 bytecodes). Nonetheless, 900 total lines of code is nice. :) An excellent price to pay to avoid other people's libraries. Here is a sample (partial) state in my assembly code
I'm guessing that a minimal state will be about 20-ish bytes, allowing for a max of about 3000 states in any specific DFA. Not a serious limitation probably. I also hope to use this code in my CS652 grad prog lang class whenever I teach that again. A nice real world problem. June 14, 2004Loring Craymer convinced me yesterday (supported by Ric Klaren's comments on the list today) that making some static objects is a better solution than a text-based DFA spec. Agreed. So, playing around today, I get the following possible solution: 1. Base DFA class with a trivial State definition that says how to
transition:
Then a specific DFA could be a singleton object derived from DFA. For example, here is a DFA predicting rule nextToken given rules:
Here is the specific DFA:
Note that this is not optimized in any way. For example, s1 and s2 are totally unnecessary. Of course, this LL(1) decision wouldn't even result in a DFA, but it illustrates the point. ;) Anyway, finally, you create the singleton:
and then use it like this:
This further does not take into consideration the optimizations I plan on doing for the lexer to avoid re-examining chars. Size? Hmm...doesn't it seem that the simple DFA encoded as static objects is still pretty damn big even though it's mostly random java syntax crap and the actual generated bytecodes is probably small. Generating bytecodes would be the tightest, but not really readable. Can't use goto bytecodes as they won't decompile into Java. :( June 13, 2004Concerning the new parsers and lexers generated by ANTLR 3.0, I'm
trying to decide how to encode the prediction DFAs. Most decisions
are LL(1) in which case I can generate a normal looking thing like:
However, when the lookahead gets k>1 (or possibly even arbitrarily
large), I need to generate a state machine that figures out which
alternative will eventually match. Consider that k fixed lookahead is
really just a state machine of fixed depth k (i.e., exactly k states
examined for each alternative and no loops). For arbitrary lookahead,
clearly the DFA needs to allow loops etc... These can get sort of
complicated looking in implementation, but will provide some dramatic
power. For example, the following will freak out any fixed lookahead
LL-based parser generator:
Naturally, this isolated case could be left-factored but it would be less natural etc... ANTLR 3.0's "LL-Star" analyzer will gladly accept this at the cost of run-time speed. A DFA will zip ahead looking for the ';' or '{' that uniquely predicts which alt will eventually match. This is like automatic backtracking, but with a fast DFA instead of the full-parse. My goal is to provide a tool that allows newbies to just type something in and not have it puke all over the place with ambiguity warnings. More experienced users can build more tightly tuned grammars. The more LL(k) it is, the faster it will go. :) So, to my big question. Java doesn't have gotos which makes it a pain
in the butt to generate decently looking state machines. You have to
use nested switches like this:
Eewww. Even in C, though, the DFA encoding could be rather large. We also have the problem in Java that methods can only be 64k and some DFAs would break that. Instead, for non-LL(1) decisions, I wonder if we should just encode the DFA in a terse (serialized) syntax and then build the DFA in memory (instead of using freeze-dried code) at parser initialization time. The cost is linear in the size of the DFA so there is no serious cost. I am leaning more and more to having decent symbolic information at parse time such as the original grammar so that great errors and recovery could be done (development would be easier too). Building DFAs dynamically fits in this direction. It turns out that I already have a readable DFA description syntax
that I use for unit testing the analyzer. For example, for grammar g,
the DFA would be the string in "expecting":
Graphically, the DFA would look like:
where s2 and s3 are stop states. So, the general form of a parsing
decision would look like:
The start up cost would be pretty small and the resulting DFA would be cached by the dfa factory. My main question revolves around readability. One of the big draws of ANTLR is its readability and I don't want to screw that up. One could argue that just the notion of a DFA is less readable, but from my own experience the key is being able to watch the parser route control flow in the debugger--not necessarily verifying the prediction expressions. In fact, the expressions are often references to sets that are totally opaque to the programmer. In summary, the common LL(1) decisions can be handled with simple switch statements, but the more complicated DFAs should be handled not with hard-coded DFAs but with DFAs dynamically created from simple Human-readable descriptions. This has the added benefit that interpreted parsers and code generated parsers would be using the exact same mechanisms. January 04, 2004Thinking more about the parser intermediate language. Rather than generate a tree directly from the NFA+DFA stage, there are advantages to generating text and then converting to a ParseLang tree:
You could view this format as a program-like grammar with
explicitly-specified lookahead computations. For example, the simple
grammar a : A | B ; might look like this:
The lookahead could get rather complicated. Decisions would assume
k=1 unless otherwise specified, but for k>1 you'd need options for
approximate lookahead, full LL(k) lookahead, syntactic predicate
lookahead, semantic predicates and their syntactic contexts, and
finally LL-regular lookahead state machines. Here is a simple k=2
decision with an approximate and full LL lookahead decision:
One could express this as:
For syntactic predicates, the lookahead would look like another block
to match:
You might gen something like this:
Semantic predicates such as:
have both a lookahead and action component:
Hmm...The syntax is inconsistent or at least the {...} is highly overloaded to mean set, code block, recognition block, etc... Anyway, subrules are different select blocks. For example,
would yield:
A star loop would be similar:
yields
Note: code generators working off this format could optimize this as simply "while lookahead is A" or "while lookahead is not B". No plus loop:
yields
This syntax makes repeated structures like
pretty easy and consistent:
Now, for LL-regular grammar, we'll have to be able to specify a state
machine somehow. It seems like a waste to convert the internal DFA to
a text regex and then back to a regex. Hmm..perhaps the DFA testing
format could be used (I manually specify what the DFA should look like
and then compare it against what is generated by the tool). I guess
just a dump of the states and transitions will be what we need like:
might result in:
On the other hand, this all seems like a lot of work to code even if really useful. Hmm... November 18, 2003[Medical doctors are always late by about 1 hour minimum, hence, I brought a notepad to his office...results herein] A primary goal is easy code gen for a variety of languages. This
necessarily requires a multi-phase translation approach so people can
jump in wherever they want in the code gen process. Here is what I'm
thinking
where ParseLang is some antlr parser intermediate form and NIL is some "neutral imperative language". NIL (just a random name now) would also be presumably the ANTLR grammar action language as well. Given a NIL translator to language L, you could translate actions as well as generate parses in L. If your language is sort of an imperative language, but perhaps with higher level recognition capabilities, you would probably generate code from the ParseLang stage as imperative method calls etc.. would be too low level. If you have a truly non-imperative language such as LISP or other functional language, you would branch off the grammar representation containing the lookahead analysis: NFA+DFA stage. ParseLang would look like the AST for a high-level language. The reason I don't go straight to NIL or Java for example is that ANTLR has to break apart some alternative lists due to lookahead and to insert tree actions. There should be an easy way to dump this from the internal grammar NFA representation. From there, it's a simple matter of translating to a imperative-language neutral form and then to final instruction selection. ParseLang would also be a convenient place to interpret ANTLR for prototyping. As an example, consider
The ParseLang tree might be
Note that there would be no infrastrutcure variable definition like AST pointers and labels etc... That would only appear in an actual output language or in the NIL format. From ParseLang you would translate the SELECT to a SWITCH in a NIL format etc... Using translation templates, you could even go straight to a programming language like this: #(SELECT #(CASE expr slist)) ==> "switch (LA(1)) <cases:{"case <expr> : <slist>; break;"> }" #(INVOKE ID #(ARGS (arg)+)) => "<ID>(<arg>);" Or, more likely, these output templates could be used as a convenient way to generate NIL trees by running the NIL parser over the output templates. This lets you use text as humans like to read as the output of a translation phase while still getting out a trees. :) Cute. From the NIL tree you could translate to any imperative language easily. I'm getting more and more fond of the notion that ANTLR will have a simple standard imperative language as it's action language format. This way all grammars are portable to all languages even with symbol table actions etc.... :) Woohoo! November 15, 2003Here is an upcoming paper by Bryan Ford to a conference that formally defines grammars (PEGs he calls them) that operate like ANTLR: where alternatives have an implied order. Nicely formalizes the strategy of predicated parsing etc... http://www.brynosaurus.com//pub/lang/peg.pdf Just as we have the syntactic predicate that says what the lookahead context must be, he has a "not predicate" that says what the lookahead cannot be. This is very useful for such things as saying "match this rule, but only if not following in context by this next thing." Really powerful. moreover, he has a parser that demonstrates linear backtracking parsing by use of dynamic programming. I will be studying his technique very closely before code gen for ANTLR. :) October 27, 2003One of the nice things about antlr is that there are multiple back ends. The problem is that you can't easily retarget a grammar if it has embedded actions. Since I'm going to have an intermediate "langauge" for auto-generation stuff like "build me some trees", perhaps I should make it visible to the users? I expected a simple intermediate thing that was of the form:
like
meaning (in C, C++, Java, C#):
and then the instruction selector would simply find the assign template for your language and dump the output like a sophisticated macro preprocessor. Now, consider that translating the actions of languages is pretty easy; it's the declarations that are hard. Well, what I mean is that looking at the actions in grammars I see simple things like IF, FOR, function call, assignment etc... What if I made a generic little language for these actions that could do most stuff you want assuming you do the declarations properly in your language outside the grammar? I mean, it's a drag not being able to reuse a grammar for C++ output just because it has an assignment action in C# in there. Whatever. I would really like to be able to have a grammar that built symbol table information, for example, that could be reused as-is in C++, C, Java, etc... It's just some pops, pushes, and function calls, right? ;) October 12, 2003My first antlr code-gen experiment working with templates. I.e., how to build output generators for ANTLR. I added a few actions to my antlr AST tree walker in order to make it create templates and fill them with data. The result is complete rule (or set of rules) in Java. :) I'm honing my understanding of what should be where in terms of MVC. I really think that we can do C, C++, Java, C# by mostly modifying some templates sitting on the disk. StringTemplate knows how to inherit templates at this point so you could just subclass Java's templates dir to implement C# as it differs. Heh heh heh. This only works for basic rule ref and token ref stuff and I faked the analysis just to test the templates. Verrrrry interesting. I'm officially back working on ANTLR 3. What do you think of the BLOG idea for ANTLR 3? Could someone with more time on their hands (read that the javacc guys) beat us to the punch with all of these cool ideas? General stuffGoal: make it as easy as possible to generate a new code-generator. To be able to do this w/o Java knowledge if possible; i.e., use templates. This may require an intermediate serialized form people can read in. General idea: Provide templates for common abstractions like program, procedure, assignment, switch, choice, add-child, create-node, and so on. Then, have a controller map AST construct to the appropriate StringTemplate just like PageDispatcher maps URLs to page body templates. Like the sample I did in OR, the tree grammar sometimes maps to a template (for "concepts" rather than intermediate rules) and sometimes just fills things in in the right order. Controller will have to publish a list of attributes it will fill in so that overall program templates can reference them like:
Controller can be rewritten for really bizarre (i.e., non algol-derivatives) or overridden. Each controller defines a set of templates it knows how to use and a set of attributes per template. May want to extend StringTemplate to allow group of templates in one file. LL-regular machine: They will have to code the LL-regular state machine natively however. Takes machine, token buffer as input and yields an integer alternative number or -1 if nothing matched. SAMPLES
Can they define a Java object somehow to handle the truly bizarre cases? A mapping to that instead of a template, that is? Interesting...must fill bottom up. Walk top-down until you hit the leaves and then start with the token refs etc... to build up more and more complicated structures. This implies that you don't really need to apply templates to attributes...that is trying to make the template walk. Make the controller fill in. Use template refs for formatting issues not walking downwards in the tree. In fact, it can't: you are not passing in the tree. Must separate the data types, structure of the AST from the template which is used to just spit code. So then the result of each conceptual chunk like atom is a StringTemplate. Same for alt:
Then moving upwards:
Then have to move template literals into file(s) so you can change the code-generator.
Questions: How to handle generating include files and others? With a second pass? |
|||||||||||||||||||||||||||||||||||||||||||||||