Home | Download | ANTLRWorks | Wiki | About ANTLR | Feedback | Support | Bugs | v2


Latest version is 3.0.1
Download now! »

Download
» Home
» Download
» ANTLRWorks
» News
»Using ANTLR
» Documentation
» FAQ
» Articles
» Grammars
» File Sharing
» Runtime API
» Tech Support
» Bug Tracking
»About ANTLR
» What is ANTLR
» Why use ANTLR
» Showcase
» Testimonials
» Getting Started
» Software License
» ANTLR WebLogs
» ANTLR Workshops
»StringTemplate
»ANTLR v2
»Feedback
»Credits
»Contact


Support StringTemplate, ANTLR Project by making a donation! Terence often pays for things like the antlr.org server, conference travel, and this site design (that alone cost US$1000). Buy him a beer and pizza remotely ;)

Search



Attribute grammar notes

RSS Feed

July 18, 2003

Seems that you collect attributes from a parse or parse tree, build dependency graph, topologically sort the graph (what does this mean?), and evaluate attribute statements in sorted order. I guess you find a single walk thru the attribute statements that doesn't break the data-flow dependencies.

Recall synthesized attributes (return values), inherited attributes (parameters), and ???

Example

X ::= Y Z X.s := new Node(Y.s, Z.s)

in yacc is

x : y z $$=new Node($1,$2); ;

Synthesized: value depends on value of attributes at child nodes (note sure how tight this is).

Inherited: value depends on values of parents/siblings.

Aho/Sethi/Ullman: for A->alpha b:=f(c1,...,ck)

  1. f is some function

2 either b is synthesized attribute of A; ci are attributes of A or alpha

3 or, b is inherited attribute of some symbol in alpha; ci are attributes of A and alpha.

Differences from our cabal ideas

Is there an assumption that n passes can occur? Our stuff would still assume one pass.

Attribute grammars don't seem to type the attributes. Context defines the type.

They seem to force saying rule.attribute instead of just attribute. We might allow scope override, but I'm not sure rule.attr is always right. Might be outputTemplate.attribute as scope override.

If they force rule.attr, it's like saying you can only use parameter names. You cannot just say "funcName" and have it set the most recent. You have to say incomingRule.funcName and keep passing stuff down. It doesn't seem I can reference an invoking rule--only the enclosing rule. Yep, confirmed can only use "local" info. people extend with reference attribute grammars...see below.

decl : type dtor  dtor.type = type.type ; // like setting parameter (dtor.type).

dtor : ID  ID.type = dtor.type ;

It doesn't seem like they can just access type.type directly whereas we can do this:

decl : type:=type dtor ;

dtor : ID => ID.type = type ;

where ID.type means tokens have attributes too! Interesting...then all tokens (or trees) are essentially heterogeneous. Perhaps we would define them above with

token {
	attributes;
}

AST {
	attributes;
}

to be explicit like a programming language not an attribute grammar, but then it blows the hetergeneity, right?

Seems you have to use parse trees to hold attributes as you use rules to "hold" the attributes. The rule nodes in the parse tree would have a hashtable of (id,value) pairs.

Constructing attribute dependency graphs

Each node in parse tree has attribute set.

  1. nodes of dependence graph are the occurrences of attributes
  2. arc from attr ref a to attr ref b if b depends on a in a rule.

Looks like a data-flow graph actually whereas in a dependence graph I expect b->a, but oh well.

Order of evaluation

  • If arc from a to b then value of as has to be computed before b.
  • well-formed if no cycles in dependency graph for any parse tree.
  • if no cycles then perform topological sort to find order (use Ian's algorithm).

Not efficient 'cause of sort. Preference is to do walk of parse tree depth first, but not always possible. They say ideally do parse/eval at same time if you can. L-attributed (no "forward" references) can be done this way obviously. S-attribute are like yacc "return value only" type attribute grammars.

Ouch. Seems like L-attributed are easy otherwise can try to seek more complex traversal or worst case use sorting. This magic is why people don't use it I suppose. You can't really decide how efficient it will be. I like the idea of giving programmers a single-pass mechanism and then let them decide how many phases to do!

Demand-driven attribute evaluation

All you have to do is make $foo be getFoo() and then have getFoo() be the computation for foo's rhs. In other words, getFoo() will call getBar() for every $bar reference on rhs. This may recompute stuff (i.e., nonlinear), but is guaranteed to terminate for non-circular graphs. Actually, can cache the value the first call to getFoo() making everything completely linear!

See pdf file in ~/Documents/languages or http://www.cs.lth.se/~gorel/AG99/DemEvalJava.pdf

DIY attribute grammars

You specify grammar plus operations to do in multiple phases. All the same except what is executed for each phase. Similar to what I do except you put everything into a single grammar spec.

Examples

Code generation

Note in this example, we need the synthesized attribute from E to be an aggregate. If a reference is unique, you can just reference it (i.e., <id>).

S : id ASSIGN code:=E => "<code> <id>=<reg>"} .
E : reg:=Atom (PLUS b:=Atom => {r1=newreg();} "<r1>=<reg>+<b.reg>")*
  ;

Atom
attributes {String reg} // hmm...can these be seen outside?
     :   ID  {reg=newreg();} => "load <reg>"
     |   INT {reg=INT.text}  // calls INT.getText()?
     ;

Email I sent to loring/monty

Yeah. I am almost done walking (pardon the pun) thru the attribute grammar stuff. Seems to me the big problem with them is that you can define assignments like a:=b at create a giant data-flow graph which you have to evaluate correctly. To correctly do this worst case requires a topological sort to eval in order. Antlr currently insists on the only single-pass (L-attributed) kind (args and return values). You cannot read a forward-defined value like this:

rule : r2 #{r3.foo} r3 ;  // at #{r3.foo}, r3 has not been executed yet.

So we are always going for this easy-to-understand single-pass concept and let programmers do multiple passes if necessary. I'm not really in favor of putting multiple pass actions within a single grammar and then saying "pass=2" or whatever to get the right mode.

I'm currently wrestling with attributes versus output template "holes" and how they relate. Are they the same? Do the holes simply read attribute values? Do attributes for rules / tokens need to even be manually defined or are they implicitly defined as lhs of an assignment? Does ID.type := type imply that tokens have attributes in addition to the overall definition of a token? Tokens are all heterogeneous then? Same for AST associated with the token?

Oh, attribute grammars don't seem as convenient as our "dynamic scoping" thing where we can ref value of "type" attribute way below the rule where it's set (and w/o the rule name). They seem to have to pass down and down until they hit the rule where you need "type" and then have to say enclosingRule.type. They also can only set the return value for a rule like enclosingRule.registerName := foo and then keep bubbling that back up. Ick! The advantage they have of course is that their names don't have to be that unique (like ID) whereas we could get a confused spec if we use ID everywhere.

In my examples so far it seems SO much better to allow referencing attributes of a token or rule if it's unique in the like this:

a : ID ASSIGN expr => "<ID> := <expr>" ;

Also we have the => "template" notion or perhaps => #(template) notion too, but what about returning multiple attributes rather than just a template or is the template just another attribute? For example, how do you return a bit of code and a register number from a rule when doing code gen? Can the template have attributes and we do this:

assign : ID ASSIGN expr => "<expr.code>\nstore <expr.reg>, <ID.text>" ;
atom : ID => reg = getRegister(), code = "load <reg>, <ID.text>" ;

Reference attribute grammars

Hedin started this stuff? 1999?

Second Workshop on Attribute Grammars and their Applications, WAGA'99, pages 153-172, Amsterdam, The Netherlands, March 1999. INRIA rocquencourt.

An extension to canonical attribute grammars is introduced, permitting attributes to be references to arbitrary nodes in the syntax tree, and attributes to be accessed via the reference attributes. Important practical problems such as name and type analysis for object-oriented languages can be expressed concisely in these grammars, and an optimal evaluation algorithm is available. The proposed formalism and algorithm have been implemented in an interactive language development tool.

http://www-rocq.inria.fr/oscar/www/fnc2/WAGA99/proceedings/hedin/hedin2.pdf

or see ~/Documents/languages/ for a paper on ref attr grammars.

Due to the references, static dependence graph is not possible. can still do an on-demand lazy eval to get properly sorted expr eval, however.

Weird though. They give an example of how to access a program's constants anywhere in the syntax tree by passing down the ref to everyone. Not that handy. use dynamic scoping to simply make attribute "programConstants" available everywhere.

Heh, do they expect a new object to be created for each alt? That must be how they get a virtual function to compute the stuff. Or, they do it on the node in the syntax tree. In oo implementations they have a class for each all and rule I think.

More stuff:

http://www.cs.lth.se/Research/ProgEnv/rags/index.shtml

Superset of attr grammars where you can reference arbitrary nodes in the AST. I guess this is the attribute thing we are thinking about via the dynamic scoping, right? Well, perhaps it's like Loring's AST label actions. Hmm...seems like they are really just allowing references from one ast to another. They started out wanting this for connecting subclasses to superclasses when doing OO analysis. not sure one needs an abstraction for this ;)

Summary

Attribute grammars allow implicitly defined attributes to be associated with any rule (or non-leaf node in a parse tree).

Attribute grammars allow computation of attributes in the local sense. Can define the return value and push parameters down into rule invocations and ask for return values. With dynamic scoping we can have non local refs.

Attributes have "value semantics" in that you can only have values not references to other attributes or nodes in a parse tree.

Use implies type of attribute.

Dependencies imply order of computation.

Attribute grammars are slow w/o on-demand lazy eval / caching. Hard to describe evaluation to programmers. I prefer a simple single-pass action execution model of computation/translation.

It seems that these attribute guys like to put everything needed into attributes. I prefer a nice separate symbol table object etc... I don't like mixing it all in and using complicated attribute assignments. Breaking out various entities like sym tab into separate data structure (use OO tech not random variables). Also breaking out things into phases is a good factorization. Attributes are associated with token and rule entities pretty much assuming you are going to build a tree if more than S-attributed.

Programmers understand order of operation as a fundamental principle from cs101. To break that and say "order is the correct order" rather than "the order procedures work", is a problem. Goal is to make programmers more productive so we have to give them what they understand. Making them do the topological sort manually is actually what they want, believe it or not. They are willing to make multiple passes to get correct overall behavior. They understand iterative refinement etc... They also don't like random spooky data structures that appear as a result of a parse (attribute trees) and don't like black boxes, which is why ANTLR is successful I'd say.

int x = y;
int y = 3;

Example that might blow minds:

decl : "var" ID {$ID.type=t;} COLON t=type ;

because of the weird set before defined issue.

Reference Attribute Grammars allow references to nodes, but still seem to force local references to keep pushing refs down/up parse tree or as parameters/return values in parser. These can be efficiently done with lazy/cached eval as well.

If you can store anything you want into a token node, do you make a wrapper or do you have many hetero token defs? What about a simmple TOkenStream interface? Can't be too weird or complicated (comm. with lexer).

July 15, 2003

Thoughts after cabal

If I have to allocated scopes of variables (i.e., attributes), why not provide that functionality generally for use in "raw" mode by users:

rule returns [int foo] attributes [int x, AST y, String z]
    :  stuff
    ;

then any foo:=element does an assignment to an attribute. We'd rquire functionality to set within an action, but not the syntax (to avoid $foo=labelOnElement; actually that ain't bad).

Then for trees or string output templates we'd have it auto define the attributes:

stats : (slist:=stat)+ => #(SLIST <slist>) ;

or

stats : (slist:=stat)+ => "{ <slist>}" ;

Note that the tree output template doesn't require a template class as you have all information before you create the tree; recall that we set all attributes and then construct the output item, whether it be tree, tokens, or text. :)

Example use of templates for C to Java decls

decl : type dtor ;

type : "int" | ID ;

dtor : ID ("[]")* ;

How to get "int a[][]" to become "int[][] a" given that stuff is spread across multiple rules and the order is diff?

decl : type dtor => "<t><array> <varid>" ;

type : t:="int" | t:=ID ;

dtor : varid:=ID (array:="[]")* ;

Ok, I don't like the name "t", but "type" is worse as it collides with the rulename. Damn! Got to be in the same space if you ask me. You don't like "t" particularly because it's not unique. Perhaps you use the rulename where the output template is defined as a scope override like decl.t:=ID? That would work except I'd need to track the name of the scope in Scope.java and then walk looking for it as it would not always be unique for nested structures. you'd have to find the most recent decl scope.

Further, don't forget that you may need syntax that access attributes with actions for set/write.

You may also need notation to access decl[0], the outermost decl scope or decl[-1], the previous decl scope. If decl[-1] is null, you know you are the outermost scope. INteresting, but must be done within an action I'd say. perhaps we will need syntax to set attributes like $foo:= and $foo. Ick; don't like translating antlr actions though.

Also, i don't like having to repeat the assignment here:

type : t:="int" | t:=ID ;

but, I don't want to create a template just to get a return template like this:

type : ("int" | ID) => "<LA(1)>" ;  // not it but you get the idea

Perhaps the answer is to support the notion that given no other translation info, each rule has access to the text or token stream matched for it. You could then do this:

decl : t:=type dtor => "<t.text><array> <varid>" ;

where t get the value(s) of type and then you can access the text or token stream or what about the tree? Well, a label might be better. We have $t and #t at the moment...perhaps we need another to access more info about the rule. Lots of people ask for the text matched in a rule etc... How does this fit with the idea that you can generate an output template (return value?) for a rule?