Home | Download | ANTLRWorks | Wiki | About ANTLR | Feedback | Support | Bugs | v2


Latest version is 3.0.1
Download now! »

Download
» Home
» Download
» ANTLRWorks
» News
»Using ANTLR
» Documentation
» FAQ
» Articles
» Grammars
» File Sharing
» Runtime API
» Tech Support
» Bug Tracking
»About ANTLR
» What is ANTLR
» Why use ANTLR
» Showcase
» Testimonials
» Getting Started
» Software License
» ANTLR WebLogs
» ANTLR Workshops
»StringTemplate
»ANTLR v2
»Feedback
»Credits
»Contact


Support StringTemplate, ANTLR Project by making a donation! Terence often pays for things like the antlr.org server, conference travel, and this site design (that alone cost US$1000). Buy him a beer and pizza remotely ;)

Search



Semantic Predicate Hoisting

RSS Feed

March 15, 2004

Conversation with smart guy Tom Moog (maintainer of PCCTS) about predicates.

Tom wrote about a combined "virtual token" that incorporates token + predicate:

#token TypeName "[a-z]+" && isTypeName(self)

Ter said:

Interesting...i like the syntax :) Can we get both using existing or enhanced version of hoisting to get this? Or is it hard to know when you want it to add the context guard? Perhaps we check to see if they use LA(i)...wait...even for the && style above, you need to either have the guard or have the predicate evaluated as the token is matched and it just uses getText() instead of lookahead. So these are done in the parser so you can use the parser's info, but they are applied to a single token, creating a new kind of token so you don't need a guard all over?

Tom spake:

Right. A guard makes someone write the test manually anywhere that a virtual token might be of interest. By combining the lexical and the semantic into a definition it becomes semi-automatic because the user just treats TypeName like any other kind of token. More importantly, it allows the error routines to be aware of the semantic aspect. So instead of getting the message "expected ID found ID" it could say, "expected TypeName found ID". Thinking about this a little more I realize that this might not be so usefl as the same think could be accomplished more easily by changing the lexical routine to return different tokens.

Ter:

True. Unless the predicate requires prior context information as well like "in a table" or something.

Tom's Note #143:

#143. Use #pred statement to describe the logical relationship of related predicates A problem with predicates is that each one is regarded as unique and capable of disambiguating cases where two alternatives have identical lookahead. For example, consider:

< rule : <<pred(LATEXT(1))>>? A | <<pred(LATEXT(1))>>? A ; >>

Even though the two alternatives are identical, this will not cause any error messages or warnings to be issued by the original versions of PCCTS. One could compare the text of the predicate, but this is not a robust or satisfactory solution.

The #pred statement allows one to give a symbolic name to a "predicate literal" or a "predicate expression" in order to refer to it in other predicate expressions or in the rules of the grammar.

The predicate literal associated with a predicate symbol is C or C++ code which can be used to test the condition. A predicate expression defines a predicate symbol in terms of other predicate symbols using "!", "&&", and "||". A predicate symbol can be defined in terms of a predicate literal, a predicate expression, or both.

When a predicate symbol is defined with both a predicate literal and a predicate expression, the predicate literal is used to generate code, but the predicate expression is used to check for two alternatives with identical predicates in both alternatives.

Here are some examples of #pred statements:

#pred  IsLabel       <<isLabel(LATEXT(1))>>?
#pred  IsLocalVar    <<isLocalVar(LATEXT(1))>>?
#pred  IsGlobalVar   <<isGlobalVar(LATEXT(1)>>?
#pred  IsVar         <<isVar(LATEXT(1))>>?  IsLocalVar || IsGlobalVar 
#pred  IsScoped      <<isScoped(LATEXT(1))>>?    IsLabel || IsLocalVar

The predicate IsLocalVar is related to IsGlobalVar (See the definition
of IsVar).  The #pred attempts to capture this for use in analyzing
the predicates appearing that appear in prediction expressions.

#### May 2003

This seemingly simple grammar

<<
a   :   var
    |   ID
    ;

var :   {isFOO(LATEXT(1))}? ID
    |   {isBAR(LATEXT(1))}? INT
    |   FLOAT
    ;

results in a complicated rule 'a' decision (from PCCTS output):

        if ( (LA(1)==ID || LA(1)==INT ||
        LA(1)==FLOAT)&&(!(((LA(1)==ID))||((LA(1)==INT)))||
(((((LA(1)==ID)))&&(isFOO(LATEXT(1))))||((((LA(1)==INT)))&&(isBAR(LATEXT(1))))))
        ) {
        var();
    }    
    else {    
        if ( (LA(1)==ID) ) {
            zzmatch(ID); zzCONSUME;
        }
        else
        {zzFAIL(1,zzerr1,&zzMissSet,&zzMissText,&zzBadTok,&zzBadText,&zzErrk);
        goto fail;}
    }

For the life of me, I can't remember the principles ;) I know you must AND and OR predicates together if you see more than one and you must grab their lookahead. That IF looks like:

if ( lookahead predicates var &&
     (lookahead does not predict either predicated alt ||
      (lookahead predicates isFOO && isFOO || lookahead predicts isBAR
     && isBAR) ) ) {
   ...
}

Another problem from Tom Moogs notes:

#117. (Changed in 1.33MR10) new EXPERIMENTAL predicate hoisting code

      The hoisting of predicates into rules to create prediction
      expressions is a problem in antlr.  Consider the following
      example (k=1 with -prc on):

        start   : (a)* "@" ;
        a       : b | c ;
        b       : <<isUpper(LATEXT(1))> >? A ;
        c       : A ;

      Prior to 1.33MR10 the code generated for "start" would resemble:

        while {
            if (LA(1)==A &&
                    (!LA(1)==A || isUpper())) {
              a();
            }
        };

      This code is wrong because it makes rule "c" unreachable from
      "start".  The essence of the problem is that antlr fails to
      recognize that there can be a valid alternative within "a" even
      when the predicate <<isUpper(LATEXT(1))> >? is false.

      In 1.33MR10 with -mrhoist the hoisting of the predicate into
      "start" is suppressed because it recognizes that "c" can
      cover all the cases where the predicate is false:

        while {
            if (LA(1)==A) {
              a();
            }
        };