Suppose we are using this simple grammar for arithmetic expressions:
Tokens: identifier = ['a'..'z']+; integer_literal = ['0'..'9']+; plus = '+'; minus = '-'; star = '*'; div = '/'; lparen = '('; rparen = ')'; Productions: exp = plusexp; plusexp = plusexp plus mulexp | plusexp minus mulexp | mulexp; mulexp = mulexp star unexp | mulexp div unexp | unexp; unexp = minus unexp | primary; primary = identifier | lparen exp rparen | integer_literal;
Now let the possible inputs to our language be given by the regular expression: a * (5|(6 + b)) (where the bold-faced symbols are control characters).
Tokenizing the expression with the abstract lexer gives us the automaton:
Feeding that to the abstract parser now yields the – rather inaccurate – graph of parse trees:
Okay so you probably can't make out the text on the nodes due the image being kinda small here on BlogSpot – but anyway, the gray nodes are tokens, the white boxes are productions, and the diamond-shaped nodes are choices.
Unfortunately the abstract stack representation is much too weak to create a precise syntax graph. The graph above can unfold to the correct syntax tree, but also the parse trees corresponding to "a", "6 + b", "6 + a", and just "6". This will obviously do us no good in the long run.
Once I manage to fix the precision issues, there is still the hurdle of performing the abstract equivalent of converting a parse tree to an abstract syntax tree. What can I call that process? Abstract, abstract syntax tree construction? Yikes.
No comments:
Post a Comment