As a rather pathetic example, consider the JavaScript code below:
var x = 23; var y = 10; var op = usePlus ? "+" : "-"; var z = eval(x + " " + op + " " + y);
Obviously the result is an integer and the eval call has no side-effects, but how can a program analysis conclude this? String analysis of the eval contents may deduce that the argument belongs to the language [1-9][0-9]* (+|-) [1-9][0-9]*. Using abstract lexing with JavaScript's lexical structure, we can deduce that the tokenization of that string is always NUMBER (MINUS|PLUS) NUMBER after whitespace truncation. The language of token strings is represented using a two-level automaton, a construct similar to a finite-state automaton that can represent tokens both by their name and the concrete text they are made of.
From this point, hopefully an abstract parser can create some kind of syntax graph representing the syntax trees that may result from parsing such a language of token strings. From this syntax graph, we should be able to perform abstract type-checking and start to reason about the behaviour of the eval-statement, and the type of the returned value.
I admit the example above isn't particularly motivating, but I don't like long posts. Information about the string analysis preceeding the lexical analysis can be found at the JSA website.
No comments:
Post a Comment