Please note that this article describes an initial implementation of the AST. If you want to browse the code while reading the article, make sure that you have switched to branch dsl_v1.
An abstract syntax tree (AST) is, simply put, a symbolic representation of the program in a form of a tree.
While the textual representation of a program is great for us, humans, computers have hard time dealing with it. Because of that, a special part of any interpreter/compiler, called parser, reads the input and converts it into a computer-readable format - AST. This tree can then be used for multiple purposes. We can, for example, feed to into an interpreter which will then run the program for us, or we can feed it into a compiler to generate an executable program or cross-compiler to generate an equivalent program in a different programming language.
In reality, this process is typically even more complex. A parser usually uses a specialized input code called tokenizer to read the input and compiler usually doesn't create executable directly, but emits multiple files which are linked into a final program.
In the case of my toy compiler project, parser uses a separate tokenizer, while all output processes (such as compiler) generate the code without any additional steps (linking).
It is pretty obvious that the AST is central to the whole project and that's why I decided to present it before the tokenizer and the parser.
In my case (and let me remind you again that all the following description applies to branch dsl_v1), AST for a program starts with a very simple interface ISimpleDSLAST. (I'll be showing all types in a shortened forms without getters and setters.)
IASTFunctions
=
interface
function
Add(const
func:
IASTFunction):
integer;
function
Count:
integer;
function
IndexOf(const
name:
string):
integer;
property
Items[idxFunction:
integer]:
IASTFunction
read
GetItems;
default;
end;
{ IASTFunctions }
ISimpleDSLAST
=
interface
property
Functions:
IASTFunctions
read
GetFunctions;
end;
A program in "The Language" is nothing more than a collection of functions and that interface reflects that.
Each function has a name, a list of parameters, and a body.
TParameterList
=
TList<string>;
IASTFunction
=
interface
['{FA4F603A-FE89-40D4-8F96-5607E4EBE511}']
property
Name:
string
read
GetName
write
SetName;
property
ParamNames:
TParameterList
read
GetParamNames;
property
Body:
IASTBlock
read
GetBody
write
SetBody;
end;
A function body is nothing more than a collection of statements.
TStatementList
=
TList;
IASTBlock
=
interface
property
Statements:
TStatementList
read
GetStatements;
end;
A statement is either an if statement or a return statement. Nothing else.
TASTStatementType
=
(stIf,
stReturn);
IASTStatement
=
interface
end;
{ IASTStatement }
IASTStatement is just a parent interface for all statement interfaces and is never instantiated as such.
A return statement has only one part - an expression which has to be evaluated to calculate the value which is then returned as a function result.
IASTReturnStatement
=
interface(IASTStatement)
property
Expression:
IASTExpression
read
GetExpression
write
SetExpression;
end;
An if statement is more complicated. It has a condition (which is in itself an expression) followed by then and else blocks (and, as we already know, a block is just collection of statements and so on).
IASTIfStatement
=
interface(IASTStatement)
property
Condition:
IASTExpression
read
GetCondition
write
SetCondition;
property
ThenBlock:
IASTBlock
read
GetThenBlock
write
SetThenBlock;
property
ElseBlock:
IASTBlock
read
GetElseBlock
write
SetElseBlock;
end;
An expression can be either a simple term or a binary operation of two terms. Only three operations are supported at the moment: addition, subtraction, and comparison. "The Language" has no unary operands so you cannot write a statement such as return -3; but you have to use more convoluted form return 0-3;
TBinaryOperation
=
(opNone,
opAdd,
opSubtract,
opCompareLess);
IASTExpression
=
interface
property
Term1:
IASTTerm
read
GetTerm1
write
SetTerm1;
property
Term2:
IASTTerm
read
GetTerm2
write
SetTerm2;
property
BinaryOp:
TBinaryOperation
read
GetBinaryOp
write
SetBinaryOp;
end;
A term can be either a constant, a variable, or a function call.
TASTTermType
=
(termConstant,
termVariable,
termFunctionCall);
IASTTerm
=
interface
end;
Similarly to IASTStatement, an IASTTerm is a parent interface which is never instantiated.
A constant just contains a constant numeric value, calculated by the parser.
IASTTermConstant
=
interface(IASTTerm)
property
Value:
integer
read
GetValue
write
SetValue;
end;
A variable is actually a misnomer. The Language doesn't support variables as such. The code can only refer to a function parameter. As there is no support for nested functions, each parameter can be represented by its index in the parameter list (calculated again by the parser).
IASTTermVariable
=
interface(IASTTerm)
property
VariableIdx:
integer
read
GetVariableIdx
write
SetVariableIdx;
end;
And the last part of our AST, a function call contains an index (into the ISimpleDSLAST.Functions property) of the function we are calling and a list of parameters we are passing to the function. Each parameter in turn is an expression.
TExpressionList
=
TList;
IASTTermFunctionCall
=
interface(IASTTerm)
property
FunctionIdx:
integer
read
GetFunctionIdx
write
SetFunctionIdx;
property
Parameters:
TExpressionList
read
GetParameters;
end;
No comments:
Post a Comment