Please note that this article describes an initial implementation of the compiler. If you want to browse the code while reading the article, make sure that you have switched to branch dsl_v1.
In my toy compiler framework, a compiler (or codegen as it is called internally), is a piece of code that implements the ISimpleDSLCodegen interface. This interface exposes only one function, Generate, which takes an abstract syntax tree and converts it into an object implementing an ISimpleDSLProgram interface which allows you to call any function in a compiled program by name.
type
TParameters
=
TArray;
TFunctionCall
=
reference
to
function
(const
parameters:
TParameters):
integer;
ISimpleDSLProgram
=
interface
['{2B93BEE7-EF20-41F4-B599-4C28131D6655}']
function
Call(const
functionName:
string;
const
params:
TParameters;
var
return:
integer):
boolean;
end;
ISimpleDSLCodegen
=
interface
['{C359C174-E324-4709-86EF-EE61AFE3B1FD}']
function
Generate(const
ast:
ISimpleDSLAST;
var runnable: ISimpleDSLProgram): boolean;
var runnable: ISimpleDSLProgram): boolean;
end;
Default compiler is implemented by the TSimpleDSLCodegen class in unit SimpleDSLCompiler.Compiler. The methods in this class mostly deal with reading and understanding the AST while the actual code is created by methods in unit SimpleDSLCompiler.Compiler.Codegen.
This compiler creates a program which is an instance of the TSimpleDSLProgram class (also stored in SimpleDSLCompiler.Compiler).
The functioning of the compiler is very similar to the compiler presented in Intermezzo - with one critical difference. Expressions in my toy language can use function parameters as terms. Because of that, expression evaluator has to have access to the parameters of the current function.
The story starts in TSimpleDSLCodegen.Generate which for each function in the tree firstly compiles the function body (CompileBlock) and secondly generates the function wrapper for that body (CodegenFunction).
function
TSimpleDSLCodegen.Generate(const
ast:
ISimpleDSLAST;
var
runnable:
ISimpleDSLProgram):
boolean;
var
block
:
TStatement;
i
:
integer;
runnableInt:
ISimpleDSLProgramEx;
begin
Result
:=
false;
//to keep compiler happy
FAST
:=
ast;
runnable
:=
TSimpleDSLProgram.Create;
runnableInt
:=
runnable
as
ISimpleDSLProgramEx;
for
i
:=
0
to
ast.Functions.Count
-
1
do
begin
if
not
CompileBlock(ast.Functions[i].Body,
block)
then
Exit;
runnableInt.DeclareFunction(i,
ast.Functions[i].Name,
CodegenFunction(block));
end;
Result
:=
true;
end;
type
PExecContext
=
^TExecContext;
TExecContext
=
record
Functions:
TArray;
end;
TParameters = TArray
function
CodegenFunction(const
block:
TStatement):
TFunction;
begin
Result
:=
function
(execContext:
PExecContext;
const
params:
TParameters):
integer
var
context:
TContext;
begin
context.Exec
:=
execContext;
context.Params
:=
params;
context.Result
:=
0;
block(context);
Result
:=
context.Result;
end;
end;
Moving one level down ... Function TSimpleDSLCodegen.CompileBlock compiles each statement in the block by calling CompileStatement and then calls CodegenBlock to wrap compiled statements in a block.
function
CodegenBlock(const
statements:
TStatements):
TStatement;
begin
Result
:=
procedure
(var
context:
TContext)
var
stmt:
TStatement;
begin
for
stmt
in
statements
do
stmt(context);
end;
end;
This continues on and on. Most of the code is pretty dull and predictable. For example, this is the method which generates code for an if statement.
function
CodegenIfStatement(const
condition:
TExpression;
const
thenBlock,
elseBlock:
TStatement):
TStatement;
begin
Result
:=
procedure
(var
context:
TContext)
begin
if
condition(context)
<>
0
then
thenBlock(context)
else
elseBlock(context);
end;
end;
Things get interesting once we want to compile a term. A term can represent either an (integer) constant, a parameter (called variable in the codegen as in some future variables may get supported) or a function call.
function
TSimpleDSLCodegen.CompileTerm(const
astTerm:
IASTTerm;
var
codeTerm: TExpression):
boolean;
var
termConst
:
IASTTermConstant;
termFuncCall:
IASTTermFunctionCall;
termVar
:
IASTTermVariable;
begin
Result
:=
true;
if
Supports(astTerm,
IASTTermConstant,
termConst)
then
codeTerm
:=
CodegenConstant(termConst.Value)
else
if
Supports(astTerm,
IASTTermVariable,
termVar)
then
codeTerm
:=
CodegenVariable(termVar.VariableIdx)
else
if
Supports(astTerm,
IASTTermFunctionCall,
termFuncCall)
then
Result
:=
CompileFunctionCall(termFuncCall,
codeTerm)
else
Result
:=
SetError('***
Unexpected term');
end;
function
CodegenConstant(value:
integer):
TExpression;
begin
Result
:=
function
(var
context:
TContext):
integer
begin
Result
:=
value;
end;
end;
function
CodegenVariable(varIndex:
integer):
TExpression;
begin
Result
:=
function
(var
context:
TContext):
integer
begin
Result
:=
context.Params[varIndex];
end;
end;
function
CodegenFunctionCall(funcIndex:
integer;
const
params:
TFuncCallParams): TExpression;
begin
Result
:=
function
(var
context:
TContext):
integer
var
funcParams:
TParameters;
iParam
:
Integer;
begin
SetLength(funcParams,
Length(params));
for
iParam
:=
Low(params)
to
High(params)
do
funcParams[iParam]
:=
params[iParam](context);
Result
:=
context.Exec.Functions[funcIndex](context.Exec,
funcParams);
end;
end;
For example, this minimal program ...
inc(i) { return i+1 }
... generates something like the following monstrosity. In reality the code is even weirder as it has to handle captured variables.
function
(execContext:
PExecContext;
const
params:
TParameters):
integer
var
context:
TContext;
begin
context.Exec
:=
execContext;
context.Params
:=
params;
context.Result
:=
0;
(procedure
(var
context:
TContext)
var
stmt:
TStatement;
begin
for
stmt
in
[
procedure
(var
context:
TContext)
begin
context.Result
:=
(function
(var
context:
TContext):
integer
begin
Result
:=
(function
(var
context:
TContext):
integer
begin
Result
:=
context.Params[0];
end)(context)
+
(function
(var
context:
TContext):
integer
begin
Result
:=
1;
end)(context);
end)(context);
end
]
do
stmt(context);
end)(context);
Result
:=
context.Result;
end;
It certainly isn't appropriate for the faint of heart but - hey! - you don't have to look into the compiled code (unless you are debugging the compiler, of course).
You may wonder about the speed of such code. Not very fast, I must admit. I'll give you more specific numbers in the next instalment in this series which will describe an interpreter for this language.
No comments:
Post a Comment