My quixotic quest to build my own programming language or die trying has lead my in a few different directions. In my last installment, I was going to bring my Python-based lexer and parser into the .NET world with IronPython. After pursuing that angle for awhile I soon grew tired of waiting 10 seconds every time I wanted to run a Python script under IronPython. Every time. I don't really know what was going on each time I ran "ipy" in a terminal, but it would take about 10 seconds to do it. Regular old CPython might take a second the first time you fire it up, but after that you can run scripts from the command line with negligible delay. The more I thought about it, the less I wanted to tie myself to the DLR if it was going to be such a hog. I instead ported my lexer and parser back to Flex and Bison C-based code.
Here is the lexer. The syntax highlighter doesn't really handle flex/lex syntax, so it might be pretty bad.
%option noyywrap
%{
#include "gc/gc.h"
#include "simple.tab.h"
#include "simple.h"
%}
ID [a-zA-Z_][a-zA-Z0-9_]*
QUOTE \"[^\"\\]*(?:\\.[^\"\\]*)*\"
%%
-?[0-9]+ { yylval.i = atoi(yytext); return(NUMBER); }
{QUOTE} { yylval.c = GC_strdup(yytext); return(STRING); }
print { return(PRINT); }
{ID} { yylval.c = GC_strdup(yytext); return(getkeyword(yytext,IDENTIFIER));}
[ \t\n]+ /* blank, tab, new line: eat up whitespace */
-> { return(ARROW); }
\. { return(DOT); }
\* { return(STAR); }
\/ { return(SLASH); }
\+ { return(PLUS); }
\- { return(MINUS); }
\( { return(LPAREN); }
\) { return(RPAREN); }
\{ { return(LCURLY); }
\} { return(RCURLY); }
= { return(EQUALS); }
%%
And the Bison parser.
%{
#include
#include
#include "simple.h"
// #include "block.h"
void yyerror(const char *str)
{
fprintf(stderr,"error: %s\n",str);
}
%}
%union {
struct node * n;
int i;
const char * c;
}
%token <i> NUMBER
%token <c> STRING IDENTIFIER
%token PRINT FUNC IF ELIF ELSE FOR IN
%token EQUALS STAR SLASH PLUS MINUS LPAREN RPAREN LCURLY RCURLY ARROW DOT
%type <n> script expression explist block ifexp identlist
%right EQUALS
%left PLUS MINUS
%left STAR SLASH
%left DOT
%%
script : explist
{ scriptnode = $1 }
;
explist : /* empty */
{ $$ = newlistnode(nt_EXPLIST); }
| explist expression
{ listnode_add($1,$2); $$ = $1; }
;
expression : NUMBER
{ $$ = newnumber($1); }
| STRING
{ $$ = newstring($1); }
| IDENTIFIER
{ $$ = newident($1); }
| expression STAR expression
{ $$ = newnode(nt_STAR,$1,$3); }
| expression SLASH expression
{ $$ = newnode(nt_SLASH,$1,$3); }
| expression PLUS expression
{ $$ = newnode(nt_PLUS,$1,$3); }
| expression MINUS expression
{ $$ = newnode(nt_MINUS,$1,$3); }
| expression EQUALS expression
{ $$ = newnode(nt_EQUALS,$1,$3); }
| LPAREN expression RPAREN
{ $$ = $2 }
| FUNC LPAREN identlist RPAREN block
{ $$ = newnode(nt_FUNC,$3,$5); }
| expression DOT LPAREN explist RPAREN
{ $$ = newnode(nt_FUNCCALL,$1,$4) }
| PRINT DOT LPAREN explist RPAREN
{ $$ = newnode(nt_PRINT,$4,NULL) }
| ifexp ELSE block
{ $$ = newnode(nt_ELSE, $1, $3); }
| ifexp
{ $$ = $1; }
| FOR identlist IN expression block
{ $$ = newnode(nt_FOR, newnode(nt_FOR_ITER, $2, $4), $5); }
;
identlist : /* empty */
{ $$ = newlistnode(nt_IDENTLIST); }
| identlist IDENTIFIER
{ listnode_add($1,newident($2)); $$ = $1; }
;
block : LCURLY explist RCURLY
{ $$ = newnode(nt_BLOCK, $2, NULL); }
;
ifexp : IF expression block
{ $$ = newnode(nt_IF, $2, $3); }
| ifexp ELIF expression block
{ $$ = newnode(nt_ELIF, $1, newnode(nt_IF,$3, $4)); }
;
There is a bunch of other C code that this parser is calling to build nodes in the syntax tree, mainly with the newnode() function which looks like this:
simplenode nn()
{
simplenode n;
n = GC_MALLOC(sizeof(struct node));
n->L = NULL;
n->R = NULL;
return n;
}
simplenode newnode(int type,simplenode L, simplenode R)
{
simplenode n = nn();
n->type = type;
n->L = L;
n->R = R;
return n;
}
There is of course a bunch of other code, but honestly, it is a bit tiring to do all this tree manipulation in C. It's a fine language, but a bit verbose and fiddly, what with all the pointer madness, which I've attempted to simplify as much as possible. In the preceding code, the simplenode type is really just a typedef of struct node *. And this is what a node really is:
struct node
{
int type;
union {
simplenode L;
Array arr;
int number;
simplestring string;
};
union {
simplenode R;
Block block;
};
};
As you can see, I really just want a node to be a dynamic container that can hold other tree nodes, or data based on type, but then I feel like I am trying to turn C into something it isn't. I'm declaring types so I can use them, but it turns out that very little is enforced at compile time but more importantly run-time. I like my dynamic languages to have strong typing so when the code is running an integer isn't happily treated as a pointer, corrupting some random memory leading to hard to reproduce bugs. I have found some useful tools.
One of those tools is gdb the Gnu Debugger. As long as I compile with -g, I can set breakpoints or run the code from gdb and see what is happening when the program segfaults. Development of this code has lead to a lot of segfaults. It's a painful development style. I guess it stopped being fun, so I've set this whole thing aside to take a look at other code. That lead me to another runtime that is already doing something very similar, Lua. The Lua runtime is dynamic with strong typing, written in portable C, fast, as is a pretty decent language, with first class functions and other nice features. It has a lot of moving parts, like it's own lexer, parser, and garbage collection. I had already punted on all three, utilizing flex, bison, and Boehm GC.
Then, something else happened. I remembered all those unbelievably over-confident Lisp guys, even Paul Graham raves about Lisp. I remembered vaguely that it had garbage collection and bunch of other things decades before they became used in mainstream programming. I thought I had better see what all the fuss is about before I go off and try to build my own programming language. Well, I got more than I bargained for. Happily, all you need is an internet connection and a computer to find some highly recommended books on Lisp. I am currently working my way through Practical Common Lisp by Peter Seibel. The complete text is available on Seibel's webpage. Neat, huh?
I am about 9 chapters in and starting to get a feel for the language, but it hasn't been easy. I have this to say about Lisp before I say anything else, it is not very approachable. Even though Seibel is a great writer (loved Coders at Work, I own an ink-on-wood-pulp version), it takes intense concentration to understand what the heck is going on. Lisp is different. It's different in the way that animal species on Australia are different. It's like it grew up as a language in isolation from other species of language. The genetic lines are so different between Lisp and the Algol/C family of languages that I feel intensely uncomfortable at times. I have to stop, go back and reread a section when I get to certain code samples, or sit and stare and try to puzzle out what is going on in the code.
Most blobs of code are basically prose. You can read through them and get a feel for what is happening quickly, but Lisp is more like poetry. There is a lot of meaning packed into each line and symbol. The ability to stack up abstractions is mind boggling. The Common Lisp flavor of Lisp, which is basically a standardization of a bunch of production Lisps is not a pure functional language. It is far from it, though it does have a lot of functional bits and with some restraint can be used like one might use a functional language. It's data structures are all very mutable. Lisp allows a variety of programming styles and paradigms. The crazy syntax is just a generic nested list (tree) syntax making the data and code pretty much the same thing. This has some heavy implications for meta-programming. I'll leave it at that for now. I'm going to keep investigating this crazy Lisp world for awhile before I do anything else.

