Category > geekstuff
Posted by postfuturist on 2010-04-01 22:14:02
As you may remember from Part 1 of this series, I have embarked on the task of creating my own programming language. I wrote a lexer and parser for my theoretical language in Python using PLY. That is all well and good. The next part I need is a good runtime. Here are some of my options:
- Interpret the parse tree in Python.
- Generate machine code and build native executables.
- Target LLVM.
- Generate C code to be compiled by GCC.
- Target the JVM.
- Target the .NET/Mono CLR.
- Build it with Parrot.
- Build it with the Dynamic Language Runtime which sits on top of the CLR.
There are plenty of other options. I've already implemented a subset of the language directly in Python, the first one. I don't really want to spend a lot of time doing that because being interpreted inside Python is going to be laughably slow. I still might, as a reference implementation, but it doesn't sound like much fun. I don't really want to build a language that compiles natively, I want the language to be more dynamic than that. The best options look like JVM (weak dynamic language support, still), Parrot (a bit of a moving target), and the DLR. Now, I'm not a huge Microsoft fan, but they have certainly put a lot of effort into the .NET framework, and the DLR, to their credit, seems to be a fantastic bit of engineering. Since it is open source and available in the repos of some bleeding-edge Linux distros (I'm using Ubuntu 10.04 beta), I decided to go that route, or at least try.
I need to build my parse tree in some .NET language. Fortunately, IronPython exists to run my Python code inside the Mono environment. Unfortunately, PLY chokes when run on IronPython. I tried debugging the issue, and once I got past one, I found another. So, I've had to start over a bit. The first thing I did was rewrite the lexer in a subset of Python that also runs on CPython as well as IronPython. Here it is:
# lexer2.py | a simple lexer in python
import re
# Token class. This class is stolen from PLY
class LexToken(object):
def __str__(self):
return "LexToken(%s,%r,%d,%d)" % (self.type,self.value,self.lineno,self.lexpos)
def __repr__(self):
return str(self)
class lexer():
def __init__(self,input):
self.input = input
self.pos = 0
self.input_length = len(input)
self.lineno = 0
self.charno = 0
self.reserved = {
'print': 'PRINT',
'func': 'FUNC',
'if': 'IF',
'elif': 'ELIF',
'else': 'ELSE',
'for': 'FOR',
'in': 'IN',
}
self.symbols = [
('->', 'ARROW'),
('=', 'EQUALS'),
('*', 'STAR'),
('/', 'SLASH'),
('+', 'PLUS'),
('-', 'MINUS'),
('(', 'LPAREN'),
(')', 'RPAREN'),
('{', 'LCURLY'),
('}', 'RCURLY'),
]
self.complex = [
(r'[a-zA-Z_][a-zA-Z0-9_]*', 'IDENTIFIER'),
(r'"[^"]*"', 'STRING'),
(r'-?\d+', 'NUMBER'),
]
self.tokens = self.reserved.values() + [x[1] for x in self.symbols + self.complex]
self.complex_compiled = [(re.compile(r), t) for r,t in self.complex]
self.symbols_compiled = [(s,t,len(s)) for s,t in self.symbols]
self.ignore = " \t\n"
def getTokens(self):
return self.tokens
def token(self):
# skip ignored characters
while(self.pos < self.input_length and self.input[self.pos] in self.ignore):
if(self.input[self.pos] == "\n"):
self.lineno += 1
self.charno = 0
else:
self.charno += 1
self.pos += 1
if(self.pos == self.input_length):
return None
for cr,ttype in self.complex_compiled:
mo = cr.match(self.input, self.pos)
if(mo):
value = mo.group()
pvalue, pttype = self.process(value, ttype)
tok = LexToken()
tok.value = pvalue
tok.type = pttype
tok.lineno = self.lineno
tok.lexpos = self.pos
self.pos += len(value)
self.charno += len(value)
return tok
for sym,ttype,length in self.symbols_compiled:
if self.pos + length < self.input_length and self.input[self.pos:self.pos + length] == sym:
tok = LexToken()
tok.value = sym
tok.type = ttype
tok.lineno = self.lineno
tok.lexpos = self.pos
self.pos += length
self.charno += length
return tok
print "invalid character : ", self.input[self.pos], " at ", self.lineno, " , " , self.charno
return None
def process(self, text, ttype):
if(ttype == 'IDENTIFIER'):
ttype = self.reserved.get(text, 'IDENTIFIER')
elif(ttype == 'NUMBER'):
text = int(text)
elif(ttype == 'STRING'):
text = text[1:-1]
return text, ttype
if __name__ == "__main__":
data = '''
a = 3 + 4 * 10
b = a + -20 *2
{ print(300) }
c = "hello"
print(b)
there->(now)
'''
# Give the lexer some input
lex = lexer(data)
# Tokenize
while True:
tok = lex.token()
if not tok: break # No more input
print tok
It's a pretty simple lexer, but it produces the same exact output as the last one, so its good enough for now. I've had my head buried in compiler books and articles for the last few days. Parsing is more complex, so stay tuned for that.
Posted by postfuturist on 2010-03-24 23:55:37
In an earlier blog post I encouraged folks to write their own programming language for fun and education. It may not end up being useful, but you'll probably have fun and almost certainly learn something along the way. That said, I've been making my own baby-steps towards creating a programming language of my own. I don't have a name for it yet (other than "simple," because that's what it is) but it is coming along. And, yes, this is the fun part, designing the syntax. A warning for all the ninja hackers who write DSL's in their sleep: this is noob territory, so don't expect any new theories on tracing JIT's or whatever is hip in the world of programming language nerds.
I'm using the terrific PLY, a Python implementation of Lex and Yacc. At first I started with Flex and Bison, the spiritual successors to Lex and Yacc, but I got bogged down with picking the right C data structures and related memory management. What a waste of time. The Python workflow is much faster. I ported my Flex and Bison code to PLY in an evening and then added code which builds a simple AST and code to "run" a sequence of expressions--all in the same evening. I am not really compiling anything, per se. I am just building an abstract syntax tree and then interpreting the tree. There is no machine code or even byte-code. Since it is Python interpreting the code, it is probably a couple orders of magnitude slower than would be reasonable for a production programming language. But it is a toy language and I'm just having fun, so here goes.
This is simplelex.py, the scanner (or lexer) which merely interprets a string of characters and emits a string of tokens based mostly on some regular expressions. Nothing here is new, most of it is straight out of the PLY documentation:
# simplelex.ply | a simple lexer in python using ply
import ply.lex as lex
reserved = {
'print' : 'PRINT',
'func' : 'FUNC',
}
tokens = [
'IDENTIFIER',
'STRING',
'NUMBER',
'EQUALS',
'STAR',
'SLASH',
'PLUS',
'LPAREN',
'RPAREN',
'LCURLY',
'RCURLY',
] + list(reserved.values())
t_EQUALS = r'='
t_STAR = r'\*'
t_SLASH = r'/'
t_PLUS = r'\+'
t_LPAREN = r'\('
t_RPAREN = r'\)'
t_LCURLY = r'\{'
t_RCURLY = r'\}'
def t_STRING(t):
r'"[^"]*"'
t.value = t.value[1:-1]
return t
def t_IDENTIFIER(t):
r'[a-zA-Z_][a-zA-Z0-9_]*'
t.type = reserved.get(t.value,'IDENTIFIER')
return t
def t_NUMBER(t):
r'-?\d+'
t.value = int(t.value)
return t
def t_newline(t):
r'\n+'
t.lexer.lineno += len(t.value)
t_ignore = ' \t'
def t_error(t):
print "Illegal character '%s'" % t.value[0]
t.lexer.skip(1)
lexer = lex.lex()
OK, I'm using some pretty naive string matching and silly naming of "*" and "/" operators, for fun. Remember, programming language design is fun. If you don't know what's going on in any of this code, look at the relevant PLY docs (the section on lexing) and it will all make sense. Basically, you can define tokens with a simple regular expression or a function that starts with a regular expression and can munge the input a bit before spitting it out, like stripping the double quotes off of a string literal.
Step 2 is a little more interesting, thought still relatively trivial. That's the parsing stage. I'll give one to you in chunks:
# simpleyacc.py | a simple parser in Python using PLY
import ply.yacc as yacc
from simplelex import tokens
precedence = (
('right', 'EQUALS'),
('left', 'PLUS'),
('left', 'STAR', 'SLASH'),
)
def p_error(e):
print('error: %s'%e)
def p_explist(p):
'''explist :
| explist expression'''
if(len(p) < 3):
p[0] = ('EXPLIST', [])
else:
p[1][1].append(p[2])
p[0] = p[1]
The parser needs to know about the tokens defined by the lexer, so those get pulled in from simplelex.py. The precedence bit gives PLY some knowledge about the binary operators (the ones that operate with two values). They are listed from lowest to highest priority. Enough about that.
The "explist" function defines the top-level nonterminal. A file or string of text will essentially be reduced to a list of expression. In this example, they are formed left recursively, starting with the empty set and adding expressions on to the right. The doc-string for the function is basically Yacc/Bison-style BNF. The function itself gets passed an array (or indexable sequence of some sort) which represents the bits of the rule. p[0] always represents the left side of the rule, and you can assign any value to it that you like, it is what the parser will return. For this code, I just am having the parser rules return tuples where the first element is a name and the other parts are the important pieces of the code. In this first rule, this code either constructs a tuple with an empty list, or adds the expression to the existing explist and assigns that as the value of the outer explist.
What's interesting about this is that I am not defining the need for any kind of delimiter between expressions (save whitespace). I may retract this later, requiring new-lines (which will have to be reported instead of ignored by the lexer) or semicolons if you want to stack multiple expression on the same line. That's a decision for later, right now it is perfectly legal to do something like :
a = 10 print(a).
I don't use the term "statement" to describe expressions. I may recant this later, too, but I honestly don't see the difference. It makes things simpler if everything has a value, even if that value is null. Here is some more of the simpleyacc.py file:
def p_expression_number(p):
'expression : NUMBER'
p[0] = ('NUMBER', p[1])
def p_expression_string(p):
'expression : STRING'
p[0] = ('STRING', p[1])
def p_expression_identifier(p):
'expression : IDENTIFIER'
p[0] = ('IDENTIFIER', p[1])
def p_expression_binaryop(p):
'''expression : expression STAR expression
| expression SLASH expression
| expression PLUS expression
| expression EQUALS expression'''
if(p[2] == '*'):
p[0] = ('STAR', p[1], p[3])
elif(p[2] == '/'):
p[0] = ('SLASH', p[1], p[3])
elif(p[2] == '+'):
p[0] = ('PLUS', p[1], p[3])
elif(p[2] == '='):
p[0] = ('EQUALS', p[1], p[3])
def p_expression_print(p):
'expression : PRINT LPAREN expression RPAREN'
if(p[1] == 'print'):
p[0] = ('PRINT', p[3])
def p_expression_parenthesized(p):
'expression : LPAREN expression RPAREN'
p[0] = p[2]
All of these rules create the same thing, expressions. There is plenty of recursion going on, as expressions are defined in terms of other expressions as well as terminals (tokens from the lexer). It's a lot of BNF and constructing a tree of tuples. So far so good, this code compiles and let's see the output from some input code. Here's the input:
a = 10 + 2
b = a * 3 print(b)
c = (b + 10) * 3
and this is the parse tree that is constructed:
('EXPLIST',
[('EQUALS',
('IDENTIFIER', 'a'),
('PLUS', ('NUMBER', 10), ('NUMBER', 2))),
('EQUALS',
('IDENTIFIER', 'b'),
('STAR', ('IDENTIFIER', 'a'), ('NUMBER', 3))),
('PRINT', ('IDENTIFIER', 'b')),
('EQUALS',
('IDENTIFIER', 'c'),
('STAR',
('PLUS', ('IDENTIFIER', 'b'), ('NUMBER', 10)),
('NUMBER', 3)))
]
)
The code to "run" this AST is pretty straight forward, iterate through expressions of the "explist" and get the values recursively, storing variables in a dictionary. That code will appear in later installments of this series when it is in a less embarrassing state. Let's look at some other things we can create with the parser and a problem I am having with my syntax.
When I got to this point in my code, I realized that I wanted to add functions. The astute reader would have already noticed the 'LCURLY', 'RCURLY' and 'FUNC' tokens in the lexer that aren't used by the parser. I don't want C-style, stand-alone functions, I want functions to be first class. I want them to be expressions, too. So I came up with this pretty straight-forward syntax:
f = func (x y) { z = 10 print(x + y + z) }
In this example, "func" is a reserved keyword kind of like "function" in JavaScript or "lambda" in Python. It is followed by a list of identifiers inside of parentheses which is followed by a curly-brace delimited block which contains a list of expressions. Here's the code for that:
def p_identlist(p):
'''identlist :
| identlist IDENTIFIER'''
if(len(p) < 3):
p[0] = ('IDENTLIST', [])
else:
p[1][1].append(p[2])
p[0] = p[1]
def p_expression_block(p):
'block : LCURLY explist RCURLY'
p[0] = ('BLOCK', p[2])
def p_expression_function(p):
'expression : FUNC LPAREN identlist RPAREN block'
p[0] = ('FUNC', p[3], p[5])
The "block" could have just been part of the definition of function expression, but I made it separate as there is more than one use for a block, and I'll probably need it later. Also, if I want to complicate my life I may replace the curly brace syntax with a Python-style indentation system, or make it optional. Here's the parse tree for the trivial case (func () {}) :
('EXPLIST', [('FUNC', ('IDENTLIST', []), ('BLOCK', ('EXPLIST', [])))])
Here's where my initial plan shows some cracks. Let's say I add the following rule to add function calls to the language syntax:
def p_expression_func_call(p):
'expression : IDENTIFIER LPAREN explist RPAREN'
p[0] = ('FUNCCALL', p[1], p[3])
PLY complains: Generating LALR tables
WARNING: 1 shift/reduce conflict
The problem is obvious: ambiguous grammar. Right now this statement can be parsed two ways: f (x) As a single function call or two separate expressions, one of which is just parenthesized. One obvious solution is to have something that separates different expressions in an expression list other than just white space, like a new line or semicolon, that is how Python solves this dilemma. I can think of other solutions:
- Use something other than parentheses to delimit function args :
f<x>
- Make a requirement that separate expressions have to have some white space between them :
f(x) vs. f (x).
- Some extra symbol or keyword for function calls :
f->(x) or f.call(x).
I'll go with the arrow "->" operator. I'll leave adding that symbol to the lexer as an exercise for the reader. Here's the parse rule, redone:
def p_expression_func_call(p):
'expression : expression ARROW LPAREN explist RPAREN'
p[0] = ('FUNCCALL', p[1], p[4])
In addition to the new syntax, which is unambiguous, I changed it so you can make a function call from any expression, since any expression may contain the value of a function. That means that this:
f = func (x) { print(x) } f->("Hello, World")
can be replaced with this:
func (x) { print(x) } -> ("Hello, World")
I really need to change the "print" builtin for consistency to this:
func (x) { print->(x) } -> ("Hello, World")
This is all good fun, however syntax is just the surface of a language. If Python had a C-like syntax on the surface, I'd still use it. It would still be duck typed, with powerful, built-in data structures, garbage collection, and a strong standard library. With Python you come for the syntax and stay for everything else. I don't think PHP is a poorer language because it is C-like or uses dollar signs at the beginning of variable names. I think it is poorer because of the weak typing, inconsistencies, and gotchas that permeate the language.
The programming language I am creating needs control flow, a type system, a runtime of some sort, and a basic library of essential functionality. Stay tuned for more on this.
Posted by postfuturist on 2010-02-15 02:10:26
Once upon a time I answered a question on LinuxQuestions, a slow, ugly website that uses vBulletin--your average, run-of-the-mill, commercial, PHP-based forum software. I couldn't just answer the question, I had to create an account, and give them an email address. Without warning, they started sending me weekly emails, reminding me how awesome they are, what's "happening" with the site and why I should return. Well, I've never been back to the site. There are a few reasons why I haven't been back to the site.
- It's ugly.
- It requires me to remember a password, or go through the hassle of resetting the password through email verification.
- I don't like to hang out on forums. Or, at least, I don't anymore.
- I actively avoid the site's links when they show up in search results.
vBulletin is bad. It powers a lot of forums in the world, and you can recognize it a mile away. I looked up vBulletin. Amazingly, people pay money for it. I could enumerate its sins for a long time, but I need to focus on one. In order to not receive weekly emails, you have to change your account settings. To change your account settings, you have to log into your account. If, like me, you don't remember passwords for sites you rarely visit, you have to reset the password through email. This is a familiar song and dance to many, but it angers me immensely. It is trivially easy to generate a single, safe, instant unsubscribe link for mailing list emails. Why don't LinuxQuestions emails have a one-click unsubscribe? Easy, they are idiots. They are stupid for using crappy, commercial forum software. They are idiots for spamming my inbox and forcing me to jump through a bunch of hoops to unsubscribe.
Sites like Stack Overflow use my OpenID login, which I know well. I can remember a single username / password for multiple sites without the fear of one site secretly stealing my password and using it to log in to another site as me. Trust me, more sites than you want to know store your password as plain text in their databases. Why am I picking on this one website? They are trying to get more visitors who provide more content, so they can extract advertising money or satisfaction that they are helping Linux users everywhere. They want to be a popular destination for a certain niche, but they completely fail to create a compelling experience in any way at all. It is so bad, I actively ignore search hits for their site.
Software matters. Usability matters. You can't just punt on the software. Invest in it. Pay a real designer and get a real programmer on your team. Unless you are independently wealthy, you need a software developer on your team. This software developer will want to make the site awesome, since he is personally invested in the success of the site. People want to know what makes a website successful. Ultimately, its the folks writing the code who make or break a site. The software developer needs access to the code, so don't even dream of buying a closed-source, commercial solution. Start with the best open source solution (let the software developer pick), pay for a good design (don't do a design competition, that's exploitative), and let the software developer merge the two. Your software developer uses social web software all the time. He knows what's good. Force him to eat his own dogfood (use the site himself), and he'll work harder at improving the experience. Elicit feedback from your users and implement popular features.
The tragedy is that these old, broken forums end up loaded with useful information donated by the users despite the shortcomings of the software the site rests on. Don't post useful information on these sites. Please, don't encourage them.
Posted by postfuturist on 2010-02-09 01:53:09
I spent some time tonight working on the compiler I am writing for my own education. I started this project while only understanding bits and pieces of what it takes to write a compiler, but I am quickly learning more. Various texts on compilers start with creating a simple calculator program that takes a string like "10 + 2 * 6 - 4 / 2" and calculates the result. For this example the problem can be broken down into a few simple steps which are merely transformations:
- Transform the input into a string of discrete tokens. "10 + 3 * 4" becomes "10", "+", "3", "*", "4". This is achieved with a scanner/lexer like lex, or the newer flex.
- Transfom the token stream into a parse tree using a BNF description of the grammar with a parser tool like yacc or the newer bison. Using infix notation, our stream becomes ("+", "10", ("*", "3", "4")).
- Reduce the tree by walking it and calculating the value of each node, depth first, until you have calculated the value of the root node which is the value of the whole tree.
- Evaluate left side of "+" operation. It is the constant value 10.
- Evaluate right side of "+" operation.
- Evaluate left side of "*" operation. It is the constant value 3.
- Evaluate right side of "*" operation. It is the constant value 4.
- Reduce: 3 * 4 is 12
It is the numerical value 12.
- Reduce: 10 + 12 is 22
The whole thing is straight-forward and almost insultingly easy. The tools I've been using are Flex and Bison which generate the scanner and parser, respectively, in C. While writing these parts to generate a parse tree, which could also be called an abstract syntax tree or AST, I had a realization. Code generation, whether it is byte-code or machine-code, is simple. The code does the same tree-walk, in the same order, but instead of evaluating and reducing the elements, just generate byte-code or machine-code which will do those simple operations at runtime. The program spits out a flat sequence of code. You are merely writing code that flattens the tree.
To put it another way, the sequence of mathematical operations that reduce the parse tree to a single value is the same ordered sequence of operations that the compiled code should represent. It's so obvious, it's painful. I just hadn't thought about it before. I thought there would be some deep voodoo involved in compilation, but there isn't. Your program is likely to be a series of expressions, not just one. All you have to do is evaluate them in order, tacking the byte-code or machine code for each expression on to the end of the code.
Here is a series of transformations that does what I am talking about:
- Input : "x = 10 + y * 3"
- Scanner : "10", "+", "y", "*", "3"
- Parser : ("=", "x", ("+", "10", ("*", "y", "3")))
- Tree walk :
- Store the location of "x" in R0
- Store "10" in R1.
- Store the value at location "y" in R2
- Store "3" in R3
- Store the product of R2 and R3 in R4
- Store the product of R1 and R4 in R5
- Store R5 at the location specified in R0
It's just one more step to take the list of operations and turn them into a list of coded instructions. At that point you are done.
One of the things I wanted with my compiler is type inference, so when I get to an "=" or "assignment" node of my tree, I evaluate the right side first, which gives me a type, then I know what the type of the identifier on the left should be. The first time an identifier is used should always be an assignment, and the type of the identifier is set at that time, if another assignment to the same identifier is made of a different type, the program should fail to compile. Here is the messy / incomplete code that handles this right now:
int genAssignByteCode(simpleblock b, simplenode n)
{
if(n->R == NULL) {
printf("No R-value in assignment!\n");
exit(EXIT_FAILURE);
}
if(n->L == NULL) {
printf("No L-value in assignment!\n");
exit(EXIT_FAILURE);
}
if(n->L->type != IDENTIFIER) {
printf("L-value is not an identifier!\n");
exit(EXIT_FAILURE);
}
int reg_num = genByteCode(b, n->R);
if(reg_num == -1) {
printf("The r-value has no value. Lame.\n");
exit(EXIT_FAILURE);
}
enum reg_type rval_type = get_reg_type(b, reg_num);
simplesymbol inmap = in_map(b, n->L->data.string);
struct byte_code bc;
if(inmap == NULL) {
// new value with big, scary TYPE INFERENCE
int newr = new_reg(b,rval_type);
new_symbol(b, n->data.string, newr);
if(rval_type == rt_int) {
bc.t = BCI_STORE_INT;
} else if(rval_type == rt_string) {
bc.t = BCI_STORE_STR;
}
bc.reg_a = newr;
bc.reg_b = reg_num;
} else {
// existing value, better do some static type checks
enum reg_type lval_type = get_reg_type(b,inmap->reg_num);
if(rval_type != lval_type) {
printf("You got the wrong type, pal.\n");
exit(EXIT_FAILURE);
}
if(rval_type == rt_int) {
bc.t = BCI_STORE_INT;
} else if (rval_type == rt_string) {
bc.t = BCI_STORE_STR;
}
bc.reg_a = inmap->reg_num;
bc.reg_b = reg_num;
}
new_code(b,bc);
return bc.reg_a;
}
Basically, simpleblock is a typedef'ed pointer to a struct that represents a block of code. It keeps an array of virtual registers allocated, array of instructions, and linked list of symbols (identifiers). This code generates the byte-code for the assignment operation. It is called from the genByteCode function which it calls recursively to evaluate the right side of the statement. The line new_code(b,bc); actually appends the byte-code bc to the array of instructions in block instance b. This is C code, and as such, it is really ugly. Also, the error messages are unhelpful and quite rude. This is a rough-draft of sorts. There are a lot of parts not in the code that would help explain it, too.
While constructing this part, I realized that it is very easy to do type inference. Java and C# took years and years to acquire type inference of any kind, but I don't see what all the fuss is about, it is quite simple to implement, really. Compilers are not scary, go write one of your own already. Go create that perfect programming language. Do it now, or else you'll spend the rest of your life writing software in programming languages that you don't like and you'll only have yourself to blame.
Posted by postfuturist on 2010-02-07 21:52:12
In an earlier post I wrote my wish-list for the perfect programming language (for me.) I believe most programmers have a perfect programming language in mind. It usually goes something like this: "I really like the object model of language W and the functional aspects of language X and the syntax of language Y. It would be great if it ran on Z runtime / virtual machine, too." Then there are the language zealots who say that language Foo is perfect, or at least better than all the other commonly used languages and with such a large user base and library of reusable code, that even if they could fix the bits of the language they don't like, it wouldn't be worth it because it would break existing code. In a sense, they've settled, put their tent pegs in the ground and encourage others to set up camp where they are, so they can all share code. There is certainly a lot of value is just spending a lot of time one language so you can go deep with it and develop a strong set of idioms and patterns. It also leads to a large number of very exclusive camps. To be fair, many programmers dabble in multiple languages and paradigms, but most do not.
I often am offended by programming language zealots. Just like religious zealots, they are myopic, self-important and insulting towards outsiders. I love much of the writings of Paul Graham, but I find that his statements about Lisp and all that which is not Lisp to be difficult and often insulting. In his article "Beating the Averages," Graham talks about a hypothetical language Blub that a programmer gets attached to. This programmer sees less powerful programming languages for what they are, but doesn't understand the abstractions and power of better programming languages. The programmer thinks he is using the best programming language, but he isn't. He isn't using the best programming language, because... he isn't using Lisp. Period. Graham states explicitly, "Lisp is so great not because of some magic quality visible only to devotees, but because it is simply the most powerful language available. And the reason everyone doesn't use it is that programming languages are not merely technologies, but habits of mind as well, and nothing changes slower." Again, I like Paul Graham's writings for the most part, but saying that he is correct about this because he seems to be correct about so many other things would be a fallacy as much as saying that he is wrong because everything else he says is wrong.
I agree with many points in Graham's article, more or less, but I don't think that most programmers are small-minded Blub programmers. I agree that Lisp macros are powerful, that it is a powerful concept to write programs that write programs etc, but Lisp is hardly the only language that does that. Boo, a language that actually fulfills my wish-list, has macros which are very similar in functionality to Lisp macros. You get access to the actual parse tree and can create or change or augment language features in the language itself. It is a powerful feature, but it is not the most powerful feature, at least not for me. It's not that I have a small mind, either. I understand that Lisp macros are awesome, but it is not an abstraction I need in day to day programming to get things done. I'm a big fan of getting things done, and building systems to help get them done faster and better. Graham argues that he used Lisp with lots of macros in a business that did well, with software that delivered functionality not present in the competitor's projects, and that this was all because of Lisp macros, which comprised at least 20% of his product's code. I'm sure there was a correlation between having features and being successful, but the 20% macros thing may just be significant of the fact that he had to add a lot of things to Lisp to be able to build the software he wanted. It could also be significant of the fact that he knows how to write macros. Just because the competitors did not deliver the same features does not mean that they didn't or couldn't because they used some less powerful programming language. The software he wrote could likely have been written in any number of languages by any number of programmers. The programmers that toil away in so-called Blub languages, oftentimes are quite good at what they do, and can deliver features if they can think them up or are allowed to add them or whatever. Look at Facebook, they rock the world with the power of PHP, a pretty awful language by the estimation of many. I just doubt that Lisp was the reason for Paul Graham's success.
I said all that to say this: Lisp worship aside, it is a good idea for programmers to learn things about programming languages that they don't understand. It is healthy to stretch your mind with different perspectives. I believe that it is a universally accepted truth that looking at a problem from different mental perspectives causes us to have breakthroughs of new ideas and understanding. One excellent way to expand your mind as a programmer is to create your own programming language--not just a theoretical grammar, but an actual working compiler and/or interpreter.
I have to admit, I've been a little lackadaisical regarding hacking code in my free time. I think I haven't been able to get excited about a project. I've announced, prematurely, some projects in the past that have been since abandoned. Perhaps it is that I spend so much time at work "getting things done" that I relish the freedom to not get things done outside work. It's true, I like messing around with different programming languages and not committing to projects. However, since the programming language wish-list post, I've been drawn to my newest project over and over. I am working on a new programming language. It is really in the very earliest of toy language stages right now. I am really just learning things as I go. I am learning how to use flex and bison, the descendants of lex and yacc, what abstract syntax tree is, and many other things. I've read a lot of articles about different aspects of compiler / interpreter design and techniques. Some things have inspired me a great deal, like V8, the JavaScript engine that Google created. I've been interested and inspired by byte-code interpreters, JIT compilation, garbage collection, stack-less interpreters, coroutines, closures, and many other things.
As you can imagine, I have a laundry-list of things I would like to implement in my very own programming language, but this project is mostly about learning, not the end product. I know most folks with a BSCS have taken a course on compilers and have written a compiler of some form or other. With all the instruction on compilers that has taken place, why do we settle for such mediocre languages. Python 3, was a very small step forward from Python 2.x. It was such a small step, that I wonder what the point was. They cleaned up the syntax and some of the libraries a little bit, but for a backwards compatibility breaking change, they sure didn't do much to fix the inadequacies of the language. The "hot" programming languages these days are incredibly old--at least in internet years. Ruby is 15 years old. Python is 19 years old. Why, after developing a science around compilers, do we still use languages that require semicolons on nearly every line? If Ruby and Python are each at least a decade and a half old, do so many companies write code in C#, Java, PHP, and so many other languages that have vestigial syntax from C? Perhaps it is the slowness of businesses. I don't think so. I think it is laziness.
Software developers have the ability to write their own compilers. Yet, they do not. It is intellectual laziness. We are blacksmiths. We can forge our own tools, yet we use the crappy ones handed down to us from old. There are established techniques for creating new tools, yet we forge on with the tools given to us. For shame. What if every developer wrote their own programming language? Sure, most would fall by the way-side and we need developers to write libraries, too and to research other concerns like concurrency, that don't necessarily need to be solved at the language level. There are millions of software developers in the world. Python is dead. If you have the chance, fire up the python interpreter some time and type in "import this" and hit enter. You will discover a creed, a philosophy that underpins many of the decisions surrounding the creation and maintenance of Python. It is a philosophy I disagree with on many points. Not only that, but the Python language fails to fulfill a good number of them. "In the face of ambiguity, refuse the temptation to guess." Python does not allow type annotations, everything is guesswork. "foo = bar" could mean a lot of things, it could be changing the value referenced by "bar" or it could be creating a new reference. "bar" could be anything, a number, a string, an object. Looking at Python code is often a lot of guesswork. Duck typing is the only typing available. When you create a function, you can only name the inputs, not define the type. It seems to fail here. Here is another : "There should be one-- and preferably only one --obvious way to do it." That line gets a lot of deserved criticism. It's not even true. Python is a very powerful language that offers a very large number of way to solve any problem, and usually several of them could be considered obvious, depending on the style of code you normally write. Regardless, even if you could write a language that only had one obvious way to solve each problem, you would have a very poor language. "If the implementation is hard to explain, it's a bad idea." CPython has the GIL, both hard to explain, and a very bad idea.
I encourage every developer to write their own programming language. It will break us out of the complacency that allows us to toil day-in and day-out using the same leaky, broken abstractions. We can write our own tools. I've seen the HTTP 1.1 RFC, it isn't that complicated, and I think the tools we use now to implement server-side code for web development are quite broken. PHP is just awful. Python is a little better, as is Ruby. V8 / JavaScript / Node.js has some promise, but JavaScript is a pretty broken language, too. It's got that semi-colon problem, plus wierd object / array / function rules that lead to ambiguity and confusion. At the end of the day, you can't have everything in one language, though they are certainly trying to do that with Perl 6. We can have better choices, or at least break through the complacency.
I don't want to sound completely critical. Change is happening. People who are stuck with certain runtimes, like the JVM or .NET have created or ported better languages for the platform. There is quite a bit of excitement about new compiler tools, runtimes, front-ends, back-end etc. The LLVM project has gained a lot of attention, lately, for being completely self-hosting.
My own project is just barely getting going. I have the code constructing a complete AST, and I am in the process of writing the part that generates byte-code. I've learned so much already about grammars, tree manipulation, and parsing. I have ideas about register allocation and other things. It is quite the experience to get my hands dirty writing C code again. I am taking advantage of tools I didn't know about in the past, like the Boehm garbage collector. I've spent too long in garbage collected languages to have to worry about freeing every bit of memory by hand now. That and generous use of typedefs make the experience not entirely unpleasant.
Posted by postfuturist on 2010-02-03 01:51:08
There is an article on the Postabon Blog titled "Make Lisp 15x faster than Python or 4x faster than Java." In his benchmark, which is a tightly nested loop doing some trigonometric calculations, he achieves speed similar to Java with unoptimized Lisp and a 4X speed increase over that with some Lisp optimizations. He also compares it with Python which is silly because Python's slowness is a known and insurmountable problem with the language.
I've been looking for a programming language with some of the feel of Python but with static, inferred typing. There are a couple, but Genie caught my eye so I wrote his algorithm in Genie. Basically, the differences between Genie and Python I noticed in porting the code are that functions need type declarations (something I have yet to see inferred in any statically typed language) and that variables are initialized with the "var" keyword. Genie does away with Python's use of the colon, which is redundant, and instead uses it for type declarations.
Here is the new code.
[indent=4]
uses
GLib
def radians(n : double) : double
return n * 0.0174532925
def distance(latA : double, lngA : double, latB : double, lngB : double) : double
var radius = 6371.0
var latAr = radians(latA)
var lngAr = radians(lngA)
var latBr = radians(latB)
var lngBr = radians(lngB)
var deltaLat = latBr - latAr
var deltaLng = lngBr - lngAr
return radius * 2 * Math.asin(Math.sqrt(Math.pow(Math.sin(deltaLat/2),2.0) \
+ Math.cos(latAr) * Math.cos(latBr) * (Math.pow(Math.sin(deltaLng/2),2.0))))
def bench()
var increment = 2.5
var latA = -90.0
while(latA <= 90.0)
var lngA = -180.0
while(lngA <= 180.0)
var latB = -90.0
while(latB <= 90.0)
var lngB = -180.0
while(lngB <= 180.0)
distance(latA, lngA, latB, lngB)
lngB = lngB + increment
latB = latB + increment
lngA = lngA + increment
latA = latA + increment
init
bench()
This code runs on my laptop in about 35 seconds, which as far as I can tell from the blog post, is about as fast as the optimized Lisp. Of course, Genie is translating the code into pure C, and then compiling it to an executable. Syntax is a matter of personal taste, and I like the way Genie works. Since it compiles to a pure binary, there is no runtime. It leverages existing C libraries, using GLib objects as its native object system. The C API is universal and Genie can emit C header files too, so I could see myself writing Genie code that integrated with almost any software development platform.
For comparison, here is the somewhat verbose C code generated by valac, the Genie / Vala compiler.
#include
#include
#include
#include
#include
#include
double radians (double n);
double distance (double latA, double lngA, double latB, double lngB);
void bench (void);
void _main (char** args, int args_length1);
double radians (double n) {
double result;
result = n * 0.0174532925;
return result;
}
double distance (double latA, double lngA, double latB, double lngB) {
double result;
double radius;
double latAr;
double lngAr;
double latBr;
double lngBr;
double deltaLat;
double deltaLng;
radius = 6371.0;
latAr = radians (latA);
lngAr = radians (lngA);
latBr = radians (latB);
lngBr = radians (lngB);
deltaLat = latBr - latAr;
deltaLng = lngBr - lngAr;
result = (radius * 2) * asin (sqrt (pow (sin (deltaLat / 2), 2.0) + ((cos (latAr) * cos (latBr)) * pow (sin (deltaLng / 2), 2.0))));
return result;
}
void bench (void) {
double increment;
double latA;
increment = 2.5;
latA = -90.0;
while (TRUE) {
double lngA;
if (!(latA <= 90.0)) {
break;
}
lngA = -180.0;
while (TRUE) {
double latB;
if (!(lngA <= 180.0)) {
break;
}
latB = -90.0;
while (TRUE) {
double lngB;
if (!(latB <= 90.0)) {
break;
}
lngB = -180.0;
while (TRUE) {
if (!(lngB <= 180.0)) {
break;
}
distance (latA, lngA, latB, lngB);
lngB = lngB + increment;
}
latB = latB + increment;
}
lngA = lngA + increment;
}
latA = latA + increment;
}
}
void _main (char** args, int args_length1) {
bench ();
}
int main (int argc, char ** argv) {
g_type_init ();
_main (argv, argc);
return 0;
}
Posted by postfuturist on 2010-02-01 02:51:32
I am beginning to question the appropriateness of visual aids in professional software development. When I only occasionally had to look at database structures and test out SQL queries, I used tools like phpMyAdmin and MySQL Workbench which provide GUI visualizations and click-friendly database management. I now look back at those times in the same way that a young adult reader might look back at a time when he cherished picture books. Once I managed to overcome the intimidating nature of the mysql command-line tool, I found myself spending more and more time in that tool and less and less in the visual tools. Visual design tools have always been a crutch--useful for getting started, but eventually a hindrance to getting things done quickly and efficiently.
Using the mysql command line tool helps me be a better database designer. First of all, I have to communicate with the tool in SQL with a few helper functions. It responds by printing out information in ASCII formatted tables. I only need to know a handful of basic commands to get started.
USE dbname -- sets the active database
SHOW TABLES -- lists the tables in the active database
DESC tablename -- shows the structure of the table
SHOW CREATE TABLE tablename -- shows SQL to create table
The SHOW CREATE TABLE command is ultimately more useful than the DESC command because it gives you a better idea of how the table was created, and how to create similar tables in the future. I learned quite a bit about MySQL by using this command to inspect various tables.
Using command-line tools to interact with systems like databases puts me closer to the action. After awhile I feel like I am getting my hands dirty, and I get a feel for the data. At times this feeling can be negative. Working with MyISAM tables can be painful when the database allows me to put the database into invalid states. I get a visceral sense for the inadequacy and brittleness of MyISAM when I manipulate it by hand. InnoDB tables feel a little more constricting at times, being much stricter about data integrity, but this strictness also brings confidence in the safety of an ACID compliant database.
MySQL is just one example of how native command-line interfaces, however daunting, are often more powerful tools for interacting with computing systems than the fancy GUI tools. Perhaps, with concerted effort, the visual tools could include some of the efficiencies of the command-line tools, but for the most part they feel slow and stupid. Other tools I use more on the command line than anywhere else are revision control tools and testing frameworks.
Posted by postfuturist on 2010-01-27 01:43:42
I work at a very small software company that has as clients other small businesses. We do custom software. In 2010 that means we do web development. We bid on projects, complete the projects we get, fix any bugs for free for 6 months, and charge a very competitive hourly rate for new features. We don't have failed projects, per se. Sometimes a customer might pay us to create something that doesn't end up helping their business, but the software we create is of reasonably high quality and does what it claims to do. It's tough times economically, so when a customer wants us to create a new feature on their web page, we generally try to get it done in hours, not days. Some days I close several tickets. I feel like I am delivering quite a bit of value to our customers. Now we support our products with a 6 month guarantee of free bug fixes, which means if I spend 2 hours on a small feature, I need to be reasonably certain that it works correctly, because if it comes back with a major issue two weeks later, we don't charge the customer for the time it takes to fix. The more time I spend on a feature, the less perceived value the client is getting because it is costing them more. However, if I do the feature in less time, and it turns out to have a bug, any cost savings the customer might have seen go away the moment that the feature fails in production and they suffer real business losses. There are books upon books written on this topic, but I'm just going to share a few things that I've learned in my short career thus far in software development.
Software development doesn't need to be slow to be high quality. I have been a slow convert to the unit testing religion, but now I am just about there. These days when a client wants a new feature, I find the test classes that correlate to the class I'm messing with, write some tests for the new functionality which fail. I type in some code, run the tests and repeat until they are all passing. I run all the unit tests for the project (which currently runs very quickly as we haven't been doing this for a long time) to make sure I didn't inadvertently break anything else and ship it. It's fast and I see very few bugs slip through this way.
Develop a small set of simple, high quality tools. Most of our work is in PHP. After you are done groaning about how awful PHP is, read on. A lot of our work consists of rescue operations. Small business owners and budding entrepreneurs, bless their little hearts, don't usually have a lot of cash to throw around, so they take the lowest bid for their website work. The lowest bid is usually a company from India or Romania or some other exotic locale where US dollars are still worth quite a bit. These thrifty folks end up with a half-working, extremely buggy piece of software which is basically a single folder with a whole bunch of files in it ending with ".php". The internals of these files would make any software developer worth his or her salt cringe. You and I know that the best thing to do would be to throw the whole thing out, and start from scratch, preferably with a language like Python or Ruby. Well, the client doesn't know that, they come to us to fix the code. Well, we have a tool, it is a very lightweight PHP framework that can integrate nicely with existing code. I have used it so much that I am at a point where it feels natural to use it to solve problems. Our clients do pay us to fix their code which we do gladly, and efficiently, usually replacing the broken parts completely with clean, MVC style code.
Imperative, object-oriented code is efficient for writing web software. I've read a bunch about heavy OO patterns and this and that, and I think most of it is crap. Nobody needs all that. What customers need is to create related hunks of data, which I call "rows" from a database perspective, or "objects" from a programming perspective. I have created my very own simple, Active Record style ORM for PHP and MySQL. Basically, you give me an existing database table. In about 2 minutes I can slap together about 20 lines of boilerplate code (mostly informing the ORM what the name and data type is of each column) which gives me the ability to easily do CRUD operations (create, read, update, delete) on your database table. The ORM generates all the SQL in the background, so the only code I have to write is PHP. The columns of your database become public members of the class instances (objects). Now, according to the rules of OO, this is a big mistake. Well, really, it's not. You've got collections of data to move around and use, why hide everything behind a bunch of pointless accessors? All that extra data-hiding cruft just slows things down. What the customer wants is a bunch of data that gets dumped into the correct parts of the HTML. The customer doesn't care about "proper OO" and neither should you. They are not paying you hourly to implement latest design pattern you read about.
Use a dynamic, "scripting" language like PHP, Python, or Ruby (or Perl if you are a masochist). I now point Apache directly at my project folder so when I'm done typing, I hit a few keys which save my work, switch to a console to run a test or switch to a browser and reload the page. I see the results of what I typed within about one second of when I finished typing. There is no deploy step, no compile step, no waiting, no distraction, just instant feedback. It's amazing, and it has revolutionized my workflow. Remember, our clients are paying hourly rates, which when you break it down into minutes is generally some amount of money greater than $1 per minute. A five minute compile/deploy step costs the client several dollars each time. I remember writing code in Visual Studio, I believe the F5 button was the compile/run shortcut. Hitting that on a reasonably complex piece of software was like an invitation to go get a snack, start up a conversation with a coworker or get distracted with interesting articles on the internet. The big secret is of course that every programming language gets compiled, it's just that C++ compiles really slowly, has an extra pre-processor step and has to be linked with a bunch of other code in a time consuming way. Things like Python get compiled as they are read into the interpreter into byte code which is then executed by the interpreter immediately. That just happens to be really fast, plus the byte code is cached to disk oftentimes. In the .01% of the time when the simplest possible code in your scripting language of choice isn't fast enough for your client's need, first check to see if you can fix it with a better algorithm, if not, rewrite the critical, innermost loop in C.
In conclusion, I may use a lame tool (PHP) and butcher "best OO practices" and make babies cry by completely ignoring "functional" programming, but I get things done. Not only that, but I get things done well, with a low defect rate, and quickly. The code I write is easily maintainable and ships with good test coverage. The "good, fast, cheap: pick two" thing is a lie. With a familiar, well-honed collection of tools any software developer can do all three.
Posted by postfuturist on 2010-01-14 02:13:03
I write custom software for a living. It's usually fun, sometimes challenging, often tedious, but overall gratifying in a number of ways. One of these ways is that I get to be creative and build my own tools that help me write better software that is quicker to write, easier to support, and is more stable. Software is often boring, and the boring parts get all the tools, abstractions, DSL's, and the like. Programming languages provide the abstraction layer to build these things. We use programming languages like tools in a tool belt, pulling out the "right tool, for the right job." Many of these "right tools" are just general purpose programming languages that happen to have a feature or abstraction that another general purpose programming language doesn't. Even though we've been developing programming languages for the last 60 years, software is still being written in languages that are so severely lacking that programmers are little more than data entry drones.
I don't propose that any particular language is THE ANSWER. Such a thing does not exist, at least not yet. There aren't even any particular fantastic ones. The more I investigate other languages, the more I see features that I wish existed in the languages I use to get things done with every day. So, I'd like to compile a list. There's no reason why these things can't coexist in the same programming language. Most of the programming languages we use are very old and as certain ideas drift into other languages, the need for backwards compatibility causes new features to be implemented in strange ways. For example, object oriented programming being tacked onto old programming languages has created all sorts of syntactical abominations: C++, Perl 5 blessed references, and Objective-C to name a few.
Last rant before the list: I've been told that every Computer Science student takes a class on compilers and yet, we still write programs in PHP, Java, and C++. I'm just a lowly English major but I think we can do better. I've been reading Coders at Work, a collection of interviews with important software developers. Many of the interviews are with designers or people influential in the creation or design of popular or important programming languages such as JavaScript, Haskell, Erlang, Common Lisp, and Smalltalk. It's really interesting stuff, but it shows how most of these languages have flashes of brilliance, pieces that add to the puzzle, but are also missing out on a lot of innovations. That said, here is my list:
1. Classes and namespaces. It doesn't need to be object oriented, as that implies that everything has to be an object, even the entry point of your program. Java is object oriented, and I consider it a fault. Too many objects and too many classes. That said, classes of objects are nice abstractions that allow the programmer to arrange code and data in useful ways. Similarly, namespaces are a nice way of avoiding name collisions and organizing code. I want to be able to put stand-alone or helper functions in namespaces, and not be forced to create static functions as part of class definitions. Not everything has to be part of a class. Python gets this right (modules as namespaces).
2, Arbitrary data structures. Sometimes data enters a program from an external source like a database, or from a web browser, or a file. It is very handy to be able to take something like a string of JSON or XML and just convert it into a schema-less blob of data, native to your programming environment. JavaScript does this particularly well with dynamic objects and arrays that can nest in arbitrary ways. The literal notation for such lists and objects is so simple and straight-forward in JavaScript that it is the de facto standard for data exchange, thanks in part to Doug Crockford who codified it. Most languages we would consider "dynamically typed" do a decent job of this like Python, Perl, Ruby, and even PHP. Most languages we consider "statically typed" do not. Do I even have to name names?
3. First class functions. Functions need to be able to move around, breathe, combine and create other functions. This allows the programmer to create innumerable abstractions. C++, Java, and PHP just completely fail at this. Python does a half-assed job--sorry, but Python just kind of sucks this way. JavaScript, Perl, and many other languages do a better job.
4. Static and dynamic typing. I want it all. There are compelling reasons for both. I see no reason why I can't have both. Some languages already have this, like Boo, C# 4.0, Delphi, and maybe PHP. Maybe what I want is called optional static typing. That's not quite right, as it is really is a matter of perspective. It could be called optional dynamic typing, too, I suppose. In any event, it's not voodoo. I don't know why stubborn languages like Java and Python will not allow optional "other-style" typing. A lot of energy is spent with statically typed languages dealing with arbitrary data structures in sane ways or initializing objects with excessive typing: "Circle mycircle = new Circle()". Conversely, people in the purely dynamic languages camp spend a lot of time agonizing over compiler and runtime optimization when it would be so simple for the programmer to just let a profiler tell him where the "hotspots" are so he or she can just go in and add a few hints or type constraints into the code which allow the compiler or runtime to generate much faster code. In Python, you generally "drop down to C" to optimize the bits of code that run too slow. That's a shame, since it can destroy the cross-platform capability of your code and/or tie it to a single language implementation. It is also needlessly complex. Boo is really the best language I've seen in this space. It looks like a dynamically typed language but it actually uses type inference and is statically typed by default but allows explicit dynamic typing.
5. Good Concurrency Options. I like Erlang's actor model. I don't care much for the rest of the language. Much has been written about concurrency recently so I won't say much. There needs to be a way to share data between different threads and/or processes in a safe, sane way and the ability to effectively utilize multiple processing cores efficiently.
6. Clean syntax. I normally can adapt pretty well, but some languages just drive me bonkers with their awful syntax. Perl is one of those languages. I am not a fan of Lisp and it's many direct offspring. S-expressions are so mundane as to be mind-erasingly boring. Those languages sorely need some variation--some different operators or something. There is a healthy balance, leaning toward the simple. Python and Ruby are both decent. CoffeeScript, a mini-language that compiles to JavaScript has some neat ideas in this space. Using different symbols and operators allow languages to be expressive, and make it easier to catch the meaning of a bit of code at a glance, but are often superfluous. Most commas, semicolons, and curly braces you see are completely useless. 99% of the time, statements are separated by new lines, why do you need a semicolon, too? You don't. Items in a list are often separated by a comma and some white space, usually one space character. Do you need a comma? You don't. Usually when you see curly brackets, only one of them is adding any meaning to the code: the last one.
There are more things that I want from programming languages, but those are some big ones. If I could find a language that does them all, I'd be a happier programmer. Ultimately, software development is about getting things done and no profession has a perfect set of tools, but I still think that we can do much better.
Posted by postfuturist on 2009-12-04 21:47:49
OK, by skeletal, I mean absolutely no fat whatsoever. What you are about to witness is merely an illustration of a thought. Here's the thought: I want to do serious server-side web development with JavaScript and Node.js. Why? Node.js is a convergence. The creation of Node.js was precipitated by several phenomena including the rise of JavaScript as the central programming language of the web, increased interest in functional programming, and advances in event-based programming such as epoll system call in Linux 2.6.
Without further ado, here is Root.js in primordial form:
var sys = require("sys");
var http = require("http");
var posix = require("posix");
function loadfolder (name)
{
var data = {};
var output = sys.exec("ls " + name).wait();
var files = output[0].split("\n");
for(var j in files)
{
if(files[j].match(/\.js$/))
{
var modname = files[j].replace(/\.js$/, "");
sys.puts("Loading " + name + "/" + modname);
data[modname] = require("./" + name + "/" + modname);
}
}
return data;
}
var models = loadfolder("models");
var views = loadfolder("views");
var controllers = loadfolder("controllers");
for(var x in controllers)
{
controllers[x].load(models,views);
}
var routemap =
[
[/^\/other(.*)$/, "other"],
[/^\/(.*)$/, "default"]
];
http.createServer( function (request, response) {
for(var i in routemap)
{
var route = routemap[i];
var matches = route[0].exec(request.uri.path);
if(matches != null)
{
matched = true;
var c = controllers[route[1]];
c.handle(matches.slice(1),request,response);
return;
}
}
}).listen(8000);
sys.puts("Server running at http://127.0.0.1:8000/");
This file expects a few things. It expects there will be folders named
models,
views, and
controllers. In its current configuration the routemap will expect to find files named
default.js and
other.js in the
controllers folder. Any controllers should export a function called
load that provides a way to pass in the models and views during initialization and a function called
handle that is called when there is an incoming request. Here is a simple example controller (
default.js):
var models = null;
var views = null;
exports.load = function (m,v) { models = m; views = v; };
exports.handle = function (matches, request, response)
{
models.mymodel.getData( function (data) {
views.defaultview.show(response,data);
});
}
This controller calls the getData function exported by the
mymodel model which would be a file named
mymodel.js in the
models folder looking something like this:
var data =
{
title : "Page Title",
message : "Hello, JS World"
};
exports.getData = function(callback)
{
callback(data);
}
In this case, the model does not need to do an asynchronous call to get the data, but in case it did, the controller passes in a callback that the model can pass the data to asynchronously, in typical Node.js fashion. The last thing we need is the view (
views/defaultview.js):
var posix = require("posix");
var jsontemplate = require("../json-template").jsontemplate;
var templates = {};
loadtemplates = function()
{
templates['basic'] = jsontemplate.Template(
posix.cat("./templates/basic.html").wait());
}
exports.show = function(response, data)
{
response.sendHeader(200, {"Content-Type":"text/html"});
var text = templates['basic'].expand(data);
response.sendBody(text);
response.finish();
}
loadtemplates();
Oh yes, and the template (
templates/basic.html):
{title}
{message}
This example uses JSON Template which I briefly discussed
here.
The preceding code actually works. It's missing the controllers/other.js controller, but it's enough to get the general idea. I could sit down with this as a starting point and create a web application. In fact, I just may do this. This framework has basically zero helper functions. I've noticed that the best helper functions and other abstractions of useful functionality spring forth naturally from the act of programming and active refactoring. When I notice myself repeating work, or pondering the idea of copy-pasting code from one part of an application to another, that's generally the time I do the work of building infrastructure. If I decide to use this framework to do something useful, then it will develop naturally. Right now it is a blank canvas.
I recently used Redis to solve some problems as work. There is already a node.js library for accessing Redis here, so I'm thinking about building something with that. Redis is essentially a persistent key-value store that supports simple data structures with atomic operations such as lists and sets. The "list" solved a very specific problem for us which had previously been partially solved (in an ultimately broken way) by building a queue on top of Memcached using a locking system built on the atomic increment and decrement operations. We dropped in the Redis list and everything started working perfectly.
If I decide that I need a traditional SQL backend, I would probably use DBSlayer, a http/JSON interface for MySQL, though Ryan Dahl has promised native support for some traditional databases in future versions of Node.js. Cool.
Copyright 2009, 2010 by Stephen A. Goss