Category > geekstuff

My .vimrc
Posted by postfuturist on 2010-06-28 13:37:00
This is mostly for my own benefit, so I can copy/paste it when I'm setting up a new system:
set nocompatible
set tabstop=4
set shiftwidth=4
set smarttab
set expandtab
set softtabstop=4
set autoindent
set smartindent

set showcmd
set hlsearch
set incsearch
set ruler
set visualbell t_vb=
set nobackup
set ignorecase
set ttyfast
set sm
syntax on
set background=dark
set virtualedit=all

set backspace=indent,eol,start

set dir=~/.vim/swap

com Q q
com W w
com Wq wq
com WQ wq
Notes On Running Ubuntu Linux on an Old Computer
Posted by postfuturist on 2010-06-27 17:42:55

These are some packages that I found useful running Ubuntu 10.04 on a 10+ year old laptop. This laptop is so awesome it has a CDROM and a floppy drive built-in! It's maxed out at 384 MB of RAM which is pretty nice considering it shipped with 64. It's a perfectly capable machine, as long as you pick the right software.

  • Window Manager: Fluxbox
  • Wireless: install wicd and remove network-manager
  • Terminal: rxvt-unicode
  • PDF Reader: epdfview
  • Web Browser: epiphany
  • Text Editing: vim
  • Video Game: nethack
Kindle For PC Beta works on Linux
Posted by postfuturist on 2010-06-17 00:08:22
The latest versions of Kindle For PC don't run using Wine in Linux, but a few forum posts explain that the old beta does, so long as Wine is configured to run in Windows 98 mode. It took me awhile to track down a live link for the beta version, so I thought I would post it on my blog for future seekers. Download the Kindle For PC Beta here!
JavaScript Benchmark: Opera 10.60 vs. Google Chrome 5.0.375
Posted by postfuturist on 2010-06-14 14:49:09

I ran the SunSpider JavaScript benchmark of the latest Opera 10.60 64 bit Linux build from here. Apparently, the JavaScript engine not only rivals but looks to be a little faster than Chrome, at least for 64 bit Linux. The tests ran in 411 ms on Opera and 477 ms in Chrome.

TEST                   COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:           1.16x as fast     477.2ms +/- 4.9%   411.2ms +/- 3.1%     significant

=============================================================================

  3d:                  1.57x as fast      86.0ms +/- 21.5%    54.8ms +/- 8.7%     significant
    cube:              2.06x as fast      31.8ms +/- 12.8%    15.4ms +/- 15.7%     significant
    morph:             1.58x as fast      31.2ms +/- 24.0%    19.8ms +/- 6.9%     significant
    raytrace:          -                  23.0ms +/- 35.2%    19.6ms +/- 24.0% 

  access:              *1.11x as slow*    47.2ms +/- 7.1%    52.6ms +/- 2.1%     significant
    binary-trees:      *2.36x as slow*     2.8ms +/- 19.9%     6.6ms +/- 21.5%     significant
    fannkuch:          *1.28x as slow*    18.4ms +/- 12.3%    23.6ms +/- 6.0%     significant
    nbody:             1.41x as fast      20.0ms +/- 6.2%    14.2ms +/- 19.0%     significant
    nsieve:            ??                  6.0ms +/- 41.4%     8.2ms +/- 29.2%     not conclusive: might be *1.37x as slow*

  bitops:              1.75x as fast      37.4ms +/- 9.6%    21.4ms +/- 9.7%     significant
    3bit-bits-in-byte: 2.20x as fast       4.4ms +/- 37.9%     2.0ms +/- 0.0%     significant
    bits-in-byte:      1.68x as fast       9.4ms +/- 24.0%     5.6ms +/- 19.9%     significant
    bitwise-and:       5.92x as fast      14.2ms +/- 13.0%     2.4ms +/- 28.4%     significant
    nsieve-bits:       ??                  9.4ms +/- 22.1%    11.4ms +/- 9.8%     not conclusive: might be *1.21x as slow*

  controlflow:         ??                  4.8ms +/- 33.8%     6.0ms +/- 14.7%     not conclusive: might be *1.25x as slow*
    recursive:         ??                  4.8ms +/- 33.8%     6.0ms +/- 14.7%     not conclusive: might be *1.25x as slow*

  crypto:              *1.15x as slow*    26.2ms +/- 10.3%    30.0ms +/- 11.7%     significant
    aes:               *1.28x as slow*    13.8ms +/- 11.7%    17.6ms +/- 14.6%     significant
    md5:               ??                  6.8ms +/- 43.7%     7.4ms +/- 19.2%     not conclusive: might be *1.09x as slow*
    sha1:              -                   5.6ms +/- 25.3%     5.0ms +/- 0.0% 

  date:                1.55x as fast      80.8ms +/- 6.4%    52.2ms +/- 12.2%     significant
    format-tofte:      -                  29.2ms +/- 16.6%    25.6ms +/- 14.4% 
    format-xparb:      1.94x as fast      51.6ms +/- 17.2%    26.6ms +/- 15.4%     significant

  math:                1.55x as fast      63.0ms +/- 8.3%    40.6ms +/- 4.1%     significant
    cordic:            2.19x as fast      22.8ms +/- 18.7%    10.4ms +/- 21.7%     significant
    partial-sums:      1.35x as fast      28.0ms +/- 9.4%    20.8ms +/- 13.0%     significant
    spectral-norm:     1.30x as fast      12.2ms +/- 19.6%     9.4ms +/- 11.8%     significant

  regexp:              *1.32x as slow*    14.2ms +/- 11.4%    18.8ms +/- 11.8%     significant
    dna:               *1.32x as slow*    14.2ms +/- 11.4%    18.8ms +/- 11.8%     significant

  string:              *1.15x as slow*   117.6ms +/- 4.6%   134.8ms +/- 2.0%     significant
    base64:            ??                 14.8ms +/- 16.2%    16.6ms +/- 10.0%     not conclusive: might be *1.12x as slow*
    fasta:             *1.15x as slow*    20.8ms +/- 9.8%    24.0ms +/- 8.2%     significant
    tagcloud:          *1.40x as slow*    27.6ms +/- 5.1%    38.6ms +/- 5.4%     significant
    unpack-code:       1.19x as fast      33.6ms +/- 5.0%    28.2ms +/- 3.7%     significant
    validate-input:    *1.32x as slow*    20.8ms +/- 10.7%    27.4ms +/- 4.1%     significant
Serving up Django with Tornado
Posted by postfuturist on 2010-06-06 20:35:55

This article is mainly of interest to web developers who might want to publish a site built with the Django web framework in a lightweight fashion (without involving Apache).

First, let me enumerate the stack that is serving this page to you right now.

  1. Hosting: VPS Hosting from ARP Networks
  2. Operating System: Ubuntu Server Edition 10.04
  3. Reverse proxy server (and file server): Nginx
  4. Web server: Tornado
  5. Web framework: Django
  6. Database: MySQL

Here is my Tornado script which I partially plagiarized from somewhere:

#! /usr/bin/env python

import os
import tornado.httpserver
import tornado.ioloop
import tornado.wsgi
import sys
import django.core.handlers.wsgi
sys.path.append('/some/folder/here/myapp')

def main():
    os.environ['DJANGO_SETTINGS_MODULE'] = 'settings'
    application = django.core.handlers.wsgi.WSGIHandler()
    container = tornado.wsgi.WSGIContainer(application)
    http_server = tornado.httpserver.HTTPServer(container)
    http_server.listen(8001, "127.0.0.1")
    tornado.ioloop.IOLoop.instance().start()

if __name__ == "__main__":
    main()

I'm serving up this script with Supervisor by creating a script in /etc/supervisor/conf.d/ called myapp.conf and it looks like this:

[program:myapp]
command=/some/folder/here/scripts/mytornadoscript.py

This is the nginx config:

server {
    listen    80;
    server_name deliciousrobots.com;
    rewrite    ^ http://blog.deliciousrobots.com$request_uri?;
}
server {
        listen       80;
    
        server_name blog.deliciousrobots.com;
        root /some/folder/here;

    	location / {
            proxy_pass      http://127.0.0.1:8001;
            proxy_set_header X-Real-IP $remote_addr;
        }

        location /static/ {
            expires 30d;
        }

        location /wp-content/ {
            expires 30d;
        }

        location /media/ {
            root /usr/share/pyshared/django/contrib/admin;
        }
}

I needed the wp-content from my old wordpress blog because of some image I had uploaded. That's it!

This Blog Now Django-Powered
Posted by postfuturist on 2010-06-03 23:53:25

This blog is now my first public site that I've built with the Python programming language and Django web framework. I needed some motivation to build a project that I could show people, so for the last week or so I've used a few bits of my personal time to build a Django-powered replacement for my WordPress / PHP blog. Django has proved to be simpler and more powerful than I had guessed. I spent likely no more than 15 or 20 hours total working on it, including time spent creating the style sheet and writing a database migration to get my old posts, categories, and comments into the new database.

It's still a bit rough around the edges, and missing a lot of the features of WordPress, but that just gives me things to work on when I'm on the bus to and from work (where most of the code for this blog was written.) I haven't added category pages, just the front page with the latest 10 posts, individual pages for each post with working comments (all comments are moderated for the time being so they will not show up right away), a sitemap.xml file for google's sake, and an RSS Feed.

Again, I've decided to shy away from Apache, even though it is the "easy" way to deploy webpages, especially Django apps. For the WordPress version of the blog, I had been running Nginx on the front, delegating requests to php-cgi instances. I kept Nginx as the frontend server, so all static files are served super fast, and regular page requests are proxied to a Tornado instance. It's bloody fast--even on this $10 a month VPS, and I'm not caching anything, yet. I'm using Supervisor to manage my Tornado script.

I know some things are broken, like images on old posts. I'll get to those, eventually. It's enough that I got it running for right now.

The Reluctant Sysadmin
Posted by postfuturist on 2010-04-25 12:12:04

As a software developer, I work on top of abstraction layers. There are a number of black boxes I build on top of. Compilers and servers fall into this category. The less time I think about the mundane details of how the code I write gets run, the more time I can spend dealing with the higher level abstractions, like "what is this application supposed to do, exactly?" That level of ignorance is helpful at times, but ultimately not healthy to maintain absolutely. That's one reason why I've been steeping myself in the black arts of compiler construction. The other seedy underworld I have placed myself in recently is that of server administration.

I'll cop to this: I suck as a sysadmin. My first act, after putting myself in charge of my very own server (a VPS, actually) was to lock myself out of administrative access. Well, the best way to learn something is by taking it apart and sometimes you break things this way. When I was just a young thing, I would quickly grow tired of playing with toys. Phillips screw drivers were my favorite tool, they allowed me to take apart almost any electronic toy or piece of equipment. Sometimes I broke things. I loved electric motors, I would take them out of my toys, and wire them up to batteries for fun or slightly nefarious purposes. But mostly I turned perfectly good toys into piles of parts. Once I figured out that my desk lamp had quite a bit of electricity flowing through it. I learned this by taking out the bulb and poking my finger into the bare socket. I had seen someone causing water to separate into hydrogen and oxygen (O2) gas by applying electricity, so I placed to wires into a glass of water and connected the other two to the parts of the same lamp socket. This produced a terrific flash of light, and melted some of the wires into balls of molten copper.

Me having a server is like a ten year old boy having 110 volts of electricity at his disposal. I don't really have a lot of experience with server security and I might get a shock here or there, but I probably won't burn the house down. Since you are reading this very blog, I have managed to successfully migrate it to the new server, as well as my wife's blog and a private git repo. I've even made some improvements. Apache isn't even installed on my server. I'm running Nginx which is directly interacting with php running through a CGI interface. I'm using normal WP Cache for page caching and APC for PHP opcode caching and it seems to be pretty snappy. Now that I have a server playground to work with, I can experiment with other technologies, like CouchDB, MongoDB, Node.js, Django, Rails, the oddly-named Hunchentoot, and whatever else I'd like.

All in all, I've had fun with the new server, setting up file permissions, making sure only the services I want running are running, no extra open ports, and all that. Getting Nginx to do my bidding is a bit of a challenge, but worth it compared to the massive hulk that is Apache. All this is possible through extremely inexpensive Linux VPS services provided by ARP Networks. Apparently, these guys don't spend any money advertising, they just are awesome and get business through word of mouth. That's probably why they are so inexpensive. The introductory level VPS is only $10 a month, compared to $20 for Linode or Slicehost for similar service.

Don't believe every blog you read - a cautionary tale.
Posted by postfuturist on 2010-04-19 19:45:02

Right now, I am using DreamHost shared hosting to host my blog. They pretty much do what they say they do. There is an advantage to easy-click installs in the land of web hosting. I'm really not a server administrator. I'm a programmer. I don't like to think about deploying to servers, or monitoring servers, or configuring servers. However, I decided that I wanted a VPS to host my stuff on. I want to run different things like a git-server, maybe a node-js server, OpenID and possibly even email. There is the idea that I shouldn't be beholden to any company for my personal data. I should be able to host the services that are important to me on a server that, for the most part, I control. After shopping around, I decided to go with ARP Networks. They seem to have really competitive pricing for Linux and Free/OpenBSD VPS hosting. So far, I feel like that is probably a good decision.

I didn't really feel like running full blown Apache on the VPS, and according to a little research, I could even port my Wordpress blog over to a VPS just with Nginx, PHP, and MySQL. I found a few blog entries detailing how they got this combination running. Well, there is one that I had gleamed some information from, but it turns out that information was wrong. Here was the command I was supposed to run, assuming that "jsmith" is my user name.

sudo usermod -G webmasters jsmith
That command is supposed to add my user to the webmasters group. Only problem is that it also removes you from every other group. The proper command is:
sudo usermod -a -G webmasters jsmith
That makes the group assignment an append, not a replace. Here is the important bit of the man page for usermod:
-G, --groups GROUP1[,GROUP2,...[,GROUPN]]] A list of supplementary groups which the user is also a member of. Each group is separated from the next by a comma, with no intervening whitespace. The groups are subject to the same restrictions as the group given with the -g option. If the user is currently a member of a group which is not listed, the user will be removed from the group. This behaviour can be changed via the -a option, which appends the user to the current supplementary group list.
OK, now here is the page with the faulty instructions. It's been up since August of 2008, that's nearly two years.

Hopefully, the folks at ARP Networks will be kind and help me out, or else I'm locked out of my VPS permanently. Don't believe every blog you read, even if looks legit and there are a bunch of comments at the bottom stating how useful and awesome the information is. Really, though, it is my fault for running commands that I didn't understand fully. Anything run as superuser should be carefully inspected.

Update: The ARP Networks folks bailed me out of my ignorance. Apparently, I could have fixed it myself with their out-of-band access tools. Today, I wear the scarlet N (for noob.)

A Simple Compiler, Part 3: Back to Flex and Bison and C and then Lisp?
Posted by postfuturist on 2010-04-15 19:29:52

My quixotic quest to build my own programming language or die trying has lead my in a few different directions. In my last installment, I was going to bring my Python-based lexer and parser into the .NET world with IronPython. After pursuing that angle for awhile I soon grew tired of waiting 10 seconds every time I wanted to run a Python script under IronPython. Every time. I don't really know what was going on each time I ran "ipy" in a terminal, but it would take about 10 seconds to do it. Regular old CPython might take a second the first time you fire it up, but after that you can run scripts from the command line with negligible delay. The more I thought about it, the less I wanted to tie myself to the DLR if it was going to be such a hog. I instead ported my lexer and parser back to Flex and Bison C-based code.

Here is the lexer. The syntax highlighter doesn't really handle flex/lex syntax, so it might be pretty bad.

%option noyywrap
%{
#include "gc/gc.h"
#include "simple.tab.h"
#include "simple.h"
%}
ID      [a-zA-Z_][a-zA-Z0-9_]*
QUOTE   \"[^\"\\]*(?:\\.[^\"\\]*)*\"
%%
-?[0-9]+    { yylval.i = atoi(yytext); return(NUMBER);   }
{QUOTE}     { yylval.c = GC_strdup(yytext); return(STRING);   }
print       { return(PRINT);    }
{ID}        { yylval.c = GC_strdup(yytext); return(getkeyword(yytext,IDENTIFIER));}
[ \t\n]+ /* blank, tab, new line: eat up whitespace */
->          { return(ARROW); }
\.          { return(DOT); }
\*          { return(STAR); }
\/          { return(SLASH); }
\+          { return(PLUS); }
\-          { return(MINUS); }
\(          { return(LPAREN); }
\)          { return(RPAREN); }
\{          { return(LCURLY); }
\}          { return(RCURLY); }
=           { return(EQUALS); }
%%

And the Bison parser.

%{
#include 
#include 
#include "simple.h"
// #include "block.h"

void yyerror(const char *str)
{
    fprintf(stderr,"error: %s\n",str);
}

%}

%union {
    struct node * n;
    int i;
    const char * c;
}

%token <i> NUMBER 
%token <c> STRING IDENTIFIER
%token PRINT FUNC IF ELIF ELSE FOR IN
%token EQUALS STAR SLASH PLUS MINUS LPAREN RPAREN LCURLY RCURLY ARROW DOT
%type <n> script expression explist block ifexp identlist

%right EQUALS
%left PLUS MINUS
%left STAR SLASH
%left DOT
%%

script : explist
    { scriptnode = $1 }
;
explist : /* empty */ 
    { $$ = newlistnode(nt_EXPLIST); }
| explist expression
    {  listnode_add($1,$2); $$ = $1; }
;
expression : NUMBER 
    { $$ = newnumber($1); }
| STRING 
    { $$ = newstring($1); }
| IDENTIFIER
    { $$ = newident($1); }
| expression STAR expression
    { $$ = newnode(nt_STAR,$1,$3); }
| expression SLASH expression
    { $$ = newnode(nt_SLASH,$1,$3); }
| expression PLUS expression
    { $$ = newnode(nt_PLUS,$1,$3); }
| expression MINUS expression
    { $$ = newnode(nt_MINUS,$1,$3); }
| expression EQUALS expression
    { $$ = newnode(nt_EQUALS,$1,$3); }
| LPAREN expression RPAREN
    { $$ = $2 }
| FUNC LPAREN identlist RPAREN block
    { $$ = newnode(nt_FUNC,$3,$5); }
| expression DOT LPAREN explist RPAREN
    { $$ = newnode(nt_FUNCCALL,$1,$4) }
| PRINT DOT LPAREN explist RPAREN
    { $$ = newnode(nt_PRINT,$4,NULL) }
| ifexp ELSE block
    { $$ = newnode(nt_ELSE, $1, $3); }
| ifexp
    { $$ = $1; }
| FOR identlist IN expression block
    { $$ = newnode(nt_FOR, newnode(nt_FOR_ITER, $2, $4), $5); }
;
identlist : /* empty */
    { $$ = newlistnode(nt_IDENTLIST); }
| identlist IDENTIFIER
    { listnode_add($1,newident($2)); $$ = $1; }
;
block : LCURLY explist RCURLY
    { $$ = newnode(nt_BLOCK, $2, NULL); }
;
ifexp : IF expression block
    { $$ = newnode(nt_IF, $2, $3); }
| ifexp ELIF expression block
    { $$ = newnode(nt_ELIF, $1, newnode(nt_IF,$3, $4)); }
;

There is a bunch of other C code that this parser is calling to build nodes in the syntax tree, mainly with the newnode() function which looks like this:

simplenode nn()
{
    simplenode n;
    n = GC_MALLOC(sizeof(struct node));
    n->L = NULL;
    n->R = NULL;
    return n;
}

simplenode newnode(int type,simplenode L, simplenode R)
{
    simplenode n = nn();
    n->type = type;
    n->L = L;
    n->R = R;
    return n;
}

There is of course a bunch of other code, but honestly, it is a bit tiring to do all this tree manipulation in C. It's a fine language, but a bit verbose and fiddly, what with all the pointer madness, which I've attempted to simplify as much as possible. In the preceding code, the simplenode type is really just a typedef of struct node *. And this is what a node really is:

struct node
{
    int type;
    union {
        simplenode L;
        Array arr;
        int number;
        simplestring string;
    };
    union {
        simplenode R;
        Block block;
    };
};

As you can see, I really just want a node to be a dynamic container that can hold other tree nodes, or data based on type, but then I feel like I am trying to turn C into something it isn't. I'm declaring types so I can use them, but it turns out that very little is enforced at compile time but more importantly run-time. I like my dynamic languages to have strong typing so when the code is running an integer isn't happily treated as a pointer, corrupting some random memory leading to hard to reproduce bugs. I have found some useful tools.

One of those tools is gdb the Gnu Debugger. As long as I compile with -g, I can set breakpoints or run the code from gdb and see what is happening when the program segfaults. Development of this code has lead to a lot of segfaults. It's a painful development style. I guess it stopped being fun, so I've set this whole thing aside to take a look at other code. That lead me to another runtime that is already doing something very similar, Lua. The Lua runtime is dynamic with strong typing, written in portable C, fast, as is a pretty decent language, with first class functions and other nice features. It has a lot of moving parts, like it's own lexer, parser, and garbage collection. I had already punted on all three, utilizing flex, bison, and Boehm GC.

Then, something else happened. I remembered all those unbelievably over-confident Lisp guys, even Paul Graham raves about Lisp. I remembered vaguely that it had garbage collection and bunch of other things decades before they became used in mainstream programming. I thought I had better see what all the fuss is about before I go off and try to build my own programming language. Well, I got more than I bargained for. Happily, all you need is an internet connection and a computer to find some highly recommended books on Lisp. I am currently working my way through Practical Common Lisp by Peter Seibel. The complete text is available on Seibel's webpage. Neat, huh?

I am about 9 chapters in and starting to get a feel for the language, but it hasn't been easy. I have this to say about Lisp before I say anything else, it is not very approachable. Even though Seibel is a great writer (loved Coders at Work, I own an ink-on-wood-pulp version), it takes intense concentration to understand what the heck is going on. Lisp is different. It's different in the way that animal species on Australia are different. It's like it grew up as a language in isolation from other species of language. The genetic lines are so different between Lisp and the Algol/C family of languages that I feel intensely uncomfortable at times. I have to stop, go back and reread a section when I get to certain code samples, or sit and stare and try to puzzle out what is going on in the code.

Most blobs of code are basically prose. You can read through them and get a feel for what is happening quickly, but Lisp is more like poetry. There is a lot of meaning packed into each line and symbol. The ability to stack up abstractions is mind boggling. The Common Lisp flavor of Lisp, which is basically a standardization of a bunch of production Lisps is not a pure functional language. It is far from it, though it does have a lot of functional bits and with some restraint can be used like one might use a functional language. It's data structures are all very mutable. Lisp allows a variety of programming styles and paradigms. The crazy syntax is just a generic nested list (tree) syntax making the data and code pretty much the same thing. This has some heavy implications for meta-programming. I'll leave it at that for now. I'm going to keep investigating this crazy Lisp world for awhile before I do anything else.

A Simple Compiler, Part 2: Write My Own Lexer in Python
Posted by postfuturist on 2010-04-01 22:14:02

As you may remember from Part 1 of this series, I have embarked on the task of creating my own programming language. I wrote a lexer and parser for my theoretical language in Python using PLY. That is all well and good. The next part I need is a good runtime. Here are some of my options:

  • Interpret the parse tree in Python.
  • Generate machine code and build native executables.
  • Target LLVM.
  • Generate C code to be compiled by GCC.
  • Target the JVM.
  • Target the .NET/Mono CLR.
  • Build it with Parrot.
  • Build it with the Dynamic Language Runtime which sits on top of the CLR.
There are plenty of other options. I've already implemented a subset of the language directly in Python, the first one. I don't really want to spend a lot of time doing that because being interpreted inside Python is going to be laughably slow. I still might, as a reference implementation, but it doesn't sound like much fun. I don't really want to build a language that compiles natively, I want the language to be more dynamic than that. The best options look like JVM (weak dynamic language support, still), Parrot (a bit of a moving target), and the DLR. Now, I'm not a huge Microsoft fan, but they have certainly put a lot of effort into the .NET framework, and the DLR, to their credit, seems to be a fantastic bit of engineering. Since it is open source and available in the repos of some bleeding-edge Linux distros (I'm using Ubuntu 10.04 beta), I decided to go that route, or at least try.

I need to build my parse tree in some .NET language. Fortunately, IronPython exists to run my Python code inside the Mono environment. Unfortunately, PLY chokes when run on IronPython. I tried debugging the issue, and once I got past one, I found another. So, I've had to start over a bit. The first thing I did was rewrite the lexer in a subset of Python that also runs on CPython as well as IronPython. Here it is:

# lexer2.py | a simple lexer in python
import re

# Token class.  This class is stolen from PLY
class LexToken(object):
    def __str__(self):
        return "LexToken(%s,%r,%d,%d)" % (self.type,self.value,self.lineno,self.lexpos)
    def __repr__(self):
        return str(self)

class lexer():
    def __init__(self,input):
        self.input = input
        self.pos = 0
        self.input_length = len(input)
        self.lineno = 0
        self.charno = 0
        self.reserved = {
            'print': 'PRINT',
            'func': 'FUNC',
            'if': 'IF',
            'elif': 'ELIF',
            'else': 'ELSE',
            'for': 'FOR',
            'in': 'IN',
        }
        self.symbols = [
            ('->', 'ARROW'),
            ('=', 'EQUALS'),
            ('*', 'STAR'),
            ('/', 'SLASH'),
            ('+', 'PLUS'),
            ('-', 'MINUS'),
            ('(', 'LPAREN'),
            (')', 'RPAREN'),
            ('{', 'LCURLY'),
            ('}', 'RCURLY'),
        ]
        self.complex = [
            (r'[a-zA-Z_][a-zA-Z0-9_]*', 'IDENTIFIER'),
            (r'"[^"]*"', 'STRING'),
            (r'-?\d+', 'NUMBER'),
        ] 
        self.tokens = self.reserved.values() + [x[1] for x in self.symbols + self.complex]
        self.complex_compiled = [(re.compile(r), t) for r,t in self.complex]
        self.symbols_compiled = [(s,t,len(s)) for s,t in self.symbols]
        self.ignore = " \t\n"

    def getTokens(self):
        return self.tokens

    def token(self):
        # skip ignored characters
        while(self.pos < self.input_length and self.input[self.pos] in self.ignore):
            if(self.input[self.pos] == "\n"):
                self.lineno += 1
                self.charno = 0
            else:
                self.charno += 1
            self.pos += 1
        if(self.pos == self.input_length):
            return None
        for cr,ttype in self.complex_compiled:
            mo = cr.match(self.input, self.pos)
            if(mo):
                value = mo.group()
                pvalue, pttype = self.process(value, ttype)
                tok = LexToken()
                tok.value = pvalue
                tok.type = pttype
                tok.lineno = self.lineno
                tok.lexpos = self.pos
                self.pos += len(value)
                self.charno += len(value)
                return tok
        for sym,ttype,length in self.symbols_compiled:
            if self.pos + length < self.input_length and self.input[self.pos:self.pos + length] == sym:
                tok = LexToken()
                tok.value = sym
                tok.type = ttype
                tok.lineno = self.lineno
                tok.lexpos = self.pos
                self.pos += length
                self.charno += length
                return tok
        print "invalid character : ", self.input[self.pos], " at ", self.lineno, " , " , self.charno
        return None
    
    def process(self, text, ttype):
        if(ttype == 'IDENTIFIER'):
            ttype = self.reserved.get(text, 'IDENTIFIER')
        elif(ttype == 'NUMBER'):
            text = int(text)
        elif(ttype == 'STRING'):
            text = text[1:-1]
        return text, ttype
        

if __name__ == "__main__":
    data = '''
    a = 3 + 4 * 10
    b = a + -20 *2
    { print(300) }
    c = "hello"
    print(b)
    there->(now)
    '''

    # Give the lexer some input
    lex = lexer(data)

    # Tokenize
    while True:
        tok = lex.token()
        if not tok: break      # No more input
        print tok
It's a pretty simple lexer, but it produces the same exact output as the last one, so its good enough for now. I've had my head buried in compiler books and articles for the last few days. Parsing is more complex, so stay tuned for that.