Standard ML vs. Python
Posted by postfuturist on 2012-01-19 23:03:49

I've been interested more and more in functional programming languages. Recently, I read through the first 6 chapters of Learn You a Haskell and I was intrigued but the lazy everything and functional purity seem a bit extreme for my tastes at the moment. Not long after I was taking a peek at Standard ML and recognized some similarities with Haskell, mainly the syntax. I also noticed that it is strict (as opposed to lazy) and impure functional. That happens to match my tastes at the moment as I've been into Clojure which is also strict (though sequences tend to be lazy by default) and impure functional. I'll get back to Haskell. For the moment, it's too many new things at once, so Standard ML seems like maybe a good stepping stone to Haskell.

I found a great free book by one of the designers of Standard ML: Programming in ML. I went through the first 12 or so chapters of that book easily. I almost didn't make it through the first chapter, because the author introduces some rather confusing code examples and speaks about them at a high level. Just trying to figure out what the code does without knowing how Standard ML works is very confusing. It's like an introduction to Chinese textbook starting with an analysis of a Chinese poem without translation. The good news is that you can skip that chapter. After the confusing high level analysis in the first chapter, the book systematically builds up the language piece by piece so normal folks like me can understand it.

I don't know why, but the language really strikes a chord with me. It is statically typed, but type-inferred, so you rarely need to specify types in the actual code. This makes a lot of sense to me. When I write code in a dynamic language, like Python, I know what the types are 99% of the time anyways. It's nice to have the compiler figure those out and then let me know when I've violated that. That said, the type system isn't quite as flexible as Python's duck typing so it takes a bit more work.

I wrote a simple version of Conway's Game of Life using a 5 by 5 grid stored as a Standard ML Vector which is, nicely, an immutable data structure. You can see the code here: https://github.com/deliciousrobots/sml_life. The code is sloppy and not optimized. After getting it working in Standard ML, I translated the code to Python 2.7 to compare the experience.

Code length: The SML version is 66 lines and the Python version is 58 lines. Neither one has comments, and some of the lines in both are blank. Python has a slight edge here, and the syntax is flexible enough in both languages that both solutions could have been shortened quite a bit (at the expense of clarity). SML's pattern matching syntax which isn't available in Python helped it stay competitive in this measurement, but idiomatic SML has the "end" on its own line, and lot of code blocks start with a "let" on its own line, too.

Functional purity: The SML version is completely functional. The language allows "mutation" by having a mutable "ref" type as well as arrays which have mutable cells. But normal variables are only bound once, and immutable. For the grid itself, I used the Vector type which is immutable. To calculate 100000 generations of the grid, I used a tail-recursive function in SML. When I first translated the code to Python, I forgot that Python doesn't optimize for tail recursion, so it blew the stack pretty quickly, I had to change it to an imperative loop, reassigning the same internal function variable.

Efficiency: The algorithm I use is pretty naive, but similar between the two implementations. That said, I Vector in the SML code and therefore had to use Vector specific functions like Vector.map. In the Python code, I just used a list and the builtin map function. That was nice, but when it came time to optimize the Python code, I couldn't really do anything to make it run faster. I tried replacing the list with an "array" of long integer using the "array" from Python's standard library, but the map function just outputs a normal list, there's no special "array.map" function which will process an array to an array. I ran the SML code with MLton, an optimizing compiler. 100000 iterations takes 0.7 seconds. CPython 2.7.2 does the same work in an embarrassingly long 26.8 seconds. My attempts at giving the code some type hinting by using the typed arrays from Python's standard library did not make the code any faster. It was actually a bit slower. I think the typed arrays in Python are probably more for memory consumption than algorithmic speed. PyPy 1.7 fared much better than CPython, and completed the benchmark in 6.4 seconds, a full 4 times faster than CPython, but still 9 times slower than MLton.

Ease of Development: This was close. I had to be a little more careful about type issues with the SML code. While writing the SML code, I would write some code, the compiler would complain, and I'd fix it, and without fail, when the compiler was happy, the output of the code was correct. I never had a runtime exception although they are certainly possible with SML. While writing the Python version, I rarely had compilation or syntax errors, but I runtime errors from mistyped variable or function names or other problems like the stack overflow I mentioned earlier. Python doesn't care if you reassign a variable to a different type than it was intially, it just explodes later when the code tries to do an invalid operation on the object. All in all, it felt like a similar experience, seeing the compiler error in SML and the uncaught exception in Python. I feel like in a larger codebase the compiler error right now is much mor helpful than the uncaught exception 10 minutes from now when I run the unit tests, or 2 hours later after I've written new unit tests that break, or 2 weeks later when the code breaks in production because of an edge case the tests didn't catch.

Final thoughts: I really like Python for getting things done. The duck typing and completely dynamic nature means that you have to test very carefully and not take anything for granted. After dealing with very large production systems written in dynamic languages, I'm ready for a little type-safety built into my programming languages. Standard ML is pretty nice to code in, despite the sometimes oppressive type system. It is fairly terse and powerful. Where functional features like the lambda syntax feels tacked on and insufficient in Python, it seems very natural and designed in SML. Python doesn't really have great support for immutable data structures and its a slow language. However, Python has an infinitely larger ecosystem so its pretty easy to get things done with it. Standard ML seems nearly dead, which is a shame. But as far as the ML family of languages go, I think Ocaml and Haskell are alive and kicking. Stay tuned.


blog comments powered by Disqus