Monday, September 28, 2009

The Essence of Scheme

Note that throughout this post "Scheme" could be replaced with "Lisp" without changing it's meaning.

My Introduction to Scheme
My first exposure to Scheme was in college at the University of Utah. At that time they taught a Programming Languages course that was based on the book Structure and Interpretation of Computer Programs, and the Dr. Scheme programming environment.  While I enjoyed the class I never understood why some people thought Scheme was so special.  In my defense the class really didn't teach us Scheme they just used a small subset of the language to teach us about programming language features such as exceptions, continuations, closures, loops, recursion, conditionals, first class functions etc.  It really was a great class, you learned about these features by writing and modifying interpreters that implemented these features.  The only problem is that many students including myself left that class thinking they knew Scheme, but we didn't.  I didn't get re-introduced to Scheme again until a few years later when I was learning Perl.  I kept on learning about features of Perl that I really liked, and then I realized that all of these features were also in Scheme.  I decided that I needed to go back and tinker with  Scheme.


Learning Scheme
I started learning Scheme by writing programs and reading a lot of blogs, and articles in an attempt to get my brain wrapped around the programming language.  Almost everyone talked about how powerful Scheme is, but it took me a long time before I really understood why Scheme is such an expressive language.  Because it takes a lot of work and study to reach Scheme enlightenment, many developers simply give up on the language, before they really understand it.  I don't know if it's possible to reach Scheme enlightenment, without all the hard work, but I hope this post will at least will give you a peek at why Scheme is so expressive, and why those that really learn it have a tendency to turn into Smug Lisp Weenies.  In short this is the blog post that I wish I could have read when I was working towards Scheme enlightenment.


A shortcut to Scheme enlightenment
Lets get started.  Consider the following snippet:
(if val
  0
  1)

Now if you're an experienced programmer or a Schemer you were probably tempted to just glance at the above snippet and move on, but take a minute and really think about what you're looking at.  As a Schemer I can see three different interpretations of the snippet.

The most obvious interpretation of the above snippet is an 'if' expression. The expressions condition is the 'val' value, the consequence is the integer '0' and the alternate is the integer '1'.

The second interpretation that I see is a linked-list1 of 4 elements.  The first elements in the list are the symbols2 'if' and 'val' followed by the integers '0' and '1'.

The third interpretation is a tree data structure. A linked-list can represent a tree structure by putting the first element in the list as the root and all other elements in the list as child notes of the root. Because Scheme allows a linked-list to have another linked-list as an element arbitrary tree structures can be represented using linked-lists. So the above snippet could also represent the following tree:

This tree also happens to be the Abstract Syntax Tree for the previously mentioned 'if' expression.  At this point you may be wondering why this is important, and we'll get to that but for know just consider that every expression in Scheme is an expression, a linked-list and the AST for that expression. Now lets explore the consequences of that fact.


Scheme and linked-lists
Scheme is one of many dialects of Lisp. The name List is short for "List Processing". Although Lisp dialects do support other data-structures, they have very powerful constructs that make them an ideal language for processing linked-lists3. Because every Scheme expression is also a linked-list and Scheme is an ideal language for processing linked-lists we can conclude that Scheme is an ideal language for processing Scheme expressions4.

Compilers and AST
When you write a compiler for a language one of the first things you may do will be to build an AST for the tokenized and parsed text. An AST is a simplified representation of the code that is in a format that is easily to manipulate.  When the code has been transformed to an AST it is easy for the compiler to perform optimizations, static analysis and transform complex language features into more basic language features.

Bringing it all together
So we've discussed that all Scheme expressions are linked-lists and Scheme is a great language for manipulating linked-lists. We've also discusses that all Scheme expressions are also the AST for that expression, and AST's are a great data-structure for compiler writers to do complex transformations. It shouldn't be too hard to see that Scheme is an excellent language for performing complex transformations on Scheme expressions. In fact Scheme allows you to easily extend the programming language in ways that are only reserved for language designers and compiler writers in other languages. When I learn about some new programming language feature in Python that I wish was in C++, my only choices is to hope that the great C++ language design committee will include the feature in some future version of the C++ spec and that g++ will implement that feature quickly, and my distro will ship the updated version, and ... If I want to use that same feature in Scheme, I can simply implement it and I don't need to ask the language design committee for permission, I don't need to modify the Scheme compiler/interpreter, I don't need to modify any Makefiles, I just implement the feature and start using it immediately!  That IMHO is the feature that makes Scheme such a powerful tool!  In my next post we'll look at some examples of how easy it is to grow your own language with Scheme.


[1] I know that for this to be treated as a list in code it would need to be quoted, but if it was simply read in from a file with a 'read' call it would come in as a list.
[2] Scheme symbols are like read-only strings that can be efficiently compared for equality by simply comparing their pointers. Symbols are often used like enums in C++.
[3] A proof of this statement is left as an exercise for the reader.
[4] Yep, you should prove this one too while you're at it.

Wednesday, September 16, 2009

A(nother) counter intuitive C++ (mis)feature

I've been using C++ for about 12 years now.  Which means I don't know if very well.  Early on in my C++ career I thought I had a good grasp of the language, but C++ is one of those languages that the more you learn the more you understand how hopeless it is to fully understand it.  Every month or so I'm confronted with some new corner case that I didn't know about.  Last months mis-feature had to due with overloading and overriding.  
Consider the following code:
class Bar
{
    public:
        virtual void method(int a)
        {
            std::cout << "Bar " << a << std::endl;
        }
};

class Baz : public Bar
{
    public:
        virtual void foo(std::string a)
        {
            std::cout << "Baz " << a << std::endl;
        }
};
int main()
{
    int i = 9;
    std::string s = "9";

    Baz *b1 = new Baz();

    b1->method(i);
    b1->method(s);
}

If you compile this code (with g++) you'll get the following errors:
error: invalid conversion from ‘int’ to ‘const char*’
error:   initializing argument 1 of ‘std::basic_string<_CharT, _Traits, _Alloc>::basic_string(const _CharT*, const _Alloc&) [with _CharT = char, _Traits = std::char_traits, _Alloc = std::allocator]’


This caught me by surprise.  I had assumed that the compiler would match the method signatures after the method names were mangled, and thus method(std::string) would not hide method(int).  It turns out that methods are matched only by the name, and after the match has been found (in this case finding the method(std::string) the type checker validates the parameter types.  While I can understand this could make the compiler more efficient, it seem that by default the C++ compiler does the wrong (non-intuitive) thing.

So how do you make the compiler do what you want? You just need to add a "using" line right before Baz's method as follows:
...
using Bar::method;
virtual void method(std::string a)
{
...
}
...

Of course you could have guessed that. Right?