Friday, November 27, 2009

Google's Go trying to be too clever.

I've discovered a couple of cases where Google's Go is trying to be too clever. There are two expressions who's behavior changes based upon what is done with evaluated results. Lets look at a bit of code.

package main

import . "fmt"

// Define a simple struct
type foo struct {
        a int;
}

// Define an interface
type plus1 interface {
        add1() int;
}

func any(a interface{}) {
        v := a.(plus1).add1();
        Printf("%v\n", v);
}

func main() {
        f := foo{1};
        any(f);
}

So this code defines a simple struct "foo" with a single member a. It defines a "plus1" interface that says that any struct that implements the interface must implement an add1 method that returns an int. The main function instantiates a "foo" and passes it onto the any function. Because "foo" doesn't implement the "plus1" interface the first expression in the function "any" will fail with a runtime exception. the "a.(plus1)" is called a type assertion and it fails if the "a" doesn't implement the "plus1" interface. Now we can switch this code so the type assertion won't assert by assigning the results of the type expression to some variables. Consider the following:

package main

import . "fmt"

// Define a simple struct
type foo struct {
        a int;
}

// Define an interface
type plus1 interface {
        add1() int;
}

func any(a interface{}) {
        a1, ok := a.(plus1);
        if ok {
                v := a1.add1();
                Printf("%v\n", v);
        }
 }

func main() {
        f := foo{1};
        any(f);
}

The only difference in this code is that the type assertion will not assert but instead return two values (yes Go can return multiple values!). The second value is of type bool which indicates whether the first value is "a" cast to "plus1". If "a" isn't of type "plus1" then instead of asserting the type assertion will just return false for the second parameter. While this is clever, it feels like assigning the return value to a variable magically alters the behavior of the expression. I think this will just make the language too difficult to learn. The second operation has to do with go-routines and I'll talk about it next time.

Wednesday, November 18, 2009

Letting the compiler find the bugs

Lets start by looking at a bit of contrived C/C++ code. See if you can spot the (hopefully obvious) bug.
#include <stdio.h>

int main()
{
    float fieldLength = 120; // yards
    float fieldWidth  = 48.77; // meters
    float yardsInAMile = 1760; // yards per mile

    printf("You must run around the perimeter of an American football field %f times to run a marathon\n", 
            26.2 * (yardsInAMile / ((fieldLength * 2) + (fieldWidth * 2))));

    return 0;
}

It probably shouldn't take you too long to realize that in the above code I'm mixing units. I'm adding yards to meters. While bugs like this are obvious in trivial code they can be difficult to track down in more complex software. What can we do to prevent bugs like this from creeping into code? One solution is to use coding standards to reduce ambiguity. For example you could decide to use metric units for all data in your project. You could also use 'typedefs' and variable naming conventions to make errors even more obvious. Consider the following:

#include <stdio.h>
typedef float yards;
typedef float meters;

int main()
{
    yards fieldLengthYards = 120; // yards
    meters fieldWidthMeters  = 48.77; // meters
    float yardsInAMile = 1760; // yards per mile

    printf("You must run around the perimeter of an American football field %f times to run a marathon\n", 
            26.2 * (yardsInAMile / ((fieldLengthYards * 2) + (fieldWidthMeters * 2))));

    return 0;
}

By including the units in the variable names and types it makes the programmers intentions clear. It's also easier to spot bugs in code audits. For even more safety you could define Meters and Yards classes that encapsulate the floats and prevent mixing of types. The advantage of defining new types/classes is the compiler can now ensure that you don't mix types. The disadvantage is that you'll end up writing a class for each type, with lots of overloaded operators. Writing a class for every unit feels to burdensome for all but the most critical code (nukes, airplane firmware, surgery robots, etc.). The problem with the above 'typedef' solution is that C/C++'s 'typedef' really only defines an alias for the type. Both the type-checker and the compiler will treat 'meters' the exact same way it treats 'yards', we want the type-checker to treat 'meters' and 'yards' as distinct types and the compiler to treat them both as floats. Google's new system language Go lets you do just that. Lets reproduce the original bug in Google Go.

package main

import "fmt"

func main() {
    var fieldLength float = 120; // yards
    var fieldWidth  float = 48.77; // meters
    var yardsInAMile float = 1760; // yards per mile

    fmt.Printf("You must run around the perimeter of an American football field %f times to run a marathon\n", 
            26.2 * (yardsInAMile / ((fieldLength * 2) + (fieldWidth * 2))));
}

This has the same bug as the C/C++ code above. The code should be readable to a C/C++ developer, the only really "weird" thing is that the type follows the variable name. Now let's use Go's strong typing to let the compiler catch the bug for us.

package main

import "fmt"

type yards float
type meters float

func main() {
        var fieldLength yards = 120;
        var fieldWidth meters = 48.77;
        var yardsInAMile float = 1760; // yards per mile

        fmt.Printf("You must run around the perimeter of an American football field %f times to run a marathon\n",
                26.2*(yardsInAMile/((fieldLength*2)+(fieldWidth*2))));
}

Notice that all we did is define two new types 'yards' and 'meters' all which should act like 'floats', but should be treated differently by the type-checker. When we compile we get the following error:
invalid operation: fieldLength * 2 + fieldWidth * 2 (type yards + meters)

The most important part of the error is at the end where it tells us we're trying to add 'yards' with 'meters'. The type checker found the bug for us! So how do we fix it? We need some conversion routines. So lets add some methods to the types and fix the bugs.

package main

import "fmt"

type yards float
type meters float
type miles float

func (m meters) toYards() yards { return yards(m * 1.0936133) }
func (y yards) toMiles() miles  { return miles(1760.0 / y) }

func main() {
        var fieldLength yards = 120;
        var fieldWidth meters = 48.77;

        fmt.Printf("You must run around the perimeter of an American football field %f times to run a marathon\n",
                26.2*((fieldLength*2)+(fieldWidth.toYards()*2)).toMiles());
}

With the corrected program we see that you only have to run around the field 133 times instead of 136.6 times!

Go isn't the first nor the only programming language that allows you to encode units as types, but it's close enough to C/C++ for a good comparison. So what's the runtime overhead of the change? Well there are a couple of method calls (toYards(), toMiles()) where the original version did the conversions inline. The error checking happens at compile time because Go is statically typed so there's no runtime performance hit. Personally I'd much rather wait for my program to call a couple of functions than to run around an American football field 3.6 more times.

By carefully using Go's (or any other language with a good type system) type system you can spend more time writing code and less time tracking down bugs.

Monday, November 9, 2009

More fun with C/C++

In order to write a program it's useful (but not necessary) to have an accurate mental model of how the language works, so you can mentally interpret your code before handing it off the compiler. So if you're a C/C++ programmer take a look at the following code and see if you can correctly identify the output:
#include <stdio.h>

int foo(int a, int b, int c, int d)
{
    if (a>b)
    {
        return c;
    }
    else
    {
        return d;
    }
}

int main(int argc, char * argv[])
{
    int x = 2;
    int y = 1;

    printf("Result: %d\n", foo(x+=y, y+=x, x+=y, y+=x));

    return 0;
}
In this case the output is non-deterministic. While C/C++ specify what order arguments will be pushed onto the stack it doesn't specify what order they'll be evaluated in. So there are several possibilities for the output. This flexibility allows compiler writers to order the evaluation of operations in the most efficient way. If you want to avoid confusing/broken code don't mutate variables in argument positions. The person that ends up maintaining your code will thank you for it.

Wednesday, October 21, 2009

Coding Challenge

Here is a little coding challenge. Learning how to solve this probably won't make you a better programmer (it may make you worse) but it's fun anyway. You are given a file named "helloworld.cxx" that contains the following code.

#include <stdio.h>
#include "muck.h"

int 
main(int argc, char **argv)
{
    printf("%s\n", "Hello World!");
}

You can't edit this file! Your job is to write the "muck.h" header file. When you're done compile "helloworld.cxx" and run it and the output should be "Goodbye!" followed by a newline. Nothing more nothing less. Remember you can only modify "muck.h" and you must compile "helloworld.cxx" unmodified. My solution and it's Makefile.

Tuesday, October 6, 2009

The Essence of Scheme (part two)

This post is a continuation of part one. You probably want to read that post first. So in my previous post I walked you through a nifty little though experiment on why Scheme is so expressive, and why those pesky ('s and )'s are actually useful. In this post I hope to provide some concrete examples of using Scheme's macros to do interesting things. So lets start with a simple example. The Scheme programming language (R5RS) doesn't have a built in "while" loop. It does have a "do" loop which is kind of like a "for" loop but I don't think it gets used much. Most Schemers use recursion instead. Lets say you're just learning Scheme and recursion hurts your brain and you really really just want to use a "while" loop just this one time and you promise that you'll pay better attention when you CS professor talks about recursion. It turns out it's quite easy to implement a "while" loop as a Scheme macro1.
;; A while loop
(define-syntax while
  (syntax-rules ()
    [(_ (cnd ...) body ...)
     (letrec ([tmp (lambda () (if (cnd ...) (begin body ... (tmp)) (void)))])
       (tmp))]))
If you want to try it out here is a bit of code that uses the while loop to count from 10 down to 1.
(define count 10)
(while (> count 0)
       (printf "~a\n" count)
       (set! count (- count 1)))
So in 10 lines of code you can define a "while" loop that works just like the "while" loops built into C/C++/Java/Python/Insert_Your_Favorite_Language_Here.

So that doesn't seem to hard. If C didn't have a built in "while" loop you could always create you're own with a C style macro as follows.
#define WHILE(cond) for(;cond;)
Lets try something more difficult. Scheme also doesn't have a C/C++ "#include" directive. So lets write our own.
;; Define include
(define-syntax include
  (lambda (stx)
    (define (read-all port)
      (let ([exp (read port)])
        (if (eof-object? exp)
            '()
            (cons exp
                  (read-all port)))))
    (define read-file
      (lambda (fn ctx)
        (let ([port (open-input-file fn)])
          (begin0
            (datum->syntax ctx (read-all port))
            (close-input-port port)))))
    (syntax-case stx ()
      ((ctx filename)
       (with-syntax ([(exp ...) (read-file (syntax->datum #'filename) #'ctx)])
         #'(begin exp ...))))))
Once you've defined this you can use it as follows.
(include "extern.ss")
Once you've defined the "include" macro any include expression will get evaluated at compile time which opens up the included file reads in the text and injects it into the the source file at the location of the "include" call. Spend a little time thinking about that. A Scheme programmer can write code that gets evaluated at compile time and that code can open, close, and read from files. So if Scheme's macro language allows you to read from files what else can it do? It turns out anything that a Scheme program can do. The above macro can read from files because any old PLT-Scheme program can read from files. A Scheme macro is just a chunk of regular old Scheme code that gets executed at compile time. If your Scheme allows you to open up a TCP/IP socket then you can write a macro to do this at compile time2. If your Scheme allows you to rotate images then you can do it at compile time. If your Scheme allows you to play music then you can play music at compile time3. OK so lets look at some more examples.
(define current-catch 
  (make-parameter 
   (lambda (exn) 
     (printf exn) 
     (exit 1))))

(define (throw exn)
  ((current-catch) exn))

(define-syntax try
  (syntax-rules (catch throw)
    [(_ body ... (catch exn cbody ...))
     (let/cc k
       (parameterize ([current-catch (lambda (exn) cbody ... (k (void)))])
         (begin body ...)))]))
This chunk of code allows users to throw and catch exceptions using "try", "catch" and "throw"4. Scheme doesn't have a built in exception mechanism, but in 13 lines of code you can write your own. The following code will use the above macro and functions.
(try 
 (printf "Yes\n") 
 (throw "Error\n")
 (printf "No\n")
 (catch exn 
   (printf exn)))

It will print "Yes" followed by "Error" the "No" will not be displayed because of the "Error" exception will be thrown. OK lets look at one more.
(define-syntax define-generator
  (lambda (stx)
    (syntax-case stx ()
      ((_ (name args ...) body ...)
       (with-syntax ([yield (datum->syntax #'name 'yield)])
         #'(define (name args ...)
             (define yield-k #f)
             (define return-k void)
             (define (yield retval)
               (let/cc k
                 (set! yield-k k)
                 (return-k retval)))
             (lambda ()
               (let/cc ret
                 (set! return-k ret)
                 (if yield-k
                     (yield-k '())
                     (begin
                       body ...
                       (set! yield-k (lambda (x) (void)))))))))))))
This allow you to define Python style generators. Here is a simple example that defines a "range" generator that will reuturn 0 he first time it's called and 1 the next.
(define-generator (range init limit)
  (let loop ([i init])
    (yield i)
    (when (< i limit)
      (loop (+ i 1)))))

(define foo (range 0 1))
(foo)
(foo)
I hope these examples give some insight at how expressive and malleable the Scheme programming language is. There is one more issue I'd like to address. If you're a Python programmer you may be thinking to yourself. "Hey Python has loops, exceptions, generators and import which is kind of like #include. If python has these built in then why should I use Scheme?" That is a valid but slightly myopic question. The point of Scheme's macro system is not to create another Python with funnier syntax. The real point is that you the developer can extend and grow the language to meet your needs. You may extend the language by adding features that are found in other programming languages, but you may also add features that don't exist in any language, and that don't have a name, and that may not be generally useful but really make the current program you're working on really really easy to write. In Python, Perl and Java users have the power to create and extend libraries, in Scheme users have the power to create and extend languages5. Power to the people!

[1] All examples tested in in PLT-Scheme 4.1.3.
[2] Implementing "http-include" is left as an exercise for the reader.
[3] Playing music at compile time is possible but not recommended by the author.
[4] Anyone want to try implementing exceptions in C?
[5] There is one major downside to learning Scheme. Eventually you'll find yourself writing a program in C/C++/Python/Java and you'll realize that you could solve some problem really elegantly if you could just extend the language just a little bit. Then you'll realize that you can't and you'll spend the next hour wishing you we're never enlightened.

Monday, September 28, 2009

The Essence of Scheme

Note that throughout this post "Scheme" could be replaced with "Lisp" without changing it's meaning.

My Introduction to Scheme
My first exposure to Scheme was in college at the University of Utah. At that time they taught a Programming Languages course that was based on the book Structure and Interpretation of Computer Programs, and the Dr. Scheme programming environment.  While I enjoyed the class I never understood why some people thought Scheme was so special.  In my defense the class really didn't teach us Scheme they just used a small subset of the language to teach us about programming language features such as exceptions, continuations, closures, loops, recursion, conditionals, first class functions etc.  It really was a great class, you learned about these features by writing and modifying interpreters that implemented these features.  The only problem is that many students including myself left that class thinking they knew Scheme, but we didn't.  I didn't get re-introduced to Scheme again until a few years later when I was learning Perl.  I kept on learning about features of Perl that I really liked, and then I realized that all of these features were also in Scheme.  I decided that I needed to go back and tinker with  Scheme.


Learning Scheme
I started learning Scheme by writing programs and reading a lot of blogs, and articles in an attempt to get my brain wrapped around the programming language.  Almost everyone talked about how powerful Scheme is, but it took me a long time before I really understood why Scheme is such an expressive language.  Because it takes a lot of work and study to reach Scheme enlightenment, many developers simply give up on the language, before they really understand it.  I don't know if it's possible to reach Scheme enlightenment, without all the hard work, but I hope this post will at least will give you a peek at why Scheme is so expressive, and why those that really learn it have a tendency to turn into Smug Lisp Weenies.  In short this is the blog post that I wish I could have read when I was working towards Scheme enlightenment.


A shortcut to Scheme enlightenment
Lets get started.  Consider the following snippet:
(if val
  0
  1)

Now if you're an experienced programmer or a Schemer you were probably tempted to just glance at the above snippet and move on, but take a minute and really think about what you're looking at.  As a Schemer I can see three different interpretations of the snippet.

The most obvious interpretation of the above snippet is an 'if' expression. The expressions condition is the 'val' value, the consequence is the integer '0' and the alternate is the integer '1'.

The second interpretation that I see is a linked-list1 of 4 elements.  The first elements in the list are the symbols2 'if' and 'val' followed by the integers '0' and '1'.

The third interpretation is a tree data structure. A linked-list can represent a tree structure by putting the first element in the list as the root and all other elements in the list as child notes of the root. Because Scheme allows a linked-list to have another linked-list as an element arbitrary tree structures can be represented using linked-lists. So the above snippet could also represent the following tree:

This tree also happens to be the Abstract Syntax Tree for the previously mentioned 'if' expression.  At this point you may be wondering why this is important, and we'll get to that but for know just consider that every expression in Scheme is an expression, a linked-list and the AST for that expression. Now lets explore the consequences of that fact.


Scheme and linked-lists
Scheme is one of many dialects of Lisp. The name List is short for "List Processing". Although Lisp dialects do support other data-structures, they have very powerful constructs that make them an ideal language for processing linked-lists3. Because every Scheme expression is also a linked-list and Scheme is an ideal language for processing linked-lists we can conclude that Scheme is an ideal language for processing Scheme expressions4.

Compilers and AST
When you write a compiler for a language one of the first things you may do will be to build an AST for the tokenized and parsed text. An AST is a simplified representation of the code that is in a format that is easily to manipulate.  When the code has been transformed to an AST it is easy for the compiler to perform optimizations, static analysis and transform complex language features into more basic language features.

Bringing it all together
So we've discussed that all Scheme expressions are linked-lists and Scheme is a great language for manipulating linked-lists. We've also discusses that all Scheme expressions are also the AST for that expression, and AST's are a great data-structure for compiler writers to do complex transformations. It shouldn't be too hard to see that Scheme is an excellent language for performing complex transformations on Scheme expressions. In fact Scheme allows you to easily extend the programming language in ways that are only reserved for language designers and compiler writers in other languages. When I learn about some new programming language feature in Python that I wish was in C++, my only choices is to hope that the great C++ language design committee will include the feature in some future version of the C++ spec and that g++ will implement that feature quickly, and my distro will ship the updated version, and ... If I want to use that same feature in Scheme, I can simply implement it and I don't need to ask the language design committee for permission, I don't need to modify the Scheme compiler/interpreter, I don't need to modify any Makefiles, I just implement the feature and start using it immediately!  That IMHO is the feature that makes Scheme such a powerful tool!  In my next post we'll look at some examples of how easy it is to grow your own language with Scheme.


[1] I know that for this to be treated as a list in code it would need to be quoted, but if it was simply read in from a file with a 'read' call it would come in as a list.
[2] Scheme symbols are like read-only strings that can be efficiently compared for equality by simply comparing their pointers. Symbols are often used like enums in C++.
[3] A proof of this statement is left as an exercise for the reader.
[4] Yep, you should prove this one too while you're at it.

Wednesday, September 16, 2009

A(nother) counter intuitive C++ (mis)feature

I've been using C++ for about 12 years now.  Which means I don't know if very well.  Early on in my C++ career I thought I had a good grasp of the language, but C++ is one of those languages that the more you learn the more you understand how hopeless it is to fully understand it.  Every month or so I'm confronted with some new corner case that I didn't know about.  Last months mis-feature had to due with overloading and overriding.  
Consider the following code:
class Bar
{
    public:
        virtual void method(int a)
        {
            std::cout << "Bar " << a << std::endl;
        }
};

class Baz : public Bar
{
    public:
        virtual void foo(std::string a)
        {
            std::cout << "Baz " << a << std::endl;
        }
};
int main()
{
    int i = 9;
    std::string s = "9";

    Baz *b1 = new Baz();

    b1->method(i);
    b1->method(s);
}

If you compile this code (with g++) you'll get the following errors:
error: invalid conversion from ‘int’ to ‘const char*’
error:   initializing argument 1 of ‘std::basic_string<_CharT, _Traits, _Alloc>::basic_string(const _CharT*, const _Alloc&) [with _CharT = char, _Traits = std::char_traits, _Alloc = std::allocator]’


This caught me by surprise.  I had assumed that the compiler would match the method signatures after the method names were mangled, and thus method(std::string) would not hide method(int).  It turns out that methods are matched only by the name, and after the match has been found (in this case finding the method(std::string) the type checker validates the parameter types.  While I can understand this could make the compiler more efficient, it seem that by default the C++ compiler does the wrong (non-intuitive) thing.

So how do you make the compiler do what you want? You just need to add a "using" line right before Baz's method as follows:
...
using Bar::method;
virtual void method(std::string a)
{
...
}
...

Of course you could have guessed that. Right?