Friday, November 27, 2009

Google's Go trying to be too clever.

I've discovered a couple of cases where Google's Go is trying to be too clever. There are two expressions who's behavior changes based upon what is done with evaluated results. Lets look at a bit of code.

package main

import . "fmt"

// Define a simple struct
type foo struct {
        a int;
}

// Define an interface
type plus1 interface {
        add1() int;
}

func any(a interface{}) {
        v := a.(plus1).add1();
        Printf("%v\n", v);
}

func main() {
        f := foo{1};
        any(f);
}

So this code defines a simple struct "foo" with a single member a. It defines a "plus1" interface that says that any struct that implements the interface must implement an add1 method that returns an int. The main function instantiates a "foo" and passes it onto the any function. Because "foo" doesn't implement the "plus1" interface the first expression in the function "any" will fail with a runtime exception. the "a.(plus1)" is called a type assertion and it fails if the "a" doesn't implement the "plus1" interface. Now we can switch this code so the type assertion won't assert by assigning the results of the type expression to some variables. Consider the following:

package main

import . "fmt"

// Define a simple struct
type foo struct {
        a int;
}

// Define an interface
type plus1 interface {
        add1() int;
}

func any(a interface{}) {
        a1, ok := a.(plus1);
        if ok {
                v := a1.add1();
                Printf("%v\n", v);
        }
 }

func main() {
        f := foo{1};
        any(f);
}

The only difference in this code is that the type assertion will not assert but instead return two values (yes Go can return multiple values!). The second value is of type bool which indicates whether the first value is "a" cast to "plus1". If "a" isn't of type "plus1" then instead of asserting the type assertion will just return false for the second parameter. While this is clever, it feels like assigning the return value to a variable magically alters the behavior of the expression. I think this will just make the language too difficult to learn. The second operation has to do with go-routines and I'll talk about it next time.

Wednesday, November 18, 2009

Letting the compiler find the bugs

Lets start by looking at a bit of contrived C/C++ code. See if you can spot the (hopefully obvious) bug.
#include <stdio.h>

int main()
{
    float fieldLength = 120; // yards
    float fieldWidth  = 48.77; // meters
    float yardsInAMile = 1760; // yards per mile

    printf("You must run around the perimeter of an American football field %f times to run a marathon\n", 
            26.2 * (yardsInAMile / ((fieldLength * 2) + (fieldWidth * 2))));

    return 0;
}

It probably shouldn't take you too long to realize that in the above code I'm mixing units. I'm adding yards to meters. While bugs like this are obvious in trivial code they can be difficult to track down in more complex software. What can we do to prevent bugs like this from creeping into code? One solution is to use coding standards to reduce ambiguity. For example you could decide to use metric units for all data in your project. You could also use 'typedefs' and variable naming conventions to make errors even more obvious. Consider the following:

#include <stdio.h>
typedef float yards;
typedef float meters;

int main()
{
    yards fieldLengthYards = 120; // yards
    meters fieldWidthMeters  = 48.77; // meters
    float yardsInAMile = 1760; // yards per mile

    printf("You must run around the perimeter of an American football field %f times to run a marathon\n", 
            26.2 * (yardsInAMile / ((fieldLengthYards * 2) + (fieldWidthMeters * 2))));

    return 0;
}

By including the units in the variable names and types it makes the programmers intentions clear. It's also easier to spot bugs in code audits. For even more safety you could define Meters and Yards classes that encapsulate the floats and prevent mixing of types. The advantage of defining new types/classes is the compiler can now ensure that you don't mix types. The disadvantage is that you'll end up writing a class for each type, with lots of overloaded operators. Writing a class for every unit feels to burdensome for all but the most critical code (nukes, airplane firmware, surgery robots, etc.). The problem with the above 'typedef' solution is that C/C++'s 'typedef' really only defines an alias for the type. Both the type-checker and the compiler will treat 'meters' the exact same way it treats 'yards', we want the type-checker to treat 'meters' and 'yards' as distinct types and the compiler to treat them both as floats. Google's new system language Go lets you do just that. Lets reproduce the original bug in Google Go.

package main

import "fmt"

func main() {
    var fieldLength float = 120; // yards
    var fieldWidth  float = 48.77; // meters
    var yardsInAMile float = 1760; // yards per mile

    fmt.Printf("You must run around the perimeter of an American football field %f times to run a marathon\n", 
            26.2 * (yardsInAMile / ((fieldLength * 2) + (fieldWidth * 2))));
}

This has the same bug as the C/C++ code above. The code should be readable to a C/C++ developer, the only really "weird" thing is that the type follows the variable name. Now let's use Go's strong typing to let the compiler catch the bug for us.

package main

import "fmt"

type yards float
type meters float

func main() {
        var fieldLength yards = 120;
        var fieldWidth meters = 48.77;
        var yardsInAMile float = 1760; // yards per mile

        fmt.Printf("You must run around the perimeter of an American football field %f times to run a marathon\n",
                26.2*(yardsInAMile/((fieldLength*2)+(fieldWidth*2))));
}

Notice that all we did is define two new types 'yards' and 'meters' all which should act like 'floats', but should be treated differently by the type-checker. When we compile we get the following error:
invalid operation: fieldLength * 2 + fieldWidth * 2 (type yards + meters)

The most important part of the error is at the end where it tells us we're trying to add 'yards' with 'meters'. The type checker found the bug for us! So how do we fix it? We need some conversion routines. So lets add some methods to the types and fix the bugs.

package main

import "fmt"

type yards float
type meters float
type miles float

func (m meters) toYards() yards { return yards(m * 1.0936133) }
func (y yards) toMiles() miles  { return miles(1760.0 / y) }

func main() {
        var fieldLength yards = 120;
        var fieldWidth meters = 48.77;

        fmt.Printf("You must run around the perimeter of an American football field %f times to run a marathon\n",
                26.2*((fieldLength*2)+(fieldWidth.toYards()*2)).toMiles());
}

With the corrected program we see that you only have to run around the field 133 times instead of 136.6 times!

Go isn't the first nor the only programming language that allows you to encode units as types, but it's close enough to C/C++ for a good comparison. So what's the runtime overhead of the change? Well there are a couple of method calls (toYards(), toMiles()) where the original version did the conversions inline. The error checking happens at compile time because Go is statically typed so there's no runtime performance hit. Personally I'd much rather wait for my program to call a couple of functions than to run around an American football field 3.6 more times.

By carefully using Go's (or any other language with a good type system) type system you can spend more time writing code and less time tracking down bugs.

Monday, November 9, 2009

More fun with C/C++

In order to write a program it's useful (but not necessary) to have an accurate mental model of how the language works, so you can mentally interpret your code before handing it off the compiler. So if you're a C/C++ programmer take a look at the following code and see if you can correctly identify the output:
#include <stdio.h>

int foo(int a, int b, int c, int d)
{
    if (a>b)
    {
        return c;
    }
    else
    {
        return d;
    }
}

int main(int argc, char * argv[])
{
    int x = 2;
    int y = 1;

    printf("Result: %d\n", foo(x+=y, y+=x, x+=y, y+=x));

    return 0;
}
In this case the output is non-deterministic. While C/C++ specify what order arguments will be pushed onto the stack it doesn't specify what order they'll be evaluated in. So there are several possibilities for the output. This flexibility allows compiler writers to order the evaluation of operations in the most efficient way. If you want to avoid confusing/broken code don't mutate variables in argument positions. The person that ends up maintaining your code will thank you for it.