Tuesday, March 16, 2010

Developer Etiquette

I created a list of etiquette guidelines for Software Engineers.  These etiquette guidelines are not for social interactions, but are instead for writing code.  Like most etiquette guidelines these are designed to minimize the discomfort for others.  So here they are.

Before committing I will:

  1. Review the patch.
  2. Review the list of files that are not being committed, but have been modified or are new.
  3. Compile the code.
  4. Run any affected unit-tests.
  5. Test the code to make sure it does not break core functionality.
The reason behind #1 is simple.  It's very easy to accidentally commit code that's not ready, or that contains printf debug code.  It's quite obnoxious to have your terminal sprayed with hundreds of "XXX Hey I'm executing function foo()."  By following guideline number #1 you will rarely accidentally commit half-baked code.  The git add --patch command makes this really easy.

Guideline #2 saves you from forgetting to commit some critical patch, or more commonly forgetting to add a new file to the repository.  I've found that most of the time when code compiles on the developer's computer but not anyone else's, it's because they forgot to add a file.  The git status command helps here.

Guideline #3 is simple.  Committing code that doesn't compile turns a O(1) problem into a O(n) where n is the number of developers on your team.  Don't ever check in code that doesn't compile.

Guideline #4 ensures that you don't accidentally break the unit-tests.  It is much easier to fix bugs earlier rather than later.  So if you break the tests it's better to find out before you commit.

Guideline #5 is similar to #4 but assumes that not all of your code is covered by unit-tests.  Even if you test every line of code with unit-tests, your unit-tests won't tell you that you just made your GUI look like puke.  So run the code and make sure nothing broke.

If you follow these guidelines you can still write horribly broken code and be a terrible developer, but at least you won't prevent your co-workers from being competent.

Wednesday, January 20, 2010

Easy to Use

I've noticed that when choosing a tool for software development one of the criteria that's often proposed is the tool should be "easy to use". While I agree that ease of use should be a criteria for choosing a tool I've come to realize that I have a different opinion on what "easy to use" means than others. When I say a tool is "easy to use" I mean it literally. A tool that is easy to use should make it really easy for me to do what I want. Here are some of the development tools that I use that I consider easy to use:

  1. vim
  2. git
  3. The command line (bash)
All three of these tools let me do my job as quickly and painlessly as possible.  If you asked others for a list of similar tools that are easy to use you may get the following:

  1. notepad
  2. svn
  3. GUI's
By my definition the above tools are not easier to use than my list (they don't allow me to do my job as quickly or painlessly) but they are easier to learn.  I'm not sure if easy to learn and easy to use must be mutually exclusive but I can't think of any tools that I use frequently that have both qualities.  There is nothing wrong with choosing a tool that's easier to learn over one that's easier to use as long as you use the tool infrequently.  Investing years to fully learn vim is a waste of time if you only edit text files for a few minutes every month.  If your going to use a tool frequently over a long period of time then you should spend the time to learn the tool that's easy to use even if it isn't easy to learn.

Tuesday, January 5, 2010

Recursion and Inductive Proofs

I've discovered that programmers frequently stumble when writing recursive algorithms. There are some simple rules that programmers can steal from mathematics to make writing recursive algorithms easy. I should point out that I'm only talking about writing recursive algorithms for interacting with recursive data-structures (lists, trees, etc.). Let's get started by reviewing some basic steps for proving a statement holds for all natural numbers n using an inductive proof.
  1. Base case:  Show the statement holds for a value of n (usually n=1 or n=0).
  2. Inductive step:  Assuming the statement holds for any value of n then show it also holds for a value of n+1
Rule 2.5 would be to make sure that the inductive step reduces the problem space.  Using a base case of n=0 and then and inductive step of n-1 wouldn't be a valid proof because the inductive step does not really reduce the problem space (remember we're talking about natural numbers for this example).  So let's use an inductive proof to show that 0+1+2...n=(n(n+1))/2.

Base case: Show the statement holds for n=0
0=0(0+1)/2
This equation clearly holds.

Inductive step:  Assuming the statement holds for any value of n then show it will also hold for n+1.
0+1+2...n+(n+1) = ((n+1)((n+1)+1)/2
If we assume that the statement holds for n then we can reduce the expression.
(n(n+1))/2 + (n+1) = ((n+1)((n+1)+1)/2
Using some basic algebra we can reduce both sides and discover that the inductive step holds.

Now we can apply same (perhaps slightly tweaked) rules to writing a recursive algorithm for finding the length of a linked-list (a recursive data-structure).

Let's start out with the function signature.
int length(Node *node)
{
  // Do stuff here
}
Base case: Write the algorithm for n=0 (the linked-list with 0 nodes).
int length(Node *node)
{
  if (NULL == node)
  {
    return 0;
  }
  // Do more stuff here
}
Congratulations! If you've make it this far you're above average. You've solved the base case without dumping core.

Now onto the inductive step: Assume that your length function will work for lists of length n now write code to make it work for lists of length n + 1.
int length(Node *node)
{
  if (NULL == node)
  {
    return 0;
  }
  else
  {
    return 1 + length(node->next);
  }
}
Notice that if we assume that "length(...)" works for lists of length n then for lists of length n + 1 all we need to do is add 1 to the length. You should also notice that we passed node->next to the "length(...)" function not just node because rule 2.5 states that the inductive step must reduce the problem space and just passing node would not reduce the problem space. Congratulations! If you've make it this far you're now well above average. Now not only will your algorithm not core dump, but it won't hang either.

So let's review what we've done. We've used the rules for inductive proofs to guide us in writing a recursive algorithm. Does this mean we've proved that this algorithm is correct? The answer is, not really or at least, not formally. Even though we haven't proved this algorithm is correct we can be much more confident that it will work.

If we were to write unit-tests for this algorithm (which you should *always* do before writing the algorithm) we could use the rules of induction to guide our unit tests. We could write a test for testing the base case and another test for testing the inductive case.

Since math has been around longer than computers, it makes sense that we could borrow a few ideas from it to make our jobs easier.