Real World Academia: Unit testing isn't enough. You need static typing too.

Wednesday, June 13, 2012

Unit testing isn't enough. You need static typing too.

When I was working on my research for my Masters degree I promised myself that I would publish my paper online under a free license, as soon as I had graduated. Unfortunately there seems to be an unwritten rule of Graduate School research. You spend so much time focusing on a single topic of study that by the time you graduate you are sick of it. So more than year later I'm finally putting my paper online. For those that don't want to read the full paper (it's not terribly long for a research paper at 60 pages, but it's no tweet either) I'll include a shorter summary below. The summary will omit some important information and so if you would like to provide constructive or destructive feedback I ask that the feedback be directed towards the full paper and not the quick summary.

For me research I wanted to test the frequently cited claim by proponents of dynamically typed programming languages that static typing was not needed for detecting bugs in programs. The core of this claim is as follows:

1. Static typing is insufficient for detecting bugs, and so unit testing is required.

2. Once you have unit testing static type checking is redundant.

3. Because static typing rejects some valid programs static typing is harmful.

Despite the fact that I had heard and read this claim many times I couldn't find any research to back this claim up. So I decided to conduct an experiment to see if in practice unit tests really did obviate static typing for error detection. I also wanted to see if developers frequently use dynamic constructs that can't be expressed in a statically typed programming language.

My experiment would consist of finding examples of open source, unit tested programs written in a dynamically typed programming language and manually translating them into a statically typed programming language. I would then quantify how many (if any) defects were detected by the type checker, and how many dynamic constructs couldn't be directly expressed due to being rejected by the static type checker. I should emphasize that for this experiment I would *not* be simply rewriting the program, but doing a direct line by line translation from one programming language to another. I would not count defects that were not detected by the type checker, nor any defects that could not be reproduced in the original program.

Before starting the experiment I needed to choose a dynamically typed programming language that I would translate programs from. I also needed to choose a statically typed programming language that I would translate those programs to. The criteria for the dynamically typed programming language were as follows:

The language should be dynamically typed
The language should have support for and a culture of unit testing
The language should have a large corpus of open source software for studying
The language should be well known and considered a good language among dynamic typing proponents

With this criteria in mind I selected Python. The next step is to chose the statically typed programming language. For this selection I used the following criteria:

The language should be statically typed
The language should execute on the same platform as Python
The language should be strongly typed
The language should be considered a good language among static typing proponents

I selected Haskell for the statically typed programming language.

The next step was to choose some unit tested programs to translate from Python into Haskell. I randomly picked four projects, The Python NMEA Toolkit, MIDIUtil, GrapeFruit and PyFontInfo from the https://code.google.com/ and https://bitbucket.org source code hosting sites.

The Python NMEA Toolkit

The translation of the Python NMEA Tookit from Python to Haskell led to the discovery of nine type errors. Three of them could be triggered by malformed input and the other six by an incorrect usage of the API. Only one of the type errors would have been guaranteed to have been discovered had full unit test coverage been employed. Additionally there was one run time error that could be eliminated once static typing was applied. Two unit tests could have been eliminated as their only function was to perform type checking. No dynamic constructs were used that could not be directly translated into Haskell.

MIDIUtil

The translation of MIDIUtil led to the discovery of 2 type errors. Only one of the type errors would have been certainly been caught had full unit test coverage been employed. An additional run time error could also be eliminated by static typing. None of the unit tests only tested for type safety and so none of them could be eliminated. The MIDIUtil code did use struct.pack and struct.unpack which could not be directly translated as they both rely on format strings that determine the type of arguments and return values. However in all cases the format strings were hard-coded, so the Haskell version could instead use hard-coded functions instead of the hard-coded format strings with no loss in expressiveness. Had the MIDIUtil code stored these format strings in external configuration files then the program would likely have required a re-design to express it in a statically typed language.

GrapeFruit

The translation of GrapeFruit to Haskell did not result in the discovery of any type errors. A single run time error could be eliminated by static typing. Additionally a single unit test could have been eliminated that only tested for type safety. No dynamic constructs were used that could not be directly translated into Haskell.

PyFontInfo

The translation of PyFontInfo resulted in the discovery of six type errors. Two run time errors could be eliminated by static typing. A single unit test could have been eliminated. The PyFontInfo code also used struct.pack and struct.unpack which can not be directly translated, but a simple work around exists.

Results

The translation of these projects revealed that all of these projects could have been written in a statically typed programming language with only minor code changes. Furthermore, unit testing did not seem to be an adequate replacement for static type checking. A total of seventeen type errors were discovered. All of the type errors that were discovered were the result of bugs in the original Python code that were not discovered by the unit tests. Many of the bugs existed in code that did have unit test coverage.

Conclusion

The results of this experiment indicate that unit testing is not an adequate replacement for static typing for defect detection. While unit testing does catch many errors it is difficult to construct unit tests that will detect the kinds of defects that would be programatically detected by static typing. The application of static type checking to many programs written in dynamically typed programming languages would catch many defects that were not detected with unit testing, and would not require significant redesign of the programs.

Future Work

The translation of these four projects do provide an interesting data point on the effectiveness of unit testing for defect detection. I hope that others will try to conduct similar experiments on more samples of dynamically typed programs.

The full length paper is located here.

The original Python code and the Haskell translation are here.

124 comments:

AnonymousJune 18, 2012 at 6:06 PM
Sadly, if the code was trivial to rewrite in another language, then the code is trivial, and not likely to have significant logic bugs.

In all the years I've spent as a maintenance coder 99% of all the bugs I've found and fixed were logic bugs that had nothing to do with type errors. And there is no compiler on earth that could have found this type of logic error.

My problem with most discussions about static typing isn't in the utility of having static typing, it's that focusing on a subset of all bugs that is so small leads people to believe that problem is larger than it really is. Let me say that again, thinking about static type bugs makes static type bugs seem more important than they really are.

What's worse, is that programmers in static typed languages tend to get the most feedback about these types of bugs, making them seem even more prevalent than the bugs that their compiler can't detect.

It's like the beginning programmer who struggles just to get their syntax right, and they think that the only bugs that exist are stray semicolons, or mismatched quotes.
ReplyDelete
Replies
Sam Tobin-HochstadtJune 20, 2012 at 7:22 AM
My dissertation on Typed Racket [1] contains somewhat more data, with similar conclusions. I found, like you, that failure to handle error cases is a big source of type mistakes.

[1] http://www.ccs.neu.edu/racket/pubs/dissertation-tobin-hochstadt.pdf
ReplyDelete
Replies
Matt RJune 20, 2012 at 7:25 AM
How did you gauge the quality of the unit test suites of the projects you translated?
ReplyDelete
Replies
Tim RufflesJune 20, 2012 at 7:47 AM
Thanks for an excellent summary and for attacking such an interesting open question! I'll have to read the source at length.
ReplyDelete
Replies
Tim MJune 20, 2012 at 7:52 AM
I don't want to dismiss your work or your sharing of your results, but I'd wonder if you've actually tested your hypothesis.

Haven't you instead shown that bugs not detected in one way can sometimes be caught in other ways?

The programs you chose were already developed: after the event it's hard to know, but I'd suggest the unit tests have previously caught a large number of bugs during development - how many of these would have been caught by static typing, and how many would have slipped past the type checker?

All code, in practice, has bugs... the more *ways* you try to find them (unit tests, inspection, static analysis), the more you find, so testing any project in "a new way" is likely to expose some bugs....
ReplyDelete
Replies
AnonymousJune 20, 2012 at 8:05 AM
Types and tests are one and the same. Types provide universal qualification that a property is satisfied across a program. A test provides existential qualification that a certain property holds in a certain situation.

Tests are weaker than types, but can be used to test a much broader class of problems.
ReplyDelete
Replies
riffraffJune 20, 2012 at 8:20 AM
this is an interesting paper, but I cannot really understand how you state that full test coverage (by which I assume C4 ) would not have detected these errors, especially from a cursory look aof the first two examples, e.g. _parse_GSV and get_velocity examples.
ReplyDelete
Replies
AnonymousJune 20, 2012 at 8:47 AM
Bullshit
ReplyDelete
Replies
FoudresJune 20, 2012 at 8:48 AM
I read you post, and your conclusion has I think one important point at least for me:

"The translation of these projects revealed that all of these projects could have been written in a statically typed programming language with only minor code changes."

That's a bold statement. I would assume that one use dynamic typing to avoid managing all the types and concentrate on the contrary of getting the job done.

So I expect to see a significantaly smaller code size when using dynamic typing than static typing. Thus making the code easier to understand, maintain...

That why I'am very interrested by lisp familly of language and why I think [lisp] macros are so important.

You finding intrigate me. I would hope that the dynamic version would come with at least like 2 time less lines of code than the statically typed version. Isn't it the case?

And if not, do you have an idea why?
ReplyDelete
Replies
Joel Parker HendersonJune 20, 2012 at 8:53 AM
Superb research, thank you for your excellent hard work. The static type / unit test distinction is going to get very interesting with languages like Dart that have optional type safety.
ReplyDelete
Replies
JulianJune 20, 2012 at 9:56 AM
I started reading the original document, and an important question jumped out.

Have you fixed the sprinklers yet?
ReplyDelete
Replies
jdavid.netJune 20, 2012 at 9:57 AM
You miss another strong point, and that is how programmers get paid. Unfortunately for the industry we don't get paid to make things bug free, we get paid to make things work. It's a constant battle of features vs. bugs. To say the least, shipping code wins.

Not only does this approach skip out on logic bugs, but it misses something larger, it misses design bugs, or UX bugs.

A lot of what code is, is a way to sketch out ideas and see if they work. Once something has passed that very bar then it starts moving up the pipe and collects patterns that encourage consistency.

As team size grows so do the number of programming styles, i see unit testing and static typing to be another component of keeping code consistent between programmers.

Static typing also works best when you have deep object graphs, which in the web world pushes you towards patterns that are very bad for web scale applications. Ex. you probably don't want a table for both apples and oranges, in-fact you probably don't want a table for either, you probably want a table called entities.

I'd much rather the programming world focus on putting a weight on function cost and showing in real time to a developer what their code is costing. In javascript, I am always finding people doing some sort of blocking DOM operation on a user event, when they could have just as easily performed that event on a separate pseudo thread. It's stuff that like that eats at your time. Fixing a typed error takes like 2 seconds in debugger with a unit test.
ReplyDelete
Replies
Andriy TyurnikovJune 20, 2012 at 10:05 AM
Rewriting program from static language into dynamic type language cause type checking causing type-related bugs.
Well - sure.
Keep type model in mind, when writhing the code.

Garbage in - garbage out
ReplyDelete
Replies
Andriy TyurnikovJune 20, 2012 at 10:07 AM
If rewriting from Java to C cause memory management bugs, that will prove uselessness of garbage collectors?
ReplyDelete
Replies
AnonymousJune 20, 2012 at 10:25 AM
I think that this is a fantastic research paper. Thank you for sharing it with us on Hacker News. I am greatly enjoying the discussion that has ensued around your conclusion.

I'm currently in the process of learning Haskell, and reading your findings has increased my desire to become fluent in this language.

I only have a few, short, comments about your research.

I read a few comments on Hacker News that stated that your research paper has a small sample size. I don't think that is the case. I think that the sample size should be counted as the lines of code and not the number of projects. From that perspective, the sample size is quite significant.

I've also read a few comments that question the importance of the bugs found. I agree with your own comment on this issue - any bug with the potential to crash an application is very important.

The only thing in your paper that could've possibly been expanded on was the nature of static typing. In other words, not all statically typed programming languages are created equal. I strongly doubt that Java has as good of a static typing system as
Haskell. You could have made a stronger case in favor of using Haskell or another language with a comparably admirable implementation of static typing.

All in all, I find it interesting that we are still having a discussion about this. Unit testing is definitely not a comparable replacement for a static type system. To me, tribalism is the only possible explanation for why there are any doubts about this at this stage.

I'm a strong proponent of dynamic languages. I've used Ruby, Perl, Python, etc. However, to me it makes sense that static typing would catch bugs than unit testing could possibly miss.

Unit tests are written by human beings, and, like the code they test, they can be incomplete and they can contain bugs. A static typing system helps to avoid a certain class of bug that tends to be insidious and can result in runtime error. Even the best unit tests will not be guaranteed to catch all possible type errors.

Then there is something else to consider as well. Unit tests actually take time to write. Comprehensive unit tests can take a remarkable amount of time to write - almost as much as the code that they test. Static typing is a boon to programmers because it saves the trouble of having to write those unit tests. This is not to say that unit tests are not useful within a statically typed language. They can still be used to test for adherence to a specification and logic flaws. However, not needing to write unit tests to verify types is still an undeniable bonus.
ReplyDelete
Replies
UnknownJune 20, 2012 at 1:33 PM
What I found most interesting is that the projects you picked don't really highlight the design uses of duck typing. The code is fairly trivial and so it's easier to implement in a statically typed language.

I agree with the thought that unit tests are not enough, but I don't think statically typed languages are the answer. I like a blend of unit tests and static analysis (like Pylint) to help find these sorts of bugs.
ReplyDelete
Replies
Wai Yip TungJune 20, 2012 at 2:08 PM
I have scanned your paper. I think we have some disagreement on what is considered bug. For example, regarding the PyFontInfo project you say

While the parse method does call parseChildren after the self.data member has been created, if the parseChildren method were to be called directly without calling parse first, Python would raise an AttributeError when self.data was referenced. It is likely that the original developer intended for the parseChildren methods to only be called from the parse method, but neglected to enforce the restriction.

This looks like a complete legitimate program to me. It is used according to the implicit assumption. Unit testing is sufficient to verify that parse is called before parseChildren. I don't believe static type checking will do anything here. With Java you simply get an NullPointerException.

If the Haskell compiler finds anything, it must be that it is doing some flow analysis, not static type checking. In any case it will be a false alarm because parseChildren is indeed invoked as the last step of parse.
ReplyDelete
Replies
TartleyJune 20, 2012 at 5:04 PM
I don't yet understand the assertion that half-decent testing wouldn't have caught those bugs.

Most projects do a poor job of testing, and these projects are reflective of that. This is still an interesting result, but does it really apply to "testing done right"? (e.g. proper TDD with unit vs system testing)
ReplyDelete
Replies
AnonymousJune 20, 2012 at 6:16 PM
This is a nice start and I think it does a fair job of addressing its goals. It raises the question: is the volume of type errors "significant" in terms of the code volume? Other than "<2KLoC" I don't think you detail the size of the codebases. It seems to me that several handfuls of errors in such small apps *does* represent a significant quality shortfall, but perhaps others would disagree. It's also a nice jumping-off point for the dual experiment of translating a type-checked codebase into a dynamic language and seeing if a typical practitioner would write sufficient unit-tests to capture the lost semantics.
ReplyDelete
Replies
UnknownJune 20, 2012 at 7:42 PM
An interesting read.

I think your claim that it's relatively easy to rewrite programs from dynamically typed languages to statically typed is true. The biggest challenges are finding a statically typed language that is flexible and expressive enough; and all the extra overhead managing these explicit types. Your choice of Python makes sense, but Haskell doesn't represent a mainstream statically typed language. Mainstream statically typed languages fall well short of Haskell's expressiveness.

I've often said the only language I'd consider a replacement for Python, is Haskell! I think others would fall in this camp too. However Haskell comes with significant challenges of its own. Had this study been done using C++, C#, or Java, the story may have been very different (and much longer :P ).

Another commenter has mentioned the other big issue here: That typing bugs aren't necessarily the biggest problem. Sure you may have discovered some typing issues, but runtime and logical errors may still abound. From my own experience with Python, probably over half of errors are typing related, but they only occur during development. Logical and runtime errors definitely dominate in production code.
ReplyDelete
Replies
Tiago Albineli MottaJune 20, 2012 at 8:31 PM
I think the big value of unit testing your code is a better design that they guided. To mantain your code without bugs, i think integration and end-to-end tests more efficients. Even they being slow and harder to debug, i cannot live without them.
ReplyDelete
Replies
Ricky ClarksonJune 20, 2012 at 10:12 PM
To the guys saying that type errors aren't the dominant problem in production code, that all depends on how you use types. A simple example, sqrt(-1) is a runtime error in most languages including typed languages, but with a type like PositiveNumber that could be a type error.

That's overly simplistic, but if you delve further you'll find that everything short of requirement misinterpretations can be found statically given sophisticated enough type systems.
ReplyDelete
Replies
AnonymousJune 21, 2012 at 12:35 AM
I find it interesting how eager people are to point out that bugs caught by the type system aren't the *biggest* errors.

Is that really how you write your programs? "I'm not going to even consider this bug, because I already fixed another, even bigger one"?

The results here are extremely interesting.

In the end, it's always down to the individual team to decide if they need static typing: do they need to catch these last couple of errors that snuck past the unit tests?

The really interesting part here is that those bugs *do exist*. That if you choose to rely only on unit testing, you should be aware that you will likely fail to discover a handful of type errors. That doesn't mean no project can be successful if it doesn't rely on static type analysis. It is just something to keep in mind when you decide that you don't need static type analysis.
ReplyDelete
Replies
AnonymousJune 21, 2012 at 1:26 AM
After coding in C/C++ for almost 20 years, Perl for a year somewhere in between, I switched to PHP last year and coded in it extensively. Due to the nature of the project, I had to check into SVN every time I made a program change so i could debug the code (slightly tedious). I was diligent about commenting the source of error. About 70% of the mistakes were variable name typos, case sensitivity typos, and SQL query errors that could not be caught by any language as they were embedded in strings. The time taken in detecting the errors is non-trivial, although the fixes themselves ARE trivial. I much preferred a static type checking system, mainly for basic fat-finger errors that a compiler detects and immediately provides feedback for. Not saying that all dynamically typed languages are inferior or anything of the sort, just that a hybrid approach (linting, or using Haxe or similar statically type checked language to generate other languages) isn't a bad consideration.
ReplyDelete
Replies
AnonymousJune 21, 2012 at 9:45 AM
Evan, thanks for this effort. To follow up on James Iry's reply briefly, I recommend you also have a look at http://idris-lang.org for a dependently-typed language that attempts to do totality checking, but is self-consciously a programming language, rather than a theorem prover such as Coq. Idris is also interesting to a Haskell programmer by being syntactically inspired by Haskell, so it may be easier to pick up than some of the alternatives.
ReplyDelete
Replies
AnonymousJune 21, 2012 at 3:45 PM
So you tested to see if statically typed languages or dynamic languages are better at checking types and found out that the ones that enforce types are better at it.

In other news, it has recently been discovered that 1 = 1.

Have you considered using your time for anything useful?
ReplyDelete
Replies
ChrisJune 21, 2012 at 3:46 PM
Evan, thanks for writing this. I like the idea of your research, but I have a few concerns.

First, I wonder how great the unit testing culture of these projects is. For example in the first you mention a malformed input error that is caught by static typing, I would have assumed that you would have test cases to check for conditions like that (same goes for a statically typed language). Further, are the bugs you found customer facing, or do they have no impact on the project? Not that this is good practice, but it does minimize the impact. Also -- What was the ratio of bugs you found to the project size?

In my experience, one of the problems is that popular statically typed languages (ie: Java) aren't that flexible. And not exclusively because of their type system. As opposed to say, JavaScript, which is very expressive and happens to have weak, dynamic typing. Maybe the best answer is in more flexible statically typed languages.

But I concede that with a dynamically typed language you may need more well-defined code standards and better test suite, but I don't think that's a reason to ditch the language.
ReplyDelete
Replies
ChrisJune 22, 2012 at 12:15 PM
Interesting work Evan, just wanted to mention a little typo in your paper, Chapter 3 Fig 3.:

class car():
# The __init__ method
def __init__(self, color):
self.color = color

# The color method
def color(self):
return self.color

When you create the object the method "color" will be shadowed by the new variable "color" of type str. While this is legal Python it's generally not the desired behavior and the instance variable would be usually called something like _color.
ReplyDelete
Replies
AnonymousJune 24, 2012 at 3:47 AM
Evan, thanks for the good stimulation of thought. I wonder if there's a problem with your methodology.

Your hypothesis is that "Unit testing isn't good enough, therefore static typing is necessary." And that's a summary of your blog post title.

I don't think you proved that by a long shot. I think what you proved is that unit testing (as defined by %100 code coverage) is inadequate. I think you also proved that static type checking is useful for finding bugs. But you haven't proven the linkage that static typing is the only way to find intractable bugs.

In other words, you proved "A" to be a true statement and you proved "B" to be true as well. But you didn't prove "Because A we must use B". "Because of the failure of unit tests we must use static typing" is still an unproven statement.

You can prove that by providing an example of a common intractable bug that is solved efficiently by using static typing, and not as efficiently with extra testing, static code analysis, peer review, etc.

In other words, you have to prove that only static typing is really the only way to solve certain kinds of intractable problems. Because if I can solve the problem though extra testing, then I don't have to use static typing, and your hypothesis is false.

I'm not saying you're wrong. Just that you haven't proven your claim.
ReplyDelete
Replies
UnknownJune 28, 2012 at 1:50 AM
Greetings!

I've translated ( :) ) your blog notes into Russian: https://docs.google.com/document/d/1eMc5CbCy0ihCEbbIFUoM6_2eP55vuFyZdr1yAo-lzn8/edit
ReplyDelete
Replies
AnonymousJune 28, 2012 at 2:49 PM
Interesting read. And it's nice to see *any* research in this area. Like you've, I've been disappointed by the lack of same.

However, I think you misstated the claims on the dynamic side. The claim isn't so much that unit testing will replace static typing, but that the time gained by not having to provide type information and otherwise manage types more than makes up for the time lost by not detecting errors earlier.

This argument is typically aimed at the Java/C+ language family. As such, Haskell has two advantages over those: less work managing types, and stronger type checking.
ReplyDelete
Replies
Alex RJune 29, 2012 at 6:56 AM
Very late to the party, but I feel like I have something to contribute anyway.

I'm on a project using Haskell to implement yet another programming language, and coming up with the theory (i.e. semantics, type rules, etc.) behind it more or less concurrently (the implementation lags by about a month sometimes).

Over the past year, Haskell's typechecker has found at least 5 (probably closer to 10, but I don't want to overestimate) _logic_ bugs in the theory for us.
ReplyDelete
Replies
UnknownJuly 8, 2012 at 10:13 AM
First, I'd like to congratulate you on a well-done study. I hope the research community will attempt to replicate it, to measure the effect of variables such as source and destination language, size of a project, and the application of metaprogramming techniques (which are more commonly used in some languages rather than others).

I blogged a few more observations here: http://blog.rafaelferreira.net/2012/07/types-and-bugs.html.
ReplyDelete
Replies
Ben HutchisonJuly 8, 2012 at 7:01 PM
If you made your program translations available on github or similar, it would facilite easy browsing & comparison of the source and translated programs for the interested passer-by. As its is, they are bundled up in a tar.gz file download, something of a barrier for the casual reader.

You could then hyperlink to the source where some of the errors were found, rather than simply assert you found them, which would make the post more persuasive.

Perhaps that's why so little of the comments refer to your actual translated code.
ReplyDelete
Replies
EvanJuly 9, 2012 at 9:52 PM
Unknown. All of the bugs are described in the paper. I think for the casual reader the paper would provide more context for discussing the individual bugs. I published the code for the benefit of any (non-casual readers) who want to verify the translation and ensure that the translation is correct.
ReplyDelete
Replies
JulianAugust 3, 2012 at 7:07 AM
Yesterday, I gave a talk to the Sydney Python User's Group (SYPY), about this paper. The session was not recorded, but my slides (including speaker's notes) are available on-line:

http://somethinkodd.com/sypy/farrer.pdf

Big thanks to Evan Farrer for giving me something interesting (and a little bit different) to talk about.
ReplyDelete
Replies
Stephen Paul WeberAugust 14, 2012 at 6:54 PM
A lot of these comments are based on the fact that this post (and possibly the paper) only talk about static typing.

a) Haskell's typing is both static and strong. Unlike C++ types, which are mostly weak
b) Haskell's typing is *very* static. Unlike Java, which is largely dynamic as well (supporting both up and down casts)
c) A lot of people seem unaware that static typing *does not* mean type annotation. Type inference needs to be better advocated in general
ReplyDelete
Replies
DaveDecember 20, 2012 at 8:30 AM
Evan,

Your experiment is very interesting and informative. Thanks for sharing it publicly.

It seems to me your results illustrate another aspect of the way code tends to be written in practice, aside from the relative merits of static typic and unit testing. You touch on it in one of your comments on the blog:

"My interest was on whether unit testing obviated static typing *in practice*. Obviously in theory unit testing can obviate static typing (you could encode a static type checker as a unit test), but if no one does this then in practice it really doesn't matter."

My experience supports your finding that unit test suites aren't necessarily well-crafted. I do not conclude from this that unit testing as such is a flawed technique, however. In my work I like to make full use of whatever capabilities the tools offer. When the language has static typing, I want to use it to advantage; the stronger the typing system, the more effort it saves me toward the goal of delivering code people can understand, modify, and feel confident in. When the language doesn't have strong typing, or if it is dynamically typed, then the burden falls to me to ensure the unit test suite covers everything necessary. That is just part of the job. Different tools have different capabilities; when a tool lacks a capability, we have to fill in the blanks. Some other commenters on your post have said similar things.

I don't see this as an either-or question of strong typing versus unit testing. A strongly typed language leaves us free to devote proportionally more time to crafting meaningful and useful test cases to cover whatever the type system doesn't cover.

Unfortunately, many people who get paid to write software just pump out code any way they can. You say this "really doesn't matter," and for the purposes of your study that's quite right.

In the larger scheme of things, I think it does matter. I think it's a deep problem in our line of work. Maybe the quality of unit test suites could be the focus of a future study based on real code, following your example of how to set up such a study.

Anyway, great work. I'll be pointing others to it.
ReplyDelete
Replies
Paddy3118February 16, 2013 at 10:54 AM
"No dynamic constructs were used that could not be directly translated into Haskell."

So your proof is that translating Python into a statically typed loose analogue fails static type checks.

ReplyDelete
Replies
UnknownSeptember 2, 2013 at 12:32 PM
I have been writing code since I was 12 years old. I've spent a lot of time working in both statically and dynamically typed languages. Which is better? In my opinion, which is better depends on the problem domain that you're attempting to solve.

In my experience, most problem domains fall into one of two general categories:

1) The application must behave as expected under known conditions.
2) The application must behave as expected under unknown conditions.

Dynamically typed languages will deliver a higher ROI in case one while statically typed languages will deliver a higher ROI in case 2.
ReplyDelete
Replies

Real World Academia

Wednesday, June 13, 2012

Unit testing isn't enough. You need static typing too.

The Python NMEA Toolkit

MIDIUtil

GrapeFruit

PyFontInfo

Results

Conclusion

Future Work

124 comments:

Labels

Blog Archive

Real World Academia

Wednesday, June 13, 2012

Unit testing isn't enough. You need static typing too.

The Python NMEA Toolkit

MIDIUtil

GrapeFruit

PyFontInfo

Results

Conclusion

Future Work

124 comments:

Subscribe To

Labels

Blog Archive