Linked by Thom Holwerda on Mon 16th Apr 2012 02:08 UTC
In the News "Modern science relies upon researchers sharing their work so that their peers can check and verify success or failure. But most scientists still don't share one crucial piece of information - the source codes of the computer programs driving much of today's scientific progress." Pretty crazy this isn't the norm yet.
Permalink for comment 514319
To read all comments associated with this story, please click here.
RE[3]: Yes and no
by j-kidd on Mon 16th Apr 2012 08:19 UTC in reply to "RE[2]: Yes and no"
j-kidd
Member since:
2005-07-06

Do you seriously think it is a good idea for logic bugs to propagate through hundreds of research projects derived from the same code?


The alternative is to have hundreds of research projects derived from hundreds of different codebases, each with its own bag of logic bugs. All code are inherently buggy, and scientists, due to lack of basic training in software engineering (e.g. code reuse, unit test, etc), tend to write buggier code.

Last year, I had to do some data deduplication work using string metric such as Jaro–Winkler distance. I found 3 open source libraries, one in Java, two in Python. And the three of them implement the formula differently, resulting in significantly different metrics. The good thing is that, because the code is open, I submitted patches to the maintainers. Some got fixed, some did not (but the bug report is publicly available nevertheless).

One of these libraries, Febrl (Freely Extensible Biomedical Record Linkage), was released by Australia National University as part of research. I owe greatly to the authors, in particularly their willingness to put the code out for scrutiny.

Reply Parent Score: 8