Linked by Thom Holwerda on Mon 16th Apr 2012 02:08 UTC
Thread beginning with comment 514319
To view parent comment, click here.
To read all comments associated with this story, please click here.
To view parent comment, click here.
To read all comments associated with this story, please click here.
"Do you seriously think it is a good idea for logic bugs to propagate through hundreds of research projects derived from the same code?
The alternative is to have hundreds of research projects derived from hundreds of different codebases, each with its own bag of logic bugs. All code are inherently buggy, and scientists, due to lack of basic training in software engineering (e.g. code reuse, unit test, etc), tend to write buggier code. "
Yes, that's the point.
If you have multiple independent implementations of the same formula, the more chance you have of finding problems with the actual formula.
You do understand that there is a development methodology that's used for designing and writing robust code by implementing with different languages, don't you?
In fact, many modern CPUs have a similar thing where a calculation takes place twice and the results are compared at the end to verify the calculation was correct. What I'm suggesting is that it's analogous to having multiple cleanroom implementations of formula.




Member since:
2005-07-06
The alternative is to have hundreds of research projects derived from hundreds of different codebases, each with its own bag of logic bugs. All code are inherently buggy, and scientists, due to lack of basic training in software engineering (e.g. code reuse, unit test, etc), tend to write buggier code.
Last year, I had to do some data deduplication work using string metric such as Jaro–Winkler distance. I found 3 open source libraries, one in Java, two in Python. And the three of them implement the formula differently, resulting in significantly different metrics. The good thing is that, because the code is open, I submitted patches to the maintainers. Some got fixed, some did not (but the bug report is publicly available nevertheless).
One of these libraries, Febrl (Freely Extensible Biomedical Record Linkage), was released by Australia National University as part of research. I owe greatly to the authors, in particularly their willingness to put the code out for scrutiny.