Linked by Thom Holwerda on Mon 16th Apr 2012 02:08 UTC
In the News "Modern science relies upon researchers sharing their work so that their peers can check and verify success or failure. But most scientists still don't share one crucial piece of information - the source codes of the computer programs driving much of today's scientific progress." Pretty crazy this isn't the norm yet.
Thread beginning with comment 514319
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[3]: Yes and no
by j-kidd on Mon 16th Apr 2012 08:19 UTC in reply to "RE[2]: Yes and no"
j-kidd
Member since:
2005-07-06

Do you seriously think it is a good idea for logic bugs to propagate through hundreds of research projects derived from the same code?


The alternative is to have hundreds of research projects derived from hundreds of different codebases, each with its own bag of logic bugs. All code are inherently buggy, and scientists, due to lack of basic training in software engineering (e.g. code reuse, unit test, etc), tend to write buggier code.

Last year, I had to do some data deduplication work using string metric such as Jaro–Winkler distance. I found 3 open source libraries, one in Java, two in Python. And the three of them implement the formula differently, resulting in significantly different metrics. The good thing is that, because the code is open, I submitted patches to the maintainers. Some got fixed, some did not (but the bug report is publicly available nevertheless).

One of these libraries, Febrl (Freely Extensible Biomedical Record Linkage), was released by Australia National University as part of research. I owe greatly to the authors, in particularly their willingness to put the code out for scrutiny.

Reply Parent Score: 8

RE[4]: Yes and no
by kwan_e on Mon 16th Apr 2012 08:53 in reply to "RE[3]: Yes and no"
kwan_e Member since:
2007-02-18

"Do you seriously think it is a good idea for logic bugs to propagate through hundreds of research projects derived from the same code?


The alternative is to have hundreds of research projects derived from hundreds of different codebases, each with its own bag of logic bugs. All code are inherently buggy, and scientists, due to lack of basic training in software engineering (e.g. code reuse, unit test, etc), tend to write buggier code.
"

Yes, that's the point.

If you have multiple independent implementations of the same formula, the more chance you have of finding problems with the actual formula.

You do understand that there is a development methodology that's used for designing and writing robust code by implementing with different languages, don't you?

In fact, many modern CPUs have a similar thing where a calculation takes place twice and the results are compared at the end to verify the calculation was correct. What I'm suggesting is that it's analogous to having multiple cleanroom implementations of formula.

Reply Parent Score: 1