Linked by Thom Holwerda on Mon 16th Apr 2012 02:08 UTC
In the News "Modern science relies upon researchers sharing their work so that their peers can check and verify success or failure. But most scientists still don't share one crucial piece of information - the source codes of the computer programs driving much of today's scientific progress." Pretty crazy this isn't the norm yet.
Thread beginning with comment 514307
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE: Yes and no
by cyrilleberger on Mon 16th Apr 2012 06:02 UTC in reply to "Yes and no"
cyrilleberger
Member since:
2006-02-01

There is a danger that by releasing the source code, other scientists would use the source code with all the bugs which causes errors to propagate undetectably in derived research.


Instead of that, they should write their own implementation with its own set of bugs... right? Instead of fixing the bugs in the original implementation and bringing improvements...

Reply Parent Score: 10

RE[2]: Yes and no
by kwan_e on Mon 16th Apr 2012 06:27 in reply to "RE: Yes and no"
kwan_e Member since:
2007-02-18

Instead of that, they should write their own implementation with its own set of bugs... right?


Yes. It's another way to check that the theories in the original code are correctly implemented.

Say you want to write some code to verify the Hockey Stick Graph is correct. If you use the original source code, chances are, you're not going to spot all the bugs in the implementation and you'll likely end up with the same graph, which does not fulfil the goal of independent verification.

We're talking scientific formulae, not a Linux desktop environment here. The most important thing is the data.

Instead of fixing the bugs in the original implementation and bringing improvements...


The only useful improvements for scientific research are corrections to formulae and theories. That can be done outside of code, and probably better served by being outside of code.

Do you seriously think it is a good idea for logic bugs to propagate through hundreds of research projects derived from the same code?

Reply Parent Score: 3

RE[3]: Yes and no
by j-kidd on Mon 16th Apr 2012 08:19 in reply to "RE[2]: Yes and no"
j-kidd Member since:
2005-07-06

Do you seriously think it is a good idea for logic bugs to propagate through hundreds of research projects derived from the same code?


The alternative is to have hundreds of research projects derived from hundreds of different codebases, each with its own bag of logic bugs. All code are inherently buggy, and scientists, due to lack of basic training in software engineering (e.g. code reuse, unit test, etc), tend to write buggier code.

Last year, I had to do some data deduplication work using string metric such as Jaro–Winkler distance. I found 3 open source libraries, one in Java, two in Python. And the three of them implement the formula differently, resulting in significantly different metrics. The good thing is that, because the code is open, I submitted patches to the maintainers. Some got fixed, some did not (but the bug report is publicly available nevertheless).

One of these libraries, Febrl (Freely Extensible Biomedical Record Linkage), was released by Australia National University as part of research. I owe greatly to the authors, in particularly their willingness to put the code out for scrutiny.

Reply Parent Score: 8