Linked by Thom Holwerda on Mon 16th Apr 2012 02:08 UTC
In the News "Modern science relies upon researchers sharing their work so that their peers can check and verify success or failure. But most scientists still don't share one crucial piece of information - the source codes of the computer programs driving much of today's scientific progress." Pretty crazy this isn't the norm yet.
Thread beginning with comment 514305
To read all comments associated with this story, please click here.
Yes and no
by kwan_e on Mon 16th Apr 2012 05:43 UTC
kwan_e
Member since:
2007-02-18

There is a danger that by releasing the source code, other scientists would use the source code with all the bugs which causes errors to propagate undetectably in derived research.

Ostensibly, scientists can check the source code to find bugs, but it's never going to be complete.

There is something to be said about scientists having to recreate source code in a clean room environment because errors in either code or hypothesis is easier to expose.

Reply Score: 1

RE: Yes and no
by cyrilleberger on Mon 16th Apr 2012 06:02 in reply to "Yes and no"
cyrilleberger Member since:
2006-02-01

There is a danger that by releasing the source code, other scientists would use the source code with all the bugs which causes errors to propagate undetectably in derived research.


Instead of that, they should write their own implementation with its own set of bugs... right? Instead of fixing the bugs in the original implementation and bringing improvements...

Reply Parent Score: 10

RE[2]: Yes and no
by kwan_e on Mon 16th Apr 2012 06:27 in reply to "RE: Yes and no"
kwan_e Member since:
2007-02-18

Instead of that, they should write their own implementation with its own set of bugs... right?


Yes. It's another way to check that the theories in the original code are correctly implemented.

Say you want to write some code to verify the Hockey Stick Graph is correct. If you use the original source code, chances are, you're not going to spot all the bugs in the implementation and you'll likely end up with the same graph, which does not fulfil the goal of independent verification.

We're talking scientific formulae, not a Linux desktop environment here. The most important thing is the data.

Instead of fixing the bugs in the original implementation and bringing improvements...


The only useful improvements for scientific research are corrections to formulae and theories. That can be done outside of code, and probably better served by being outside of code.

Do you seriously think it is a good idea for logic bugs to propagate through hundreds of research projects derived from the same code?

Reply Parent Score: 3

RE: Yes and no
by looncraz on Mon 16th Apr 2012 07:43 in reply to "Yes and no"
looncraz Member since:
2005-07-24

During peer review, the code would be checked to verify any unexpected results.

With open code, any meaningful problem would be found and solved, and old studies could be easily re-run and verified or discarded.

With closed code, the bugs are never found, and the authors have no reason to repair it if they get what they think are sound results.

--The loon

Reply Parent Score: 10

RE[2]: Yes and no
by kwan_e on Mon 16th Apr 2012 08:44 in reply to "RE: Yes and no"
kwan_e Member since:
2007-02-18

With closed code, the bugs are never found, and the authors have no reason to repair it if they get what they think are sound results.

--The loon


It doesn't matter because it's the published results that matter, and if the results are wrong, someone can verify it independently when its published. If you use the original source code, to verify, it's no longer independent.

In a research organization where there are hundreds of people pulling in open source code, you cannot guarantee someone did not pull code from the original base, leading to a compromised verification of the data.

Reply Parent Score: 1

RE[2]: Yes and no
by thomasg76 on Mon 16th Apr 2012 15:00 in reply to "RE: Yes and no"
thomasg76 Member since:
2012-04-16

Well, it can also work quite easily in the opposite. The source code is taken, with little or no review, and new data are run through it, confirming the original result.

I am sure that this happens. Not too long ago I run in to this issue while looking at studies done in the field of psychology. They run most there studies through SPSS to make a factor analysis, do get something out of the data. Everybody using the same software the same way of conducting the study, of course they confirm the result of others. Most of the conclusions drawn are just simply wrong, because less than half of the data is actually supporting the result.
Now since most psychologist aren't statisticians, they just take the work of others as template for their own. And you propagate a wrong method / software.

The same is going to happen with opening the source code for all research. If the code is critical to the research than it should be implemented independently to confirm the results, based on the same data. If the code is auxiliary to the problem, then who cares anyway.

Also I know of Professors that stopped publishing all together because of that requirement. Now what do you gain?
The good thing from all the published work is, we KNOW that certain things work/exist, so they can be re-discovered and independently verified.

Reply Parent Score: 1

RE: Yes and no
by renox on Mon 16th Apr 2012 08:32 in reply to "Yes and no"
renox Member since:
2005-07-06

There is something to be said about scientists having to recreate source code in a clean room environment because errors in either code or hypothesis is easier to expose.


I'm not so sure: there was a time where a popular idea to produce safe code (for avionics or things like that) was to have several independant teams coding the same software to have different bugs.
A study discovered then that independants teams had quite a few identical bugs, so it became much less popular!

Reply Parent Score: 4

RE[2]: Yes and no
by kwan_e on Mon 16th Apr 2012 08:59 in reply to "RE: Yes and no"
kwan_e Member since:
2007-02-18

A study discovered then that independants teams had quite a few identical bugs, so it became much less popular!


Yes, but with scientific research spread all over the world, we can afford to have more teams than any single organization can afford.

And again, I refer people to the Climategate non-scandal. What if it turns out everyone who verified the data were using the same code, or at least derived versions of the same code? Think about the fallout from that. Even if the bugs were mostly identical, do we want to risk being wrong?

Reply Parent Score: 1