To read all comments associated with this story, please click here.
That only works if you have relatively small amounts of data that is not modified frequently. If you have a large operation with frequently changing data the potential for failure increases significantly.
I am not a fan of cheap storage because what most people think they are getting in savings they end up paying for in terms of management, performance and reliability. I don't care for SATA storage because I am not convinced that it will work reliably in the long run as opposed to SAS or Fibre Channel.
While the possibility of a massive failure for most cases is slight, it depends entirely on how the solution is deployed and what mechanisms are in place to protect the data. Having redundant storage to prevent data loss can get real expensive.
Edited 2009-09-18 15:33 UTC
"It is not the end of the world. The mainframe solved this one decades ago."
You seem to have missed the point. If a drive fails in a raid, it takes hours to rebuild the raid. The larger the drives, the longer it takes to rebuild the raid. If you have 1TB, it takes maybe 10 hours. 2TB may take 24h. 4TB maybe 2 days. 8TB drives may take one week? Because drives doesnt get much faster, only larger.
At some point, it will take long time to rebuild the raid. Scaringly long time. Say it takes one week. When you rebuild a raid, it stresses the other discs very much, to the point it is common another disc breaks! This happens more often than you think. Then you are screwed, if another disc fails.
Therefore you use raid-6, which allows two discs to fail. But there is likelihood that both drives fail during rebuild. At some point in the future, the discs get so large, another disc will fail as fast as you rebuild a broken drive. This is due to larger and larger drives.
A decade ago, the mainframes didnt have this large drives. To rebuild a raid was no problem, it went very quick. Today, it takes a very long time. Therefore, you are wrong, mainframes have not solved this problem. This is the reason people say that raid-5 is soon obsolete. This is what the article is about.
Also, enter Silent Corruption. Discs will read/write bits errorneously without even noticing it! You will not get a notification: there was an error. This is a BAD thing. 20% of a discs surface is dedicated to error correcting codes, and the codes can not fix every error, nor even detect every error. There are lots and lots of errors in every read and write, that gets corrected on the fly. But sometimes there will be errors that can not get error corrected by the disc. Nor even detected. It is like the lamp on the oven says it is turned off, but the lamp shows wrong, the oven is in fact turned on - the HW doesnt detect this, so it lies to you. Look at a spec of a new drive, it says "unrecoverable error: 1 in 10^14". There are errors that even doesnt get detected. But ZFS detects, and also recovers the data. The "1 error in 10^14" doesnt apply with ZFS. Because ZFS detects and corrects them.
SUN knows about these problems and ZFS does fix this problem. ZFS also allows other more safe, configurations than raid-5 and raid-6, which makes ZFS less susceptible to taking a long time to rebuild a raid. For instance, three discs are allowed to fail in raidz3 configs. Or you can mirror lots of discs, and combine them into flexible raid configs. The best thing is that ZFS does NOT like HW raid and they only disturb ZFS. Therefore, ditch HW raid while you can get a fair price, and use ZFS to get a cheaper and safer solution. 48 SATA 7200 rpm discs, reads 2-3GB/sec and writes 1GB/sec. And the data is safe too.
CERN did a study on silent corruption, and the moral was that one error in 10^14 is not correct. This article is not correct, according to studies at CERN. The errors occur more frequently, in practice:
http://storagemojo.com/2007/09/19/cerns-data-corruption-research/
Your data is at risc. Silent Corruption and bit rot eats your data. Silently. Without the HW telling you. The HW doesnt even notice.
Clearly, something has to be done to cope with these errors that large drives and future filesystems will face. The main architect of ZFS explains some of the future problems that will be more and more common.
http://queue.acm.org/detail.cfm?id=1317400
Edited 2009-09-19 09:54 UTC





Member since:
2006-01-04
What the article states might be true, but the cost of storage goes down far faster than the problems rise.
So you can just throw more disks and a permanently running replication daemon at the problem will have a working solution.
It is not the end of the world. The mainframe solved this one decades ago.