To view parent comment, click here.
To read all comments associated with this story, please click here.
Maybe if you are a CEO for a company with a critical system, worth of billions of dollars or of thousands of lifes, you are willing to take any measure to minimize the risc of loosing data? And, ZFS doesnt require extra, specialized hardware. Just a Sata controller card with no raid functionality + 7200 rpm SATA discs.
RAID - aRrAy of Inexpensive Discs(?). With ZFS it becomes true. A good HW raid card costs much. What happens if the vendor goes bankrupt? Where to find a new HW card? You are locked in.
ZFS code is open and you can do whatever you want with it. ZFS is future proof. And it doesnt cost anything. Move your discs to a Mac OS X computer, or FreeBSD computer, or Solaris SPARC, or Solaris x86 and write "zpool import" and you are done with the migration. All data is stored endian neutral.
To me it is a no brainer why not use ZFS. It is better, safer, easy to administer and free. Ive heard to create a raid with Linux and LVM takes like 30 commands. With ZFS you write "zpool create raidz1 disc0 disc1 disc2 disc3" and you are done. No formatting. Copy your data immediately. No fsck exists. All data is always online.
But these great advantages that ZFS gives, is nothing new with SUNs technology. DTrace is also as good as ZFS. And Niagara Sparc. Zones. etc. And they are all open tech. And GOOD tech.
Here is a Linux guy builds his first ZFS storage server. Well researched and a good read:
http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/
If you're going to talk like that, then why worry about anything? Except maybe your job because you get fired after all your data is lost to corruption





Member since:
2007-07-27
"It is not the end of the world. The mainframe solved this one decades ago."
You seem to have missed the point. If a drive fails in a raid, it takes hours to rebuild the raid. The larger the drives, the longer it takes to rebuild the raid. If you have 1TB, it takes maybe 10 hours. 2TB may take 24h. 4TB maybe 2 days. 8TB drives may take one week? Because drives doesnt get much faster, only larger.
At some point, it will take long time to rebuild the raid. Scaringly long time. Say it takes one week. When you rebuild a raid, it stresses the other discs very much, to the point it is common another disc breaks! This happens more often than you think. Then you are screwed, if another disc fails.
Therefore you use raid-6, which allows two discs to fail. But there is likelihood that both drives fail during rebuild. At some point in the future, the discs get so large, another disc will fail as fast as you rebuild a broken drive. This is due to larger and larger drives.
A decade ago, the mainframes didnt have this large drives. To rebuild a raid was no problem, it went very quick. Today, it takes a very long time. Therefore, you are wrong, mainframes have not solved this problem. This is the reason people say that raid-5 is soon obsolete. This is what the article is about.
Also, enter Silent Corruption. Discs will read/write bits errorneously without even noticing it! You will not get a notification: there was an error. This is a BAD thing. 20% of a discs surface is dedicated to error correcting codes, and the codes can not fix every error, nor even detect every error. There are lots and lots of errors in every read and write, that gets corrected on the fly. But sometimes there will be errors that can not get error corrected by the disc. Nor even detected. It is like the lamp on the oven says it is turned off, but the lamp shows wrong, the oven is in fact turned on - the HW doesnt detect this, so it lies to you. Look at a spec of a new drive, it says "unrecoverable error: 1 in 10^14". There are errors that even doesnt get detected. But ZFS detects, and also recovers the data. The "1 error in 10^14" doesnt apply with ZFS. Because ZFS detects and corrects them.
SUN knows about these problems and ZFS does fix this problem. ZFS also allows other more safe, configurations than raid-5 and raid-6, which makes ZFS less susceptible to taking a long time to rebuild a raid. For instance, three discs are allowed to fail in raidz3 configs. Or you can mirror lots of discs, and combine them into flexible raid configs. The best thing is that ZFS does NOT like HW raid and they only disturb ZFS. Therefore, ditch HW raid while you can get a fair price, and use ZFS to get a cheaper and safer solution. 48 SATA 7200 rpm discs, reads 2-3GB/sec and writes 1GB/sec. And the data is safe too.
CERN did a study on silent corruption, and the moral was that one error in 10^14 is not correct. This article is not correct, according to studies at CERN. The errors occur more frequently, in practice:
http://storagemojo.com/2007/09/19/cerns-data-corruption-research/
Your data is at risc. Silent Corruption and bit rot eats your data. Silently. Without the HW telling you. The HW doesnt even notice.
Clearly, something has to be done to cope with these errors that large drives and future filesystems will face. The main architect of ZFS explains some of the future problems that will be more and more common.
http://queue.acm.org/detail.cfm?id=1317400
Edited 2009-09-19 09:54 UTC