Linked by Pobrecito Hablador on Mon 2nd Nov 2009 21:19 UTC
Sun Solaris, OpenSolaris One of the advantages of ZFS is that it doesn't need a fsck. Replication, self-healing and scrubbing are a much better alternative. After a few years of ZFS life, can we say it was the correct decision? The reports in the mailing list are a good indicator of what happens in the real world, and it appears that once again, reality beats theory. The author of the article analyzes the implications of not having a fsck tool and tries to explain why he thinks Sun will add one at some point.
Thread beginning with comment 392536
To read all comments associated with this story, please click here.
ZFS needs real openness and Linux
by TheRealNelson on Tue 3rd Nov 2009 15:45 UTC
Member since:

Lot's of filesystems do a great job of maintaining their own consistency. It's really external errors that bother a lot of us, say you drop a backup drive on the floor, I'm going to fsck it before I even attempt to mount it. It sounds like ZFS has a fsck or scrub that can run. If it can guarantee that it's mountable then you should be able to boot up a drive enough to run a fsck and verify the integrity of the filesystem from external errors. This likely pushes some of the problems elsewhere, it's a background process on a live system so you could probably have other software failures if they touch broken parts of the filesystem since it was assumed valid, other filesystems have similar problems with data being damaged though. Fscking just makes you stop everything else when you do it, when you find problems you find them in fsck and not when you database crashes for some unknown reason because the blocks on the disk were screwed up.

What ZFS really needs is to run under Linux and probably Windows and to do that it probably requires some license changes and probably some substantial attitude changes within Sun. Until that happens, its at best a bit player. The "bad hardware" problems are pretty weak as well, I can't recall hearing NTFS devs or Ext3 devs complaining about it. Part of that is Sun's management needed each and every home-run they could get as they shopped the company around and for some reason they chose to roll a filesystem out with the kind of visibility that they did. Actual support will always trump hype, if it's so perfect then give it to the rest of the world and the rest of the world will adopt it.

Reply Score: 1

Kebabbert Member since:

"The "bad hardware" problems are pretty weak as well, I can't recall hearing NTFS devs or Ext3 devs complaining about it."

But you fail to notice that SUN does Enterprise storage. That is a completely different thing than commodity hard drives for Windows and Linux that doesnt obey standards, as Jeff Bonwick explains. Enterprise storage has much higher requirements, and therefore you will hear complaints from Enterprise storage people. For Linux and Windows, which does not have those high demands, nor is capable of handling such demands - anything will do. Windows and Linux are not used in Enterprise storage area. That is the reason you dont hear NTFS or ext3 devs complain about it.

Here you see that Linux does not handle Enterprise Storage, according to a storage expert. Maybe he is wrong, maybe he knows more about Enterprise Storage than most people.

Regarding "that attitude change that SUN needs", maybe you will see it quite soon as Oracle is bying SUN. SUN is the company that has released most open source code, and last year was in rank 30 of those who contributed most code to Linux kernel. We will see if Oracle will close SUN tech and charge a lot, or if Oracle will continue in the same vein as SUN. But, SUN was in the process of open sourcing _everything_, we have to see if Oracle will also open source everything they own.

Reply Parent Score: 1

Oliver Member since:

>I can't recall hearing NTFS devs or Ext3 devs complaining about it."

Usually ext3/4 devs are complaining about different applications (like KDE) that should do their very own homework. So to speak, they don't have any clue what they're actually doing. If it comes to reliable filesystems Linux is a huge disappointment. Apart from XFS, but that's another story.

Reply Parent Score: 1

dvzt Member since:

The "bad hardware" problems are pretty weak as well, I can't recall hearing NTFS devs or Ext3 devs complaining about it.

You're not listening carefully enough, then. Linux has the same problems if a disk does not honor barriers. Even funnier, on Linux barriers don't work at all even with properly working disks when LVM is in use. ZFS does not need Linux, but it seems that Linux does need ZFS.

Oh, and ZFS is 100% open, (after all it's in FreeBSD and other operating systems) too bad Linux isn't and therefore can't be integrated with foreign code, just as others can. ;)

Reply Parent Score: 2

blu28 Member since:

The "bad hardware" problems are pretty weak as well, I can't recall hearing NTFS devs or Ext3 devs complaining about it.

Of course not. Suppose that this kind of bad hardware accounted for 1% of FS corruption. It is unlikely that anybody even knows about it because it is in the noise. But now ZFS comes along and gets rid of the other 99%. Now it is responsible for 100% of ZFS file system corruption on a FS that is designed to have none. That's a big deal. In reality, the bad disks are probably responsible for even more corruption on other FS's, but since you have already accepted a bit of corruption with each crash, you can't see the difference. Remember, fsck does not get you back what you lost and arbitrarily large amounts of data corruption can still occur. But in general the amount lost and corrupted is small and everybody has learned to live with it. But now we have ZFS and and it guarantees consistent and complete data, but possibly a few milliseconds out of date in a crash, assuming the underlying disks follow the standards. Compare that to fsck where one file may be up to date, a bunch more are a few milliseconds behind, another is corrupted and another is deleted.

Up until now with ZFS, that 1% caused by bad hardware leaves the FS unusable. But with the zpool recovery just added, in that 1% of cases you end up losing a couple of seconds of data and the file system recovers virtually instantaneously, instead of scanning and rescanning and patching to get back to a inconsistent state with some data from a few seconds ago and some up to date as happens with fsck.

Reply Parent Score: 1