Linked by Thom Holwerda on Fri 6th Nov 2009 23:42 UTC, submitted by poundsmack
Sun Solaris, OpenSolaris "There is a discussion at osnews.com about a simple question: "Should ZFS Have a fsck Tool?". The answer is simple: No. I could stop now, as this answer is pretty obvious when you work a while with ZFS, but i want to explain my position. And i want to ask a different question at the end."
Thread beginning with comment 393320
To read all comments associated with this story, please click here.
Contradictory post...
by diegocg on Sat 7th Nov 2009 01:15 UTC
diegocg
Member since:
2005-07-08

To allow ZFS to be crash proof, there must be certain really basic mechanisms implemented in a way, that adheres to specifications and standards.

Which doesn't always happen in the real world, be it because the devices are buggy or because the devices are great but have a firmware bug, or because they are old and stop working properly. So there will be a very very small, but existent, group of users who will have problems.

The cause of the problem is not ZFS' fault, it's the hardware fault, but the lack of a tool to fix the filesystem or recover data from it is the filesystem fault, not the hardware fault.

The whole post explains very well why ZFS reliability depends a tiny bit on hardware behaviour - that is equivalent to say that ZFS doesn't rely absolutely everything on its own design to avoid absolutely all kind of problems. However, the "ZFS doesn't needs fsck" attitude assumes that the ZFS design can avoid all kind of problems...that's somewhat contradictory.

The need of helpers to fix things is clearly there, just take a look at the last month in the ZFS lists. Here's http://mail.opensolaris.org/pipermail/zfs-discuss/2009-November/033... from 6 days ago. And also the second paragraph of http://mail.opensolaris.org/pipermail/zfs-discuss/2009-November/033... which pretty much says what i just wrote but in less words: "I've no objection to deciding how much recovery tools are needed based on experience rather than wide-eyed kool-aid ranting or presumptions from earlier filesystems, but so far experience says the recovery work was really needed"


BTW: Linux has similar problems with problematic hard- and software like components not honoring write barriers.

But it has a fsck, which makes Linux (and solaris UFS) users think "hey, something can get corrupted due to a bad disk that doesn't handle the sync cache commands correctly, but if I hit the problem at least I can try to fix it"

Some may call the results of PSARC 2009/479 something like an fsck tool, but it isn't.

The disk state is inconsistent and the tool fixes it - it's a fsck. The uberblock is a part of the filesystem metadata, a wrong uberblock is a filesystem inconsistency. Just because that tool is transactional based doesn't means it isn't fixing something.

Besides, the kind of corruption that you can hit with bad hardware is not neccesarily a uberblock that points to something that hasn't been written to the disk due to bad cache handling - it can be other things. When hardware fails the resulting behaviour is undefined.


At first you should to throw the sub-sub-substandard hardware in the next available trash bin after copying the the data to a storage subsystem of better quality and wiping the old disks.

Well, one of the most common ZFS catchphrases is that you can do reliable storage with very cheap disks - so it's quite probable that users and enterprises will do exactly that, don't you think?

Edit: Trying to fix links...

Edited 2009-11-07 01:34 UTC

RE: Contradictory post...
by Dryhte on Sat 7th Nov 2009 05:54 in reply to "Contradictory post..."
Dryhte Member since:
2008-02-05


At first you should to throw the sub-sub-substandard hardware in the next available trash bin after copying the the data to a storage subsystem of better quality and wiping the old disks.

Well, one of the most common ZFS catchphrases is that you can do reliable storage with very cheap disks - so it's quite probable that users and enterprises will do exactly that, don't you think?


what would also be interesting is a sort of HCL or a set of criteria with which the average user can decide which of his set of harddisks he should not use zfs on. It's not like there are so many harddisk manufacturs, so how can we decide which of their harddisks we can trust our data to?

Reply Parent Bookmark Score: 1

RE[2]: Contradictory post...
by c0t0d0s0 on Sat 7th Nov 2009 13:55 in reply to "RE: Contradictory post..."
c0t0d0s0 Member since:
2008-10-16

The point is: You shouldn't use such devices with other filesystems, too. Just say NO to such disks. With ZFS you just recognize those error. Since i'm running regular scrubs over my datasets on my home fileservers, i'm pretty disappointed about the quality of SOHO drives.

BTW: When you are using disks directly with SATA or SAS, you won't see such problems. Those disks are reasonably biggest-mistakes free. The problems start, when you have some cheap SATA/PATA to Firewire or USB converters.

Reply Parent Bookmark Score: 1

RE: Contradictory post...
by c0t0d0s0 on Sat 7th Nov 2009 15:32 in reply to "Contradictory post..."
c0t0d0s0 Member since:
2008-10-16

Many concepts in ZFS are pretty different to any other filesystem. It think this is the problem, when people are talking about ZFS and try to impose concept of other filesystems on it.

For example the transaction rollback doesn't fix and doesn't check. It doesn't fix anything. It just imports the pool at a different transaction group number. That's pretty much the complete story. When you are still paranoid, you can scrub your pool now, and check if your data is correct. But you don't have to.

Both is pretty much different to the concept of the fsck. The transaction rollback does nothing what a fsck would do, and the scrub goes much further than a fsck, as it checks the checksums of all blocks. Of course you could call it fsck, but it has nothing in common with a fsck for ext4 or xfs.

Regarding the "cleaning up after bugs": I'm not sure if the fsck is the correct place for such logic, perhaps it's better to integrate code that is able to live with the buggy state and rewrite it correctly as soon, as the data has changed. The other interesting point: What's if the state is correctly on disk, but it's read incorrecly. How do you repair such a problem by fsck? As the logic of the fsck is similar to the code that reads the data, it would be obvious, that the same problem would exist in both parts.

For further explanation i just cite the ZFS FAQ:

"Why doesn't ZFS have an fsck-like utility?

There are two basic reasons to have an fsck-like utility:

* Verify file system integrity - Many times, administrators simply want to make sure that there is no on-disk corruption within their file systems. With most file systems, this involves running fsck while the file system is offline. This can be time consuming and expensive. Instead, ZFS provides the ability to 'scrub' all data within a pool while the system is live, finding and repairing any bad data in the process. There are future plans to enhance this to enable background scrubbing.
* Repair on-disk state - If a machine crashes, the on-disk state of some file systems will be inconsistent. The addition of journalling has solved some of these problems, but failure to roll the log may still result in a file system that needs to be repaired. In this case, there are well known pathologies of errors, such as creating a directory entry before updating the parent link, which can be reliably repaired. ZFS does not suffer from this problem because data is always consistent on disk.
A more insidious problem occurs with faulty hardware or software. Even file systems or volume managers that have per-block checksums are vulnerable to a variety of other pathologies that result in valid but corrupt data. In this case, the failure mode is essentially random, and most file systems will panic (if it was metadata) or silently return bad data to the application. In either case, an fsck utility will be of little benefit. Since the corruption matches no known pathology, it will be likely be unrepairable. With ZFS, these errors will be (statistically) nonexistent in a redundant configuration. In an non-redundant config, these errors are correctly detected, but will result in an I/O error when trying to read the block. It is theoretically possible to write a tool to repair such corruption, though any such attempt would likely be a one-off special tool. Of course, ZFS is equally vulnerable to software bugs, but the bugs would have to result in a consistent pattern of corruption to be repaired by a generic tool. During the 5 years of ZFS development, no such pattern has been seen.



For almost all failure modes ZFS protects the data, there is just one left: Components lying about the sequence and state of write operations. And no filesystem can work against such problems: The advantage of ZFS in conjunction with the mentioned PSARC putback: At least you can jump back to a state that's consistent and has validated integrity. And that's much more important form my point of view to press the data into a form, that's expected by the filesystem, where some blocks are old, some are new, some are deleted after a fsck. At the end the data is the important stuff, not the filesystem. The filesystem is just a helper construct.

Reply Parent Bookmark Score: 4

RE[2]: Contradictory post...
by Kebabbert on Sat 7th Nov 2009 17:14 in reply to "RE: Contradictory post..."
Kebabbert Member since:
2007-07-27

fsck only checks the metadata, but it doesnt check the actual data, right?

Reply Parent Bookmark Score: 3