Linked by Thom Holwerda on Fri 6th Nov 2009 23:42 UTC, submitted by poundsmack
Sun Solaris, OpenSolaris "There is a discussion at osnews.com about a simple question: "Should ZFS Have a fsck Tool?". The answer is simple: No. I could stop now, as this answer is pretty obvious when you work a while with ZFS, but i want to explain my position. And i want to ask a different question at the end."
Thread beginning with comment 393458
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[2]: It sounds like...
by phoenix on Sun 8th Nov 2009 03:14 UTC in reply to "RE: It sounds like..."
phoenix
Member since:
2005-07-11

Don't think of ZFS as a regular filesystem. It's a combination of a volume manager and a filesystem. The filesystem is just one view to a data pool via the ZFS Posix Layer. But in the same pool emulated block devices can exist via the ZFS Emulated Volume Layer.


Sun did the computing world a *huge* disservice by calling it "ZFS, the filesystem". They really should have called it what it actually is: "ZSMS, the Zettabyte Storage Management System". That would have solved so many of these kinds of issues for people.

Once you start looking at if from a storage management position, instead of "it's just a fancy fs" position, it becomes a lot simpler to understand and work with.

Unfortunately, it's too late now, and these kinds of misunderstanding and misconceptions are just going to continue to get worse. ;)


ZFS "the filesystem" doesn't need an fsck tool. It has features that make sure data is either written correctly, or not written at all. And if a specific block can't be read or doesn't match the checksum, then it pulls it from a different copy.

ZFS "the storage pool manager" manages all the storage transactions. If something goes wrong, it can lead to an unimportable storage pool (ie, all the filesystems and volumes above it are inaccessible). Previously, one had to manually much around with dd, zdb, and voodoo to tell the storage pool to load from a previous transaction group. Now, one can do that automatically. No filesystem checking is done. It just picks an older point in time (transaction group), and loads from there. All your data (up to that point in time) is intact.

Once the pool is imported, and all the filesystems and volumes are available, you have the option of running a background scrub on the pool (the entire pool, not individual filesytems and volumes) to make sure that the data is intact. The scrub will compare the checksums on every single block in the pool, and repair anny that are bad via redundant copies.

Thus, a filesystem-specific tool that checks that one filesystem's metadata on disk (aka fsck) is not needed. Tools are already available that give a better end result ... just from a different direction.

Edited 2009-11-08 03:25 UTC

Reply Parent Bookmark Score: 3

RE[3]: It sounds like...
by phoenix on Sun 8th Nov 2009 05:57 in reply to "RE[2]: It sounds like..."
phoenix Member since:
2005-07-11

Another way to look at it is to ask whether or not LVM needs an fsck, since that's the layer in the ZFS storage system that's being worked on.

ZFS filesystems themselves rarely need fixing (I've never come across one, and haven't read about any online, but I've only been using ZFS for a year). They take care of that automatically using self-healing via checksums and redundancy, transactions, and copy-on-write.

The storage pool could become unimportable, but was usually fixable via arcane voodoo magic commands. Now, it's made a lot simpler (via the code implemented in the PSARC mentioned above -- PSARC is like a support case, or bug report, in Sun-speak).

There are tools for fixing LVM, though. And now there are tools to fix things at the storage pool layer in ZFS.

Asking for "fsck" doesn't make sense, though, as that's the wrong layer in the stack.

Reply Parent Bookmark Score: 2

RE[4]: It sounds like...
by c0t0d0s0 on Sun 8th Nov 2009 12:57 in reply to "RE[3]: It sounds like..."
c0t0d0s0 Member since:
2008-10-16

PSARC has nothing to do with support cases or bug reports. PSARC stands for Plattform Support Architecture Review Commitee. That's a group of people in the Opensolaris design process discussing and voting about new additions to Solaris when it changes external interfaces or open new interfaces (ABI, command line commands et al) Looks bureaucratic at first, but at the end it's responsible for such stuff like the effectiveness of the binary compatibility guarantee and the systemic features like the dense coupling of containers, zfs snapshots and the new networking stack aka Crossbow for example.

Reply Parent Bookmark Score: 1

RE[4]: It sounds like...
by Kebabbert on Sun 8th Nov 2009 12:57 in reply to "RE[3]: It sounds like..."
Kebabbert Member since:
2007-07-27

Yeah. The problem/blessing with ZFS is that it detects many more errors than other filesystems, as it is end-to-end. ZFS being more sensitive than other filesystems, is a good thing. Which filesystem could have caught this?
http://blogs.sun.com/elowe/entry/zfs_saves_the_day_ta
And the problem was not ZFS fault. Instead, ZFS is the messenger. Dont shoot the messenger?

Reply Parent Bookmark Score: 3