Linked by Pobrecito Hablador on Mon 2nd Nov 2009 21:19 UTC
Sun Solaris, OpenSolaris One of the advantages of ZFS is that it doesn't need a fsck. Replication, self-healing and scrubbing are a much better alternative. After a few years of ZFS life, can we say it was the correct decision? The reports in the mailing list are a good indicator of what happens in the real world, and it appears that once again, reality beats theory. The author of the article analyzes the implications of not having a fsck tool and tries to explain why he thinks Sun will add one at some point.
Thread beginning with comment 392326
To read all comments associated with this story, please click here.
You are wrong.
by Burana on Mon 2nd Nov 2009 21:34 UTC
Burana
Member since:
2009-01-26

Most of your listed problems are related to the problem of buggy hardware, resulting in failed transactions.

This PSARC http://c0t0d0s0.org/archives/6067-PSARC-2009479-zpool-recovery-supp... will solve this.

We don't need no stinking fsck.

Reply Score: 6

RE: You are wrong.
by sbergman27 on Mon 2nd Nov 2009 21:39 in reply to "You are wrong."
sbergman27 Member since:
2005-07-24

We don't need no stinking fsck.

Because ZFS is unsinkable.

Reply Parent Score: 1

RE: You are wrong.
by phoenix on Mon 2nd Nov 2009 21:42 in reply to "You are wrong."
phoenix Member since:
2005-07-11

That, plus background scrubbing, give you the same end result as an fsck.

No real need for a separate fsck. If needed, one can just alias "zpool scrub <poolname>" to fsck and be done with it. ;)

Reply Parent Score: 5

Did you read the article?
by JoeBuck on Mon 2nd Nov 2009 22:42 in reply to "You are wrong."
JoeBuck Member since:
2006-01-11

The article states directly that the problems causing the corruption were related to bad hardware! Bad hardware is a fact of life, and the existence of bad hardware is the reason why some fsck-like tool is needed.

Reply Parent Score: 3

RE: Did you read the article?
by c0t0d0s0 on Mon 2nd Nov 2009 23:00 in reply to "Did you read the article?"
c0t0d0s0 Member since:
2008-10-16

Given the BER of normal hard disks, SATA cabling and all the components participating in the job of storing data (a fact of life, too) , it's a miracle, why people still using filesystems without checksums ;)

But back to your comment: You don't fight bad hardware with an inadequate tool like fsck ... scrub in conjunction with the PSARC 2009/479 transaction roolback code is a much better solution.

Reply Parent Score: 2

RE: You are wrong.
by WereCatf on Mon 2nd Nov 2009 23:57 in reply to "You are wrong."
WereCatf Member since:
2006-02-15

Most of your listed problems are related to the problem of buggy hardware, resulting in failed transactions.

That was the whole point of the article here. Bad hardware exists and is actually very very very widely used because it's cheap.

So, ZFS might not usually need fsck or similar, but what do you do in the case where you can't mount it? For example, the hardware has corrupted the ZFS headers and you can't mount your volume and as such the self-healing and correction facilities can never run? Yes, that's right; you need an off-line tool to get it into a state where you can mount it, ie. fsck or similar.

Reply Parent Score: 4

RE[2]: You are wrong.
by c0t0d0s0 on Tue 3rd Nov 2009 08:02 in reply to "RE: You are wrong."
c0t0d0s0 Member since:
2008-10-16

You don't even need something similar to a fsck, you just need a transaction rollback. The rest is done by scrubbing ...

Reply Parent Score: 1

RE: You are wrong.
by segedunum on Wed 4th Nov 2009 22:04 in reply to "You are wrong."
segedunum Member since:
2005-07-06

Most of your listed problems are related to the problem of buggy hardware...

As the article has specified quite clearly, other filesystems like NTFS, ext, XFS etc. have been dealing with the same 'buggy' hardware for years. Granted, they usually run on systems with far, far better developed and tested storage drivers on such varied hardware devices than Solaris will ever have and that's where quite a few of the unseen problems are probably happening. ZFS doesn't seem to handle these issues well because it assumes a system working as it expects.

...resulting in failed transactions.

I fail to see how transactions can fail and bork the system. They either succeed or they don't. If this isn't the case then you need to be looking at where the problem is in your own stack.

It's highly ironic that ZFS was specifically designed and hyped by Sun to bring 'storage to the masses' with commodity hardware......and when there turns out to be a problem that same 'buggy' hardware that Sun has said you can use with confidence with ZFS is blamed for the problems.

In the quote there Jeff told us exctly why Apple can't use ZFS, or why it can't be used in desktop scenarios until it is optimised for the purpose. Single pools will even be quite common in large storage scenarios a ZFS will sit on large LUNS with their own redundancy along with filesystems used by other operating systems.

We don't need no stinking fsck.

Well yes you do, because fsck merely stands for 'filesystem check'. All it does is make sure that the filesystem is in a state that can be used before you mount it. On different filesystems those will consist of different checks, so yes, this is a fsck for ZFS. It should be checking consistency on every mount. The only difference with ZFS is that the fsck should take far less time than on other filesystems.

I'm not entirely sure what you or a few other people around here think 'fsck' stands for.

Reply Parent Score: 2