Linked by Pobrecito Hablador on Mon 2nd Nov 2009 21:19 UTC
Sun Solaris, OpenSolaris One of the advantages of ZFS is that it doesn't need a fsck. Replication, self-healing and scrubbing are a much better alternative. After a few years of ZFS life, can we say it was the correct decision? The reports in the mailing list are a good indicator of what happens in the real world, and it appears that once again, reality beats theory. The author of the article analyzes the implications of not having a fsck tool and tries to explain why he thinks Sun will add one at some point.
Order by: Score:
You are wrong.
by Burana on Mon 2nd Nov 2009 21:34 UTC
Burana
Member since:
2009-01-26

Most of your listed problems are related to the problem of buggy hardware, resulting in failed transactions.

This PSARC http://c0t0d0s0.org/archives/6067-PSARC-2009479-zpool-recovery-supp... will solve this.

We don't need no stinking fsck.

Reply Score: 6

RE: You are wrong.
by sbergman27 on Mon 2nd Nov 2009 21:39 UTC in reply to "You are wrong."
sbergman27 Member since:
2005-07-24

We don't need no stinking fsck.

Because ZFS is unsinkable.

Reply Score: 1

RE: You are wrong.
by phoenix on Mon 2nd Nov 2009 21:42 UTC in reply to "You are wrong."
phoenix Member since:
2005-07-11

That, plus background scrubbing, give you the same end result as an fsck.

No real need for a separate fsck. If needed, one can just alias "zpool scrub <poolname>" to fsck and be done with it. ;)

Reply Score: 5

Did you read the article?
by JoeBuck on Mon 2nd Nov 2009 22:42 UTC in reply to "You are wrong."
JoeBuck Member since:
2006-01-11

The article states directly that the problems causing the corruption were related to bad hardware! Bad hardware is a fact of life, and the existence of bad hardware is the reason why some fsck-like tool is needed.

Reply Score: 3

RE: Did you read the article?
by c0t0d0s0 on Mon 2nd Nov 2009 23:00 UTC in reply to "Did you read the article?"
c0t0d0s0 Member since:
2008-10-16

Given the BER of normal hard disks, SATA cabling and all the components participating in the job of storing data (a fact of life, too) , it's a miracle, why people still using filesystems without checksums ;)

But back to your comment: You don't fight bad hardware with an inadequate tool like fsck ... scrub in conjunction with the PSARC 2009/479 transaction roolback code is a much better solution.

Reply Score: 2

RE: You are wrong.
by WereCatf on Mon 2nd Nov 2009 23:57 UTC in reply to "You are wrong."
WereCatf Member since:
2006-02-15

Most of your listed problems are related to the problem of buggy hardware, resulting in failed transactions.

That was the whole point of the article here. Bad hardware exists and is actually very very very widely used because it's cheap.

So, ZFS might not usually need fsck or similar, but what do you do in the case where you can't mount it? For example, the hardware has corrupted the ZFS headers and you can't mount your volume and as such the self-healing and correction facilities can never run? Yes, that's right; you need an off-line tool to get it into a state where you can mount it, ie. fsck or similar.

Reply Score: 4

RE[2]: You are wrong.
by c0t0d0s0 on Tue 3rd Nov 2009 08:02 UTC in reply to "RE: You are wrong."
c0t0d0s0 Member since:
2008-10-16

You don't even need something similar to a fsck, you just need a transaction rollback. The rest is done by scrubbing ...

Reply Score: 1

RE[3]: You are wrong.
by WereCatf on Tue 3rd Nov 2009 11:11 UTC in reply to "RE[2]: You are wrong."
WereCatf Member since:
2006-02-15

You don't even need something similar to a fsck, you just need a transaction rollback. The rest is done by scrubbing ...

But as said, can you do that if you can't even mount it?

Reply Score: 3

RE[4]: You are wrong.
by phoenix on Tue 3rd Nov 2009 16:25 UTC in reply to "RE[3]: You are wrong."
phoenix Member since:
2005-07-11

Yes, that code has just been checked in. And no, it's not called fsck. ;)

Reply Score: 2

RE[4]: You are wrong.
by c0t0d0s0 on Tue 3rd Nov 2009 19:34 UTC in reply to "RE[3]: You are wrong."
c0t0d0s0 Member since:
2008-10-16

You want to look in result of PSARC 2009/479 ( the http://www.c0t0d0s0.org/archives/6067-PSARC-2009479-zpool-recovery-... ). And no it isn't an fsck ;)

Reply Score: 1

RE[5]: You are wrong.
by WereCatf on Tue 3rd Nov 2009 19:51 UTC in reply to "RE[4]: You are wrong."
WereCatf Member since:
2006-02-15

And no it isn't an fsck

You guys are hanging too much on the word fsck, you know? Try to read the whole post and not cling on to a single word you might not like that much. I was only talking about a way of getting the ZFS volume and/or pool into a sane state, not necessarily a tool called 'fsck' or similar.

Reply Score: 2

RE: You are wrong.
by segedunum on Wed 4th Nov 2009 22:04 UTC in reply to "You are wrong."
segedunum Member since:
2005-07-06

Most of your listed problems are related to the problem of buggy hardware...

As the article has specified quite clearly, other filesystems like NTFS, ext, XFS etc. have been dealing with the same 'buggy' hardware for years. Granted, they usually run on systems with far, far better developed and tested storage drivers on such varied hardware devices than Solaris will ever have and that's where quite a few of the unseen problems are probably happening. ZFS doesn't seem to handle these issues well because it assumes a system working as it expects.

...resulting in failed transactions.

I fail to see how transactions can fail and bork the system. They either succeed or they don't. If this isn't the case then you need to be looking at where the problem is in your own stack.

It's highly ironic that ZFS was specifically designed and hyped by Sun to bring 'storage to the masses' with commodity hardware......and when there turns out to be a problem that same 'buggy' hardware that Sun has said you can use with confidence with ZFS is blamed for the problems.

In the quote there Jeff told us exctly why Apple can't use ZFS, or why it can't be used in desktop scenarios until it is optimised for the purpose. Single pools will even be quite common in large storage scenarios a ZFS will sit on large LUNS with their own redundancy along with filesystems used by other operating systems.

We don't need no stinking fsck.

Well yes you do, because fsck merely stands for 'filesystem check'. All it does is make sure that the filesystem is in a state that can be used before you mount it. On different filesystems those will consist of different checks, so yes, this is a fsck for ZFS. It should be checking consistency on every mount. The only difference with ZFS is that the fsck should take far less time than on other filesystems.

I'm not entirely sure what you or a few other people around here think 'fsck' stands for.

Reply Score: 2

I have seen more technical insight...
by fernandotcl on Mon 2nd Nov 2009 21:51 UTC
fernandotcl
Member since:
2007-08-12

...in cake recipes...

Reply Score: 0

fsck isn't the end all of file system repair
by jrash on Mon 2nd Nov 2009 22:15 UTC
jrash
Member since:
2008-10-28

I doubt that a ZFS fsck would be able to recover the trashed file systems in those posts/bug reports. File systems like Ext2/3 have simple designs that can be easily repaired by an fsck, however ZFS is an extremely complex file system and I don't see what an fsck would do that the file system doesn't do already.

Reply Score: 1

.
by renhoek on Mon 2nd Nov 2009 22:19 UTC
renhoek
Member since:
2007-04-29

And what is this fsck supposed to check and fix? As soon as somebody can answer this a fsck tool wil be made i think.

Reply Score: 4

RE: .
by tobyv on Tue 3rd Nov 2009 00:56 UTC in reply to "."
tobyv Member since:
2008-08-25

And what is this fsck supposed to check and fix? As soon as somebody can answer this a fsck tool wil be made i think.


Fixing/detecting corrupted SHA256 block hashes for the deduplication feature, for one. I've relied on file systems in the past that worked on a similar concept.

Nothing more terrifying than learning that a block of the root fs has a hash of zero!

The fs will need to be offline and the hash values are metadata, so it falls into the 'fsck' category IMHO.

Reply Score: 2

RE[2]: .
by c0t0d0s0 on Tue 3rd Nov 2009 08:00 UTC in reply to "RE: ."
c0t0d0s0 Member since:
2008-10-16

That opens an interesting question: What's the correct stuff. The checksum or the data ;) Furthermore: Dedup uses the already computed checksums of the filesystem. You don't have to sync it to your data.

Reply Score: 1

RE[3]: .
by cerbie on Tue 3rd Nov 2009 09:13 UTC in reply to "RE[2]: ."
cerbie Member since:
2006-01-02

The solution to that would be parity data saved inline with normal data, sacrificing a little bit of space. Then, some small % of data could be hosed, yet still recovered, whether it was the data, hash, or parity.

But, since server people want better drives and more backups, us cheapskates want all of that 1TB our $80 paid for, and we all want faster storage...I don't see it happening ;) .

Reply Score: 2

RE[2]: .
by renhoek on Tue 3rd Nov 2009 22:27 UTC in reply to "RE: ."
renhoek Member since:
2007-04-29

"zfs scrub" does this. And taking the filesystem offline for this is a waste of time.

Reply Score: 2

ZFS was a good first step
by kragil on Mon 2nd Nov 2009 22:39 UTC
kragil
Member since:
2006-01-04

FS devs learned a few lessons. FS development became sexy again and everyone saw the need for a new filesysten.

That said, ZFS will be the pioneer with arrows in his back. Other FSs will offer more features and better performance soonish with less resource usage and a more elegant design. (And I don't think Oracle with stop Solaris' negative growth rates.)

Reply Score: 1

RE: ZFS was a good first step
by c0t0d0s0 on Mon 2nd Nov 2009 23:08 UTC in reply to "ZFS was a good first step"
c0t0d0s0 Member since:
2008-10-16

Oh ... can't wait to see sync dedup in another filesystem really ready for prime-time ... let's say in the next 5-10 years ;)

Reply Score: 2

ZFS Is a rock solid.
by Troydm on Mon 2nd Nov 2009 22:43 UTC
Troydm
Member since:
2009-04-03

It won't fail that easily. i have a home server based on zfs and even frequent power losses don't render data useless proven by 2 years of 24/7 always online usage. Before i had ufs based setup and every 6-8 power loss rendered data so useless that even fsck couldn't fix the problem. ZFS is as solid as rock.

Reply Score: 3

RE: ZFS Is a rock solid.
by Lennie on Mon 2nd Nov 2009 23:18 UTC in reply to "ZFS Is a rock solid."
Lennie Member since:
2007-09-22

Just an observation, I think 24/7 means no powerloss. ;-)

Reply Score: 1

RE[2]: ZFS Is a rock solid.
by tylerdurden on Tue 3rd Nov 2009 08:24 UTC in reply to "RE: ZFS Is a rock solid."
tylerdurden Member since:
2009-03-17

24/7 usually means they don't turn them off.

Reply Score: 2

ZFS doesn't need a fsck
by c0t0d0s0 on Mon 2nd Nov 2009 22:54 UTC
c0t0d0s0
Member since:
2008-10-16

At first a fsck doesn't solve a lot of problem. It checks the filesystem, but not the data. It's called fsck and not datackc for a reason. So we end of with a mountable filesystem, but the data in it ... that's a different story.

With ZFS you can tackle the problem from a different perspective. At first you have to keep two things in mind (sorry, simplifications ahead): ZFS works with transaction groups and ZFS is copy-on-write. Furthermore you have to know that there isn't one uberblock, there are 128 of them, (transaction group number modulo 128 is the uberblock used for a certain transaction group).

Given this points, there is a good chance, that you have an consistent state of your filesystem shortly before the crash and that it hasn't overwritten since due to the COW.

So you just have to rollback the transaction-groups until you have a state that can be scrubbed without errors .... and you have a recovered state that is consistent and with validated integrity. You just lost the last few transactions. That can't be done with a fsck tool. You can't guarantee the the integrity of the data after the system reported back to you, that the filesystem has been recovered.

You may call the results of PSARC 2009/479 something like an fsck tool, but it isn't. It just leverages the transactional behaviour of ZFS to enable other tools to do their work ( http://www.c0t0d0s0.org/archives/6067-PSARC-2009479-zpool-recovery-... )

Just to end it here: ZFS doesn't need a fsck tool, because it doesn't solve the real problem. ZFS needs something better and with all the features of ZFS in conjunction with PSARC 2009/479 it will deliver something better.

Reply Score: 4

RE: ZFS doesn't need a fsck
by Dryhte on Tue 3rd Nov 2009 07:57 UTC in reply to "ZFS doesn't need a fsck"
Dryhte Member since:
2008-02-05

Everyone who quotes that link implicitly _agrees_ with the article's premise, which is not that zfs needs 'fsck' but *that zfs needs a way to fix unmountable volumes to the point where they can be imported/mounted again*, in order for the filesystem's self healing capacities to kick in. Please try to read between the lines instead of criticising the author for using the word 'fsck'.

Reply Score: 4

fsck not needed
by tomifumi on Tue 3rd Nov 2009 08:50 UTC
tomifumi
Member since:
2009-11-03

First of all, I talk as someone who already experienced what looked liked a corrupted zpool thanks to a failing drive + a bug in the sata driver + a bug in the zfs release of the time, a really bad and rare case.

The pool consisted of a raidz of 46 500GB drives, taking approximately 18TB.

Fscking the filesystem would have just been a nightmare. In our case restoring the data from a validated and consistent source was both the fastest and the easiest option.

If you can't do a fast restore from a valid backup source and/or don't have any redondancy of your storage and machines, that means you just don't care at all of your data and your business. So I don't see why you should ask for an fsck tool in the first place.

Edited 2009-11-03 08:54 UTC

Reply Score: 4

ZFS needs real openness and Linux
by TheRealNelson on Tue 3rd Nov 2009 15:45 UTC
TheRealNelson
Member since:
2009-11-03

Lot's of filesystems do a great job of maintaining their own consistency. It's really external errors that bother a lot of us, say you drop a backup drive on the floor, I'm going to fsck it before I even attempt to mount it. It sounds like ZFS has a fsck or scrub that can run. If it can guarantee that it's mountable then you should be able to boot up a drive enough to run a fsck and verify the integrity of the filesystem from external errors. This likely pushes some of the problems elsewhere, it's a background process on a live system so you could probably have other software failures if they touch broken parts of the filesystem since it was assumed valid, other filesystems have similar problems with data being damaged though. Fscking just makes you stop everything else when you do it, when you find problems you find them in fsck and not when you database crashes for some unknown reason because the blocks on the disk were screwed up.

What ZFS really needs is to run under Linux and probably Windows and to do that it probably requires some license changes and probably some substantial attitude changes within Sun. Until that happens, its at best a bit player. The "bad hardware" problems are pretty weak as well, I can't recall hearing NTFS devs or Ext3 devs complaining about it. Part of that is Sun's management needed each and every home-run they could get as they shopped the company around and for some reason they chose to roll a filesystem out with the kind of visibility that they did. Actual support will always trump hype, if it's so perfect then give it to the rest of the world and the rest of the world will adopt it.

Reply Score: 1

Kebabbert Member since:
2007-07-27

"The "bad hardware" problems are pretty weak as well, I can't recall hearing NTFS devs or Ext3 devs complaining about it."

But you fail to notice that SUN does Enterprise storage. That is a completely different thing than commodity hard drives for Windows and Linux that doesnt obey standards, as Jeff Bonwick explains. Enterprise storage has much higher requirements, and therefore you will hear complaints from Enterprise storage people. For Linux and Windows, which does not have those high demands, nor is capable of handling such demands - anything will do. Windows and Linux are not used in Enterprise storage area. That is the reason you dont hear NTFS or ext3 devs complain about it.

Here you see that Linux does not handle Enterprise Storage, according to a storage expert. Maybe he is wrong, maybe he knows more about Enterprise Storage than most people.
http://www.enterprisestorageforum.com/sans/features/article.php/374...

http://www.enterprisestorageforum.com/sans/features/article.php/374...

Regarding "that attitude change that SUN needs", maybe you will see it quite soon as Oracle is bying SUN. SUN is the company that has released most open source code, and last year was in rank 30 of those who contributed most code to Linux kernel. We will see if Oracle will close SUN tech and charge a lot, or if Oracle will continue in the same vein as SUN. But, SUN was in the process of open sourcing _everything_, we have to see if Oracle will also open source everything they own.

Reply Score: 1

Oliver Member since:
2006-07-15

>I can't recall hearing NTFS devs or Ext3 devs complaining about it."

Usually ext3/4 devs are complaining about different applications (like KDE) that should do their very own homework. So to speak, they don't have any clue what they're actually doing. If it comes to reliable filesystems Linux is a huge disappointment. Apart from XFS, but that's another story.

Reply Score: 1

dvzt Member since:
2008-10-23

The "bad hardware" problems are pretty weak as well, I can't recall hearing NTFS devs or Ext3 devs complaining about it.


You're not listening carefully enough, then. Linux has the same problems if a disk does not honor barriers. Even funnier, on Linux barriers don't work at all even with properly working disks when LVM is in use. ZFS does not need Linux, but it seems that Linux does need ZFS.

Oh, and ZFS is 100% open, (after all it's in FreeBSD and other operating systems) too bad Linux isn't and therefore can't be integrated with foreign code, just as others can. ;)

Reply Score: 2

blu28 Member since:
2009-11-05

The "bad hardware" problems are pretty weak as well, I can't recall hearing NTFS devs or Ext3 devs complaining about it.


Of course not. Suppose that this kind of bad hardware accounted for 1% of FS corruption. It is unlikely that anybody even knows about it because it is in the noise. But now ZFS comes along and gets rid of the other 99%. Now it is responsible for 100% of ZFS file system corruption on a FS that is designed to have none. That's a big deal. In reality, the bad disks are probably responsible for even more corruption on other FS's, but since you have already accepted a bit of corruption with each crash, you can't see the difference. Remember, fsck does not get you back what you lost and arbitrarily large amounts of data corruption can still occur. But in general the amount lost and corrupted is small and everybody has learned to live with it. But now we have ZFS and and it guarantees consistent and complete data, but possibly a few milliseconds out of date in a crash, assuming the underlying disks follow the standards. Compare that to fsck where one file may be up to date, a bunch more are a few milliseconds behind, another is corrupted and another is deleted.

Up until now with ZFS, that 1% caused by bad hardware leaves the FS unusable. But with the zpool recovery just added, in that 1% of cases you end up losing a couple of seconds of data and the file system recovers virtually instantaneously, instead of scanning and rescanning and patching to get back to a inconsistent state with some data from a few seconds ago and some up to date as happens with fsck.

Reply Score: 1

Comment by DOSguy
by DOSguy on Wed 4th Nov 2009 11:39 UTC
DOSguy
Member since:
2009-07-27

[quote]if you want reliability, you need hardware which is not that crappy.[/quote]

Well, how do I to know what hardware meets ZFS's requirements. Does something like a ZFS HDD 'compatibility' list exist?

Reply Score: 1

RE: Comment by DOSguy
by Kebabbert on Wed 4th Nov 2009 12:06 UTC in reply to "Comment by DOSguy"
Kebabbert Member since:
2007-07-27

Here is hardware compatibility list with OpenSolaris:
http://www.sun.com/bigadmin/hcl/data/os/

For ZFS to play well with your hardware, I dont know of such a list. To be truly sure, which hardware plays by the rules and dont breaks any standards, you have to buy Enterprise stuff.

Or, you could try to see which components SUNs storage servers use, and buy them components.

Reply Score: 2

RE[2]: Comment by DOSguy
by dvzt on Wed 4th Nov 2009 21:02 UTC in reply to "RE: Comment by DOSguy"
dvzt Member since:
2008-10-23

To be truly sure, which hardware plays by the rules and dont breaks any standards, you have to buy Enterprise stuff.


No you don't. You can use any kind of disks assuming they are not broken.

Reply Score: 2

RE[3]: Comment by DOSguy
by Kebabbert on Wed 4th Nov 2009 22:23 UTC in reply to "RE[2]: Comment by DOSguy"
Kebabbert Member since:
2007-07-27

Yes, I use ordinary Samsung 1TB spinpoint and they work fine.

However, some of these cheap hardware does not adhere to standards. Then it can be problem - if you do something unusual than just using the hardware. Maybe hot swapping discs, etc. Hot swapping discs must be supported by the drives and the card, and there were some other issues too. I can not remember right now.

What I am trying to say is that if you just use your hardware as normal, and do not try to use unusual functionality (without confirming) then everything is fine. But for instance, that crazy guy that installed OpenSolaris in VirtualBox, ontop Windows XP and then created ZFS raid with 10TB - I do not consider it as normal usage. First of all run everything ontop WinXP is just a bad idea. And on top of that VirtualBox is slightly unstable, and has different unusual quirks that ZFS does not expect. That is the reason he lost his data. Many levels of fail.

If you want to use unusual functionality, first confirm it follows the standards, etc. If you use normal plain functionality, everything is fine.

Reply Score: 2

RE: Comment by DOSguy
by dvzt on Wed 4th Nov 2009 20:57 UTC in reply to "Comment by DOSguy"
dvzt Member since:
2008-10-23

Well, how do I to know what hardware meets ZFS's requirements. Does something like a ZFS HDD 'compatibility' list exist?


You totally misunderstood. Mentioned "crappy" disks are bad. Malfunctioning. Broken. If you have such a disk, you should get it back to the shop and reclaim your money. ZFS doesn't have any special requirements for disks.

Reply Score: 2

No need for fsck. Period.
by Kebabbert on Wed 4th Nov 2009 22:46 UTC
Kebabbert
Member since:
2007-07-27
RE: No need for fsck. Period.
by Kebabbert on Thu 5th Nov 2009 18:36 UTC in reply to "No need for fsck. Period."
Kebabbert Member since:
2007-07-27

Two reasons:
"One: The user has never tried another filesystem that tests for end-to-end data integrity, so ZFS notices more problems, and sooner.

Two: If you lost data with another filesystem, you may have overlooked it and blamed the OS or the application, instead of the inexpensive hardware."

Reply Score: 2

At the end ...
by c0t0d0s0 on Thu 5th Nov 2009 22:00 UTC
c0t0d0s0
Member since:
2008-10-16

... i don't like the concept of an fsck out of a completely different reason: With fsck you press you data in the form your filesystem expect. When you are lucky enough, it's the same form your data was before, but most often it isn't.

When you rule out bit rot by checksums, cheap and crappy hardware by transaction rollback, power failure with ZIL and always-consistent on-disk-state, this would leave just software bugs to an fsck. But i think, such problems should be handled in the filesystem itself like enabling the code to read the buggy structure and fix it simply by rewritting it correctly the next time, not with a sideband tool.

The advantage: The fsck just put it into the expected form, a bug fix to the code understands the problem and can do exactly the right steps to fix the bug in the structure and not just pressing it into the expected form.

BTW: I've wrote a rather long piece to this topic in my blog: http://www.c0t0d0s0.org/archives/6071-No,-ZFS-really-doesnt-need-a-...

Reply Score: 1

Comment by kernpanic
by kernpanic on Thu 5th Nov 2009 23:22 UTC
kernpanic
Member since:
2008-03-15

It seems to me the transactional nature of ZFS, the checksums, the 'scrub' command and the recently added zpool recovery feature all negate the need for a fsck utility.

Reply Score: 1

RE: Comment by kernpanic
by zlynx on Sat 7th Nov 2009 00:28 UTC in reply to "Comment by kernpanic"
zlynx Member since:
2005-07-20

It seems to me that a script called fsck.zfs containing a zpool restore and a scrub command would satisfy everyone.

Reply Score: 2

RE[2]: Comment by kernpanic
by Dryhte on Sat 7th Nov 2009 06:00 UTC in reply to "RE: Comment by kernpanic"
Dryhte Member since:
2008-02-05

Including the OP.

Reply Score: 1