Home > Solaris > ZFS–the last word in file systems ZFS–the last word in file systems Eugenia Loli 2004-09-15 Solaris 43 Comments The ZFS, breakthrough file system in Solaris 10, delivers virtually unlimited capacity, provable data integrity, and near-zero administration. About The Author Eugenia Loli Ex-programmer, ex-editor in chief at OSNews.com, now a visual artist/filmmaker. Follow me on Twitter @EugeniaLoli 43 Comments 2004-09-15 10:43 am I thought DBFS were the last word in file systems? 2004-09-15 10:55 am Sounds really sweet, no need for volume managers, great! 2004-09-15 11:12 am no database / metadata features are mentioned. so, they are right: this is the last word of the old-fashioned filesystem. the rest of the world is moving to DBFS… 2004-09-15 11:20 am isn’t this dbfs stuff a bit overrated? IMHO a dbfs should be an option for folders where it makes sense and implemented in user-space. where is the sense in indexing folders like /var? it only takes away performance and comes with no additional benefit – still, for /home/xyz it could be nice… 2004-09-15 11:34 am After skimming the article on ZFS I have to admit that as a sysadmin I want this filesystem. If it really lives up to the “sunshine” that Sun is pushing in their article, it could mean that life as a sysadmin is going to get a lot better 2004-09-15 11:41 am The vision of DBFSs is not restricted to the desktop, it is extended a lot ahead. Think about a network like a whole server, where the information could be shared across all machines without the need of replication… thats great. I see the idea behind DBFSs going much far, putting the same consept in other areas beside files. Where is the sense in indexing folders like /var? Don’t activate indexing for the FS that you don’t want to. 2004-09-15 11:47 am This is indeed strange. They might claim that they can move disks between different endian boxes, but the data on those disks is another matter surely? How do they not cost to move disks? Surely they don’t keep every pointer in the filesystem in both endians at all times? It is hard to imagine how they can claim zero cost, although negliable cost would be easily defendable? 2004-09-15 12:05 pm We’ll know once the patent is published… 2004-09-15 12:22 pm Suns outdated approach to their market will make sure ZFS never gets as far as it should. No one wants to spend money for truly reliable software and hardware. Plus, support is a throw in nowadays.. not something you’d PAY for. 2004-09-15 12:26 pm DBFS will never replace normal filesystems. With DBFS I mean complex, database-prone filesystems like WinFS or GnomeFS. The key to a sucessful filesystem is simplicity! MacOS resource branches have failed, BeOS db-filesystem has failed, the M$ registry (a special “config-filesystem”) is failing every day. I think only Reiser4 is moving into the right direction by making everything a file and trying to remove features instead of adding them. 2004-09-15 12:38 pm >The vision of DBFSs is not restricted to the desktop, it is >extended a lot ahead. Think about a network like a whole >server, where the information could be shared across all >machines without the need of replication… thats great. I >see the idea behind DBFSs going much far, putting the same >consept in other areas beside files. > >>Where is the sense in indexing folders like /var? > >Don’t activate indexing for the FS that you don’t want to. Of course you can do nifty stuff with dbfs’s over the net, but the main point is, that there ist no sense in sharing / indexing unimportant stuff like system-files … My main point is: store all files on some (high-performance) filesystem and for specific folders / files (home/data folders) add a transparent database layer ontop of it.. 2004-09-15 12:39 pm endianness is really trivial; they just need a bit in the file system’s disk bootblock to say what endian the data is laid out in. 2004-09-15 1:12 pm Sounds very close to WAFL. Always consistent disk image, changed pointers after data is written, etc. 2004-09-15 1:14 pm “Neither architecture pays a byte-swapping tax due to Sun’s patent-pending “adaptive endian-ness” technology”. Knowing which endian a disk is is one thing; doing all access without ‘byte-swapping tax’ seems to be something else? Although you might move a disk from one endianed architecture to another, the applications that use that disk also need to know the endianness of all (binary) data on that disk too? 2004-09-15 1:31 pm This looks like phase-tree, with my checksum suggestion. Arrrgh!!! Daniel Phillips had this for Linux a couple years ago. He chickened out due to the remote possibility that the NetApp WAFL patent could somehow possibly be construed to cover the phase-tree algorithm. 2004-09-15 1:32 pm ”Knowing which endian a disk is is one thing; doing all access without ‘byte-swapping tax’ seems to be something else?” That’s called ‘marketing spin’ — who cares really, the point is that the file system supports various endian architectures transparently. ”Although you might move a disk from one endianed architecture to another, the applications that use that disk also need to know the endianness of all (binary) data on that disk too?” That goes without saying. If you’re storing data in, say, XML format – you won’t have a problem. It all depends. 2004-09-15 1:36 pm Swapping bytes is way cheaper than the “if” you might use to see it swapping is required. Thus, all modern Linux disk filesystems have a fixed on-disk format. Ext3 is kind of funny. The log file is big-endian, while everything else is little-endian. It doesn’t really matter, as long as you choose. UFS is messed up. Every access to the metadata has to check the superblock to determine if byte swapping is needed. 2004-09-15 1:46 pm Will we be able to use it on anything other than Solaris… 2004-09-15 2:05 pm My sources at Sun say that zfs has been renamed dfs so what gives? Now that Sun doesn’t have such a cozy relationship with Veritas they are trying to cut them out of the picture. Surely this project has been going on for some time but I’m a bit afraid that perhaps it is being rushed to market for reasons other than it being ready. All that being said, I’m looking forward to trying out zfs/dfs and I’m watching announcements. As soon as this is available on Solaris Express I plan on being the first person on my block to be running it. The pools idea is not exactly new. Dec/HP/Compaq had a similar concept on Tru64’s Advanced File System. Sun’s implementation looks a bit cleaner from its description. 2004-09-15 2:07 pm My sources at Sun say that zfs has been renamed dfs so what gives? Not likely. Windows now includes “Distributed File System,” or DFS, with their server products. I don’t think Sun could escape that. 2004-09-15 2:46 pm > I think only Reiser4 is moving into the right direction > by making everything a file and trying to remove features > instead of adding them. I don’t think so. Although DBFS seams more user oriented and not that usefull to the server market it can bring many advantages. One, as refered, is a leap in network filesystems and the other is the possibility to talk about business objects instead of file records right from the start. 2004-09-15 3:01 pm My understanding is the dbfs will sit on top of the existing file system IE: xfs, ext3 and so on. Who, says dbfs cannot be part of zfs? dbfs seems like a lot of overhead for /var, /boot and so forth, dbfs seems like a decent idea for users but not system wide. Just mho 2004-09-15 3:04 pm > This looks like phase-tree, with my checksum suggestion. Is this: http://www.ussg.iu.edu/hypermail/linux/kernel/0107.2/0698.html ? 2004-09-15 3:14 pm It looks like ZFS is a logging filesystem … the idea is not new, just take a look at NetBSD’s LFS. AFAIK Linux has one such thing in development too, but my memory stops there. And by the way, DBFS seems a bad idea to me. This sort of things definitely belong to userland, there is no need to bloat the kernel with such nonsense. 2004-09-15 4:24 pm > DBFS will never replace normal filesystems. OS/400; over a decade old… > With DBFS I mean complex, database-prone filesystems like WinFS or GnomeFS. That’s because WinFS and GnomeFS are the patch method to DBFS; that is the system was designed for a traditional file system, not a DBFS. -uberpenguin 2004-09-15 4:32 pm Not to mention DCE DFS, which has been aroun@ 2004-09-15 4:57 pm There’s an expert exchange on ZFS: http://www.sun.com/expertexchange starting in a few minutes (10am-11am Pacific Time), if you want your questions answered by the engineers working on the project. 2004-09-15 5:02 pm Nope. The BeOS (may its soul rest in peace) was the first 128-bit filesystem. Bob Replies may be directed to “firstname.lastname@example.org”. Talk to me for an invite. 2004-09-15 5:10 pm If Moore’s Law holds, in 10 to 15 years people will need a 65th bit Not likely! I’m not sure where they got that but since filesystems are just now bumping into the 32 bit limit (with 2tb filesystems that have 512byte blocks) it should take 32 18 month doubling cycles to hit a 64 bit limit. By my math that puts us at 48 years from now assuming Moore’s law (which is for processors, not disks) holds up. It’s unlikely that it will since the demands of the market will not support such large storage systems (there won’t be a lot of demand soon for 1000 terabyte systems no less systems that are 8 million times that big. Regardless of when we hit the 64 bit limit, no doubt ZFS won’t be around anymore in it’s current form. Even if it was, I highly doubt it would realistically be capable of managing a 8 million petabyte storage array. Trying to advertise 128 bits as a feature now is ridiculous. You might as well tout the amazing capacity offered by 256 bits. I could see building an amazingly huge storage system cost effectively using this and a bunch of Promise RM15000’s though (drool.) 2004-09-15 5:17 pm I admire Osnews and its readers for diversity of the information I can get here. But sometimes I just don’t get it. For so long SUN has been bashed here that its not innovative. Now they come up with something that at least seems impressive. And instead of being constructive the majority of the comments are that the someone started doing some rough sketches of something similar for [magic words]LINUX[/magic words] 5 years ago, and that DBFS (a filesystem that practically does not exist) is much better. Where that negativism for SUN comes from? Despite some stupid ideas (that Java Desktop) SUN are still producing some fine software and hardware and I think that Solaris 10 and ZFS are Good Things.And the fact that I cannot afford a SUN machine does not mean that the company is rubbish. BTW anyone knows if we can read that pending patent somewhere? I’m just curious to see that they’ve done. 2004-09-15 5:26 pm i gotta agree with pavel, i often find myself bashing corps. just because they follow what clients want and are ready to pay for…i think it’s in every human: we gotta bash someone/somethinig – let’s all together bash on one compagny… just kidding – but i still agree with pavel 2004-09-15 5:26 pm > Not likely! … filesystems are just now bumping into the 32 bit limit Remember that ZFS also encompasses volume management, and people are already getting quite close to petabytes. And on-disk filesystem formats last a *long* time. – jonathan 2004-09-15 6:01 pm > Suns outdated approach to their market will make sure ZFS never gets as far as it should. > No one wants to spend money for truly reliable software and hardware. If your company depends on reliable hardware (like webservers), then they will (and does) pay for software. BTW: Isn’t Solaris going to be open source anyway? 2004-09-15 7:13 pm “The BeOS (may its soul rest in peace) was the first 128-bit filesystem.” “The address space in BFS is 64-bit, meaning that the theoretical maximum file size on a BFS volume is 18,000 petabytes ” – see http://www.osnews.com/story.php?news_id=421&page=13 2004-09-15 7:18 pm I don’t know how much OSNews folks know about IBM iSeries (aka AS/400’s) but they have been doing this for many years. Every feature mentioned in this article -barring the 128 bit – already in use. For many years I have been laughing at *nix administrators and pittying them. For them, adding a disk to a system is a major event, where as all we need to do is add the disk to the system and tell it to use it (and all the while the system is serving online users). May be Eugenia should start a section for this great OS. 2004-09-15 7:46 pm I think the primary problem of layering a DBFS on top of a traditional filesystem is that it prevents the database layer from taking advantage of the on-disk format of the FS layer to optimize searches. Still, I don’t think it’s really a problem for Sun and it’s ZFS, because DBFS is more of a client thing, not a server thing. 2004-09-15 11:28 pm One thing I didn’t see mentioned either in the original article or comments so far is whether or not they support memory-mapping (map shared) fully, including concurrent updates to the shared file (all by memory-mapping), assuming the proper barrier/fence instructions are used. 2004-09-16 1:52 am Given that SunOS 4.x was (IIRC) the first Unix to do mmap/read/write consistency, and that VM is still a basic part of SunOS 5.x, and Sun has a huge commitment to binary compatibility (which lacking this, zfs would break), it seems quite likely that it will be supported. 2004-09-16 5:09 am my comment is very simple. reiserfs intends to use a layer above the storage manager for its database, and if Hans Reiser thinks thats a good idea I’m gonna say it is. however when it comes to maximum amount of storage I still don’t get why they don’t do something like a recursive block manager (so that if they have a limitation of a certain size, as soon as they reach that size they make another largets block, move away and put a block manager on these blocks. that way, when the block managers get too many to fir in the bits you simply do it all over again.) but, I’m just a client and have never worked on something like it before, thus I may be completely wrong 2004-09-16 8:45 am I like it. ZFS will beat Linux in the market. 2004-09-16 9:06 am I agree. Linux will need some significant improvement to compete with Solaris 10 in server land. It looks like Sun has some of the best features from several different Linux distros rolled into one package. Now we get to see how long it takes Linux to catch up. 2004-09-16 10:35 am ZFS will not appear in the first release of Solaris 10 (Gold Master), apparently it will appear later on next year. IIRC they’ve got to get it working as root yet, and apparently getting things working on x86 isn’t exactly a walk in the park due to the fugly design of the x86 architecture, namely the BIOS. 2004-09-16 6:52 pm I read their whole write up on this and all I can say is that it looks like a complete copy of Novell NSS FS. Not sure where they get the idea that they are innovating or are the last word in file systems. Only architectural difference I can see is that ZFS is 128 bit whereas NSS is 64bit. But, as far as all the other concepts are concerned Novell has been shipping that since Netware 5 and is currently being ported to Linux. NSS support in Linux should be available by year end.