The days of the plain filesystems like FAT32 and ext2 seem to have past. Newer operating systems are offering journal, 64-bit filesystems, with features like supporting terrabytes of filesizes or attaching attributed meta-data in them. Today we are interviewing (in a given set of questions) the main people behind IBM’s JFS, NameSys’ ReiserFS and SGI’s XFS. Read on about the status of their filesystems, their abilities and what they are aiming for the future.
Steve Best from IBM, for JFS
1. What is the current status of JFS for Linux? Is it 100% ready for a production level environment?
Steve Best: Yes, We did our 1.0.0 release which was production level ready on 6/28/01.
2. What are its biggest differences (good or bad) when compared to ReiserFS and XFS?
Steve Best: Feature wise Juan I. Santos Florido, did a very good article comparing the journaling file system being developed for Linux. The article is called “Journal File Systems” was published by the Linux Gazette on 7/2000.
Juan used some of the information for his article about JFS by viewing a white paper that Dave Kleikamp and I have wrote describing the layout structures that JFS uses.
3. What are the differences between the Linux version of JFS and the one found on OS/2?
Steve Best: The JFS for Linux is a port from OS/2 and has an OS/2 compatibility option. The OS/2 source was also used for the JFS2 just release on AIX 5L. There is a JFS1 on AIX and we didn’t use this source base, since the OS/2 source base was a new “ground-up” scalable design started in 1995. The design goals were to use the proven Journaling File System technology that we had developed for 10+ years in AIX and expand on that in the following areas: Performance, Robustness, and SMP support. Some of the team members for the original JFS designed/developed this File System. So this source base for JFS for Linux is now on the following other Operating Systems:
OS/2 Warp Server for e-business 4/99
OS/2 Warp Client (fixpack 10/00)
AIX 5L called JFS2 4/01
4. Has JFS made its way to be included as an option on the Linux kernel?
Steve Best: Not yet, Our plan is to submit a patch for the 2.5 kernel tree when that opens up and then back port it to the 2.4.x series of the kernel. This plan might change if the 2.5 development tree for the kernel doesn’t open up soon. JFS is in the process of being included in several different Linux distributions right now and you will see it being shipped shortly in the next releases that come out in the September 2001 time frame. The ones I know about right now are Mandrake, SuSE, TurboLinux.
5. What is the fragmentation policy of JFS and how is it dealing with it?
Steve Best: JFS uses extents, so that reduces the fragmentation of the file system. We do have defragmentation utility that will do defragmentation on the file system.
6. Does JFS has support or plans to support arbiraty meta-data (multiple-stream attributes as some call them)?
Steve Best: The source of the OS/2 has extend attributes which I believe are similar to multiple-stream attributes. We still need to move that support over.
7. One of the qualities found on BeOS’s BFS is “live queries”. Live Queries is a query that sends (automatically) deltas when the result changes. They stay “open” after the first returned result row. Is there a plan for such a support on JFS?
Steve Best: I’m not totally familiar with this BeOS’s “live queries”, but I think this is similar to a change notification mechanism that a file system manager would use to say I want to be notified if a file has been deleted, so if the file system manager was displaying that sub dir it could remove the icon of that file? This type of mechanism would need to be supported by both the virtual file system layer and the file system. Currently I don’t believe Linux supports this. JFS did support this type of mechanism in OS/2 so it won’t be that hard to add it to JFS if and when Linux would support this.
8. Which part of JFS you would like to change, update, extend or even re-design? What are the future plans for JFS?
Steve Best: The design for JFS is proven now on several different operating systems and I currently see no major change in this design. We still have performance tuning to do for JFS for Linux and this will be one of our major areas that we will work on in the coming months.
One good part of working in the file system area of Linux right now is that there are 4 journaling file systems being actively developed and all are GPL, so it is possible that each file system can improve by sharing the best design points.
The JFS team has worked together for several years and started the port of JFS to Linux in December of 1999. We took the approach of release early and often. Our first release was done on 2/2/2000 and we have done 41 releases so far. In general we do a new release about every 2 weeks and if possible provide patches to problems as they are fixed. We still have new features to add to JFS and will continually enhance this file system.
Some of the goals of the JFS team are to have JFS run on all architectures that Linux supports. With the help of the Linux community JFS has run on (ia32, ia64, PowerPC 32 and 64, Alpha, s/390 31 and 64 bit, ARM). The community
members are helping us debug some problems that JFS has on SPARC and PA-RISC, so we should have this architectures running shortly. JFS has no architecture specific code included in it.
Another goal is have JFS included in the kernel.org source tree. When we started the port the team decided that we weren’t going to change any of the kernel and this would allow JFS to be easily integrated into the kernel.
Hans Reiser from NameSys, for ReiserFS
1. What is the current status of ReiserFS for Linux? Is it 100% ready for a production level environment or there are distros that still do not “trust” the FS?
Hans Reiser: We are the preferred filesystem for SuSE. I have been told by the author of LVM that 90% of his users on his mailing list use ReiserFS. This surprised me, but he was sure it was correct. LVM users are of course not representative of FS
users as a whole, as LVM users tend to be sysadmins of large machines who tend to need journaling. If you need journaling, we are the most stable FS available. We have many more users in Germany, where SuSE is dominant, than
All filesystems and all OSes have bugs. At this point your chances of hitting a software bug in ReiserFS are far less than your chances of bad disk, bad ram, bad cpu, or bad controller corrupting your data. We have patches available on
our website that hopefully Linus will put in the main kernel when he gets back from vacation, but I have to admit that I haven’t bothered to put them on my laptop yet :-). We have no outstanding unsolved bugs at this moment, and while I am sure that there are still a few in there somewhere, ReiserFS is getting really quite stable when we can go for weeks with no new bug reports with the number of users we have. I think that we are now finally even a little more stable in 2.4 than we were in 2.2, and whether a ReiserFS user should use 2.4 or 2.2 is not dependent on ReiserFS anymore. As you probably know, there are other layers (VFS, MM, etc.) that are less stable, but I think that things will settle down real soon now. VFS seems to have finally gotten stable in just the last few weeks, and I am sure that the memory management layer will get fixed real fast once Linus is back from vacation in Finland.
We have minor feature improvements (relocatable journal, bad block handling, etc.) that are waiting for 2.5.1, and are in Alan Cox’s tree while we wait. I suppose it is possible that they could end up in 2.4 before 2.5.1 if 2.5.1 is delayed long enough. To my surprise they are not generating bug reports from the -ac series users. I am not sure whether I should conclude from this that they are bugfree, there may be a lot less -ac users than Linux users generally.
With regards to particular unnamed distros…. :-)…. Stability is not the issue, ReiserFS is known to be stable by the people who use it. SuSE is known to worry more about stability much more than the unnamed untrusting distros you mention (think of how SuSE waited for 2.4.4 before shipping 2.4 as the default, think gcc…), and we are the SuSE default.
I used to think that it was politics that was the reason why positions in discussions of ReiserFS on linux-kernel prior to our acceptance by Linus are predictable by what distro the poster works at, but more and more I am coming to see that the difference is one of style, and that what style the developer embraces is semi-predictable by distro. Different people adopt change at different rates. ReiserFS has at its heart some of the same lust for change that BeFS has. You probably don’t realize how scary it is to most old time Unix filesytem developers to talk about adding new semantics to the filesystem namespace like we describe at here, or here, or like BeFS has already done. What many distros want in Linux is simply what Unix has, but free, and nothing much more.
SuSE has an exceptional head of R&D, Markus Rex, who understands the deep things before they are something real yet. They then combine this with a quality assurance team lead by Hubert Mantel, that is also exceptional in the industry. The result is that with SuSE you tend to get cutting edge technology that works. I think it is in part because they are so fanatical about quality assurance, and good at it, that they have the confidence to adopt change a bit earlier than others who get burned just changing the compiler for an unchanging
Ok, ok, you can tell which style I like, but we always have to be careful to not disrespect the other styles. They also work, and have different advantages. Linux could not have developed as fast as it did without the folks who just copied what worked in Unix, and did a damned good job of making it work in Linux. The beauty of Linux is that users can choose the distro that matches their particular style.
2. What are its biggest differences (good or bad) when compared to JFS and XFS?
Hans Reiser: JFS provides an easy migration path for IBM’s current JFS users who are seeking to migrate to Linux. I think this is the primary objective of the JFS project. It has decent performance, there is nothing bad about JFS, but you should look at the benchmarks before using it for non-migration purposes.
XFS is an excellent file system, and there is an important area where XFS is higher performance than we are. ReiserFS uses the 2.4 generic read and generic write code. Using this made for a better integration into 2.4, but it was a performance mistake for large files. ReiserFS does a complete tree traversal for every 4k block it writes, and then it inserts one pointer at a time into the tree, which means that every 4k write incurs the overhead of a balancing of the
tree (which means it moves data around. For this reason, XFS has better very large file performance.
However, they are a bit slow with regards to medium sized and small files. It seems that Chris Mason implemented an exceptionally good journaling implementation for ReiserFS, with much less overhead than other journaling implementations. He likes to say that there is nothing innovative or interesting about his code, but…. he avoided all of the usual performance mistakes in implementing journaling, and I think that is a form of innovation:-)…. XFS is slower than reiserfs for the typical file size distributions on “typical” file systems, and I encourage you to examine our
benchmarks, where you will see that they are faster for very large file writes, and slower for typical file sizes. The benchmarks provide a lot more details on this. The upshot is that whether you should use XFS or ReiserFS depends on what you want to use it for. If you want the most widely tested journaling file system for use with “typical” file sizes, then use ReiserFS. If you want to stream multi-media data for Hollywood style applications, or use ACLs now rather than wait for Reiser4, you might want to use XFS.
This is going to change. XFS is going to go into Linux 2.5/2.6 (they make changes to the kernel that are considered 2.5 material, and thus are not in 2.4), and I just bet you that by 2.6 they will have improved their “typical” file size performance by the time 2.6 ships. You can be 100% sure that Reiser4’s large file performance will be far faster. We are writing the new code now, and, well, I like it….;-) 2.6 is far enough away that you are seeing the first lap of a race to good performance by XFS and ReiserFS, it is too early for the users to know how much our large file performance will
increase, and how much their small file performance will increase. I would say that we were lucky to make the code cut-off point for 2.4, except that my guys worked every weekend for 6 months getting the 2.4 port done so that when Linus
announced code freeze we could send him a patch immediately.
Our performance is better than ext3 (www.namesys.com/benchmarks.html) because of the great job Chris did with journaling, but ext3 is written by excellent programmers who do good work.
3. What is the fragmentation policy of ReiserFS and how is it dealing with it?
Hans Reiser: This is an area we are still experimenting with. We currently do what ext2 does, and preallocate blocks. What XFS does is much better, they allocate blocknrs to blocks at the time they are flushed to disk, and this allows a much more efficient and optimal allocation to occur. We knew we couldn’t do it the XFS way and make code freeze for 2.4, but reiser4 is being built around delayed allocation, and I’d like to thank the XFS developers for taking the time to personally explain to me why delayed allocation is the way to go.
4. Does ReiserFS has support or plans to support arbiraty meta-data (multiple-stream attributes as some call them)?
Hans Reiser: Yes, these are what we are doing for DARPA in Reiser4.
5. One of the qualities found on BeOS’s BFS is “live queries”. Live Queries is a query that sends (automatically) deltas when the result changes. They stay “open” after the first returned result row. Is there a plan for such a support on ReiserFS?
Hans Reiser: Someday we should implement it, but it will be post version 4.
6. Which part of ReiserFS you would like to change, update, extend or even re-design? What are the future plans for ReiserFS?
Hans Reiser: For version 4 we are gutting the core balancing code, and implementing plugins. We think plugins can do for filesystems what they did for photoshop. We are making it easy to add new types of security attributes to files. We are implementing ACLs, auditing, and encryption on commit, as example security plugins. We are moving from “Balanced Trees” to “Dancing Trees”. We support inheritance of file bodies and stat data. This page describes this in detail. These are features we will deliver by Sep. 30 of next year.
In the long term, we very much share the BeFS vision of enhancing the file system namespace semantics. We think we have some theoretical advantages in our semantics, you can see our semantics in detail here, but I think a lot of what the BeFS authors have done. I would be curious to hear your personal experiences as to what works and does not work. I understand that they had some performance problems that required them to make some design sacrifices. (You probably know a lot more about this than I, please cut these words if I am wrong.) I think we will have performance that will make it unnecessary to sacrifice such semantic design elegance, but we will do this at the cost of getting our semantic innovations to users later than BeFS did. Right now we only offer a high-performance traditional Unix file system. This is not our goal, adding search engine and database semantic features into the FS namespace is our goal, but we wanted to get good performance for traditional file system usage first before adding the database functionality, and now we find that performance is interesting also, and so it distracts us, and…. version 4 will have some semantic innovations, but most of what we discuss in the whitepaper will wait for a version after that. I can say though that we have laid a nice foundation for that future work.
Nathan Scott from SGI, for XFS
1. What is the current status of XFS for Linux? Is it 100% ready for a production level environment?
Nathan Scott: The current stable version of XFS is 1.0.1. There are many people we know of using XFS in production environments on Linux today, so yes, it is production ready.
There were some good examples on the XFS list just yesterday – see the “production server” thread – here’s one very positive quote, for example:
“I certainly think so; XFS runs on my Compaq/Red Hat 6.1 server with a 30 GB, 7200 RPM IDE drive. The server is a file/web/ streaming media server; it runs just fine. Fast as hell with Samba, and the ACL support is great. I use a CVS version from a while back, and that is stable as hell. Never crashed, as a matter of fact, no problems whatsoever.”
It’s very rewarding as a developer to read this sort of stuff!
2. What are its biggest differences (good or bad) when compared to ReiserFS and JFS?
Nathan Scott: Each filesystem will offer a different set of features and different performance characteristics, so naturally one should always choose the filesystem most appropriate for their specific workload. I’m not deeply familiar with the implementations of these other filesystems so can’t really provide a good contrast.
Some of the features of XFS which other filesystems often do not have would be:
– Direct I/O
– Fast recovery after unplanned shutdown
– Extent-based space management with either bulk preallocation
or delayed allocation; this maximizes performance and minimizes fragmentation
– Journalled quota subsystem
– Extended attributes
– Access Control Lists (integrated with the latest Samba too)
– XDSM (DMAPI) support for Hierarchical Storage Management
– Scalability (64 bit filesystem, internal data structures and algorithms chosen to scale well)
XFS also has a fairly extensive set of userspace tools – for dumping, restoring, repairing, growing, snapshotting, tools for using ACLs and disk quotas, and a number of other utilities. We were fortunate to have had XFS around for a number of years on IRIX when the Linux port began, so these tools (and indeed the core XFS kernel code) have been
extensively used in the field on IRIX and are very mature.
3. What are the differences between the Linux version of XFS and the one found on Irix?
Nathan Scott: In the original IRIX implementation, the buffer cache was extensively modified to optimally support various features of XFS, in particular for its ability to do journalling and to perform delayed allocation.
This has become a fairly complex chunk of code on IRIX and is very IRIX-centric, so in porting to Linux this interface was redesigned and rewritten from scratch. The Linux “pagebuf” module is the result of this – it provides the interface between XFS and the virtual memory subsystem and also between XFS and the Linux block device layer.
On IRIX XFS supports per-user and per-project disk quota (“projects” are an accounting feature of IRIX). On Linux we had to make some code changes to support per-group instead of per-project quota, as this is the way quota are implemented in Linux filesystems like ext2 (and Linux has no equivalent concept to “projects”).
There are some more esoteric features of XFS on IRIX that provide file system services customized for specific demanding applications (e.g. real-time video serving), and these have not been ported to the Linux version so far.
4. Has XFS made its way to be included as an option on the Linux kernel?
Nathan Scott: No, not at this stage. Although this has been our long-stated goal and we are continually working towards inclusion of XFS in both Linus’ tree and the standard Linux distributions.
5. What is the fragmentation policy of XFS and how is it dealing with it?
Nathan Scott: XFS was designed to allocate space as contiguously as possible. It’s an extent-based filesystem, has features like delayed allocation, space preallocation and space coallescing on deletion, and goes to great lengths in attempting to layout files using the largest extents possible (an “extent” being an offset and a length within a file).
The ability for XFS to _not_ fragment the filesystem is such that XFS on IRIX survived for many years without any filesystem reorganisation tool, but eventually the need surfaced in a specific application environment, and a defragmenter was developed. This tool attempts to redo the allocation for a file and if a more optimal block map can be created, switches the file over to using that block map instead.
6. Does XFS have support or plans to support arbitrary meta-data (multiple-stream attributes as some call them)?
Nathan Scott: XFS supports “extended attributes” – arbitrary, small key:value pairs which can be associated with individual inodes and which are treated as filesystem metadata. These are used to store ACLs, MAC labels (on IRIX), DMAPI data, etc, and user-defined attributes. In practice they are intended for use to augment the metadata associated with an inode, rather than the more exotic uses that the “non-data fork” is designed for in some filesystems. Multiple data streams within a single file is something else entirely, and XFS does not support that concept.
7. One of the qualities found on BeOS’s BFS is “live queries”. Live Queries is a query that sends (automatically) deltas when the result changes. They stay “open” after the first returned result row. Is there a plan for such a support on XFS?
Nathan Scott: No, XFS does not support anything like this, and I’m not aware of any plans to implement such a feature.
8. Which part of XFS you would like to change, update, extend or even re-design? What are the future plans for XFS?
Nathan Scott: There are a number of areas of development in XFS, on both IRIX and Linux. I’ll talk about Linux only here.
On Linux we are actively working on the IA64 port – the intention is for the 1.0.2 release to support that architecture. There are folk in the open source community working on ports to a variety of other architectures as well (in particular, the PowerPC and Alpha porters and seem to have had a great deal of success).
Work is ongoing in the “pagebuf” code which I talked about before – it currently imposes the requirement that the filesystem block size be equal to the pagesize, and that restriction is going to be eased somewhat in a future release.
In the longer term, there is plenty of other work planned too – as one example, there is some work earmarked for the write code path in order to do multi-threaded direct IO writes.
Obviously, being included in Linus’ kernel is an important goal. I know of several distributions that are currently working on supporting XFS natively – Suse employees post to the XFS mailing list quite frequently; Andi Kleen from Suse in particular has been very active in stress testing XFS and fixing problems, and Jan Kara (also from Suse) was instrumental in helping us get the journalled XFS quota support into the base Linux quota tools. Also, the next stable Debian release will contain an XFS kernel patch and the XFS userspace tools.
So, there is still plenty of work to be done, but at this stage I’d say we now have a very stable foundation and a good base for moving forward.