The 2006 Linux Filesystems Workshop

Submitted by jonsmirl 2006-07-22 Linux 40 Comments

The Linux file systems community met in June 2006 to discuss the next 5 years of file system development in Linux. Organized by Val Henson, Zach Brown, and Arjan van de Ven, and sponsored by Intel, Google, Oracle, the Linux File Systems Workshop brought together thirteen Linux file systems developers and experts to share data and brainstorm for three days. Read here, here and here.

About The Author

Eugenia Loli

Ex-programmer, ex-editor in chief at OSNews.com, now a visual artist/filmmaker.

Follow me on Twitter @EugeniaLoli

40 Comments

2006-07-22 1:57 am

butters
If you have any interest in filesystems, this is a fascinating read. It’s amazing that such innovating thinking is going on, and that such productive collaboration is going to forge the future of Linux filesystem. I was especially intrigued by the “chunkfs” concept. Very cool.

Here’s a bunch of filesystem and I/O programmers, including one of the ZFS developers, agreeing that we’re nowhere near where we have to be to scale with filesystem sizes, and that we need to do better. And we’ll do it on Linux.

I wonder what Microsoft’s doing about this? I’m also not surprised that Hans (or anyone else from Namesys) didn’t show up.

2006-07-22 3:18 am

FlipmodePlaya
Listed among participants:

“Chris Mason, SuSE – Reiserfs developer”

I guess he’s not from Namesys per se, but at least he’s a Reiser developer. No idea if he’s involved in Reiser4 or just Reiserfs (3).

2006-07-22 4:55 am

butters
To my knowledge, none of the commercial Linux vendors employ any Reiser4 developers. That’s because (at the moment) Reiser4 is a neat concept, but doesn’t qualify as a “Linux filesystem.” ReiserFS has almost nothing in common with Reiser4, the name being a notable exception.

I’m sure Chris made some useful contributions to the talks. Maybe related to small files (where ReiserFS happens to excel). Besides his filesystem, Hans has nothing useful to contribute to the Linux community, and he’s gone out of his way to prove this over and over again. He doesn’t play well with others.

2006-07-22 3:29 pm

jonsmirl
To my knowledge, none of the commercial Linux vendors employ any Reiser4 developers. That’s because (at the moment) Reiser4 is a neat concept, but doesn’t qualify as a “Linux filesystem.” ReiserFS has almost nothing in common with Reiser4, the name being a notable exception.

This comment seems to be confused. Reiser file systems (namesys.com) get most of their funding from the US DARPA. Reiser file systems offer advanced control and encryption options that are tailored to DARPA needs. Both Reiser3 and Reiser4 are fully functioning Linux filesystems.

More about inclusion of Reiser 4 in the kernel.

http://kerneltrap.org/node/6844

Reiser3 is already in the kernel. AFAIK ReiserFS is the generic term referring to both Reiser3 and Reiser4.
2006-07-22 3:41 pm

nutshell42
That’s because (at the moment) Reiser4 is a neat concept, but doesn’t qualify as a “Linux filesystem.”

I had a machine running Linux on reiser4 (switched back because I didn’t want to compile kernels all the time) if that doesn’t qualify as Linux filesystem what’s the ZFS developer doing there?

Besides his filesystem, Hans has nothing useful to contribute to the Linux community

Yeah, this clearly makes him unsuited for a “contribute to Linux — but no filesystems” workshop like this one.

Perhaps he could take part in the 2006 Linux Filesystems Workshop or something li… oh wait.

You’re probably even right on some points but the specific arguments in your post were nonsense.

2006-07-22 9:14 pm

butters
Yeah, I’m guilty of a litte sensationalism, so what? I had several machines running Reiser4 on Linux for up to 18 months, pre- and post-reiser4progs-1.0, and I was severely disappointed across the board. I know Reiser4 runs on Linux. That’s not my point, nor is my point really about how “good” it is as a filesystem.

When Hans submitted his more-or-less standalone Reiser4 patchset to the kernel devs, he was given a laundry list of things he/Namesys needed to change to make Reiser4 fit nicely into the kernel–both in terms of functionality and organization. Hans turned the dialog nasty, rejecting the requests made by Linux filesystem maintainers such as Christoph Hellwig, and further implying that Linux VFS is a piece of crap. So, back to my statement, Hans has nothing–besides _his_ filesystem, as _he_ implemented it–to offer the Linux community. If his job didn’t absolutely hinge on his ability to get Reiser4 merged into mainline, he would have opted to take his ball and go home.

As for Val Henson, she used to work for Sun on ZFS, but now she works for Intel developing Linux filesystem and networking code. IMHO if the Linux development community had more people (especially women) like Val, we’d get a lot more done with way fewer pissing contests.

Hey, Jon! I see your interests go well beyond graphics these days. The naming conventions might not be 100% consistent, but “Reiser3” is/was always called ReiserFS in every distribution I’ve seen, and Reiser4 is always called just that. I’m not sure that Reiser3 really offers advanced control or encryption technologies (I thought that was unique to Reiser4), and neither Reiser filesystem supports block checksums nor do they remain consistent during writes.

I should point out that from the outset, the kernel devs sought to make sure that Hans/Namesys were clear on the fact that they had no intention of infringing on their right to distribute proprietary plugins for Reiser4. All flamewars aside, they think the plugin architecture is a great idea, so long as it isn’t used to replace existing Linux VFS functionality.
2006-07-22 10:39 pm

jonsmirl
Hey, Jon! I see your interests go well beyond graphics these days. The naming conventions might not be 100% consistent, but “Reiser3” is/was always called ReiserFS in every distribution I’ve seen, and Reiser4 is always called just that. I’m not sure that Reiser3 really offers advanced control or encryption technologies (I thought that was unique to Reiser4), and neither Reiser filesystem supports block checksums nor do they remain consistent during writes.

It has been several years, but as I recall it the Reiser3 encryption was done for DARPA but not released to the community as a whole. That is what spawned the concept of plug-ins. Reiser4 can now easily plug in an array of encryption options. There should be references to this in the ReiserFS mail archives.

I’m not working on local file systems, instead I have been recently interested in wide area, distributed systems.
2006-07-22 11:49 pm

chatz
As for Val Henson, she used to work for Sun on ZFS, but now she works for Intel developing Linux filesystem and networking code. IMHO if the Linux development community had more people (especially women) like Val, we’d get a lot more done with way fewer pissing contests.

That is a ridiculous statement, read some kernel mailing lists and you will see that Val is in there with everyone else attacking one filesystem or another. This workshop was aimed at improving ext3, did you see any mention of trying to add any of these features (if needed) to other filesystems? Did the scope include how to do petabyte filesystems for example?

The workshop writeup is excellent, and some interesting topics discussed, but I have to agree with other replies to this forum, it really didn’t suggest anything really new, it was just how to take some features to make ext3 a bit better. One day they might realise, hey filesystem X actually most of this and more already, maybe we should concentrate on that filesystem and really leap frog ahead…

2006-07-22 8:37 pm

segedunum
Besides his filesystem, Hans has nothing useful to contribute to the Linux community, and he’s gone out of his way to prove this over and over again. He doesn’t play well with others.

Personally, from everything that I’ve read on the lists and elsewhere, this seems to go both ways. Hans has had difficulty getting Reiser4 peer reviewed by others and communicating just what Reiser4 implies before putting code in the kernel. At times, he can be pretty abrasive, but welcome to the vast majority of kernel developers. The issue of support for Reiser3 has been a contentious issue as well, and I think people have rightly been harder this time around.

However, the notion that it’s all Hans’ fault is stretching it a bit. It’s pretty clear now with the issue of the next generation of filesystems and storage devices, and with ZFS thrown into the mix, that a few developers are getting a bit uncomfortable with the status of their babies – particularly the ext3 developers. Quite frankly, ext3 is junk in this day and age. Andrew Morton wants to say that, but he can’t. There was some talk about extending and stretching ext3 still further to very large filesystems, and the reasoning behind them not using something built for the purpose like XFS was the code quality and the LOC count. Reiser wasn’t even involved there.

The problem with Reiser4 was that it had the potential to be seen and used as the universal Linux filesystem. It had, and still has, the potential to be a great desktop and server filesystem pending some painstaking code modifications. I think that’s a bit much to swallow for some people, politically speaking. However, Reiser4 may have been swept under the carpet, but the issue of large storage devices and new filesystems like ZFS are forcing the issue.

To suggest it’s all Hans Reiser’s fault and there is nothing political going on doesn’t 100% reflect what’s gone on.

2006-07-22 2:53 am

Cloudy
all over again.

reminds me of all the random hacks tried out by the various Unix vendors in the 80s.

Pity these guys didn’t even manage to come up with something as simple as Wilkes’ storage management concepts from the mid ’90s, or Cue.

It’ll be interesting to see what happens when NAS and SAN come along and smack these guys up the side of the head with multiple terabyte home storage systems.

2006-07-22 4:40 am

butters
Well, I briefly looked at what John Wilkes’ SSP team is doing at HP, and I couldn’t really get anything out of it besides that it’s like a cluster of SANs. This workshop was specifically _not_ about clustered or distributed filesystems, but it considered their implications for local filesystem design.

Further, the purpose of this workshop was specifically to address the implications of multiple-terabyte filesystems. I would say that they’re doing the best they can to avoid getting smacked upside the head. Could this workshop have been held in the mid 90s? Absolutely not. I don’t even think the Linux community had broad enough industry support to do what they’re trying to do as recently as 2 years ago.

If you’re looking for simple, I think the idea behind chunkfs falls into that category. Sure it started as a “random hack,” but the idea grew organically into something that looks better and better the more you think about it (so says the article).

I don’t understand who you’re criticizing here. None of the UNIX vendors had terribly robust filesystems until the last 3-5 years or so with IBM JFS(2) and HP/Veritas VxFS. Sun ZFS came along very recently, and the workshop (which included a ZFS developer, Val, the author of the article) identified problems with the copy-on-write (COW) design. Who exactly is doing a better job at filesystems?

I don’t know how much better the Linux community can do then to put Christoph, Arjan, and Linus in a room with people from ZFS, Intel, Oracle, EMC, and IBM to talk about the future of Linux filesystems. They obviously covered a lot of ground during the workshop, and I’m sure we’ll all be better off for it when those nasty 8TB disks come along.

2006-07-22 6:06 am

chatz
My reading of the workshop is that it really only focused on two issues, error recovery and small files, and really only for ext3 with much discussion on how to improve fsck times. The small files discussion surprised me, since the preface discussed how much capacity is going to increase.

There are so many issues ext3 (sorry, ext4 now) has to deal with with large filesystems that other filesystems have solved are attempting to solve.

Filesystems like XFS are looking to the future at how to scale to petabyte filesystems with a billion inodes and doing tens of gigabytes a second. Dealing with increased errors is important, but this is not where the real advances in filesystem still lie, ZFS already does checksums and other techniques, I’m sure others will follow.

2006-07-22 6:30 am

Cloudy
My reading of the workshop is that it really only focused on two issues, error recovery and small files, and really only for ext3 with much discussion on how to improve fsck times. The small files discussion surprised me, since the preface discussed how much capacity is going to increase.

The error recovery stuff is what made me think of Wilkes’ work from the mid 90s and gave me a sense of dejavu. When you have a large enough file system you basically have to self-test on an ongoing basis, and that’s tricky to do without impacting performace and responsiveness.

Small files are the bane of an FS designer’s existance, as they pretty much demand exactly the opposite of what makes filesystems reliable and fast. The larger the volume, the more smallfiles, the worse the performance of various algorithm in the FS.

There’s work by IBM from the mid 60s, DEC from the mid 70s, Cray from the mid 80s, and HP from the mid 90s that tackles this problem in the context of the then-available hardware.

What’s missing from XFS, as far as I know, and certainly from EXT3/EXT4, is the concept of adaptive algorithms and on-disk structures. Even the intro to the workshop betrays that the developers have not yet caught on to the need for that.

2006-07-22 9:47 pm

butters
“What’s missing from XFS, as far as I know, and certainly from EXT3/EXT4, is the concept of adaptive algorithms and on-disk structures. Even the intro to the workshop betrays that the developers have not yet caught on to the need for that.”

On this, I completely agree with you. The workshop, at least from the article, seemed to agree that consistency and fast recovery trump throughput and CPU overhead. I sort of agree with them, but I don’t think that the two are in any way mutually exclusive. I think that if we get the right minds together, we can design a filesystem that does both. Perhaps they can even take some pages out of the Reiser4 playbook with dancing B*-trees and such.
2006-07-22 11:54 pm

chatz
“What’s missing from XFS, as far as I know, and certainly from EXT3/EXT4, is the concept of adaptive algorithms and on-disk structures. Even the intro to the workshop betrays that the developers have not yet caught on to the need for that.”

“Perhaps they can even take some pages out of the Reiser4 playbook with dancing B*-trees and such.”

So XFS having a number of different inode structures from completely inline to large btrees is not an adaptive algorithm? And guess where Reiser4 took many of these ideas from?

2006-07-22 6:21 am

Cloudy
Not what they’re doing now, what they did in the mid 90s. A lot of it is relevant to maintaining the integrity of large file systems.

Could a Linux workshop on multiterabyte file systems have been held in the mid 90s? no. Were there already multi-terabyte file systems? definitely. Were there workshops on file systems for them? Absolutely. IEEE held some, under the IEEE working group for mass storage. There was even some work at various Usenix OS workshops.

I am criticizing the claim that there’s any “innovation” represented here. New-to-Linux is not new. The idea of storing part of the file in the inode dates back at least to a Cray UniCos file system from the early ’80s. Storing more than names and inode numbers in the directory structure has been played with in several file systems, usually with disasterous effects because of the harm done to namei cache locality and the fragility caused by making the directories more volatile.

8TB disks are still away off. 8TB arrays have been common for a while though.

I think the workshop was a fine thing, but I think it’s more accurate to describe it as incremental improvement in Linux file systems than as a source of any innovation.

2006-07-22 9:59 am

Soulbender
“It’ll be interesting to see what happens when NAS and SAN come along and smack these guys up the side of the head with multiple terabyte home storage systems.”

Uhmm…but…

A SAN is a distributed network storage and a NAS is network attached storage device. Both of these still need local filesystems on the actual storage devices.

They are not competing storage technologies to local filesystems but complements them.

2006-07-22 8:18 pm

Cloudy
A SAN is a distributed network storage and a NAS is network attached storage device.

Actually, a SAN is a network segment dedicated to the storage needs of a collection of machines. (That’s why it’s called a storage area network)

Both of these still need local filesystems on the actual storage devices.

Do they? In particular, do storage servers attached to SANs? This assumption is part of what’s keeping people from solving the extensible storage problem.

They are not competing storage technologies to local filesystems but complements them.

That’s the way they are now, except in high end implementations of storage servers. It’s not the way they’re going to stay. Neither NFS nor SMB scale well to the needs of very large storage arrays, (nor do any of the other network file systems.) Further, traditional approaches to backup don’t scale well either.

Eventually, these systems will require hierarchical storage with automatic migration, will require separation of storage optimization be separated from file store logical implementation, and will require more adaptive algorithms than are now in use.

i don’t think the work from HP on QoS based file systems done in the 95-97 timeframe was ever published, but if it was, it’s an example of the sort of architectural changes needed to handle the scaling.

2006-07-22 10:29 am

evert
“Soft updates was a refinement to Berkeley FFS which preserved the on-disk format while removing the need to run fsck on the file system before it could be mounted after a crash. Soft updates carefully orders updates to the file system so that if the system crashes at any time, the file system is consistent with the exception that some blocks and inodes are “leaked” – marked allocated when they are free. A background fsck, run on a file system snapshot, finds these unreferenced blocks and marks them free again. The downside of soft updates is mainly that it is extremely complex to understand and implement, and each file system operation requires its own specially designed update code. To our knowledge, there is only one implementation of soft updates in existence.”

Sounds like Soft updates are perfect, but too difficult to implement? I understand that complexity is a danger in itselt, but in this case doesn’t the advantages outweight the costs?

more: http://www.usenix.org/publications/library/proceedings/bsdcon02/mck…

Edited 2006-07-22 10:34

2006-07-22 12:45 pm

Gullible Jones
They’re not perfect… just very good. No matter what you do some things will still be able to mess you up.
2006-07-22 10:06 pm

butters
Berkley FFS was released in 1984 and received a major overhaul in 1991. Soft updates weren’t added until 1999. I think that gives a good sense of how difficult it is to implement.

Besides, most of the issues addressed by soft updates are also addressed by copy-on-update (COW) designs, since they don’t allocate blocks until they are written. This means that in the event of a crash, any partially written blocks remain free. This is pretty simple to understand and implement. With some craftiness, we can even find these blocks and try to recover them.

In all likihood, the next-gen enterprise Linux filesystem will use a variation on the COW design similar to the “doublefs” idea sketched in the article, and the next-gen home/multimedia filesystem with be a chunkfs-style pool of B*-tree-based chunks. At least I hope

2006-07-22 11:52 pm

Cloudy
Berkley FFS was released in 1984 and received a major overhaul in 1991. Soft updates weren’t added until 1999. I think that gives a good sense of how difficult it is to implement.

I’d guess it gives more of a sense of how long it took Matt Dillon to get around to wanting a faster FFS and them implementing soft updates, myself.

Soft updates aren’t as hard to understand as people seem to think, but what’s lacking is a good exposition on the ordering rules necessary to harden file operations.

2006-07-22 1:04 pm

Luis
While on File Systems, it’s also interesting to take a look at the development of new log-structured file systems for Linux:

http://www.nilfs.org/

http://logfs.sourceforge.net/
2006-07-22 3:35 pm

Morin
Very interesting. I liked the chunkfs idea and giving hints about future file accesses. However, I don’t think *guessing* hints will do the job:

> One point of view is that applications are already

> giving us hints about file size, permissions, and other

> attributes: they are called “file names.” If a file is

> named “*.log”, it’s a pretty fair guess it will start

> out small, grow slowly, become very large, and be

> append-only. If a file has the string “lock” in its

> name, it is likely to be zero length, frequently

> stat’d, and deleted in the future.

“syslog” doesn’t match *.log, “clock.jpg” would be identified as a lock file. Why not make these hints explicit?

2006-07-22 3:46 pm

jonsmirl
It doesn’t make sense to keep storing file metadata outside of the file. The Mac had a decent solution for this with resource forks.

If all of the metadata were stored with the file (and the path names the file can be accessed by are metadata) then the entire directory system could be considered a cache. fsck could get the metadata out of each file and rebuild everything if needed.

2006-07-22 7:29 pm

somebody
It doesn’t make sense to keep storing file metadata outside of the file. The Mac had a decent solution for this with resource forks.

Now,… this two contradict them selves. Both are true, but they are opposite.

Mac resource fork, was partialy based in (helluvanonsensebuilt) subdir (never was resource fork a part of file), part resided sometimes in desktop.db, part in FindByContent, part in … This is why Mac forks on network so often crashed. And filesystem was anything but resiliant. True about repair being able to correct all mistakes with ease.

Mac implementation was far from the nicest, more like most insane design of really sane logic.

Nowadays, Mac still uses similiar principles, only logic and approach is a bit more sane (not a lot though).

2006-07-22 7:46 pm

jonsmirl
Think of it from the conceptual viewpoint, not the actual implementation. I think NTFS has resource forks too.

I’ve been reading about OBDs. They have full support for metadata stored with the files.

The argument against is always that file metadata can’t be copied with existing tools like tar, etc. Those tools are going to have to change sooner or later.

2006-07-22 9:05 pm

somebody
The argument against is always that file metadata can’t be copied with existing tools like tar, etc. Those tools are going to have to change sooner or later.

Nope, they won’t be able to solve that problem by them selves. Too much legacy/deps/common problems is lying there.

First problem that exists for your metadata is in kernel. Kernel nowadays doesn’t have basic support for metadata.

But there are problems that arrise with this. All legacy fs would need to support metadata (or faked version of it in some additional file or folder, where support is not possible by default).

Kernel would need something like pull_metadata and put_metadata (add/remove would be recomended also), where fs would need to have support for it. As soon as you would introduce this support into glibc this problem would dissapear. All that it would need afterwards is only making this change apparent in software (for example, kde and gnome would have simple task since *-vfs is mostly used, others I don’t know???)

And only after as this basic support would be introduced, then your problems (tar/gzip) would need to be solved. btw. It would be few liner patch, no more.

Which means, propose metadata to linux vfs in lkml

Edited 2006-07-22 21:08
2006-07-22 10:46 pm

jonsmirl
Linux kernel already has support for extended attributes.

http://en.wikipedia.org/wiki/Extended_file_attributes

Turn it on like this:

/dev/md1 /home ext3 defaults,user_xattr 1 2

The Beagle search app is using them to store metadata.

http://beagle-project.org/Main_Page
2006-07-22 11:44 pm

somebody
Linux kernel already has support for extended attributes.

2006-07-22 8:28 pm

Cloudy
If all of the metadata were stored with the file (and the path names the file can be accessed by are metadata) then the entire directory system could be considered a cache. fsck could get the metadata out of each file and rebuild everything if needed.

This has horrible performance implications and binds a file to its physical location on storage.

You have to recognize that the directory structure in a Unix-like file system is a hierachical namespace of object references, that objects are really a collection of (property, value) pairs, that certain (property, value) pairs are present for all objects of the system and require efficient access (permissions, anyone,) and that the implementation of the storage system that makes objects persistant, should be independent of the abstract representation of the objects.

The directory, i-node, file implementation of Unix-like file systems really is a good way to separate the abstractions. The limitation on “modern” file systems is that they still haven’t gotten around to separating the “file as object” and “file as storage entity” aspects in their implementations. (See the Brevix discussion of name spaces and filesystems for details.)

2006-07-22 8:56 pm

jonsmirl
Directories are an artifact of hierarchical thinking. Google is something like a giant file system and it doesn’t have directories. There can be lots of ways to access files, directory paths are just one way.

Why do directories have to be stored in a hierarchical system of blocks? If you take all of the path names in my system and compress them into a minimal structure they will fit in 8MB. Why can’t I just keep everything in a single 8MB location?

The concept behind ODB is a metadata server(s) which provides a handle to an object. ODB is kind of like Bit Torrent. There is no enforced directory structure but you can build one if you want.

There is more than one way to look at permissions. Why can’t permissions be viewed as a pattern match against the directory path string?

The overall point is that hierarchical organization isn’t the only way to do things.

For example, I have a single directory with 100K images in it. These images all have tags on them which enable searching. These tags can’t be put into a hierarchy since their relationships are not hierarchical. This is an example of where the current file system model doesn’t do what is needed.

2006-07-22 9:40 pm

Cloudy
Directories are an artifact of hierarchical thinking.

Yes. That’s why they’re useful for maintaining hierarchies.

Google is something like a giant file system and it doesn’t have directories. There can be lots of ways to access files, directory paths are just one way.[i]

Google isn’t a file system, it’s an index.

[i]Why do directories have to be stored in a hierarchical system of blocks? If you take all of the path names in my system and compress them into a minimal structure they will fit in 8MB. Why can’t I just keep everything in a single 8MB location?

Performance.

The overall point is that hierarchical organization isn’t the only way to do things.

Yes. In the early days, IBM OSes had many storage access mechanisms, each reflecting a different style of data storge. Over time this has pretty much been winnowed down to what we have now: hierachical namespaces of pointers to objects as “file systems”, and relational databases as storage for data that has multiple indices.

The only problem this doesn’t solve is that of indexing unstructured data. But that’s not a file system problem, that’s an indexing problem, and object data bases, while sometimes useful, are difficult to obtain performance from.

It all comes down to pragmatism. File systems mimic hierachical organization and databases use the relational structured model because they efficiently handle data access in those structures, and, to date, that has covered the vast majority of user needs.

Switching to an an unstructured indexing system would come at a cost in performance and in usability. It is, in effect, optimizing for the least common case.

2006-07-22 10:28 pm

jonsmirl

Switching to an an unstructured indexing system would come at a cost in performance and in usability. It is, in effect, optimizing for the least common case.

I don’t know if this is true anymore. The old model of caching all of the hierarchical directories in RAM for performance doesn’t make any difference on my machine. I have 4GB and the entire Linux kernel source is in memory when I build it.

I’d rather have multiple index options on the source code, one of those being identifier based full-text searching. That would make grep instant.

I’m not convinced that hierarchy is the only model for files. I’m more in the file system is a database camp. If hierarchy was the perfect solution we wouldn’t need file system links (aliases).

2006-07-22 11:46 pm

Cloudy
Switching to an an unstructured indexing system would come at a cost in performance and in usability. It is, in effect, optimizing for the least common case.

I don’t know if this is true anymore. The old model of caching all of the hierarchical directories in RAM for performance doesn’t make any difference on my machine. I have 4GB and the entire Linux kernel source is in memory when I build it.

It’s still true. I’ve been following the ratio of memory to storage since the mid 70s, and every time we’ve hit a situation you’re describing, it lasts for a very short period of time before the ratio tips the other way again.

I’d rather have multiple index options on the source code, one of those being identifier based full-text searching. That would make grep instant.

Actually, that’s a good example of why this sort of data doesn’t generalize. Identifier based search requires knowledge of what an ‘identifier’ is. That means you have to process the source tree with something that understands the language. It’s easier to purpose build a quickly searchable index that’s efficient for the specific purpose than it is to get together a general set of tools to do it.

For what you’re searching source code for, I’d rather have a good context sensitive source browser like source insight, or failing that, cscope, or, failing that, TAGS files.

I’m not convinced that hierarchy is the only model for files. I’m more in the file system is a database camp. If hierarchy was the perfect solution we wouldn’t need file system links (aliases).

There are no perfect solutions. But there is more than 40 years of file system experience and the lesson has always been that the best compromise is hierachical filesystems/relational databases/custom indices.

2006-07-23 1:55 am

jonsmirl
Lately I have become interested in the OBD area. In my fantasy system the OBDs store files keyed by something like the unique 60bit hash used by git. These files also contain their metadata. All of the disk layout stuff is provided by the OBD. OBDs can be local or remote.

To find files you need to know the ID of a well known file. This file will contain an index allowing access to other files. Much the same way git stores directory trees. When an OBD joins the system it may publish some well know IDs. These indexes may take many forms, hierarchical, full-text, etc. As backup these indices can be rebuilt by queuring all of the objects metadata,

When you open a file you implicitly start a tracker service like bit torrent. Distributed hashing is used to locate existing trackers so that things can be coordinated. The tracker provides three functions. Distributed locking, file segmenting when the file is too big for an OBD, and distributed caching.

The system does not use RAID, instead a higher level service enumerates the list of objects on an OBD and makes sure they exist on 1-n other OBD, if not it makes a copy. Refreshes work this way too. Since there is a tracker coordinating everything you can either do simultaneous writes to all copies or declare one copy a master and spool the back up writes.

This is a query based system, there is no fixed directory structure. Queries are run by applying them to the indexes you know. If a new disk is mounted, you ask it for it’s well known IDs and add it to your list of indicies. This works for local and remote disks.

This model can also work inside a single box. A large disk is broken up into multiple OBDs. You still have trackers and queries.

From what I know this system is similar to lustre or zFS (IBM Haifa).

2006-07-23 10:14 am

Pfeifer
You do realise that the so called “ressource fork” of the Macintosh File System (MFS) stored the metadata of a file independent of the actual file data (which was stored in the “data fork”)? The metadata was not stored “inside” of the file, as your post implicites, but alongside the file.

This is just the way all filesystem do it.

The only way to store metadata within a file is the way MP3 and Ogg Vorbis do it; specify a part of the file data to contain metadata. But while this concept may be usefull as long metadata don’t get transfered with the file, it’s useless on a filesystem level. This way a filesystem has to know every possible file format to handle metadata. And that’s quite an effort.

It would be better to improve data exchange mechanisms to include metadata along with the transfered data, just as it happens with, say, the filename.

And, I almost forgot, please, please, please merge all these patches for utilities like tar to include extended attributes. Please.

Edited 2006-07-23 10:34

2006-07-22 3:41 pm

jonsmirl
Object based storage devices (OSD) would eliminate a lot of these problems. I’m surprised they didn’t focus more on these.

Some background articles

http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf

http://www.enterprisestorageforum.com/sans/features/article.php/343…

Lustre is designed around OSDs. http://www.lustre.org

While in transition you can turn an existing block oriented disk into an OSD inside of the disk driver.

2006-07-22 4:29 pm

jonsmirl
I just noticed that IBM has alreay build an OSD emulator for Linux. Note that OSDs don’t have to be connected over a SAN, they work directly attached too. OSD would be a featured added to disk drives like the SATA NCQ feature.

http://www.alphaworks.ibm.com/tech/osdsim/?open&S_TACT=105AGX03&S_C…

2006-07-23 6:21 pm

phoenix
Think of it from the conceptual viewpoint, not the actual implementation. I think NTFS has resource forks too.

NTFS has streams, which are similar to resource forks. However, there are no Windows tools that actually use them. There are several articles on them over at sysinsternals, that cover hiding data in streams to make it invisible to the user and OS.