Linked by Thom Holwerda on Wed 25th Jul 2012 22:18 UTC
OSNews, Generic OSes The article I'm about to link to, by Oliver Reichenstein, is pretty terrible, but it's a good way for me to bring up something I've been meaning to talk about. First, the article: "Apple has been working on its file system and with iOS it had almost killed the concept of folders - before reintroducing them with a peculiar restriction: only one level! With Mountain Lion it brings its one folder level logic to OSX. What could be the reason for such a restrictive measure?" So, where does this crusade against directory structures (not file systems, as the article aggravatingly keeps stating) come from?
Thread beginning with comment 528446
To view parent comment, click here.
To read all comments associated with this story, please click here.
galvanash
Member since:
2006-01-25

I really don't see a technical reason a hierarchy must be immutable, that's not an existing restriction on any FS I'm aware of. Ext3 fs can move files with no problem whatsoever. (btw it shouldn't be a forgone conclusion that ext3 is the best approach)


Not immutable... user immutable.

What I mean by that is that ideally a user should be able to tag, categorize, colorize, hierarchically organize, move within their hierarchy, copy, etc. - basically do your everyday run of the mill file operations that make them happy if they so choose - but none of these operations should ever change the actual location of the file in the systems storage hierarchy. Its storage location isn't completely immutable, but it is only mutable by the OS - not even applications can move files. It becomes a privileged operation.

Lets pick a complex operation. You have an HTML file that you have manually transformed into XML. You now want to tell the system that its type has changed. You would normally do that by just changing the file extension (or its type metadata) and you are done.

I envision that something like this would involve the following (conceptually speaking)

1. The system is made aware of the change - the OS is always notified on such changes (for metadata it it tracks internally) since it has to be asked to make the change anyway.

2. Since it is no longer an HTML document the file is disassociated with programs that only handle HTML. This is done by removing it from the systems HTML storage directory and moving it to the XML storage directory. Essentially the file is moved from one place to another. File extensions to signify type are completely optional, type is denoted by location - not extension. The actual names of the files, however, are immutable and system assigned (like an inode). User visible filenames become metadata.

All applications registered for XML files will now see the file, because they _always_ work with files from the storage location the OS reports for applications that work with XML file. An applications view into storage is based on what types of files it handles - it sees all files of those types autmatically and can present them to the user when needed. The OS doesn't have to keep track of what applications are registered for what files, because it is defining where the files are stored by type. All an application has to do to say it handles XML files is ask the OS for a view in the XML storage space.

Anyway, if the _user_ has metadata in this scenerio (such as a storage "folder") tied to the file, that might have to be updated so that it points to the new file (depending on technical details of the file system). But a type change is an uncommon operation.

Common operations, like the user moving the file from one "folder" to another don't required _any_ system level change - because that is user level metadata that the system doesn't care about and changing it does not actually change the files location. Its invisible to applications. Most such metadata is user level, the system primarily only concerns itself with type and its associated metadata. File location becomes irrelevant to applications - they just ask the OS for views.

I understand and appreciate the argument for using a surrogate key for this. But I personally like the idea of it being human readable for system admins and developers, i.e. the OS defines how and where files are stored - and that is where they will be. I see the ideal solution is that directories, which are never exposed directly to users, carry human readable conventional names as they always did. But file access would be based on inode (or some other surrogate key), and the root view of the file system (for admins, developers, etc.) would only concern itself with this key. Filenames would still exist and still be very useful (for both admins and users), but at the application API level everything is handled by the files key, not its name. And the key would never change.

No questions asked, no deviations, no exceptions. All user generated XML files will always be stored in /Users/Whoever/XML. In other words when you log in as root you see that _actual_ storage system, but when you log in as a normal user you simply see /Users/Whoever/[whatever-primary-oranizational-view-you-prefer].

You are, conceptually at least, giving control of the existing directory structure over the the OS - all you have to worry about is creating a metadata layer so that users can have their own preferred views into that structure. You don't actually need a new file system to do this - pretty much any existing file system will do. You just need to add some kind of metadata database to manage all the new metadata.

What does this buy you:

1. Want to find all the mp3 files a user has. No indirect lookups required - ask the OS for a view on mp3 files and you get them all. Nothing but a directory lookup. Ditto any other type or combination of types.

2. Most user level operations don't require any type of system level change. Things stay where they are put unless they need to be moved for system level reasons.

3. Applications can easily create customized and optimized views of files. They can leverage custom user level metadata to do lots of neat things to make peoples lives easier. They can get creative again.

4. Unlike (imo) failed attempts at this type of concept, applications don't own files at all, but they can own metadata. Files can be shared amongst applications without them conflicting with each other.

Then comes the new user tagging information. This could either be a layer on top of ext3 (like a separate relational database), or the new metadata/indexes could be incorporated into the file system itself. Using a separate layer might help keep the ext3 FS backwards compatible, but either way should be feasible in implementing the APIs.

Is there a specific problem that you see?


As far as how the metadata goes, that is what I was thinking more or less.

Reply Parent Score: 2

Alfman Member since:
2011-01-28

galvanash,

"basically do your everyday run of the mill file operations that make them happy if they so choose - but none of these operations should ever change the actual location of the file in the systems storage hierarchy."

"actual location" as a separate concept from "user location" doesn't make sense to me. I see no reason the "actual location" shouldn't be the "user location". I'm definitely tripping over your terminology, but I'll try to parse what you mean...


"Since it is no longer an HTML document the file is disassociated with programs that only handle HTML. This is done by removing it from the systems HTML storage directory and moving it to the XML storage directory."

"All user generated XML files will always be stored in /Users/Whoever/XML."

Where did the concept of system's HTML & XML storage directory come from? What purpose does that have? If you want to track files by their type, why not simply create an index on type instead of changing their "actual location"?


"Common operations, like the user moving the file from one 'folder' to another don't required _any_ system level change - because that is user level metadata that the system doesn't care about and changing it does not actually change the files location."

If your storing path/filename information in a database instead of in the underlying ext3 file system, that's true. However you still need some kind of index, and that duplicates the functionality of the file system. You might as well map the database to inodes directly and completely bypass ext3 filenames if they don't serve any purpose other than being a transient identifier.


"I understand and appreciate the argument for using a surrogate key for this. But I personally like the idea of it being human readable for system admins and developers, i.e. the OS defines how and where files are stored - and that is where they will be. I see the ideal solution is that directories, which are never exposed directly to users, carry human readable conventional names as they always did."

What you are describing is every file type is hard coded to be located in a folder matching it's file extension, but it perplexes me why that would be more useful for either the file system itself, or the administrator than just using the "user's location" directly. The intermediate file system representation you've chosen isn't terribly useful, I question why bother having it at all.



"1. Want to find all the mp3 files a user has. No indirect lookups required - ask the OS for a view on mp3 files and you get them all..."

Ok, but what good is this to an administrator who can't see any of the identifying metadata? The only thing I could ascertain from a raw directory listing is how many files the user has and how big they are. I'd have to open every single file to check what it contains.

"2. Most user level operations don't require any type of system level change..."

I don't think "system level changes" were ever a problem to start with.
The indirection will add some overhead when accessing files and running integrity checks. No biggie though.

"3. Applications can easily create customized and optimized views of files. They can leverage custom user level metadata to do lots of neat things to make peoples lives easier. They can get creative again."

In terms of UI the backend really shouldn't matter.


I'd prefer stronger FS integration myself, and your intermediate representation strikes me as weird, but of course it could be made to work.

Reply Parent Score: 2

galvanash Member since:
2006-01-25

"actual location" as a separate concept from "user location" doesn't make sense to me. I see no reason the "actual location" shouldn't be the "user location". I'm definitely tripping over your terminology, but I'll try to parse what you mean...


That is the very heart of the matter though... A piece of data has to live somewhere - it has to have an address so to speak. In Linux this it ultimately an inode, but my example is meant to be conceptual, not literal.

I was using existing file system paradigms to describe the goal. Ultimately, all it boils down to is that the system (as in the OS) has a meaningful view into the users storage area - but not simply in the sense that it know all the inodes of their files - it needs to know certain things about those files and their relationships with each other... In my example type information. I was using directory structure as an index, but it can be any type of index.

The real objective is to move "user" metadata out of the realm of the OS - meatadata, such as arbitrary file organizations and relationship mappings that don't mean anything at all to the OS or its applications. Push it out of the OS and into the hands of the user in a way that allows them (and applications) to create their own views into this information, while maintaining a completely coherent view for the OS to work with.

A simple list of inodes doesn't cut it. The OS has relationships it has to manage too. Current file systems use directory structure for this - I was just trying not to stray too far from the existing paradigm. Ultimately it is an implementation detail - you simply want to _expose_ it as a directory structure and make it human readable (for admins and developers).

Where did the concept of system's HTML & XML storage directory come from? What purpose does that have? If you want to track files by their type, why not simply create an index on type instead of changing their "actual location"?


Again, just an example - not meant literally. The point is an index IS a location and vice versa. Indexes resolve to an inode, so do paths. The difference is in current filesystems a file can only have 1 path (excluding tricks like hard links and what not) and users can alter it at will.

When I say "actual location", I merely mean system maintained - as in the user cannot alter it. Think of it in database terms... The system manages tables, users and apps only work with views.

What you are describing is every file type is hard coded to be located in a folder matching it's file extension, but it perplexes me why that would be more useful for either the file system itself, or the administrator than just using the "user's location" directly. The intermediate file system representation you've chosen isn't terribly useful, I question why bother having it at all.


I was only trying to give an off the cuff example of _why_ you would want to push user metadata out of the system realm. I agree it is completely contrived and not a very useful example. I suspect from the system level you would want lots of indexed views into block storage for different purposes - the real point is keeping them coherent, and to do that users cannot alter them.

You give the impression that you think inodes are fine for the systems purposes, because they are unique, don't change, and users cannot alter them. That is all true, but inodes are not enough - you also need metadata and you need to establish relationships between files.

Right now the problem is Windows, OSX, Linux, whatever... They all use directory hierarchy as metadata. I.e. config files go in /etc. That is metadata. We keep users from mucking it up by locking them out of it unless they have permissions.

We need the exact same thing for USER data... User data needs system level metadata too. It lets the OS and apps do a whole lot of very useful things they cannot do efficiently now. The reason we don't have that in current file systems is users can alter file paths, thus destroying the metadata...

That is all I am really getting at. I don't have a master plan on how to build it right - but I do understand why it is needed.

Ok, but what good is this to an administrator who can't see any of the identifying metadata? The only thing I could ascertain from a raw directory listing is how many files the user has and how big they are. I'd have to open every single file to check what it contains.


If you have root access you can just use their metadata... In other words if you have access to the file you would have access to the metadata (since the same user owns both). I'm not trying to keep the OS or admins from seeing users metadata, I'm trying to keep users from altering metadata that the system generates and uses for dealing with their files.

I don't think "system level changes" were ever a problem to start with.


That is where I totally disagree. If I am a developer and I want to make a program that works with some particular document type...

I do not want to make the user have to go find the files I work with for me, I dont' want to go searching for them... I just want to ask the OS to hand them to me. I (as the application) can present them to the user in a better manner than the OS can - because I know how I am going to be using them. It boils down to being application centric, not document centric.

Establishing conventions for this, i.e. "My Documents", is not good enough... It needs to be strictly enforced and maintained.

Reply Parent Score: 2

hhas Member since:
2006-11-28

Lets pick a complex operation. You have an HTML file that you have manually transformed into XML. You now want to tell the system that its type has changed. You would normally do that by just changing the file extension (or its type metadata) and you are done.

I envision that something like this would involve the following (conceptually speaking)

[...]


I think at this point you're getting too wrapped up in implementation ideas, which may be good or bad in themselves but either way are distracting from the big picture. i.e. What are the contracts you want the system to make with the user regarding the safety, security and accessibility of their data? Musings on possible behind-the-scenes implementations is way, way down the list by comparison.

..

Incidentally, representing a single piece of data in multiple file formats is an especially lousy example to use. REST already figured out the correct answer: you continue to present that information as a single resource, publicly announce which representations it is available in along with any hints as to which are optimal and which are lossy, and let the client specify which of those representations it wants. The actual conversion processes - be they manual or automatic, cached or on-the-fly - are then implementation details.

The one useful point that does arise from this sort of example is that a traditional filesystem cannot natively support this sort of presentation, lacking 1. the ability to cache multiple representations at a single location, and 2. any means for the user to negotiate or specify type requirements when accessing data at that location. Whereas a metadata-driven storage system could do stuff like this in its sleep.

Reply Parent Score: 1

galvanash Member since:
2006-01-25

I think at this point you're getting too wrapped up in implementation ideas, which may be good or bad in themselves but either way are distracting from the big picture. i.e. What are the contracts you want the system to make with the user regarding the safety, security and accessibility of their data? Musings on possible behind-the-scenes implementations is way, way down the list by comparison.


I agree actually. I was just trying to make an example in the context of existing file system paradigms. Frankly, I have not through it all the way through.

The one useful point that does arise from this sort of example is that a traditional filesystem cannot natively support this sort of presentation, lacking 1. the ability to cache multiple representations at a single location, and 2. any means for the user to negotiate or specify type requirements when accessing data at that location. Whereas a metadata-driven storage system could do stuff like this in its sleep.


That was kind of what I was trying to get at. I know what I want, but not how it should work exactly.

In a nutshell I want to OS to manage and maintain coherent views into use data, so that as an application developer I can simply ask for them. I want the OS to expose metadata facilities that I can use to index this data, but in a way that the system and other applications can leverage if they want or need to. And I want to let users (if they choose to) create and manage their own metadata for their own purpose.

How it should work... I don't know exactly. But I am a firm believer in application centric interfaces and this kind of plumbing really is quite critical if you want to be able to do them right (and still maintain data democracy)

Reply Parent Score: 2