Linked by Thom Holwerda on Wed 25th Jul 2012 22:18 UTC
OSNews, Generic OSes The article I'm about to link to, by Oliver Reichenstein, is pretty terrible, but it's a good way for me to bring up something I've been meaning to talk about. First, the article: "Apple has been working on its file system and with iOS it had almost killed the concept of folders - before reintroducing them with a peculiar restriction: only one level! With Mountain Lion it brings its one folder level logic to OSX. What could be the reason for such a restrictive measure?" So, where does this crusade against directory structures (not file systems, as the article aggravatingly keeps stating) come from?
Thread beginning with comment 528514
To view parent comment, click here.
To read all comments associated with this story, please click here.
Alfman
Member since:
2011-01-28

hhas,

You've got some great ideas.

"I see no reason why parent-child and sibling relationships couldn't be expressed via metadata as well. Point is, users should neither be constrained by it nor forced to maintain it themselves."


I didn't mean to imply that a directory structure cannot be considered a kind of metadata. I'd point out that even today, even though we think of directories as being special, it really IS metadata for the file (which is represented as inode inside the linux kernel, and not by the filename metadata), it just happens to be the only indexed metadata available to us in *nix. That's what's restrictive. We don't need to get rid of the directory metadata, we just need to add more indexed metadata types. "Directory" metadata would be optional, like other metadata types. It should not be treated specially. As long as it's available, that's what's important.


"Once properly decoupled, users and applications can express and view whatever shaped graph(s) they like"

I'm not sure if using a full graph instead of a tree would be more confusing for users. It might break alot of recursive directory operations that are possible now. For example, deleting a directory and it's children could be extremely dangerous. However I'm not opposed to the idea and I'd like to see it implemented.


"It works better once you start thinking about how to decompose application functionality as well"

I've often wished we had a way to treat all data the same regardless of what it was. Emails shouldn't be managed under a separate room or filing system than any of my other documents. If I want to create a directory (you probably despise that terminology, can we substitute "container"?) to hold a specific set of emails and other documents, I should be able to - nothing is special about the email. I might even like to do the same thing on a more fine grained approach, using the filing system for email addresses & contacts.

All applications on such a platform should use the standard data types so contacts don't have to be "synchronized", they'd simply be shared.


I think I'm loving it ;)


"A one-armed straitjacket is still a straitjacket; you've just gotten so used to living with it that it's never occurred you could take it off at any time."

It's not a straitjacket though because it's optional. Having a hammer doesn't mean I always have to use it. Just because a hammer can't do everything doesn't mean it's not worth having.

"Treating one particular idiom/organisation/representation as a special case that is hard-wired into the storage system requires a duplication of effort, and will colour and constrain everything else that the system might want to do."

I never really wanted it to be special, I wanted the concept of hierarchies to be integrated along with other forms of metadata. There's no reason the computer should not be able to arrange things in hierarchies when that's what the user explicitly wants to do, which is what I've been trying to point out since my first post.


"Meanwhile, since the system now has unimpeded control over how and where everything is physically stored, it can make whatever rule-driven arrangements it deems best:"

I agree with the idea, however the "problem" was never with directories in the first place, it is the implementation of them. AFS is a great example of a file system that transparently handles local/remote storage without regards to a file's path. I agree very strongly that a filing system's organization should not drive (or be driven by) it's storage requirements.

Reply Parent Score: 2

hhas Member since:
2006-11-28

You've got some great ideas.


FWIW, I do have a rather eclectic background (amongst other things, I studied art, not CS). One of its benefits is being able to bring a refreshingly unique perspective to the same old set of problems. OTOH, one of its drawbacks is not owning nearly enough brains to put any proposed solutions into practice myself. ;)

On this occasion, I'll let you in on my secret: I've had quite a bit of experience at wrapping my head around multiple mixed paradigms simultaneously. For example, the idea that you can present a 'virtual', idealised tree (or other shaped) representation of data which is bound together by relationships, not containment, is something I've already seen extensively done by the Apple Event Object Model (the elegant but widely misunderstood RPC+query-driven foundation of 'AppleScriptable' applications). And the notion that you can decouple the roles of data identification, data representation, and the physical data storage behind it is taken directly from my time designing and implementing RESTful HTTP interfaces for a distributed application (REST being another very elegant and also widely misunderstood conceptual model for communicating between data management systems at a very high level).

So mostly all I'm really doing here is taking a bunch of ideas I've previously seen conceived and used elsewhere by folks far smarter than me, and synthesising a nice solution to the current predicament out of them. Like I say, it's good to know something of history. (And, hey, on the off-chance somebody here makes their future billions after being inspired by my waffle here, I hope they remember theirs too.;)

So yes, I quite agree they are great ideas... I just can't take personal credit for them is all. ;)

Reply Parent Score: 2

hhas Member since:
2006-11-28

For anyone still reading - and with profuse apologies in advance for mad length - I've been giving the topic a bit more thought. Being an IPC weenie, I will tend to cast any given solution in terms of "I know, let's use some IPC." (And now we have two problems.) So I'm coming at everything from a "it's a communication problem" perspective, just so you're aware.


First, let's reiterate the root requirements: how to ensure a user's data is safe, secure and quickly and easily accessible - and all of this in a highly heterogenous, networked environment - without them having to do any of the tedious clerical work that involves?

If we phrase this in terms of a contract between computers and users, then straight off the top of my head:

1. This has to work across:

- individual user devices (not just general-purpose PCs but also phones, tablets, AV systems, games consoles, and anything else you can think of)
- home and business intranet-based NAS and server boxes
- big internet-based commercial cloud storage services.

2. It'll require behaviours such as:

- automatic replication and redundancy (including all the challenges that come with synchronising user changes across all data stores)
- full revision history
- security and trust (not only must accidental data loss never occur, but all data must be heavily protected against malicious access)
- transparency and automation-based ease of use (since it must cater to users from all walks of life).

Not a comprehensive list by any means, but enough to illustrate. It's no small request either, so any attempt at addressing it is going to have to work extremely hard at keeping a lid on complexity.


Sounds daunting, I know, but this is actually something that the early Unix world had a fantastic knack for: take a large, complicated, scary problem and solve it by the highly creative, insightful and relentlessly parsimonious application of what few crazily limited resources were available to the task. It's only later, as growing wealth and maturity generate comfort and complacency, that such careful habits become non-essential, and the old talent and tricks for efficient, effective problem solving are forgotten or lost in the middle-age spread.

For example, think about what the hierarchical Unix file system [plus its expanded and refined Plan9 descendent] represents. From a default perspective (such as the one Thom is seeing), it's just a simple 1:1 representation of the contents of one or more disk-type storage devices wired to the local system.

However, those old Unix guys were terrific at spotting opportunities for reusing concepts and behaviours, so it also works as a collection of endpoints onto various services - device drivers, network-mounted drives, etc. - thanks to stuff like Unix domain sockets, which are in turn only a tail's shake away from Internet sockets. Imagine the increased load and complexity on early Unix had they simply ploughed in with a wealth of tools and manpower at their disposal, creating a completely new interaction model as each new requirement presented itself. (They still missed some opportunities - hence Plan9 - but for a first crack at the problem it was pretty damn good.)


This parsimonious, holistic approach to all-over system design seems to have been all too often squandered or forgotten over the years (e.g. the failure of Linux DEs to follow Unix Philosophy [1]), but this is surely the right time to get back to it. Like it or not, the entire concept of what a computing system is and what it should be is rapidly and radically changing.

Just as the original primary (RAM) vs secondary (disk) storage kludge has created all sorts of awkward divisions and clumsiness, so too has local vs remote storage. We've one protocol for accessing local data (the hierarchical file system) and a whole plethora of them for accessing remote data (RESTful HTTP being just one example). These are divides drawn along technological lines - originally out of necessity, but now through inertia, laziness and lack of vision. Why people use technology has got lost due to microscopic focus on how they currently do it, with no regard to whether it's still optimal or not.

Such well-worn ruts may have become familiar and comfortable - and even a position of power for those most practised in negotiating them, - but they are going to become a liability that even the most conservative nerds/geeks will not be able to afford to ignore.


What's needed is to step all the way back and try slicing the entire problem along new and different (even novel) lines, i.e. according to [ideal] usage. And then redefine the technology so that in future it fits the way that users should interact with their data, rather than forcing users to adapt themselves to the current technology with all its legacy baggage and myriad complexities and faults.

Change is already underway, of course, but even from my largely uneducated viewpoint it looks completely piecemeal with no clear overarching strategy. For example, Mountain Lion's Core Data (data storage) framework can now use iCloud as its backing store, but this is at the application level, and just one particular framework on one particular OS using one particular cloud. In fact, ML now has no less than three different ways to interact with iCloud.

Now, I am all for 'let a thousand flowers bloom' as far as research projects go, but when it comes to production systems to be used by millions, a coherent overarching strategy is absolutely essential if complexity is to be managed at all. Such as: pushing the functionality as far down in the system as it'll possibly go (i.e. right down in the bowels of the OS, alongside the file and network subsystems), defining [open] standards and protocols to be used all the way throughout, and ruthlessly consolidating and eliminating duplication of effort and needless redundancy (c.f. Unix's 'everything is now a file' idea that instantly provided all clients with a powerful, clearly defined IPC mechanism essentially for free).

Obviously, for OSes and applications, the traditional device-centric patterns work well as a whole, providing a more than adequate benefit-vs-drawback ratio, so they will no doubt continue to rely on the existing hierarchical file system to manage their own resources.

OTOH, what's needed for user data is a user data-centric, not device-centric, approach, and that means decoupling user data management from the nitty-gritty implementation-dictated details of file systems, databases, etc. and trying as much as possible to create a single, standard interaction model for accessing user data regardless of how and where it is stored.


The more you think like this, the more you realise just how far beyond the local file system this goes. For example, what is LDAP if not a half-assed reinvention of the 'Unix file system as namespace' principle? And what are the 'everything is a file' and REST interaction philosophies, if not two sides of the same damn coin? All this and more is just crying out for massive consolidation.

So the HFS will still be required; it just won't be something users interact with directly any more. Instead of accessing file and network subsystems directly, userland processes will talk to a single standard 'data management' subsystem whenever they need to read or write user data. Once that decoupling is completed, the system is free to deliver on all of the wonderful promises made in the contract above. Plus of the file system-imposed problems currently bedevilling users (backup hell, iOS data exchange, etc.) simply cease to exist!


Ultimately then, it's all a matter of good communication. Admittedly, before the mechanical machine-to-machine challenges are addressed some additional effort may still be needed the geek-to-geek front. But with luck Thom &co will become believers yet...;)


[1] http://www.faqs.org/docs/artu/ch01s06.html

[2] http://www.osnews.com/thread?528519

Reply Parent Score: 1