Linked by Thom Holwerda on Wed 25th Jul 2012 22:18 UTC
OSNews, Generic OSes The article I'm about to link to, by Oliver Reichenstein, is pretty terrible, but it's a good way for me to bring up something I've been meaning to talk about. First, the article: "Apple has been working on its file system and with iOS it had almost killed the concept of folders - before reintroducing them with a peculiar restriction: only one level! With Mountain Lion it brings its one folder level logic to OSX. What could be the reason for such a restrictive measure?" So, where does this crusade against directory structures (not file systems, as the article aggravatingly keeps stating) come from?
Thread beginning with comment 528671
To view parent comment, click here.
To read all comments associated with this story, please click here.
hhas
Member since:
2006-11-28

For anyone still reading - and with profuse apologies in advance for mad length - I've been giving the topic a bit more thought. Being an IPC weenie, I will tend to cast any given solution in terms of "I know, let's use some IPC." (And now we have two problems.) So I'm coming at everything from a "it's a communication problem" perspective, just so you're aware.


First, let's reiterate the root requirements: how to ensure a user's data is safe, secure and quickly and easily accessible - and all of this in a highly heterogenous, networked environment - without them having to do any of the tedious clerical work that involves?

If we phrase this in terms of a contract between computers and users, then straight off the top of my head:

1. This has to work across:

- individual user devices (not just general-purpose PCs but also phones, tablets, AV systems, games consoles, and anything else you can think of)
- home and business intranet-based NAS and server boxes
- big internet-based commercial cloud storage services.

2. It'll require behaviours such as:

- automatic replication and redundancy (including all the challenges that come with synchronising user changes across all data stores)
- full revision history
- security and trust (not only must accidental data loss never occur, but all data must be heavily protected against malicious access)
- transparency and automation-based ease of use (since it must cater to users from all walks of life).

Not a comprehensive list by any means, but enough to illustrate. It's no small request either, so any attempt at addressing it is going to have to work extremely hard at keeping a lid on complexity.


Sounds daunting, I know, but this is actually something that the early Unix world had a fantastic knack for: take a large, complicated, scary problem and solve it by the highly creative, insightful and relentlessly parsimonious application of what few crazily limited resources were available to the task. It's only later, as growing wealth and maturity generate comfort and complacency, that such careful habits become non-essential, and the old talent and tricks for efficient, effective problem solving are forgotten or lost in the middle-age spread.

For example, think about what the hierarchical Unix file system [plus its expanded and refined Plan9 descendent] represents. From a default perspective (such as the one Thom is seeing), it's just a simple 1:1 representation of the contents of one or more disk-type storage devices wired to the local system.

However, those old Unix guys were terrific at spotting opportunities for reusing concepts and behaviours, so it also works as a collection of endpoints onto various services - device drivers, network-mounted drives, etc. - thanks to stuff like Unix domain sockets, which are in turn only a tail's shake away from Internet sockets. Imagine the increased load and complexity on early Unix had they simply ploughed in with a wealth of tools and manpower at their disposal, creating a completely new interaction model as each new requirement presented itself. (They still missed some opportunities - hence Plan9 - but for a first crack at the problem it was pretty damn good.)


This parsimonious, holistic approach to all-over system design seems to have been all too often squandered or forgotten over the years (e.g. the failure of Linux DEs to follow Unix Philosophy [1]), but this is surely the right time to get back to it. Like it or not, the entire concept of what a computing system is and what it should be is rapidly and radically changing.

Just as the original primary (RAM) vs secondary (disk) storage kludge has created all sorts of awkward divisions and clumsiness, so too has local vs remote storage. We've one protocol for accessing local data (the hierarchical file system) and a whole plethora of them for accessing remote data (RESTful HTTP being just one example). These are divides drawn along technological lines - originally out of necessity, but now through inertia, laziness and lack of vision. Why people use technology has got lost due to microscopic focus on how they currently do it, with no regard to whether it's still optimal or not.

Such well-worn ruts may have become familiar and comfortable - and even a position of power for those most practised in negotiating them, - but they are going to become a liability that even the most conservative nerds/geeks will not be able to afford to ignore.


What's needed is to step all the way back and try slicing the entire problem along new and different (even novel) lines, i.e. according to [ideal] usage. And then redefine the technology so that in future it fits the way that users should interact with their data, rather than forcing users to adapt themselves to the current technology with all its legacy baggage and myriad complexities and faults.

Change is already underway, of course, but even from my largely uneducated viewpoint it looks completely piecemeal with no clear overarching strategy. For example, Mountain Lion's Core Data (data storage) framework can now use iCloud as its backing store, but this is at the application level, and just one particular framework on one particular OS using one particular cloud. In fact, ML now has no less than three different ways to interact with iCloud.

Now, I am all for 'let a thousand flowers bloom' as far as research projects go, but when it comes to production systems to be used by millions, a coherent overarching strategy is absolutely essential if complexity is to be managed at all. Such as: pushing the functionality as far down in the system as it'll possibly go (i.e. right down in the bowels of the OS, alongside the file and network subsystems), defining [open] standards and protocols to be used all the way throughout, and ruthlessly consolidating and eliminating duplication of effort and needless redundancy (c.f. Unix's 'everything is now a file' idea that instantly provided all clients with a powerful, clearly defined IPC mechanism essentially for free).

Obviously, for OSes and applications, the traditional device-centric patterns work well as a whole, providing a more than adequate benefit-vs-drawback ratio, so they will no doubt continue to rely on the existing hierarchical file system to manage their own resources.

OTOH, what's needed for user data is a user data-centric, not device-centric, approach, and that means decoupling user data management from the nitty-gritty implementation-dictated details of file systems, databases, etc. and trying as much as possible to create a single, standard interaction model for accessing user data regardless of how and where it is stored.


The more you think like this, the more you realise just how far beyond the local file system this goes. For example, what is LDAP if not a half-assed reinvention of the 'Unix file system as namespace' principle? And what are the 'everything is a file' and REST interaction philosophies, if not two sides of the same damn coin? All this and more is just crying out for massive consolidation.

So the HFS will still be required; it just won't be something users interact with directly any more. Instead of accessing file and network subsystems directly, userland processes will talk to a single standard 'data management' subsystem whenever they need to read or write user data. Once that decoupling is completed, the system is free to deliver on all of the wonderful promises made in the contract above. Plus of the file system-imposed problems currently bedevilling users (backup hell, iOS data exchange, etc.) simply cease to exist!


Ultimately then, it's all a matter of good communication. Admittedly, before the mechanical machine-to-machine challenges are addressed some additional effort may still be needed the geek-to-geek front. But with luck Thom &co will become believers yet...;)


[1] http://www.faqs.org/docs/artu/ch01s06.html

[2] http://www.osnews.com/thread?528519

Reply Parent Score: 1

Alfman Member since:
2011-01-28

hhas,

Still good ideas, but I think you should take a closer look at NT's DFS and open source AFS to consider the work already done to separate virtual organization from physical organization. The separation has largely been achieved in these examples, what they are lacking is the integration of indexed metadata.


Since we're throwing ideas around:

We might also want to consider how a "global file system" would work, it'd obviously make use of local caching and maybe sophisticated conflict resolution (like in source control). However it would put an end to the need to "email" files, you'd only have to give the other user a secure link into the global file system and they could access it and possibly even work on it with you (avoiding the all too common usecase of emailing back and forth).

Reply Parent Score: 2

hhas Member since:
2006-11-28

The TL;DR version...

Imagine you have two pneumatic tubes sitting next to each other. One leads to local (file system) storage, the other to remote (network) storage. Data is stored by inserting it into one or other of these tubes.


Today's systems require that a client decide which of the two tubes they should drop their data down. Furthermore, these tubes are slightly different shapes, so the client has to do some extra wrangling to fit their data into a particular tube.

This arrangement is acceptable enough when the computer is the client. Machines are great at performing tedious, fiddly, repetitive tasks according to a predefined set of rules over and over again without error or omission - it's what they were designed to do.

However, the same approach really sucks for human clients. People already have far better things to do with their time than micromanage thousands of resources across multiple locations with perfect diligence and accuracy, never mind doing it all via such a primitive, labor-intensive interface.

Yet the only reason humans are having to work with the same set of tubes as the computer is that back when all this technology was being invented, this was the absolute best that could be achieved with the seriously limited resources of the time. Nowadays, we have a wealth of resources spilling out of our ears, but we're still doing things the same old way because nobody's yet bothered to build something better.


What users should instead be presented with is a single, standard tube into which they throw all of their data without any additional fiddling or fuss. That tube then leads into a 'user data management' subsystem, and it's up to that automated system to determine which of the two original tubes is appropriate, and perform any additional fiddling required to store and retrieve the data using that. (And change-track, replicate, synchronize, secure, categorise, translate, share, etc, too.)

For this to work though, you must totally decouple the user from the physical means and methods of data storage. Because the very concept of 'physical' containment - which is what the hierarchical file system is built on - becomes absolutely meaningless in an environment where data may exist in any location [1] at any time, and frequently in several places at once!

--

[1] Some kinds of data might not even end up in a file system, but in [e.g.] a relational/non-relational database instead. All sorts of fascinating new possibilities could arise once the 'single user data pipe' abstraction is in place; computer scientists should be falling over themselves to explore it.

Reply Parent Score: 1

hhas Member since:
2006-11-28

The STL;SDR version:

The goal is to replace the current imperative model of [user] data storage and management with a declarative one.

Reply Parent Score: 1