Linked by Rufus Hamade on Mon 21st Feb 2005 23:12 UTC
General Development With the recent (or not so recent, I am a very slow writer) interest in database file systems, I've been thinking about what a typical user really wants from such a system. What would they use it for? What would we need to do to help them get the most from it? Are there any precedents that show how useful a database file system could be? If not, could we invent one? This lead me to some "gedanken solutions" (like gedanken experiments, just with software) that I thought I'd distract you with.
Order by: Score:

Would users use their filesystem?
by CrapInMySocks on Mon 21st Feb 2005 23:19 UTC

If their filesystem is based on a database, I presume people would use it implicitly. The point is - would *developers* use it to create cooler apps? Probably. Why should finding files quickly (or content for that matter) be left to the indivdual applications or application layer? I think database technology has advacned far enough for these systems to be integrated in some way.

the BeOS point of view
by jonas.kirilla on Tue 22nd Feb 2005 00:00 UTC

BeOS has tried two different "database" filesystem approaches. The BFS, with its extended attributes, indexes and query functionality, and the early BeOS storage database.

http://www.theregister.co.uk/2002/03/29/windows_on_a_database_slice...

Proof of concept
by Che Kristo on Tue 22nd Feb 2005 00:24 UTC

Be's BFS was years ahead of the market and such a pleasure to use, I think the actual dynamic of a file system won't change so much in concept but in it's implementation and usability. useful for Vfolders and the like though

Data shape
by emacs on Tue 22nd Feb 2005 00:35 UTC

If I understand databases well, what makes them great tools is their set-theoretical approach to data. Data get stuffed into nice, two-dimensional packages that can be indexed and managed with relative efficiency.
Reality often is more of a graph of arbitrary complexity, with gnarly cycles.
I submit that the challenge of managing the things you'd like to do with a filesystem may be similar to the challenge that a cartographer has when figuring out how to map a three-dimensional object to a two-dimensional space: there will be some 'fibs' told about scale, or the relationship of things within the map to each other to make it work.
If the task was not hard, it would have long since been done.

Windows FS
by Phil on Tue 22nd Feb 2005 00:50 UTC

Oh.. do I miss BEOS for the file system. That was the one thing I loved the most of BEOS. I'm not sure how Mac OSX works, but in the lastest few demo clips for the next version (forgot the name) it seems that Mac Filesystem might have some sort of DB too?!

Man I only wish there was someone out there that was able to make BFS or something similar in Windows. Ya Ya I know.. why would you want that. Sorry, but I still use windows more than any other OS. For many reasons, work and gaming being on the top of the list. Addon at good price, I am very very very afraid of WinFS. Very afraid.

RE: Windows FS
by Jon on Tue 22nd Feb 2005 00:53 UTC

> I'm not sure how Mac OSX works

Mac OS X Tiger is BeFS-plus. It supports everything BeOS did back then, plus the ability to search *inside* 20 file formats.

>I am very very very afraid of WinFS.

Why? It was cut by Longhorn anyway in order to make the deadlines. ;)

v You don't have a clue...
by Joe User on Tue 22nd Feb 2005 01:19 UTC
Opera + Desktop Search
by e2mtt on Tue 22nd Feb 2005 02:09 UTC

Opera M2 mail does almost all of the things that he talks about, and he links to at the end of the article. Anyone interested in data storage and retrieval should try it out... it is very tough to go back to an ordinary email program. Of course some of the others are getting closer.

On a related note, the Yahoo Desktop Search (a free version of X1), along with other desktop searches to varying degrees, can search all your files almost instantly, and integrates into Outlook. You can save searches, with many custom parameters, for immediate retrieval. Of course it builds an index separately from the file, but I think that may be safer, because problems with the index would never damage the file itself. I think the ability to search the contents of a file basically negates the lack of file attributes in windows.

Reiser4
by Chris on Tue 22nd Feb 2005 02:52 UTC

Doesn't Reiser 4 add a lot of this stuff?

Other's have done a database fs
by Matt Grab on Tue 22nd Feb 2005 03:14 UTC

One is example is Pick. It is an os and a fs and a db all in one. It is still in use in healthcare systems, and others. If I recall correctly, IBM also made a fs and db in one.

Matt

Re:  Other's have done a database fs
by Luis Masanti on Tue 22nd Feb 2005 03:24 UTC

The IBM AS400 operating system has a file system that was a database circa 1980.

Why NAMED Attributes?
by M Jared Finder on Tue 22nd Feb 2005 04:14 UTC

Why must each attribute have a name and a value? Forcing this structure is theoretically ugly, since you can not assign categories to the attributes. But more importantly, it's extra, tangential, work on the user's end. When creating a grouping, I don't want to think of a classification for this grouping -- I just want to create the group!

So why do we need attributes to have names? Why not just allow me to create the group "friends" the group "colleagues" and the group "football players"?

I would hate a database file system.
by Jessta on Tue 22nd Feb 2005 05:56 UTC

I would hate a database file system.
I can never remember what my files are called or what they were supposed to be. But I can always remember which directory I put them in.

Re: AS/400 database plus file system
by Doug Smith on Tue 22nd Feb 2005 05:58 UTC

AS/400 (alias iSeries, alias i5) has a built in database system. If you can live with 10 character table names and 10 character database names (we call them libraries) you get a fairly robust RDBMS. This system existed for years without a name, and is used by all of the other operating system components that need persistent data.

IBM got tired of Oracle kicking sand in their face (anyone remember Charles Atlas ads?) finally named it DB2/400. This database is actually built on top of a single level store, a style of virtual memory where requested 4K byte pages from the disk are brought into memory and can be shared by all authorized users, and updated pages get copied back to the disk. This provides many advantages of an in memory database, but IBM doesn't push that in marketing.

The database is ANSI compliant and has most of the modern features - triggers, referential integrity constraints, stored procedures, BLOBS, CLOBS, SQL interface as well as VSAM like record at a time navigation for traditional RPG/COBOL pgms. Most shops do not have DBA staff - the system is dirt simple to administer.

In addition, the AS/400 has a large set of alternative file systems, called IFS (Integrated File System) which treats the RDBMS as another file system. The "root" system is UNIX/Windows like, ie, hierarchical, long names, and allows UNICODE encodings, whereas the database is classic mainframe EBCDIC. You can link a stream filename to a field in a database record, which locks it down. Useful for linking a resume to a personnel record, etc.

Check out Gmail
by Jon Smirl on Tue 22nd Feb 2005 06:18 UTC

Rufus I sent you a gmail invite. Check out their filters, search, views, etc. Nothing really gets deleted in gmail unless you try very hard. Gmail is very similar to what you are proposing - the folder's aren't real they are simply views on the database.

Also check out Reiser4. http://www.namesys.com/
There are good white papers there. Hans is building a system almost exactly like you describe. One company has implemented a complete XML database on Reiser using plugins.

WinFS - bha!
by Jason on Tue 22nd Feb 2005 07:29 UTC

The last line of the article reads:
"Real soon now, Microsoft will unleash WinFS onto the world and make all other database filesystems obselete. Though details are still a little vague."

What is the author considering to be "real soon?" Longhorn? Ha!

Umm...hate to break it to you, but Longhorn/Longwait will NOT have WinFs.

Here is just one article among many that talk about how WinFS has been cut from Longhorn.
http://www.microsoft-watch.com/article2/0,1995,1640454,00.asp

Even Microsoft has admitted having difficulties with WinFS and has no idea when it will be ready.

http://news.com.com/New+file+system+has+long+road+to+Windows/2100-1...

The link above points out, and includes comments from a Microsoft exec, how WinFS may not be ready until Blackcomb which wont be for another decade, even though WinFS has been in the works for a decade already. The earliest WinFS will be available will be as a test version in late 2006. I'd hardly call that "real soon."

OS/2 had it long before
by Cris on Tue 22nd Feb 2005 08:01 UTC

Hi all,
OS/2 had Extended Attributes in his HPFS file system long before BeOS, and still has.
OS/2 implements EAs natively on HPFS and JFS, while it supports them on FAT with a non-native approach.
What is nice in BeOS is the "query" concept that resembles DBs much more, even though it has always been possible to do EA-based searches in OS/2.
On the "performance" side, BeOS added indexes, which made queries perform a lot faster.

Bye

Cris

hmmm
by mmu_man on Tue 22nd Feb 2005 08:22 UTC

The author says he never tried BeOS... maybe it would be nice to do it now so next time he has some background ;)

> "... BeOS was ..."
BeOS *IS*: http://yellowtab.com - http://haiku-os.org

> WinFS
ugh ? AFAIK that has been withdrawn from Longhorn btw :p

> reiser4
The only problem is the upper layer.
The Linux VFS doesn't have any call to use the extra features.
It barely has attribute read/write calls.
Yes, Linux now (finally !) supports POSIX eattrs... the only problem of those being they are *untyped*.
just a name-value pair.
How do you know it's a string, an int or whatever ???
Of course the application that created it knows, but the other ones will stay ignorant.
In BeOS, attributes have a 32 bit type field associated, which tels if they are int, int64, string, mimetype, ... or some app-specific stuff.
That allows Tracker (the file mgr) to display them (ok, the mime db describes those attributes), but also any app to at least read them someway. (and queries to search for them correctly)

Exchange / Outlook
by Evert on Tue 22nd Feb 2005 09:16 UTC

Microsoft Outlook, part of their office suite, uses a Jet database to store it's email messages. You can use it to search for attributes, and you can even store normal files in the folder hierarchy.


Their mailserver, Exchange, builds on the same database design. The wole WinFS idea is derived from the Exchange Storage.

Personally I prefer Maildir over the Exchange Storage because Maildir is less vulnurable to disk errors and is easier to rsync / backup, and Maildir stored messages are easier to manage / edit / troubleshoot with standard (filemanager) tools.

But the DB capabilities of the Exchange Server are often useful, that's for sure.

HFS+
by arwq on Tue 22nd Feb 2005 10:14 UTC

Mac OS X Tiger is BeFS-plus. It supports everything BeOS did back then, plus the ability to search *inside* 20 file formats.

Only at first glance, not if you look more closely:
BFS keeps all the indices in the filesystem itself, and the implementation is part of the file system. On Tiger, it's just an additional process that's running. In effect, your indices on the drive are not necessarily in sync with the actual data, as you may have changed data with no add-on running (e.g. when having the same drive mounted in 10.3). There are a few more differences, but the ADC NDA tells me that I have to keep them for myself. Furthermore, all attributes are read-only from the Finder, you can't simply change an mp3's rating in the file manager like you can in BeOS.

Oracle db file system
by Elminster on Tue 22nd Feb 2005 10:47 UTC

Oracle's Internet File System has been around for 5 years or so now (under a number of names). Although designed more for use on file servers that individual PC's.

http://www.oracle.com/technology/documentation/ifs_arch.html

As it uses an Oracle RDBMs you can use much of Oracles technology, its part of Internet Application Server these days.

@Exchange / Outlook
by emacs on Tue 22nd Feb 2005 12:51 UTC

A .pst file is really a Jet database?
Does that mean it's really an .mdb, under the hood?
This bears experimentation, because no glance at the COM interface, or experience with the software Winnebago that is Outleak ever hinted such.

while at it...
by mmu_man on Tue 22nd Feb 2005 12:55 UTC

I'd just put my files on a webhost, and let Google index them for me, then I'd just use googlefs (no it's *not* gmailfs)
http://clapcrest.free.fr/revol/beos/shot_googlefs_006.png

>HFS+
So it seems BFS is still better :p

> Oracle's Internet File System
I've read a bit about AFS and Code, but never saw that thing, will have a look.

why this doesn't work
by Gabriel on Tue 22nd Feb 2005 13:00 UTC

you have email atributes in a filesystem.

nice. You can store From, To, Subject, multipart messages, attachments... the whole shebang.

Now, i write an app that mark messages as spam, or something that you didn't tought about when designed the attributes. No matter how intuituve the design is or how nice it treat extra data or how nicely tought was the fall-back compatibility... I will not read much of the docs and will save it into the subject field, breaking all other clients.

That didn't happened to BeFs with it's email attributes because there weren't much clients.

Also, it's the same that happened with Html and is happening with XML.

If we had consistent data. the naive solution of apple to simply read inside the common formats, would sufice for that purpose.

read subject.

Any chance I could see some screenshots?

@ emacs, RE outlook
by Evert on Tue 22nd Feb 2005 13:54 UTC

exchange uses a jet db for sure

i'm less sure about outlook,

http://groups.google.nl/groups?q=outlook+pst+file+jet+database&hl=n...

states it, but

Van:Sue Mosher [MVP] (suemvp@slipstick.com)
Onderwerp:Re: PST file format
Discussies:microsoft.public.outlook.program vba
Datum:2002-04-15 10:18:30 PST

It's undocumented, and it's not JET. Since only Outlook users can open
PST files and use the data, why not use Outlook to make your
modifications?

--
Sue Mosher, Outlook MVP
Outlook and Exchange Solutions
at http://www.slipstick.com


anyway, .mdb != .pst

re: mmu_man
by stew on Tue 22nd Feb 2005 14:54 UTC

So it seems BFS is still better :p

It's different. Much better attributes (since OS X is not doing it on file-system level), but no full-text search.

RE: Opera + Desktop Search
by emagius on Tue 22nd Feb 2005 15:41 UTC

Indeed. Not only does it return better results than any other e-mail client, it's done faster than other clients (e.g., Outlook) return their first result. And while it took me some time to break from the folder method, labels work really well. It's a pity that Opera (as of 8.0b) still restricts the maximum number of labels to such a small value.

Care required.
by Francois Stiglitz on Tue 22nd Feb 2005 16:02 UTC

The idea of a DB filesystem is fairly old. Until recently, it was largely impractical because of the resource requirements (CPU and disk). There are various experimental and even production-level implementations of such things for virtually every platform, some a dedicated filesystem, others simply glue that maps a virtual filesystem to an object store / RDMBS. Microsoft's original intent was to remove the whole hierarchical filesystem concept and have all data stored in an object store instead.

The only problem these things have are the explosion of attributes and indices. There are lost of ways to implement the filesystem database concept. They all have the problem that if the metadata storage permits arbitrary attributes (application defined), you soon have so many that some attributes become redundant (CorelDraw uses 'CreationDate' and Adobe uses 'DateCreated', for example) or ambigous. Indexing of the various attributes and the overhead of mainting the indices and syncing metadata with object data becomes prohibitive and difficult to schedule properly (particularly in a multiprocessing environment).

More to the point, once there's sufficient structure and normalization of the metadata, why maintian the notion of a file at all? Simply move all data into the attributes (perhaps have a special attribute that's a graph of relations between other attrbutes). That eliminates the concept of the file and completely changes the way everything is stored. It's conceptually so abstract that Joe Average will never understand it, and it's fragile (disk errors, etc), and would open up entirely new horizons for potential abuse by the implementor as far as platform-data lock-in and hidden proprietary features. What could be better from a commercial OS provider's point of view?

Mail items as files
by Slobodan Celenkovic on Tue 22nd Feb 2005 16:17 UTC

RE: Francois

Long term file systems may be replaced by semantic nets that are stored in databases. For now, relational databases don't manage large objects (BLOBs and CLOBs) as well as the file systems. Therefore the transitional period is required where metadata and other attributes are stored in databases, while file content is stored in file system. In fact, it is conceivable that databases will eventually import some file system characteristics/mechanisms for large data item management.

Where Do DB Filesystems Fit?

If messages were stored in plain files (either ASCII, or some other doc format, DOC, PDF,...) while the attributes are stored with files in the DB FS (from, to, ...) then it would enable developers to create many different tools for e-mail. We would no longer need a single massive monolithic app (Outlook) that requires to have all the command anyone could possibly use.

Instead, we could have a simple reader/writer app, another app to filter spam, another app to search, etc. All of them would use the standard FS e-mail attribute names, hence would work together without any special effort. That way we would no longer be locked in a specific client, but could easily change it, customize by adding/removing other apps, etc.

The actual managament of attributes (from, to,...) should be left to the DB FS because it is common to all apps. E-mail apps use from, to, subject,... doc apps use author, title, .... and so on. Each app has to create its own format, search functions, etc. Instead of replicating this effort and reinventing the wheel the common command (read, write, search) ought to be available to all apps from DB FS.

See Dekk at http://www.dekksoft.com/index.html

What about:
http://www.sqldesktop.com/ ?

I am afraid database system is what it is all about. Keeping "file" abstraction in a database based OS is somewhat weird.

@Evert
by emacs on Tue 22nd Feb 2005 16:38 UTC

Yeah, I just confirmed that there was no easy way into the .pst.
Sure is any easy way out, though: Gnus. ;)

Gabriel: why this doesn't work
by Earl Colby Pottinger on Tue 22nd Feb 2005 17:46 UTC

Sorry, but you lost me. I filter my Email and UseNet using BeOS's queries all the time. First, the way I avoided your problem is everthing that matchs the filter just goes to trash and is delete. True I have to use loose filtering so as not to lose anything I want but that first pass deletes 90% of the junk, and I don't even need to run a program, the query window is always open on screen 2.

Second, what problem marking spam? Just create an additional SPAM flag attribute for Email. Programs that don't know about will not touch it, so it is set only if you want it to be. You are not stuck with only the attributes that come with the system, you can create and add as many new ones as you want with not problems or conflicts. BeOS is that great!

Stew: Text search in BeFS
by Earl Colby Pottinger on Tue 22nd Feb 2005 17:59 UTC

http://www.bebits.com/app/454 has always worked for me.

http://www.bebits.com/app/3637 looks interesting, but not finished.

http://www.bebits.com/app/3782 I have never used but Linux people probably would like better.

Re: Extended Attributes...
by Rich Steiner on Tue 22nd Feb 2005 18:42 UTC

Don't forget that even the classic MacOS HFS filesystem had files which each had a resource fork and a data fork, allowing the storage of all kinds of interesting metadata (including icons, creating application tag, and other info) along with each file. I don't know if the initial incarnations of the MacOS had it, but I know it certainly predated OS/2's usage of EA's.

re: misc
by stew on Wed 23rd Feb 2005 01:06 UTC

Earl Colby Pottinger
BeIndexed is going in that direction, but the others are all slow brute-force searhces.

Rich Steiner
It's a common misconception that type/creator info is stored in HFS' resource fork - it isn't. It's stored in a regular file attribute, just like a file name or date. The resource fork was used for all the things that OS X wants you to put in separate files now, application resources like images, strings or sounds.

Mail databases
by Miles on Wed 23rd Feb 2005 09:41 UTC

Lotus Notes has been doing *all* this ... and more .... for years.

Database mail
by a_dem on Wed 23rd Feb 2005 13:28 UTC


While we are all waiting for DB FS, here is the
DB mail application the author is yearning for.

http://www.dbmail.org/

Good stuff and now some philosophy
by Jody on Wed 23rd Feb 2005 17:05 UTC

I liked the style of the article. As far as details go their are others more qualified than I, but some observations I have made are as follows:

1) Any file system can arguably be called a database, depending on how strictly you define the term "databse".

2) Entropy rules.

3) One of the most popular computer applications is e-mail.

4) E-mail has become the de-facto file system for many clueless end-users, despite our pain.

5) Sym links are fun for a while.

6) Context people, context.

7) Task oriented file systems, smoke 'em if you got 'em.

8) 0100010001101111011101110110111000100000011001000110010101100101011100 0000101100
0010000001110100011010000110100101110011001000000110100101110011001000 0001110111
0110100001100001011101000010000001110100011001010111100001110100001000 0001110010
0110010101100001011011000110110001111001001000000110110001101111011011 1101101011
01110011001000000110110001101001011010110110010100101110

9) What did he just say?

10) "There's a message for you."

Sorry, Earl...
by Rich Steiner on Wed 23rd Feb 2005 18:06 UTC

I remembered seeing icons, various strings, and other things living in the resource fork when viewing them via ResEdit on a classic Mac, but I was admittedly guessing on the type/creator string. :-) I sit corrected!

VMS
by Jensen on Wed 23rd Feb 2005 21:46 UTC

VMS has it for years. Do we need another whell? Come on.