Home > Linux > Database File System for Linux Database File System for Linux Submitted by super_science_monkey 2004-09-06 Linux 53 Comments There’s what appears to be a well-presented but probably pretty early phase implementation of a Database File System for Linux. Check it out. About The Author David Adams Follow me on Twitter @david_adams 53 Comments 2004-09-06 4:09 pm …fork Gnome? In my opinion they should focus on the DBFS itself and not worry about desktop integration yet. Pick one thing and do a damn good job of it and everyone else will adjust. Let the Gnome devs re-write Gnome to integrate with DBFS. If its going to be as great as you say, Im sure they will want to. 2004-09-06 4:48 pm http://www.gnome.org/~seth/storage/features.html 2004-09-06 4:49 pm looks promising, if it works. but the permissions & multiple users thing still scares me a bit. 2004-09-06 4:56 pm Yes, so what are you trying to tell us? 2004-09-06 4:56 pm I’m no expert on Linux filesystems but isn’t ReiserFS an example of a mature DB filesys for Linux? Cheers 2004-09-06 5:00 pm Why not use WebDAV as a backend for the DBFS. It has attributes and can be accessed via network share. With a database backend like Catacomb http://www.webdav.org/catacomb/ the searching could be realized. With DeltaV support the files could be stored in some kind of versioning system. 2004-09-06 5:02 pm Didn’t look at the page first. I think there is room for ambiguity in the term “Database File System”… I guess I’m a purist 😉 2004-09-06 5:04 pm http://www.gnome.org/projects/beagle/ 2004-09-06 5:04 pm I’m no expert on Linux filesystems but isn’t ReiserFS an example of a mature DB filesys for Linux? ——- reiser4 can probably do this 1) you are relying on this filesystem to be adopted by everyone which is near impossible 2) it is not in the main kernel yet though it is in mm tree and looks like it should be modified before inclusion 2004-09-06 5:06 pm Nothing, I just feel like adverstising GNOME Storage. /sarcasm 2004-09-06 5:09 pm Actually its not a bad post given ReiserFS4 has recently been released. Can’t name it ‘stable’ yet though because something like that is decided over the time. ReiserFS4 would allow something like this via plugins IIRC, but i’m not so sure on it. 2004-09-06 5:12 pm OK so this is supposed to target users mostly, right? Is this really an issue in *nix considering users will only have permission for /home/$USER/* So whats wrong with: ~/Documents ~/Music … and having an indexing tool like Beagle? 2004-09-06 5:17 pm Many people probably heard about gnome storage and beagle, but what’s the point in simply posting references to them? None probably, it’s just the old kiddy game of gnome vs. kde. Now if anyone could tell me why the projects you posted links to are better or worse, or simply different then the project this news refers to, now that would be interesting. I’m all ears. Oh and btw., the devs of dbfs are looking for developers that will lend a hand into integrating it into gnome. 2004-09-06 5:32 pm beagle is a product specified developed for a modified gnome desktop (Novel Linux Desktop ) for use a decent beagle integration,you must modify some gtk widgets like gtk_file_chooser and take some obscure widget developed by (??) for nautilus (the search utility ) some of this code is not included on the cvs tree and i think that will never released until novell will ship their desktop. 2004-09-06 5:36 pm this is here now, working in kde (check out the shots) and is desktop agnostic (all it needs is a small change so that the save/load guis uses the database). basicly its winfs for linux. it creates a database of metadata covering all user created files so that later on you can search the metadata on top of the normal ways to search (filename, date of creation). its not supposed to cover the entire filesystem of the os, only the user files. the question is if they have the ease of use. can i add metadata to a file by doing a search and then putting the file into the seach by drag and drop (this was a action deacribed for winfs)? 2004-09-06 5:54 pm If I can’t be bothered to organise my files into folders so they are easy to find then WHY would I add keywords to every file to make it easier to find? lovely idea and all but I can’t see the point myself. I’m not organised now, that’s not the fault of the system that’s my fault and no advanced features will change that. also, there are many thousands of files on my system, where will it put all of the files without metadata which are installed by apps? 2004-09-06 5:57 pm Now this sounds very interesting. It would definitely take some time to get used to using it. I think the advantages and disandvantage of having this as an additional layer are equal. The advantages are flexibility – it’s here right now, no matter what file system you use. The disadvantages are probably performance and lack of integration with non-compatible apps. Once you use a non-QT / GTK app or the terminal you are quite lost, because you have to navigate through a hierarchical file system that you’re not familiar to. I would love to try it, but currently i still have no HD. Thanks Maxtor. 2004-09-06 5:59 pm Personally, I don’t like it. I really do not like the idea of making this be a user-level service, and creating a schism between the underlying hierarchical model, and the search-based user model. Personally, I think Hans Reiser is spot-on with his ideas about what the namespace should look like. I’d much rather have this be implemented as something like a Reiser4 plug-in. 2004-09-06 6:22 pm I’m no expert on Linux filesystems but isn’t ReiserFS an example of a mature DB filesys for Linux? Which version? 3, yes, 4, no. Version 4 is far too new and untested to be considdered mature, and it’s not really even based on ReiserFS version3, so it’s more of a revolution than an evolution. It’s essentially a different filesystem. Cheers 2004-09-06 6:25 pm Wow this is freakishly close to the project I’ve been privately developing for a year and a half now. My prototype isn’t even finished yet though I like the keyword concept however I’ve taken quite a different approach and I’m designing it to be OS independent. Good luck on this project though, if mine doesn’t work out I might join the efforts of this project. 2004-09-06 6:34 pm Stick with working on yours, yours may be better. 2004-09-06 6:41 pm I agree in a way but I have to arguments against your pessimistic view. First, the reason why many people don’t organize their files is because the heirarchy system just doesn’t fit many of the natural orginazation schemes people use mentally. I think this is the crusial reason why a. new users find heirarchies complicated and have trouble finding their files, and b. why many users neglect to organize their files. My second argument is that with a more relational or keyword based system it’s easier to tag files with more meaningful data other than it’s location in a heirarcy. Look at the steps to place newly downloaded file on the desktop into the right folder. First select the file and cut, navigate to the file system entry point (eg. file browser/explorer/navigator) start a journey to find the folder where this file should go. Arive at that folder and paste. Yes there are many other ways to simplify this process but in a way they are all work-arounds for the flawed hierarchy system which is overly complex to the user. With a keyword type of system, it’s as simple as select the keyword from a list and apply it to the file at hand… Done! Also because files can be tagged with many keywords more easily than placing this file into many folders, it becomes much more feasible to allow the system to automatically index files according to your keywords. An example is indexing them based on extension such as mp3 into keywords such as music. These rules would also be easy for a new user to create using Outlook style filter rule creation which is very verbose and flexible. See heirarchies are complex because many times people can have many duplicate folders in different branches to help describe the data. I have 4 different music folders right now. One for music I write myself, one for digital rips of my CDs, one for sheet music, and one for my shortcuts that have to do with music. in a keyword system these files could all be tagged with the same music keyword allowing for multiple roots to be made which quicken the task of finding files. 2004-09-06 6:42 pm Because GNOME Storage does exactly what this project intends to achieve and more. 2004-09-06 7:29 pm Personally, I think Hans Reiser is spot-on with his ideas about what the namespace should look like. I’d much rather have this be implemented as something like a Reiser4 plug-in. Spot on. This sort of thing absolutely belongs at the filesystem level, and Hans Reiser is right. A layer on top of the filesystem at the userspace level, as WinFS is, results in a painfully slow, ill thought out implementation that is just a nightmare to continue to maintain into the future. Would you implement permissions and ACLs at a userspace level? No. Layers on top of the filesystem is a sure fire sign that someone isn’t communicating with the right people and thinks they can go off and do it all themselves. 2004-09-06 8:18 pm Wow “mystilleef,” thanks for clearing that up! We appreciate your consistent level of detail and methodical explanations provided in this thread. 2004-09-06 8:57 pm Not too different from http://kspaces.org/ . 2004-09-06 9:20 pm please, could you explain me, why this should be slower? Ok, yes, that is right, the less layers, the faster it is. But three arguments I get right in my mind: 1. the relational database means drastic search improvements, so it will not be slower than normal filesystems. Therefore, does it really matter the layer slows down? 2. do you not think some people want to use the relational data with eg. ext3 or fat? the database has a huge flexibility advantage. 3. filesystem operations (copying and moving files) are done natively on both ways. I do not think, the integrated version has significantly speed improvements. 2004-09-06 9:45 pm Heres a question. Where does the actual file get saved if you do not use a hierarchical structure? Does this mean that all the files saved will be in /home/user/, creating a massive flat directory? Cos when my system crashes / need to use a different metadata system / etc, I dont really want a disorganised home directory. There also seems to be way too many of these prototypes flying around – I have heard of about 7 now! I think Ill stick with what I have, until one or two emerge as victors. 2004-09-06 9:53 pm Damn, this’ll get me flamed… Anyway, isn’t this a similar concept to gnome-vfs, in that it just adds another layer to the file system for storing metadata. From nautilus (production version), I can add notes to any file, whether I can write to it or not, as they are stored in xml files in my home. All that’s missing is integration into the search tool. Beyond that, this reminds me of an idea I thought of a while ago, but dismissed because BeOS was the only viable platform without a lot of work. The plan was to keep all “user” files (that is, not config files, stored music etc, just things the user has made/is working on) in a single heap directory, and access them via query-folders. Obviously, this isn’t exactly novel, but by adding the ability to set an attribute on the file by dropping it on a query window, and remove one by dragging it out (to somewhere undefined,) it could be the basis of a logical system where files could be associated in many places without the issues of symbolic linking and the keeping track of the actual files that that entails. Anyway, I didn’t think about it for long, so who knows. 2004-09-06 10:16 pm Keep it at the file system level and not overhead ontop of the file system. Then you’ll have somethng worth while to show how to do it right when MS implements their crufty version (me shudders at the thought). Now for that new HD I need. 2004-09-06 10:18 pm Many people probably heard about gnome storage and beagle, but what’s the point in simply posting references to them? None probably, it’s just the old kiddy game of gnome vs. kde. I think many haven’t heard of the alternatives, or this project (it hit /. today, too). The point of posting it, from my point of view, is to provide alternatives, to allow one to lay these side by side, to get a constructive discussion out of it, etc. Not to generate some flamewar. The fact different opinions arise (e.g. check out Rayiner’s post) is no problem in regard to this, so why moderate it down? If read the FAQ of DBFS you’d have read the author is agnostic to any KDE vs GNOME war and that he allows both DE’s to write a frontend for his software. A Good Thing IMO. He’s gonna concentrate on GNOME though and that’s his right just like the HAL/DBUS people concentrate on GNOME but don’t have a problem with KDE. 2004-09-06 10:43 pm Ok, yes, that is right, the less layers, the faster it is. That’s not generally true, if the layers do not overlap in functionality. Layers aren’t the reason why this would be slower. There are a couple of reasons, however, why it would be slower: 1) Regular filesystems are optimized for a relatively small number of relatively large files. 1-4kb is commonly the smallest on-disk size for any file. In a database system, you’d like to support enormous amounts of very small (few dozen to few hundred) byte files. Such files are needed for things like metadata (ID3 tags, EXIF tags, etc). 2) By acting as a layer on top of the existing filesystem, communicating via the standard POSIX API, the indexing server loses all ability to take advantage of specialized on-disk structures to optimize searches. They also cannot take advantage of these structures to optimize the on-disk layout, say, by keeping metadata files close to their parent files. 2004-09-06 10:46 pm I don’t think locking the users in one filesystem is a good idea. Putting it in the linux kernel VFS is also a bad idea, since both gnome and kde support many platforms, like BSDs, Solaris, cygwin and none of them will have a metadata-aware filesystem like reiser4. 2004-09-06 10:54 pm Why would I want to search for something all the time, when I know where some file is located on my filesystem. The thing on screenshot is really confusing and hard to use. What is wrong with old file chooser? And what about non KDE applications? Same question if Gnome storage is used. 2004-09-06 11:19 pm Initially I was pretty skeptical about this idea for similar reasons already posted on this board. Namely, why would I bother putting keywords on my files when I don’t put them in my file hierarchy. Furthermore, at least for me, a hierarchy seems like a completly natural concept/organization so it *really* is laziness rather than conceptual discordance keeping me from being organized. Wouldn’t this problem be worse with a keyword based filesytem? After all google claimed their email system would change the way I thought about email and while i do like it more I still find it necessery to arrange my stuff into folder type things. Still, thinking about it some more I realized that with good application support this could work wonderfully. Even if you were too lazy to add your own keyword information you could easily access “OpenOffice spreadsheet files edited in the last week” from automatically added metadate. This sort of thing could be quite usefull and they have the facet system which gives a relevant power of a hierarchiacal system. Putting this in kernel space, or even in the base system seems pretty silly. If it becomes popular there might be some argument to include optimizations in the kernel (metadata caches, perhaps even a limited in kernel DB) but this should run in user space simply because the base filesystem shouldn’t work like this. The base system benefits from an enforced hiearachry (increased security by securing entire directorys and less chance of file name misunderstandings). Things like your search path require the programs involved to be kept in a strict hierarchy. For instance what prevents a user in a pure keyword system from saving a file su with the same metadata as the real su (even worse if they do it to a file which happens to be missing). You need some sort of security measure (stronger than just the permission to create new executables) which is also easily changeable by the users (to use the new version of su in the new directory). A hierarchy does this sort of thing much better. Since the system files are going to be organized in a hierarchy already it is just wastefull to include the keyword filesystem at system level. If we want the kernel to still be good for server systems this is not the choice to make. In fact we don’t even really want system files to show up in the keyword system at all, only documents and data. Finally, files will be relatively rarely searched for and desktop systems are quite powerfull enough to search these small databases (if optimized) quickly. It seems everything indicates this best belongs in user space, and only if it becomes really popular/usefull but slow consider adding kernel optimizations The Reisfer4FS extensions question is really of quite another matter. I do think an extensible file system interface is a good idea but I simply don’t know enough to say if Resiser has it right. It would be nice if a DFS like this could gain system level support just by loading the correct filesystem module without requiring all filesystems to support this feature. 2004-09-06 11:40 pm Just because Microsoft needs such a technology, the rest of the world does not necessarily need it. Microsoft needs it for the enforcement of digital right management: the machine decides about whether or not a file is visible. And MS needs is for it’s search engine future. They like their customers to be transparent and it will be very easy to send a search request to any client. This is actually not that bad, because there is much power in such a technology, but it’s Microsoft who will negotiate the requests – they are currently attacking the search engine market and they need this kind of software installed on every machine. They use their monopoly to spread software which helps MS to strike against other search engines – again with proprietary protocols, or do you believe that their will ever be a driver for Linux, that can access WinFS Shares? And it’s yet another layer of abstraction and thus another increase of complexity. MS will have to decide whether they stop any kind of innovation in order to keep tis baloon (the OS) consistent or to disintegrate the applications and the OS. The latter will never happen, since this would mean, that the customer’s applications would be less dependent on windows platforms – how do explain, that it’s that hard for many companies to switch to an other OS ? The applications, they are dependant on, are dependant form MS Platforms. This fact is the perfect money printing machinery for MS. With WinFS everything becomes more and more integrated there are many dependencies, which make it harder and harder to develop in a platform independant way. Developing for windows is a one way road. It’s very easy to develop apps for windows (with .NET for example). But it’s almost impossible to turn away from windows without giving up all your code (especially with .NET). This trap is becoming stickier and stickier with every “innovative” technology from MS. It’s not the technology that’s that bad, it’s what MS is going to do with it. The concept of storing meta data with a file is a great idea, but there’s absolutely no necessity to abstract from a file. I like the idea of ReiserFS4 which thinks the “everything is a file”-idea until the end. Each file can have meta information in form of (an)other file(s). In the end there’s still a file and not any abstract object where you don’t know, whether it’s a file, an addressbook entry, an email, a file stored on a MS-server which you have to pay for, etc. This abstraction made by MS is more a way to keep the user stupid. Nowadays it’s almost impossible for newbies to understand what the internet is, when they use WindowsXP. Most newbies think, that it’s impossible to use the internet without registering at Passport, because that’s what the user is asked for, dozens of times per session. Most of the people cannot distinguish between the internet and the blue “E” on the desktop. They think that reading mails is something different than surfing the web. That’s not because it’s too hard to understand, but because of all this abstraction. With WinFS this will become worse. Where would we be, if we couldn’t distinguish between a mail and file and a mail’s attachement? This is what worm programmers want. The complete confusion. 2004-09-07 1:46 am >> Why would I want to search for something all the time, >> when I know where some file is located on my filesystem. >> The thing on screenshot is really confusing and hard to >> use. What is wrong with old file chooser? And what about >> non KDE applications? Same question if Gnome storage is >> used. Actually, you’re an idiot for assuming things you don’t know nothing about. Have you tried Picasa (or iPhoto on a Mac)? No? Well, try it. There’s no way to save anything in this program. The only thing it does is “just work”. It searches your HD for images in the background and puts them into categories. The only thing you can do is changing the name of categories or move images from one to another and so on. Now, that may sound limiting, but it’s actually quite refreshing. Lately, I’ve found images on my HD of which I totally forgot about. Thanks to Picasa for showing them to me. If this DB-based ‘file system’ would work like this, I think that would be great because it actually allows you to find _more_ files on your HD. There’s no way you can forget about old files in deep hiearchies somewhere on the disk. So, isn’t that a good thing? I think it is. Combined with the use of metedata, I think this system will be a winner. 2004-09-07 2:00 am One reason for needing a folder-free way of storing files is for finding information other than filename/path. The iPod does this reasonably well. I can reference songs by artist or predefined playlists. I seldomly remember the name of a song since I have so many (was it ‘I like that’ or ‘I like it like that’?) but I remember clearly who sung each. The iPod’s file view would be a little better if you could extend the MP3’s ID3 tag to describe the members who sung together and search for each individually. This way I could search for all the songs that such-and-such artist sung in (even if he or she was not the main singer). Currently I have to declare this the old fashioned way: “main singer feat. such-and-such” In any case the more files you amass the more likely that such a filesystem becomes useful. You’ll be able to find files quickly based on mood (recipes for spicy food, movies with such-and-such theme, documents that were created by my co-worker during the past year). I will admit though that somethings may be harder to work with this way. How does one create a website based on the idea of folders when the folders have been taken away from you? Will web-servers serve up pages from zipped up archives? Maybe such a DBFS will only apply to our personal document section of the HDD. Who knows? Another issue is keeping unwanted parties from finding data as easily as you can. Before you could bury sensitve data in the catacombs of your hard drive as a DLL or other unnoticeable file. You can even encrypt your personal folder in MacOS X but what happens when your files no longer reside in folders? Will the OS need to hide the folder implementation from you while still protecting your data? This is certainly an area where a lot of different approaches will be tried and some will succeed and some will fail. For better of for worse I can see this taking hold and I for one welcome it. 2004-09-07 2:09 am It’s good to see more then one group pitch in on storing user data in a database format. I think this group should focus on a shared library for DBFS so that it will be easy to integrate into other applications down the road. I personally feel that the rest of the system should be transparent to the user and all they will see is the DBFS. That means Documents, emails, webpages, ssh servers, samba shares, ftp servers, images and even applications should all be transparent to the user. Now this will allow the root users to see the entire *nix system in its pure form and not be constricted to a DBFS image. 2004-09-07 2:13 am I think that its good that dbfs is a good try, but I dont think the author has put in enough features to make it qualifiable for support for it to be packaged with KDE and gnome. I myself have considered coding a DB FS like project, however, it takes a massive amount of planning to do well, and while I think the author has done a decent job at getting the first working implementation working, it just doesn’t have enough features. It takes a lot more then just a simple database to do something like this in a way that it is remotely competitive to WinFS or spotlight, and sorry to say, but storage still looks like the only db filesystem layer that is up to the task.. 2004-09-07 2:23 am Lately, I’ve found images on my HD of which I totally forgot about. Thanks to Picasa for showing them to me. I don’t have such a system but I know the experience: To find something in your tree that you forgot about. But that’s seldomly happing to me. Reading the discussion, I wonder if such a DB-based system would be indeed more productive for a user. I never tested such a system, thus I can compare it only to BD-based web sites with several “keywords”. Take the recommendation system of amazon.com, for example. You can dive down into unknown information for several hours without being productive. Now, imaging somebody in a locale network, jumping from one keyword to the next, amazed what in his own or somebody elses shared folders exists. I admit I’m sceptical. Thus, an implementation that allows to jump back to the good’old tree is something worth having. If normal users are just as sceptical as I am, an implementation on the file system level is just plain silly since only a very few people will accept the hassle to set it up. Thus, the central question for a me as a user would be: “Would I be able to find my pictures if I decide to give up using something like Picasa?” 2004-09-07 2:31 am It takes a lot more then just a simple database to do something like this in a way that it is remotely competitive to WinFS How the hell is this not competitive with WinFS? WinFS isn’t even available yet and won’t be for the first version of Longhorn in 2006. By that time, this DBFS or another will be far ahead of WinFS. 2004-09-07 4:21 am >> The only thing limited is your brain Yeah, whatever. I’m sure you’re actually a very friendly person in real life. Moderate me down, Scotty! 2004-09-07 6:44 am Beagle and Storage are largely vaporware right now and DE-specific. This supports both KDE and Gnome, is here now with working code, clear ideas and a clear roadmap. I say this is the way to go. 2004-09-07 11:16 am BeOS did it right, and that years ago, with implementing this functionality natively, right in the filesystem-level. I really wonder why everybody nowadays comes up with this idea and thinks it’s new – why has nobody of the GNOME- or KDE-folks looked at porting OBFS to Linux and integrating it into the desktop (with queries etc.)..? See: http://www.haiku-os.org/contribute.php?mode=team_view&id=BFS Maybe Haiku (formerly known as OpenBeOS) will show how this could work *throughout* the OS/desktop, like BeOS did long ago… Hugh 2004-09-07 11:38 am (fulll of sarcasm) Who’s next? PS: Oracle (semi-failed), Microsoft (semi-failed), Apple(?)… 2004-09-07 12:02 pm The way to do this is to do it in the VFS layer of the kernel, and not in just another demon that can be bypassed by a simple cp. Reiser4 is a magnificient modern file system. everything microsoft could ever dream of. If linux developers refuse to make it a requirement because some backwards people still want to use ext2 in 2010, then linux does not deserve a database file system. 2004-09-07 3:53 pm I really wonder why everybody nowadays comes up with this idea and thinks it’s new – why has nobody of the GNOME- or KDE-folks looked at porting OBFS to Linux and integrating it into the desktop (with queries etc.)..? —- you need to understand the filesystem specific stuff wouldnt work universally. otherwise reisersfs would do fine. what we really require is a data store which would work efficientally with a large amount of meta data 2004-09-07 4:52 pm There’s no reason to build a database-oriented filesystem unless your file system will be THE filesystem for the next millenia and no better filesystem will EVER EVER EVER be produced. Ever. Because, you know, if it did, you would be tied into one single filesystem, period. ;0 2004-09-07 6:34 pm It will be really hard for let’s say an administrator to force users to use some new system like this. This is why the users must have a chance to select what they want to use, and also to try it out and switch back, or to it. This is why it’s better to be on top of old fashioned file system and usable from any desktop environment (or any application). The kde and gnome file operation dialogs and file managers should have modes for both ways to use the system. Also the metadata could be saved in both modes making the switching easy and the physical locations assigned in db-mode should probably try to follow some human understandable ways of saving the data. Like putting mp3’s to ~/Music/Artist/Album/ automatically. 2004-09-07 10:17 pm I didn’t have too much trouble with his idea on this file system, and I definitely would like to see this kind of functionality on the desktop. What intrigued me more was his idea on never saving any file. You only throw away files you don’t think you need — I really like this idea. 2004-09-08 12:11 am Just FYI….. the Run Time Access library can give a PosgreSQL interface to your application. It is a better way to get status and statistics while the program is running. RTA is to a DB what /proc is to a file system, a way to view internal variables and structures while the program runs. http://www.runtimeaccess.com — try the live demo. 2004-09-09 12:08 am I’m really glad that after reading through peoples comments that the DBFS project I’m currently working on seems more valid. All the pessimistic comments are actually helpful in critically analyzing alternative file systems, but in my case it’s so simple counter these arguments that it’s not even worth it. I will be easier to just continue development and wait until people can see the difference for themselves. Regardless thanks everybody for the good discussions and interesting topic.