You have probably gathered that the database described in this email is not a typical relational database. I like to think of it as an "ad hoc" database; we can use it to store any old junk.
In our favour we don't have too much information to store. Even the most sociable of users will not have millions of emails. In addition, the attributes we use are pretty straightforward: flags, enumerated values, strings, a category attribute with fields, perhaps dates and numbers.
Having said that, this database may be difficult to implement efficiently. The user can add new attributes to any object at will. Most attributes are optional and some are multivalued. Together this will make it difficult to find an efficient storage scheme, though something like Reiser4 whould suffice. Further, we will want to key every attribute, as well as creating a "string search" key into the message subject and body.
These things may make the database difficult to implement. These difficulties will only get worse as we add more object types to our database, and may be prohibitive in an unconstrained system like a full database file system. Perhaps this is why Microsoft has failed to implement one for the last 10 years.
Taking it FurtherEven though this system is imaginary, it is useful thinking how it will develop over time. I can think of a few directions that could be explored:
Accessing databases remotelyJust as the MAPI and IMAP protocols provide remote access to traditional mail databases, we would need to create a network protocol to provide remote access to our database.
If we were using a DBFS, this could be the filesystem's standard network protocol, if one existed. Conversely, creating a network protocol to access our database would serve as a good prototype for a network DBFS.
Linking together multiple databasesQuite often, people have access to more than one mail database. For example, a company employee typically has access to two: their own database containing personal messages and a shared public database containing messages of general interest.
It is interesting to consider how we could integrate multiple databases in this framework. Will the user see a combined view or will they see two distinct databases? How will access control be implemented? Can the user add private attributes to public entries?
Setting attributes automaticallySome user attributes, e.g., attributes indicating that the message has been forwarded or replied-to, can be automatically set by the mail client. It would be useful to provide some javascript-like scripting language to allow the user to automate the maintenance of such attributes.
Taking this one step further, we could use the same techniques used to identify spam &ndash Bayesian filtering for example – to place messages into other categories. Although I'm not sure if users would be willing to allow their messages to be categorised by a machine.
Adding attributes to outbound messagesA user might want to include attributes within messages sent to other users of database-backed mail clients. For example, the sender might like to set a reply-required flag or a reply-by date. The recipient might like it if the message indicated whether it was a work or a personal email.
The RFC822 standard is flexible enough that it would be quite easy to add these attributes to a message. The difficult bit would be creating a shared schema for all users to use.
Where Do DB Filesystems Fit?You may have noticed that database filesystems weren't mentioned much in the above. So now it is time to ask what would such a filesystem give us?
It is not obvious that a database filesystem would implement what we require. For example, would we be able to add attributes at will to database objects? Would the database filesystem allow us to do string searches into the body of objects? Perhaps the database filesystem would just provide an efficient storage layer (e.g., Rieser4) and we would have to do all the indexing ourselves.
Assuming that we have access to a database filesystem that fulfills our requirements, implementing a mail client on top might be little more than implementing a UI. Unfortunately, as I attempted to convey above, I think that this would be one of the harder problems to solve.
In addition to easing the implementation of our mail client, the database filesystem would allow users to manipulate messages using the standard filesystem tools. For example, users could view and edit message attachments with standard utilities – the attachments would appear as if they were just another file in an ordinary file hierarchy. Of course, we don't need a database filesystem for this; we could achieve the same result by exporting the contents of our database as a userspace filesystem using tools like FUSE.
Perhaps most importantly of all, a database filesystem would allow us to unify the way we handle all filesystem objects. For example, if we extended the database to extract intrinsic attributes from word documents or music files, then these attributes would automatically be available in our mail client.
So it seems that a database filesystem does not buy us very much. On the other hand, many of the ideas and issues outlined in the previous sections apply to a database filesystem just as much as to our database backed mail client. In the former, we still need tools for administering attributes and specifying queries. Solving these issues in the simple email case should give us a good insight into more general solutions for a full database filesystem.
ConclusionThe above was a fairly undirected ramble through some ideas, and I must apologize for inflicting it upon you. I think the point I am trying to illustrate is that a database filesystem is not a solution in itself. Like all databases, it is only as useful as the applications that use it. The converse is not true – we can start implementing the applications straight away using a custom database.
I think this suggests that it is worthwhile starting to implement the applications now. We can start using the database filesystems when they become available.
If I get some time, I plan on experimenting with some of the above ideas in a prototype mail client. But considering it took me a month to write this essay, don't hold your breath.
Further Reading DB Mail Clients:- Opera's mail client is based on a database and includes many of the features described here.
- The new Mac mail client is also database-like. Its search UI is based on Spotlight, so integrates with the Mac filesystem.
- Reiser4 is an efficient database for storing lots of small objects.
- Hans Reiser's vision for a database filesystem
- Real soon now, Microsoft will unleash WinFS onto the world and make all other database filesystems obselete. Though details are still a little vague.
If you would like to see your thoughts or experiences with technology published, please consider writing an article for OSNews.
- "DB fs, Page 1/3"
- "DB fs, Page 2/3"
- "DB fs, Page 3/3"


