posted by Rufus Hamade on Mon 21st Feb 2005 23:12 UTC
IconWith the recent (or not so recent, I am a very slow writer) interest in database file systems, I've been thinking about what a typical user really wants from such a system. What would they use it for? What would we need to do to help them get the most from it? Are there any precedents that show how useful a database file system could be? If not, could we invent one? This lead me to some "gedanken solutions" (like gedanken experiments, just with software) that I thought I'd distract you with.

IMPORTANT NOTE: Most of what is discussed here has already being implemented by BeOS since 1996, however the author has never used BeOS and so he was not familiar with its capabilities while writing this article.

As technical people, we can all think of a bunch of cunning uses for a database filesystem. My personal dream use would be a superlative code management system; when integrated with a good editor/IDE, it could provide revision control, tagging, searchable documentation, name completion, and probably any number of other things. Imagine being able to search the Doxygen comments for the function you can vaguely remember provides exactly the feature you want. Imagine being able to find every place a method is called so you can tweak its interface. Imagine being able to examine, shuffle and package changesets, like Bitkeeper.

But that is quite a lot to implement all in one go, even as an imaginary system, and it doesn't really show how a general user would be able to take advantage of the tools we want to provide.

Instead, I'd like to focus on the humble email client. Email clients have a number of features that make them interesting here:

  • Everyone uses a mail client
  • Email messages have a bunch of attributes that can be easily extracted
  • Mail clients use a custom database
The latter item is particularly intriguing. Email clients have a custom database, but what do they do with it? They use it to implement what at first sight appears to be a straightforward filesystem with folders and files just like the OS-native version. There are some deviations from this norm like the virtual folders in Thunderbird and Evolution, and I believe Opera uses a more generic database in their mail client, but predominately we are still using a hierarchical structure to organize our email.

This observation inspires the following questions:

  • If database filesystems are so good, are there any good reasons why no-one has implemented one for email?
  • Can we explore the usefulness of a database filesystem by implementing one within a mail client?
  • What "killer features" would such a mail client provide, and would they convince users to switch?

The rest of this article tries to find some answers to these questions by creating a specification for a database-backed mail client.

The Mail Database

First of all, we should explore what features a database backed mail client would provide the user. In a pure email system, we would only need to store two different types of objects: email and addressbook entries. To simplify things, I'm ignoring all the other things, like task items and diary dates, that some mail clients store.

We can divide the attributes for each object into three different categories:

  • Intrinsic attributes – These are defined by the objects themselves, e.g.,
    • The sender, date, recipients, subject etc. for an email.
    • The name, email address etc. for an addressbook entry.
  • Client attributes – These are invented by the mail client to manage the database objects, e.g.,
    • Object type
    • Unique identifier
    • Per-message flags: draft, sent, unread, deleted etc.
    • Received date
  • User attributes – These are attributes that the user maintains e.g.,
    • Per object flags e.g., message has been replied-to, message has been forwarded, message needs response
    • Object category attributes, e.g., message is a personal/work message, addressbook entry is a friend/business associate
    • Custom attributes e.g., Deal-with-by date
The above is obviously not an exhaustive list of attributes, but I think they give a feel for the type of things we are talking about.

We want to use the message attributes to help a user organize their email in ways that weren't possible with the old folder paradigm. For example, the user might want to

  • Set a "Needs reply" flag so that the user can see which messages need to be responded to.
  • Set a "Deal with by" date so that the user can specify any deadlines imposed by the message and a completed flag the user can set when the task is complete.
  • Set flags indicating that the message is work/personal/etc.
or any other attributes that the user might think of. The important thing is the user should be able to modify the set of attributes whenever he wants; it might be difficult to get a user to maintain a set of attributes that we impose on him, but he is bound to be keen to use attributes that he defines himself.

The user can can use these new attributes to manage his email in lots of new and interesting ways, for example,

  • The user can find all messages that have been waiting for a reply for longer than a week
  • The user can find all messages with imminent deadlines
  • The user can find all work messages from a particular recipient
Creating a Message Hierarchy

One attribute type that I haven't mentioned is a explicit message folder. Instead we can produce a folderlike hierarchy using any set of attributes. But will the user want to sort his email into a hierarchy? Considering the precedents – current mail clients, hierarchical databases and filesystems, DNS, taxonomy and any number of other examples – I think we can safely assume that the need to categorize objects into a hierarchy is hardwired into the human brain.

I can think of two approaches to producing a hierarchy from object attributes. First of all, we can categorize objects using a subset of the available attributes. At each level of the hierarchy, we choose an attribute, and assign messages into subcategories using that attribute.

This hierarchy is very simple to achieve but its usefulness is probably limited. Most attributes aren't suitable. Who would want to categorize their messages using the message ID? How would we use a multi-valued attribute such as recipients? Even the originator will only be useful under limited circumstances.

The second option is to use a specific user-defined category attribute. The user enumerates all possible values of this attribute and assigns messages to their appropriate categories as he sees fit. To produce a hierarchy, we divide the category attribute into fields, with each field used to categorize objects at a given level in the hierarchy.

The most useful solution would probably be a combination of these two. At the highest level, the user would want to see their messages categorized using the message flags to produce categories like unread and uncategorized messages, messages waiting to be sent, deleted messages etc. Afterwards, it is probably sufficient to arrange messages according to the single category attribute.

Note that with this scheme, we no longer guarantee that the message categorization is disjoint – a given message can exist in more than one category. In fact it might be useful to make the category attribute multivalued. After all, not every message is easy to pigeonhole.

Table of contents
  1. "DB fs, Page 1/3"
  2. "DB fs, Page 2/3"
  3. "DB fs, Page 3/3"
e p (0)    41 Comment(s)