posted by Slobodan Celenkovic on Tue 21st Dec 2004 17:58 UTC
IconThe topic of combining a database system (usually a conventional relational db system) with a file system to add meta-data, a richer set of attributes to files, has been a recurring discussion item on this and other sites. The article published last week, Rethinking the OS, under the heading "Where Is It Stored?" talks about the ability to locate a file without knowing the exact name or location.

The Basic Idea

While there are many variations, the essential idea is to enhance the number and type of file attributes to enable location and access to files without knowing the exact path. BeOS added B-tree indices for additional attributes (such as author, title, etc.) thus enabling fast searching based on those attributes. WinFS seems to use Microsft's SQL server (or a version thereof) to store and index file attributes.

Using RDBMS technology to index file attributes enables fast search of files based on a number of attributes besides file name and path. In effect, we enhance access capability by providing many more new access paths that were not available before. At its core, this idea of adding a database to a file system is about adding many more new access mechanisms to files. The increased flexibility enables users to locate files using different terms, especially domain specific terms that have nothing to do with file name and path.

The Arguments Against It

Many developers and other technically-inclined computer users are very much against this idea. However, their opinion is often not based on sufficient information and clarity to be sound. It is easy to jump to conclusions based on headlines alone or a short summary provided by news.

The common criticism is that addition of a database is simply not necessary. After all, the modern file systems do work very well, provide a very fast and efficient access to many files already. Why mess with them? This is the, "If it ain't broke don't fix it", philosophy. This is a perfectly valid argument, as it is well known that the modern file systems are results of a long evolution and therefore have many excellent characteristics. It is important to note, though, that this idea simply adds database to an existing file system and doesn't eliminate it! The standard file system continues to play the same role as it always has had. It is not being removed of replaced!

Another argument is that the overhead introduced by a database system on top of (or next to) a file system is simply not necessary. There are several problems with this argument. First, the overhead is not very well understood today. This argument may be valid only if the overhead is significant. Naturally all implementations strive to keep the overhead minimal. In the past when computing resources were scarce this argument would make sense. Today with massive processing, memory and hard disk capacities available the overhead should be handled without any problems. In fact, future multicore processors can easily execute database operations on one of the cores in background with little to no impact on other tasks.

Another argument is that database simply adds unnecessary complexity to the filesystem. There are many RDBMS of different sizes and complexities. In general, database systems being combined with file systems are of the smaller, simpler variety. We are not talking about a massive Oracle database using massive amounts of memory and disk space. In general, they would be some sort of smaller embedded database systems that are much lighter on machine resources.

Table of contents
  1. "Page 1"
  2. "Page 2"
e p (0)    53 Comment(s)

Technology White Papers

See More