Could Linux Abandon Directories in Favour of Tagging?

Submitted by rhyder 2010-11-04 Linux 81 Comments

“For a fairly scruffy looking guy, I have a surprisingly healthy approach to organising my files. However, I’m constantly pushing up against the limitations of a system that is based around directories. I’m convinced that Linux needs to make greater use of tagging, but I’m also beginning to wonder if desktop Linux could abandon the hierarchical directory structure entirely.”

About The Author

Thom Holwerda

Follow me on Mastodon @[email protected]

81 Comments

2010-11-04 10:55 pm

Almafeta
Tagged filesystems seem to be the “holy grail” of filesystem technologies. Vista provided early attempts, which were (to my disappointment) almost abandoned in 7. They’re going to try again in 8. And probably just delay things again.

I think this is one area in which homebrew OSs could really beat the “big guys” to the punch. Adapting from a hierarchal structure to a tagged structure isn’t a drag-and-drop replacement.

Even in the case of the ‘middle ground’ suggested in the article – only replacing the user’s directory of personal files – I’m not sure most UIs and programs would be prepared for finding a directory that seems full of literally everything that people have ever created or saved.

Edited 2010-11-04 23:03 UTC

2010-11-04 10:56 pm

umccullough
I think this is one area in which homebrew OSs could really beat the “big guys” to the punch. Adapting from a hierarchal structure to a tagged structure isn’t a drag-and-drop replacement.

I bet this is something that could be implemented in Haiku with BFS attributes… I suspect someone could mock something up relatively quickly with the existing functionality – perhaps offering it as an addon enhancement to stock Haiku…

2010-11-05 12:54 am

Tuishimi
Exactly… but why abandon directories? Why not have both? I don’t see their functionality as mutually exclusive but instead complement one another. I don’t think the article is actually suggesting doing away with directories so much as advocating the use of tagging and that it could be better in certain circumstances.

Anyway, yeah, tagging/file attributing is a nice thing. I love using it in mail (postbox) and would really love the ability to relate groups of files by tagging them (yet also contain them in separate directories).

2010-11-05 10:49 am

dagw
I suppose if the tagging system was fast and robust enough you could simulate directories using tags. Add a layer on top your shell/filemanager so that when I type ls /foo/bar or click on the /foo/bar folder you list all files tagged with “dir:/foo/bar”. Moving a file would simply mean retagging it, symlinking would mean adding a new “dir:” tag etc.

I’m not sure if this is actually a good idea, but I’d love to see someone implement it and find out.

2010-11-05 4:26 pm

phoenix
I suppose if the tagging system was fast and robust enough you could simulate directories using tags.

Precisely. Tags do not need to replace folders. They just need to supplement them. Adding this to the GUI is easy, as done in Windows 7 with Libraries, and in GMail with IMAP (tags == folders), and in Zimbra (dynamic/saved searches == folders). And, didn’t BeOS do this 10 years ago?

The fun part will be retrofitting it into the VFS/FS layer, so that it also works at the CLI level. There’s nothing worse than have a virtual folder in your GUI file manager, and being unable to find it via “ls” or “cd” at a bash prompt.
2010-11-06 10:30 am

bogomipz
symlinking would mean adding a new “dir:” tag etc.

Why not just use symlinks, or even hard links, as tags then? Add a function to the file manager that creates a link in the proper directory when you “tag” a file, and you’re all set!

2010-11-05 1:24 am

Ricard
A Google summer of code project idea for Haiku next year?

Edited 2010-11-05 01:24 UTC

2010-11-05 12:24 am

gfolkert
Microsoft has been promising that since “Chicago”… dropped again and again.

Could Linux do it?

I hope not. a DB as a filesystem just reeks.
2010-11-05 1:47 pm

FunkyELF
Adapting from a hierarchal structure to a tagged structure isn’t a drag-and-drop replacement.

Why not?

You just need a new file manager and new open / save dialogs for applications. Keep the files actually stored in a hierarchical manner.

2010-11-04 10:58 pm

unoengborg
I’m not sure abandoning the traditional file structure would be such a good idea. There are a lot of old software that we probably would like to use that wouldn’t understand the new tags, especially old command line stuff. E.g. we would still like to be able to do things like “chmod -R g-rwx somedirectorystructure”

However, adding tagging as an additional way to find information would be a very good idea. The problem with tagging is that they takes time to enter manually. What we need is some smart logic that help us suggest tags. Some things are simple, e.g. if we download something from a website we could try to tag the file with its origin and perhaps Dublin core tags on the page, things that we have got as attatchments to e-mail could be tagged with the sender name and e-mail. Another way would be to use neural network technology to get suggestions.

BTW, many linux filesystem have the ability to add extended attributes that could be used for tagging. It’s just a matter of making progams to make use of them. Especially important would be support for this in commonly used backup programs.

Edited 2010-11-04 23:01 UTC

2010-11-08 4:32 pm

richip
You can certainly abandon old hierarchical file systems and build a compatibility layer on top of the existing tag-based filesystem. In other words, you can have a hierarchical tag that grows like the file system. You’ll have tags like “/usr/bin/cat”, “/etc/redhat-release” and have the system level IO calls support it.

2010-11-04 11:25 pm

Delgarde
Bad idea. Support for tagging is good. But having a good search function is not a substitute for files having a unique human-readable name – sometimes, you need to be able to say *this* file.

Edited 2010-11-04 23:26 UTC
2010-11-04 11:34 pm

Zifre
Tagging is nice, but file systems need a complete overhaul.

First, we need transactional file systems. There is really no good reason not to have a transactional file system. It would make things like updates, installations, and removals much simpler. It would also make a lot the common synchronization hacks unnecessary. The thing is, this really isn’t that hard. I created a very primitive transactional file system prototype for Linux some months ago, but I haven’t had time to finish it (I plan on basing it on Btrfs). Any user could do transactions, and they would never block. The basic algorithm was that if a transaction wanted to write to something that was being read, it would be canceled, and if it wanted to read something that was being written, it would be cancelled.

Second, we need indexing of extended attributes. BFS got this right. My music should just be a folder with a bunch of files that have metadata. There should be no database. I should be able to search for songs with complex logical queries, not just simple text searches like you would find in a standard music player (e.g. iTunes, Rhythmbox).

Personally, I believe tagging is secondary to all of this. My mind naturally categorizes things hierarchically, but I have had times when I wished a file could be in two folders.

I am quite sure that the reason that none of these ideas have been implemented is not because they are hard, but because people stopped caring. File systems have hardly changed since the 1980s (the interface, not the implementations). I think the biggest problem with Linux is that most people are focused on creating a shiny interface, when the system below is inelegant and full of hacks. Of course, every major OS is like this, but I think it shows more in Linux. This is an area where Linux could really innovate and be better than Windows and Mac OS X.

2010-11-05 12:09 am

koorogi
My mind naturally categorizes things hierarchically, but I have had times when I wished a file could be in two folders.

This has been possible for ages. It’s supported by all POSIX-compliant OSes, plus Windows NT4+. It’s called a hard link.

2010-11-05 12:30 pm

Zifre
This has been possible for ages. It’s supported by all POSIX-compliant OSes, plus Windows NT4+. It’s called a hard link.

Yes, I know about hard links. However, I have to get out the command line to create them.

Also, I would generally like it so that if I delete the file from one directory, it would disappear from all the others too. Hard links don’t work like that. (You could do that with symbolic links, but you would be left with broken links.)

2010-11-05 3:05 pm

sorpigal
You don’t need to use the command line for hard links if your file manager is sufficiently good. Konqueror has always supported linking as an action from its drag+drop popup menu. Still, it’s true that hard links are not a viable solution and are a nightmare to manage.

2010-11-05 12:53 am

modmans2ndcoming
unless you have come up with a magical method of concurancy, there will always be blocking unless you take on mutli- versioning but that brings with it its own issues to think through.

2010-11-05 12:33 pm

Zifre
unless you have come up with a magical method of concurancy, there will always be blocking unless you take on mutli- versioning but that brings with it its own issues to think through.

Nope, there is no blocking. Whenever a transaction would normally block, it is aborted. If two transactions are competing, the one with the higher priority always wins. Regular file operations are treated as transactions with infinite priority, so they are never aborted or blocked for transactions.

2010-11-07 1:36 pm

modmans2ndcoming
sounds pretty much like a basic Concurrency control system already in use in every file system.

2010-11-07 7:30 pm

Zifre
sounds pretty much like a basic Concurrency control system already in use in every file system.

No not really, because normally everything just interferes. (I can read a file at the same time you are writing it.)
2010-11-09 2:43 am

modmans2ndcoming
Write locks are exclusive locks in databases as well and do not allow reading. The reason is that you would be reading data that is invalid or could be invalid. A file system that allowed reading of a file that is being written to would not be able to produce reliable data.

2010-11-05 12:53 pm

Zifre
One thing I forgot to mention: file names. We really need to stop relying on names to locate files. Something like a UUID would be much better. The name would solely for display purposes, and would just be a regular indexed extended attribute. Links would reference the UUID, not the name. The entire file system would essentially be a giant database. You could query the file system based on any attributes, and the result would be a list of UUIDs. You could then open a file through the UUID. Directory structures could be implemented using a parent attribute that would refer to the “directory” (really a file) containing a file. To get a listing of the files in a directory, you would query for all files with a parent attribute equal to the directory’s UUID. Tagging would be implemented in a similar way.

Unfortunately, this is a bit harder to implement. The major problem is dealing with broken links. If you delete a file, do all the references to it go away, or stay broken? Would it be possible to create a file with a specific UUID in order to fix a broken link? These problems are a lot harder to solve, so I would not expect to see a system like this for a long time. It is somewhat similar to WinFS. Does anyone know how WinFS solves these problems?

2010-11-05 8:09 pm

sbalmos
Unfortunately, you’ve pretty much described an inode, and how filesystems generally work already. Especially a directory being a special file that contains other file ID (sorry, inode) references.

2010-11-06 8:12 pm

Zifre
Unfortunately, you’ve pretty much described an inode, and how filesystems generally work already. Especially a directory being a special file that contains other file ID (sorry, inode) references.

Except that you can’t open a file by inode number. File paths/names are the main interface, and inodes are mostly just an implementation detail (except when dealing with hard links and other special files).

2010-11-07 1:37 pm

modmans2ndcoming
so?

2010-11-05 4:32 pm

phoenix
First, we need transactional file systems.

Only if by “we” you mean Linux. Non-Linux systems have had transactional filesystems for years now (ZFS, HAMMERFS), and support for versioning in the filesystem (VMS).

There is really no good reason not to have a transactional file system. It would make things like updates, installations, and removals much simpler.

You’re right, it does. ZFS snapshot your filesystem(s), do your updates. If it fails, roll-back the snapshot and carry on. If it succeeds, you either keep the snapshot just-in-case, or you delete it. Works beautifully, even across full OS upgrades.

Second, we need indexing of extended attributes. BFS got this right. My music should just be a folder with a bunch of files that have metadata. There should be no database.

Uhm, what do you call your index, if not a database?

Personally, I believe tagging is secondary to all of this. My mind naturally categorizes things hierarchically, but I have had times when I wished a file could be in two folders.

Some kind of tagging or EA system would be nice, for just this reason. After using GMail and Zimbra for the past couple of years, it’s nice being able to physically store messages in a hierarchical manner, but also access them via multiple “folders”/tags where appropriate. And having saved searches (virtual folders) that refresh each time you go into them is absolutely wonderful; something I’ve missed from GUI file managers like Dolphin.

2010-11-05 10:26 pm

jonas.kirilla
When people think “database”, they might think of a userland process, some kind of metadata storage on -top- of a traditional filesystem and some periodic indexing process. BFS indices (in BeOS and in Haiku) are an integral part of the filesystem. Indexing happens in the filesystem (is done by the filesystem) at the exact time when attributes are created/altered. There is no periodic indexing process, and there is no separate metadata storage. (Which could potentially get out of sync with the target files.)

2010-11-05 11:08 pm

phoenix
Ah, gotcha. That makes sense.

2010-11-06 8:20 pm

Zifre
Only if by “we” you mean Linux. Non-Linux systems have had transactional filesystems for years now (ZFS, HAMMERFS), and support for versioning in the filesystem (VMS).

Nope, that’s an entirely different type of transaction. The only “real” transactional file systems (i.e. allow multiple user-level transactions that can be cancelled individually) that I am aware of are TxF for Windows Vista/7, and TxOS for Linux: http://www.cs.utexas.edu/~porterde/txos/

You’re right, it does. ZFS snapshot your filesystem(s), do your updates. If it fails, roll-back the snapshot and carry on. If it succeeds, you either keep the snapshot just-in-case, or you delete it. Works beautifully, even across full OS upgrades.

That works fine when you only need to do one transaction at a time. There is no reason why a file manager shouldn’t be able to do atomic copies or atomic unpacking of archives. Snapshotting the entire file system is not a very general or elegant way to solve the problem.

Uhm, what do you call your index, if not a database?

It is a database, but it’s part of the file system (i.e. not updated by applications). Look at BFS on Haiku or BeOS.

2010-11-07 3:45 pm

abraxas
Personally, I believe tagging is secondary to all of this. My mind naturally categorizes things hierarchically, but I have had times when I wished a file could be in two folders.

You can have that with a hard link although it does have its limitations.

2010-11-05 12:02 am

OSbunny
I don’t like tags. Ever since firefox introduced bookmark tagging I can’t seem to properly organize my bookmarks. Its too confusing when you have both tags and directories. IMO to implement it on the level of a filesystem would be chaos.

2010-11-05 5:00 pm

phoenix
Organise your bookmarks exactly the same way as before, using folders.

Then, just add tags to your bookmarks for “categories” and keywords.

When you want to find a bookmark, you can use the menu the same as always to browse through your folders.

Or, you can just start typing a keyword into the addressbar, and Firefox will scan your tags and list out your bookmarks that match.

Think of tags as a supplement to folders, not a replacement.

2010-11-05 12:23 am

cristoper
One of the main benefits of tagging, that is putting the same file in more than one directory at the same time, is already possible using hardlinks.

2010-11-06 11:03 am

bogomipz
Yes. What is lacking with hard links is the ability to find other names for the same file, without traversing the file system tree. So there is no easy way to delete something (it happens when you remove the last link to the inode), and you can’t list which tags/directories a file is currently associated with.

Well, you can do it with `find`, but it is too slow to be practical. I’d say fix this problem rather than establish a new way to tag files.

2010-11-06 3:15 pm

cristoper
Yes. A naive solution is not difficult. You just need wrappers around ln and rm that update an index whenever a link is created or removed. And then a tool to search the index and list all of the names of a given file (like ‘update’ but just for hard links).

Of course ln and rm are not the only tools that can create and remove links, so a better solution would be to handle it at the system call level… or at the filesystem level, which I guess is sort of what this article is suggesting.

2010-11-05 12:34 am

squelart
Instead of overhauling the whole experience, I would like some kind of intelligent filesystem that would interpret directory names as tags, and transparently make files available at different locations under the same unordered set of directory-tags, e.g.:

…/docs/articles/2010/…

would contain the same files as

…/docs/2010/articles/…

So there’s no real change in GUIs, people can keep their working habits with directories, but the filesystem will help with organising files so that files are easier to access through different paths.

(Yes I know about hardlinks, but they’re hard and need extra work! I want my tag-links to happen transparently)

Edited 2010-11-05 00:36 UTC

2010-11-05 1:13 pm

BlueofRainbow
How about the reverse – being able to organize the tags (or selected tags) in some form of hiearchy/structure?

For the operating system, a file is simply an unique identifier of an object and the location of this objecte on the storage medium. Other necessary stuff comes with it too like access rights, general attributes, and tags but these are add-ons to ease the management of these objects.

Isn’t a directory/folder simply a list of files grouped together by some rules?

2010-11-05 3:03 pm

sorpigal
The answer is drill-down tag clouds.

In a filter box type a simple string. Below you see all tags matching that substring with accompanying folder icons. Click on a folder icon to open a window showing all files tagged with that tag and a set of “folders” showing the union of all tags applied to all files tagged with the first tag. Clicking into one of these folders brings up a similar list showing all files tagged with the first tag and the second tag, plus folders representing the union of all other tags that those files have applied. Continue to drill down in this manner until the file set becomes small enough that you can search by file name or other meta data, or until there are no relevant tags to drill in to.

This gives you arbitrary “hierarchies” which are not rigid.

2010-11-05 12:38 am

jessesmith
Tags are helpful, but they shouldn’t replace the organization which naturally comes from the current FS model. It’s pretty easy to sort files now by setting up folders in the style of

Work/Project/Sub-Category

Switching to a tag-based or DB-based file system trades one set of problems for another.
2010-11-05 12:44 am

ChoK
This sounds like he wants beagle or nepomuk or tracker. (and gnome Zeitgeist iirc)

Implementing it at FS level will require more computing power than current FS I suppose, it wouldn’t work well with mobile devices. (I’d like to be proven wrong)

Edited 2010-11-05 00:44 UTC
2010-11-05 1:10 am

Soulbender
It’s not important how the file system itself is organised. What’s important is how it is presented to the user.

There are already exiting technologies for tagging fies n Linux anyway, such as Nepomuk. It’s more important to make better use of these than it is to “abandon the hierarchal file system”.

2010-11-05 3:04 pm

sorpigal
While abandoning the hierarchical filesystem is a laughable idea and will continue to be so for the foreseeable future, adding tagging at the FS level is a good idea whose time has come. Any add-on tagging database is pointless duplication of effort. The only add-on you really want is a tag index, mostly because indexing at the FS level is (a) hard, (b) contentious (politically) c) bad for multiuser and (d) stupid long-term.

Let applications or DEs or other frameworks agree on indexing however they like and do it multiple ways if they wish. Tags should be stored directly in files and not in a separate database not everyone can agree on and not everything can access.

2010-11-05 7:43 pm

Soulbender
adding tagging at the FS level is a good idea whose time has come

I can see the benefits of this but it might be difficult in a system that is inherently multiuser since users probably want to use different tags and don’t want their tags changed by others.

Tags should be stored directly in files and not in a separate database not everyone can agree on and not everything can access.

Alternatively you could just agree on a user-space solution. Might actually be easier to agree on than fs-level tagging.

2010-11-06 8:03 pm

sorpigal
it might be difficult in a system that is inherently multiuser since users probably want to use different tags

This is solved by permissions, as usual. If a file is read only you cannot tag it. Controlling tag writing via a separate ACL might be possible in the future as well.

Alternatively you could just agree on a user-space solution. Might actually be easier to agree on than fs-level tagging.

Any system that does not store tags directly in the files is useless. The only way to get useful, universal tagging is to store it in a place we can all agree on and we can all agree on the filesystem.

2010-11-06 8:29 pm

phoenix

it might be difficult in a system that is inherently multiuser since users probably want to use different tags

This is solved by permissions, as usual. If a file is read only you cannot tag it. Controlling tag writing via a separate ACL might be possible in the future as well.

That’s completely useless, then. Afterall, if I have read-only access to a file, I can copy the file to anywhere I want. Thus, why shouldn’t I be able to tag it as well?

Alternatively you could just agree on a user-space solution. Might actually be easier to agree on than fs-level tagging.

Any system that does not store tags directly in the files is useless. The only way to get useful, universal tagging is to store it in a place we can all agree on and we can all agree on the filesystem.

Good luck with that. How long has the debate been going on about Extended Attributes, where to store them, and how to access them?

Edited 2010-11-06 20:29 UTC
2010-11-06 8:41 pm

sorpigal
if I have read-only access to a file, I can copy the file to anywhere I want. Thus, why shouldn’t I be able to tag it as well?

You should be able to copy it to anywhere and then tag the copy. This is just common sense: I don’t want your tags on my file. I can see the argument that says that tags should be an overlay describing a particular person only, but the fact that you lose those when transferring the file to someone else makes it almost worthless.

EDIT: I would just like to add: Consider tags on files as they exist today. JPEGs, MP3s, hell even MS Word supports a kind of tagging for .doc files. These tags are per-file and cannot be adjusted without write access, yet they are undeniably useful. The only problem with them is that for each new file type you have to learn a new tagging system, which means that any software dealing with tagging becomes enormously complex. I’m suggesting that tagging in files is established and accepted and works; all we need is a universal system for tagging in files.

Good luck with that. How long has the debate been going on about Extended Attributes, where to store them, and how to access them?

It’s been a long time but finally someone who isn’t a security researcher has a good use for them.

Edited 2010-11-06 20:44 UTC

2010-11-05 4:25 am

Liquidator
Look at tagging in email. It’s horrible, I have a great number of tags at the same level, and I’m never able to find my archived messages. I definitely don’t want that for my file system!

2010-11-08 4:36 pm

richip
The reason tagging in email fails is because it’s presented in the wrong way. If you take all the email applications that present mail folders in a pane, take out that pane and instead put in a pane containing hierarchical tags, then you can have users tagging their email by dragging the email into the tag (instead of the tag onto the email).

2010-11-08 9:41 pm

phoenix
Whoa! You’ve just described GMail. lol

2010-11-05 6:25 am

nt_jerkface
Too many programs are built around the current system.
2010-11-05 6:51 am

Neolander
-Tags make file organization a nightmare once you use too much. But it takes a lot of mastery to only use a few tags. So I’m not sure that for the average guy, this would really be an improvement.

-Tagging files at hand is a very lengthy process compared with giving a name and creating a directory hierarchy. Automatically tagging files is doable, but error-prone once it gets a bit fine-grained.

-Tag discoverability is poor once there are many tags around, so it’s more of a companion for search than a companion for hierarchy.

-More over, too many people and programs are used to hierarchical storage for such a breakthrough change in the way files are organized.

So in my opinion, a better idea would be to offer an automatically-generated hierarchy in order to make search functionalities more discoverable.

Say, if I look for music, I go in Search/Music. That folder full of symlinks is automatically updated by the indexing service, with the hierarchies I like (Artist/Album/Title, Genre/Artist/Title…). If I look for documents, I go in Search/Documents and can search by date of last modification, document type (PDF, Word processing, Slides…), and so on.

Current search functionalities only work properly when you know the name of the file by heart. In my opinion, a search functionality which allows finding a long-lost file through a more thematic search would be more interesting.

Edited 2010-11-05 06:59 UTC

2010-11-05 11:27 am

phoudoin
Say, if I look for music, I go in Search/Music. That folder full of symlinks is automatically updated by the indexing service, with the hierarchies I like (Artist/Album/Title, Genre/Artist/Title…). If I look for documents, I go in Search/Documents and can search by date of last modification, document type (PDF, Word processing, Slides…), and so on.

It’s called live queries in BeOS and Haiku, and work on automatic mime type discovery and indexing.

So… see you soon under Haiku?

😉

2010-11-05 6:47 pm

Neolander
It’s called live queries in BeOS and Haiku, and work on automatic mime type discovery and indexing.

Interesting, should have a look at this…

So… see you soon under Haiku?

😉

Not soon, but maybe later. Tried it, and found it as boring to use as it is exciting technologically-speaking ^^

2010-11-05 5:14 pm

phoenix
So in my opinion, a better idea would be to offer an automatically-generated hierarchy in order to make search functionalities more discoverable.

Yet another way BeOS was released to world before its time. This was implemented in the file manager and the filesystem. Supposedly, it worked well (never used BeOS myself).

This is something I really like about Zimbra. You can save a search anywhere in your folder tree, and it will update the results every time you “open” the “folder”. Works quite nicely, with a barely noticeable 1s lag.

Say, if I look for music, I go in Search/Music. That folder full of symlinks is automatically updated

Why symlinks? Why not just show the files themselves?

2010-11-05 6:49 pm

Neolander
Why symlinks? Why not just show the files themselves?

Because file indexer only has to create a lot of symlinks in the folder once, instead of having to do some database query every time you open the folder.

2010-11-05 7:26 pm

phoenix
Yes, but then the “file indexer” has to continually check that the symlinks are valid, and has to continually recreate them. And, anyone accessing the GUI will see the same icon (link) for all files, regardless of the type of file. And, if you rename the “file” in the search area, it doesn’t rename the actual file. Plus, each symlink is a 0-byte file using up an inode, so each search you create can potentially run your system out of inodes, leading to “disk full” errors when you are using 10% of your disk.

Using symlinks is a band-aid that would be worse than the cut it covers.

2010-11-06 11:44 am

bogomipz
You could also use hard links, which would be a technically better solution.

Yes, but then the “file indexer” has to continually check that the symlinks are valid, and has to continually recreate them.

Yes, the indexer must run continuously to create the links. As Neolander said, this is not all bad because it means the work is then done only once, and the “search” is zero cost.

And, anyone accessing the GUI will see the same icon (link) for all files, regardless of the type of file.

Not true. Most X11 file managers will show the correct file icon, but with a symlink overlay (a small arrow). Some let you turn off the overlay if you wish. Use hard links rather than symlinks, and this issue is gone for sure.

And, if you rename the “file” in the search area, it doesn’t rename the actual file.

Aha, yes because you want the file to exist in multiple folders, not to have multiple names.

Like I mentioned in another post, one usability issue with hard links that need to be fixed is that you must traverse the directory tree to find links to the same file. If finding the links to an inode was fast, the file manager could easily let you 1) remove all hard links to a file 2) rename all hard links when you rename one of them.

Plus, each symlink is a 0-byte file using up an inode, so each search you create can potentially run your system out of inodes, leading to “disk full” errors when you are using 10% of your disk.

Not so with hard links.

2010-11-06 12:06 pm

bogomipz

So in my opinion, a better idea would be to offer an automatically-generated hierarchy in order to make search functionalities more discoverable.

Yet another way BeOS was released to world before its time. This was implemented in the file manager and the filesystem.

Just to make this clear; BFS automatically indexes certain attributes to speed up searching, it does not automatically populate those attributes. With Neolander’s suggestion, creating a link is like adding a tag, not indexing an existing tag. If/when Haiku will automatically fill attributes, this too will be by a continuously running userland process.

This is something I really like about Zimbra. You can save a search anywhere in your folder tree, and it will update the results every time you “open” the “folder”.

Saved queries in BeOS/Haiku work this way too. Tracker makes them behave more or less like folders. You can for instance make a search for program files and put it in your Leaf menu folder (Haiku-speak for start menu), or anywhere else for that matter, to have a dynamic application launcher.

Edited 2010-11-06 12:10 UTC

2010-11-05 7:16 am

pica
Look at IBM OS/400 aka i5/OS. The persitent storage of these systems is a relational database. You can even add you own tagging mechanisms simply be defining own relations between system tables and your own tables.

For sake of UNIX / POSIX (conformance i5/OS also features a hierarchical Filesystem on top of the rDBMS.

Yes, it could be done. But if it is done, Linux based OSes would not be unixoid anymore.

pica
2010-11-05 9:25 am

ddc_
This may be easily without abandoning the directories done with a FUSE module representing Your tags as recursive direcotry layout, so given a file “somefile”, tagged “tag1”, “tag2” and “tag3”, You could access it as:

~/tags/tag1/tag2/tag3/somefile

~/tags/tag3/tag2/tag1/somefile

~/tags/tag2/tag1/tag3/somefile

etc.

Taging files means:

mv somefile ~/tags/tag1/tag2/tag3/

Adding tag:

mv ~/tags/tag1/tag2/tag3/somefile ~/tags/tag1/tag2/tag3/tag4/

Removing tag:

mv ~/tags/tag1/tag2/tag3/somefile ~/tags/tag1/tag2/

The file manager doesn’t even have to be aware of the FUSE module, it may just follow the directives from “~/.tags/tagN/.directory”.

That should be relatively easy. It could be even accomplished (with some overhead) just with a set of symlinks and a simple daemon.

Edited 2010-11-05 09:29 UTC

2010-11-05 4:33 pm

sorpigal
Tags as directory hierarchies is a good visualization and organization method, but hiding this from apps only kind of works. Ultimately apps need to know what tags are.
2010-11-06 12:22 pm

bogomipz
What happens when a program tries to save a file in the tags hierarchy? Either because “Save as…” defaults to the same place you opened a file from, or becuase it creates backup files?

2010-11-05 9:27 am

zimbatm
With fuse [1] you could easily create a virtual filesystem, where directories are tags. That would let you try out that idea without changing the rest of the filesystem.

Say you mount that filesystem in ~/Tags. A file with two tags “A” and “B” could be found in ~/Tags/A/B/ ~/Tags/B/A/ ~/Tags/A ~/Tags/B and in it’s original location. One difficulty would be that filenames can’t have the same name as any other tag. Another-one would be that after some time, ~/Tags would contain a huge list of tags, which is not really handy.

[1] http://fuse.sourceforge.net/

2010-11-06 1:09 pm

bogomipz
One difficulty would be that filenames can’t have the same name as any other tag.

Ah of course, thanks for bringing this up. Presenting equally named files as nodes in the same virtual directory doesn’t work very well.

Fixing this involves changing the file manager to show a different name than what the file system presents. Like when I delete two files called test.txt in Thunar, and then go to the Trash folder – it looks like the folder contains two equally named files, but in reality one of them is called test.txt$1. The illusion works as long as you restore the file before opening it. If you directly open the second file from Trash in a program that doesn’t know this convention, you see the name as stored in the file system.

2010-11-06 2:07 pm

zimbatm
Fixing this involves changing the file manager to show a different name than what the file system presents. Like when I delete two files called test.txt in Thunar, and then go to the Trash folder – it looks like the folder contains two equally named files, but in reality one of them is called test.txt$1. The illusion works as long as you restore the file before opening it. If you directly open the second file from Trash in a program that doesn’t know this convention, you see the name as stored in the file system.

I also think it’s reasonable to involve the file manager and the “File Open” methods of the various toolkits, since we’re talking about a surface-level feature. You can see similar problems with localized user folders, where they still are written in English when using the console or in old GUI programs. Yet another example are the netmounts in gnome, which are located in ~/.gvfs in your home folder. Unfortunately I don’t know of any good solution to hide the implementation details without breaking legacy softwares.

2010-11-05 10:03 am

henderson101
Until AA (or maybe DR 9, I forget) BeOS had a filesystem completely based around database concepts. It universally sucked and was fairly unpopular towards the end of its life(that might have been implementation or whatever, but it regularly required rebuilding indexes and stuff, even to the point where the BootROM had a “REBUILD INDEXES” option.)

The OFS (as BeOS calls it) was all about tagging and not having specific folders and what have you (though it obviously presented a hierarchical file system, I seem to recall that was more a convenience than anything else.)

The BFS we have today was a “compromise” between database like features (extended attributes, query based searches) and trad file systems.

The problem with querying the “database” is that at any one time you are unsure what state the “entries” are in. A trad file system gains speed because you only need to look in a specific location for the directory list (however that has been implemented, as it does vary across various FS.) With a database like approach (even with extended attributes like with BFS) you get a delay, whilst the tagged files are located. Trad file systems work well because there is a very atomic relationship to the physical data and actual structure. To graft something similar on to a FS and keep all of the metadata up to date (therefore, skip what BFS does, and keep the indexed data fresh so that the lookups are fast) causes dreadful synchronisation issues (as Microsoft found when they first decided to graft SQL Server on to a file system. Performance also becomes slower as the data structures containing the file data expand – plus, with attribute based indexing, an index can be added at any time which might cause a complete re-index.

I dunno, I’m not a FS expert, but I’ve read Domonic Giampaolo’s BFS book (and in fact, just spelled his name correctly be reading it off the spine of the book) and I’d recommend anyone interested in this takes a look at it. http://www.nobius.org/~dbg/practical-file-system-design.pdf

2010-11-05 2:38 pm

paul_m
Your experience was quite different from mine.

When ever a file was saved in a directory, it automatically added the function of inheritance to the file, keying the additional information set for that directory.

Any files added could be searched using those keys.

If you where searching and added a file during the search, the search recognized that file almost immediately.

I did not test the endurance of the system as you may have, but found it very easy to use.

And as a plus, I don’t think I ever heard the hard drive make a noise. It was the quietest file system I’ve ever used.

My only regret was that the file system was not universalized and aggregated over a network of multiple machines during a search.

2010-11-05 11:04 am

Karitku
Old beards used to call it metadata. But I agree, directory tree system should die already. Content Management Systems have based on metadata on years, yet this model seems to be too diffucult for most users.
2010-11-05 12:23 pm

userw014
I’ve not used BEOS or IBM’s OS/400 – although I have used other non-*nix/non-Windows systems with intriuging file systems.

I don’t think that something as nebulously defined as “tags” have been in this discussion are useful for servers or program/application internals. (Programs generally need to find things in deterministic ways.)

It seems to me that the idea behind “tags” is to be able to organize the same information in multiple different ways. Directories only allow you to organize the information one way, although you could get creative with symlinks (or even hard links) and multiple directory structures to satisfy your organizational needs.

In any event, the big problem is developing a human interface for this. I don’t think that the filesystem should be the interface – and I’m not sure how to reconcile the different “spaces” where tagging might be useful.

I would think that tags for music are likely to need to work differently from tagging for e-mail, human resources, or sales & marketing – but there might also be commonalities to these different activities.

From a programmers perspective, I’d rather like to see a “virtual tagging api” (like the virtual file system api) that applications that do tagging could adopt and THEN see whether there needs to be support in filesystems.

2010-11-05 2:36 pm

pica
why is a tag non-deterministic. An unique tag — or call it ID — is as deterministic as an unique path.

BTW, the IBM /400 or iSeries is a server system.

pica

2010-11-05 1:17 pm

Drunkula
I don’t know about you but I would not eliminate the hierarchical structure. I certainly wouldn’t want to manually tag thousands of files. Tagging could be an additional layer. But not a replacement IMHO.
2010-11-05 4:46 pm

sorpigal
Throwing away the hierarchy is impractical and not provably a good idea. Instead what we need to do is add tags to files and extend base tools to understand tags, then allow users to use them and indexers to index them and not try to force users to change behavior or force all apps to be rewritten around some kind of theoretically-great but unproven paradigm. If in 10 or 15 years such a change seems natural it can be made more easily then.

First things first: modern filesystems have support for arbitrary metadata. To the extent that they don’t we need to add such support. This support takes the form of ‘extended attributes’ on many filesystems. Begin by establishing an agreement among filesystems and tools on how to look for tags in extended attributes, a kind of xattr tag spec. Once we all agree on what they look like the rest is easier. Now you can add tags, just as you can with tracker or nepomuk, but you let the FS worry about storing the data, something it’s good at, and only index it if and when that makes sense. In addition you can index it multiple ways and not all apps have to agree on or know about the index. Once we have a spec and basic file utils don’t destroy the metadata, which is mostly true for xattrs in general now, start adding a simple UI for editing file tags to file save dialogs and filemanagers. Tagging should rarely be automatic, just as choosing the save file name is rarely automatic. Suggested, perhaps, but always chosen by the user in the end.
2010-11-05 10:24 pm

cycoj
Someone above already mentioned FUSE. There’s actually already a number of tagging filesystems based on fuse, I can’t believe nobody has mentioned any yet.

http://www.tagsistant.net/

http://code.google.com/p/tagfilesystem/

http://pages.stern.nyu.edu/~marriaga/software/oyepa/
2010-11-06 5:38 pm

vivainio
Tracker / Nepomuk was already mentioned, but just by one guy. That, if anything, is the way tagging will emerge in Linux desktop.

Applications need to start supporting those technologies sooner rather than later. This just needs to be championed by distro.
2010-11-06 5:53 pm

Verenkeitin
Just look at id3 tags in any collection of mp3 files you haven’t personally spend countless hours (anally) tagging.

What you find is: Missing tags, wrong tags, semantically identical tags that are too different to process without resorting to fuzzy logic (e.g., artist name; first-last, or last-first) etc..

For another example, take a look at the metadata embedded in any document you happen to have laying around. It will be full of garbage.

The unwashed masses are not able to give a single consistent and meaningful name to any of their files (not to mention hierarchical organisation). There is no way they would be any better in using anything as complex as tagging. Even if you are not one of these dolts, I guarantee you are too lazy to manage your tags and would end up with a colossal mess.

As an engineer with usability background I am glad there is no chance of tagging ever becoming more than a sideshow in file organisation.
2010-11-06 9:15 pm

KClowers
>Could Linux Abandon Directories in Favour of Tagging?

no
2010-11-07 11:34 am

Icaria
It’s called ln -s
2010-11-08 8:39 am

deathshadow
or not… since I can remember having said functionality some thirty years ago on my Trash-80 Model 1… and under windblows I still have said functionality.

Excuse me as I commit a blasphemy, but they are called file extensions.

Literally, all you’re talking about is adding a field to a file to say what it is. Well guess what… That’s what a file extension does. You know, those things that Posix based OS don’t natively support (even if the most useful software for *nix OS, Apache DOES to assign mime-types, at which point why the **** do we need mime-types again?) and for a decade OS makers have gone out of their way to try and hide. (inviting those oh so wonderful .jpg.vbs files in the door)

The only ‘improvement’ could be allowing more than one of them per file… and if you’re going to allow more than one of them per file then that’s no different than any other filesystem metadata, which the majority of people cannot be bothered to do anything with.

Take images for example, in most cases you’re lucky to get anything more meaningful in a filename than SCC0168275.jpg and if exif data is present it’s because the camera set it automatically (so the internal date is usually sometime in January 1980)…

Same for word documents where if you’re lucky the writer may have filled out “noydb” as the name and “eff yew” as the organization. I still remember about ten years ago a secretary sitting down to use a fresh Word install, and yelling out in the office “Lands sake what I need to fill out all this crap for? Can’t I just type a letter?!?”

Metadata – bah… If people can’t be bothered to organize their files into directory — a simple matter of wildcard copy or drag and drop in any OS — What would make anyone actually think they’d take the time to fill out all that extra stuff either.

But what do I know — I’m the nutjob who turns off all that database driven ‘search indexing’ crap as annoyingly slow and forces the OS to show me file extensions, and ALWAYS wants to see ‘detail’ view with an actually filesystem TREE on the right.

Spatial navigation, plasmoids (or whatever the hell KDE calls them), icon/thumbnail views, “search indexing” that slow the computer down worse than win 3.1 on a 286. Useless **** garbage.

But again there’s a reason I consider Win98 to be the pinnacle of UI design and everything since to be the slow slide down to oblivion; Though at least Windows still lets me set things back to being useful.

Of course this ‘amazing new idea’ is talked about in *nix terms, where of course 1980’s technology is considered innovative. As I’ve said many times, chalk it up to the back-room Unix server geeks being left out of the REAL computer revolution of the 80’s and early 90’s.
2010-11-08 4:29 pm

richip
I certainly hope so! I was thinking this was next to impossible as well, but given that there’s a forum to discuss the possibility, I’m starting to look hopeful again.

Just be sure that the tags themselves can be made to be hierarchical (e.g. tag: work -> process date -> 2010-10-10, etc.)
2010-11-08 4:50 pm

labeltop
Hi, Until the kernel filesystem has support for that. You can use my software for free, LabelTop v2

http://labeltopv2.appspot.com/

Hasan
2010-11-08 6:06 pm

richip
After giving it some thought, I’m starting to wonder if Attributes might not be better to associate with file objects rather than tags. Attributes give tags some context. Attributes are key-value pairs (where the value could be some hierarchical tag. E.g. recipe -> soup -> clear). Forget filenames, but do have a way to create a filename for when transferring to legacy systems.