Linked by Thom Holwerda on Mon 30th Jan 2012 20:39 UTC
General Unix Finally something really interesting to talk about. If you've used UNIX or any of its derivatives, you've probably wondered why there's /bin, /sbin, /usr/bin, /usr/sbin in the file system. You may even have a rationalisation for the existence of each and every one of these directories. The thing is, though - all these rationalisations were thought up after these directories were created. As it turns out, the real reasoning is pretty damn straightforward.
Thread beginning with comment 505510
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[2]: We are stuck in the past.
by axilmar on Wed 1st Feb 2012 19:21 UTC in reply to "RE: We are stuck in the past."
axilmar
Member since:
2006-03-20

Databases are slow.


They are not.

And a filesystem is a database


It's not. Files are unstructured binary blobs. There is no way to query what's inside them.

optimized for data access in large units, many GBs in size.


So can databases handle TBs of data, not only GBs.

Databases are optimized for data access in tiny units, such as strings, or single numbers. They aren't good at huge units in hundreds of Mbytes.


They are.

Reply Parent Score: 2

jabjoe Member since:
2009-05-06

It's not. Files are unstructured binary blobs. There is no way to query what's inside them.


What info exactly do you expect to be able to query from a jpeg or mp3? Only meta-data will have anything meaningful for a query, and yes, that sort of data, makes sense in a database and often is put there. But its also embedded in the file as its tiny and that ensures it stays with the file.

You could view the filesystem as a database where the path system is the primary index. Find and Grep can be used, with others, to query. Ok, it's the command line not SQL, but I won't be surprised if someone has written something to do it with SQL. Indexing scales better than sequential scan and there are things to do exactly that for your files. But they are all in userland. The kernel need only provide the basics, the primary key and the data.


They are.

It's the wrong tool for the job. You don't store that in a database, you store it in a file on a filestore. Many database just aren't design for storing GBs in a column entry for a row. It's just not what they are for.

Reply Parent Score: 2

axilmar Member since:
2006-03-20

What info exactly do you expect to be able to query from a jpeg or mp3?


Size. Author. Title. Date. Compression rate. Encoding rate. Decoding rate. Etc. There are many other attributes to query.

and yes, that sort of data, makes sense in a database and often is put there.


But it is put there by specialized software. Querying for metadata is not a standard feature of most filesystems, as is, let's say, the POSIX file interface.

You could view the filesystem as a database where the path system is the primary index.


But it is not relational.

Find and Grep can be used, with others, to query.


These tools fail to return structured data, especially from non-text formats.

The kernel need only provide the basics, the primary key and the data.


I never said anything about kernels.

It's the wrong tool for the job.


Nope, it's the right tool for the job. The various development problems we are having today are due to the lack of databases in a large degree.

Many database just aren't design for storing GBs in a column entry for a row. It's just not what they are for.


These GBs that you speak of would be broken down to their individual parts, if stored in a database, and they will be indexable, and queriable, discoverable by any program, they would support transactions, and they would allow programs to be notified of changes in the data store. All these capabilities are absent, more or less, from today's data storage systems.

Reply Parent Score: 2

JPollard Member since:
2011-12-31

I've seen lots of databases, from oracle, MySQL, MS, Sybase...

None of them are any where near as fast as a filesystem.

Try searching blobs for information... VERY slow.

Try locating a blob given just a short name... nope. not gonna find it.

Try searching for all files of that name... Fairly quick at that... depending on how many indexes it has to go through.

Try maintaining metadata (acls, permissions, ownerships..) possible.. but try searching- REALLY slow.

How long does it take to recover? databases have to replay their journals.. can take hours for a database of a couple of GB. Especially if it is updated continuously.

Databases have their place. They are very good at non-structured small units of data. Relational database suck at structured data though - they have to constantly rebuild the structure. SLOW.

Filesystems have been tried in databases (look at sqlfs for one). They can work. But they are really slow.

Reply Parent Score: 1