This article explains how WinFS defines types of objects that can be stored and examines the WinFS API. Using WinFS to store data in a cohesive data store, not just a file system, systems and users can store rich metadata about a myriad of objects.
A Developer’s Perspective on WinFS
2004-03-30 Windows 29 Comments
maybe i just ‘dont get it’ but all i see is extra info stored about a file, on NTFS. so in other words, a new, slower, layer on top of NTFS. . . how is this moving foward? Guess that will make us have to get the newest intel gizmo..
Last I heard, that is exactly what WinFS is, NTFS with some type of SQL style database for the journaling. It seems like a bad idea as NTFS isn’t the greatest performer on its own and adding another layer would only make things worse, but I’m sure they have don’t a little work under the API to make it better, they certainly have the time to do it. On the plus side, perhaps the linux kernel developers wont have to start from scratch when they try and reverse engineer this one.
It doesn’t seem like adding another layer on an already fragmented file system is not the smartest thing in the world to do. So, either we’re missing something or MS is really dumb
I think that you are just missing something.
At least they are going in sort of the right direction. Files as plain character streams are, to put it bluntly, crap. Fine in the early days of computing, a bit inadequate today. That’s why there are millions of incompatible file formats. The data is open to interpretation instead of providing some guidance as to what byte means what. XML is a non-solution to this problem simply because it is *SO* unweidly and inefficient in speed *and* size. But it’s the same idea.
I guess building on top of NTFS saves them the time re-writing block allocation algorithms and such. And also saves their busines customers the hassle of converting filesystems.
I’m afraid he’s not missing something. NTFS with WinFS over it is going to be fast to search through and NTFS-slow overall. Not to mention wasteful of resources, which I admit are increased on modern hardware, but will happily counter with the fact that the hard drives are now the biggest obstacles to computing I/O.
NTFS itself is now an old, relatively decent but by no means groundbreaking file system, nor is it particularly fast or fragment-proof. Whereas a number of file system choices are available to other operating systems which are considerably better. Now ReiserFS4 is moving up to a release date any day now, with features, speed; reliability (atomicity!) that Microsoft can only dream of… and yet Microsoft is acting to make NTFS even slower, less reliable and more resource intensive!
But I digress.
XML is a plain simple format. combined with a good implementation like libxml2 it can be really fast. however that has nothing to do with winfs.
winfs is a layer above ntfs powered by a ms sql database which means that you will get sql queries over filesystems. sometimes this can be useful but generally ntfs is not as efficient as a fs can be and adding a layer has made is slow as seen in the builds but its too early to say how good it will be.
While I don’t have hard facts, I doubt the version of NTFS that WinFS is layered upon is the vanilla version used today in Windows XP and later. I’m pretty sure NTFS will be extensively modified and reengineered to provide WinFS’ functionality.
Remember that WinFS won’t index all the hard disks and files by default. When installed, it will index My Documents and its sub-folders, since that’s where the documents are (or should be). Of course, additional WinFS-stores can be added, if the files are in another partition/folder.
There wouldn’t be much point to index and add metadata to system files as it would only make the system slower and not much useful information is stored there anyway.
people still have to fit in their data into predefined stuff defined by the filesystem (instead directories, they now use objects). What is so exciting about it?
WinFS seams to be a cludge. By the time it is released into real life products we can expect that the Linux world have built similar things on top of Reiser4.
> XML is a non-solution to this problem simply
> because it is *SO* unweidly and inefficient
> in speed *and* size.
Too bad the computer industry tends to forget good approaches if they only happened a bit too early. IFF (Interchange File Format) had some nice approaches, is still at the core of several formats used today (AIFF, for example, and PNG also borrowed from the concept). It would require some touching-up to be fit for 64-bit data and streams, but would solve many problems for which many parties are trying to re-invent the wheel.
But then again, IFF is an open standard, and we wouldn’t want that anyway, would we? :-
Looks like this WinFS is suffering from the notorious Second System Syndrome – namely, when a (large) company needs to roll out a successor to a successful, relatively simple product, they often tend to get a bit taken away and overengineer the whole thing to hell.
The MS logic behind WinFS is the following: we have a good fs, we have a good rdbms, why not just put them together to implement one of those “revolutions” again ( == recycled ideas on stereoids). If the price in performance isn’t so bad and there is no real alternative, the strategy will work well, but I doubt many users care that much about all this pseudo-math relations-items-nested-types kind of stuff. Many will continue to pile files in a limited number of places – and if it works for them, why not?
the advantages of this system can far out weigh the performance hit if your in need of such a system, for example say your fileserver is setup with this system, and that its setup corectly, you can do searches for all doc files made between 12/03 and 12/04 that have to do with finances and it will show every file you need in a matter of seconds no matter how many files there are. yes you can setup a db to do this for you but it would require each file to be entered into the db, ware with winfs would store the data automagicly (assuming you used an app that supported winfs when makeing the file). now if you have no need for such a system then yes having winfs running would just slow you down
I would love to implement database features like live queries for linux. The biggest problem is that there is no efficient way to find out when a file has changed. I am currently trying to get something included in the kernel.
reiser4 would be a much better basis for database features than ntfs, since it is faster, more efficient for small files and allows transactions. You could do for the whole file what winFS does for the metadata.
What some people don’t understand is that unless it’s a plain File item (for instance a big uncompressed image, or a PCM file), it will go straight as BLOB into the database part of WinFS, which would be one growing data file with its associated logfile per partition. Creating a myriad of contacts, emails and what not in WinFS is not creating any seperate file objects or fragmentation onto the NTFS part. Data is supposed to be stored INTO WinFS. The indexing functionality of it is just there for backwards compatibility, if you have applications that can’t absolutely not work together with WinFS, which allows files to be accessed via UNC paths for legacy applications. And it’s not even for sure if that indexing functionality is going to stay for the final release.
I was being unclear. The indexing functionality is currently there so you can keep your files outside WinFS but still perform WinFS-like searches on it.
I don’t think they are going to use BLOBs at all. All existing files are just stored as ntfs files but with their metadata indexed in the database, they only things that are going directly into the database are really small data items like contacts, calender events, locations, groups etc. These are going to be part of the database schema not stored as blobs.
At the moment all the files copied into winFS can be found in the hidden folder under system volume informationwinfs b
As far as access speed goes it shouldn’t be any different to how ntfs currently it, you query the database bit first and get given a path to where the file is actually stored on the disc. It might take a fraction of a second to find the file in the database, but after that it just loads the way it always has.
Again the plan is to only keep user created files and records in WinFS, all the program files, system files, paging file etc will work just the same as they have always done.
It probably will require lots of memory and more disc space, but the performance shouldn’t be so different.
At the moment copying files into WinFS is really slow, but I hope that improves a fair bit in later builds
Uhm… if it uses more disk space and more memory, it’s not going to perform as well. More head seeks and data transfer, more cache thrashing, more processor time, all will mean less performance and it could be considerable.
WinFS functions in that way:
For every installed schema, it will generate a table, it’s SQL Server afterall. There will be a column for every property specified in the schema. Each column uses the data type specified in the schema (the datatypes used in schema definitions are the same as in T-SQL. Strange, huh?). A contact schema that has varchar properties for Name, Forname, etc will cause a table to be generated that contains varchar fields for each property in the schema. Storing a contact using that schema in WinFS will NOT generate a NTFS file. Never. Ever.
If your schema is supposed to accomodate binary data, you can specify a varbinary(max) field, which will cause the binary data to be stored into the database file (BLOB). Still there won’t be a NTFS file be spawned for. Ever. The drawback is that you don’t have random access on the data. If you query the field, you get the whole data back in one chunk. Same goes for storing it, you will need to pass a big chunk. Not exactly efficient on big chunks of data, but fine for e.g. up to 256kb.
SQL Server Yukon introduces an optional attribute to varbinary(max). It’s called FileStream. Defining that attribute on a varbinary(max) column WILL cause a NTFS file to be created. Only when the schema defines a varbinary(max) FileStream property, only THEN WinFS will spawn NTFS file(s) to store the binary data of the RESPECTIVE properties, all other non-FileStream properties will continue to be stored into the columns of the WinFS table attached to the schema.
The varbinary(max) FileStream columns allow WinFS to return you a .NET FileStream object, so you can perform random I/O on it. Also, as it stands currently, only standard schema definitions that require random I/O and/or large binary data to be stored will get these FileStream columns. WinFS items that store only a bit of info, or small-to-medium binary chunks that will be written once only (e.g. archiving a web page, contacts, emails, attachments, IM message history) will store their data inside the WinFS database file (including BLOBs if binary data is involved).
It can be guaranteed that this will not end document incompatibilities. MS loves them because its how they stay the dominant software company.
Is there a real need for this instant search of your hard drive on your own box? I know where my files are at. This seems to be made for someone external to snoop, ie and systems administrator or dare I say it hacks worms and maybe even MS.
Or am I way off. Seems to me I would not want a super easy way for someone new in my box to find anything I had.
The Zope Platform has a storage system that uses much of this concepts, it’s called ZODB (Zope Object Database).
I think Microsoft can be using the intelectual property of thirdies one more time.
Is there a real need for this instant search of your hard drive on your own box? I know where my files are at.
But I’d like to be able to find my stuff quickly. I’ve lots of data spread across lots of disk. But it’s also about dynamic folder structures and organization. Imagine Evolutions virtual folders, except on your disk using all possible metadata to organize itself.
I have this problem saved: I put my documents in a single backed up store and label them well.
Over 26 comments, and BeFS still wasn’t mentioned …
I notice noone has commented on the API itself, or the functionality discussed in the article. (Not meaning to bait – but thats getting sadly typical of osnews)
This actually looks like a very nice way for an app developer to handle file i/o (especially in terms of high user-file-interaction tasks like media manipulation…); it seems to be much cleaner than the BeOS storage kit API, for example (which I suppose is a design difference; WinFS seems to be a well structured top-down sort of a affair, whereas the Storage Kit was built to expose the available functionality of a very fast clever core)
Over 26 comments, and BeFS still wasn’t mentioned …
LiveQueries weren’t exactly that revolutionary, and the API was somewhat limited as changes had to be monitored volume-wide. NT4 implemented two mechanisms for monitoring changes across a portion (or all of) of a filesystem such as ReadDirectoryChangesW() and FindFirstChangeNotification(), and has since implemented a much more powerful facility than LiveQueries, change journals, in Windows Server 2003 and XP.
1) It is not intended for geeks who use find and grep every day. IF you are mostly using ASCII files and IF you know how to use find and grep well, THEN WinFS is more or less useless. However, these people are a minority. Most users, especially in Winland, use binary files (Word docs, Excel sheet, images, …) and never heard of find, grep or even UNIX. Most of them don’t even know how to properly organize files into directories. Just dump eveything onto desktop and before you know it they have a huge mess and can’t find anything.
2) What is more expensive: another gig or RAM, another 100 gig of disk space or 10 hours of my time? In 99% of companies it is much cheaper to toss in another memory module or another disk then to pay a worker 10 hours wasted while looking for files and trying to reorgnaize them for the 1000th time. Hardware is cheap, people’s time is not.
Therefore, Microsoft has correctly concluded that at this point there is plenty of extra memory/disk space to handle bit of overhead introduced by WinFS, while helping users find their files much faster even if they dump all of their files into a single directory!!!!
In effect, WinFS introduces customizable metadata that is attached to files. This metadata can be used as an alternative access mechanism that is faster/easier/more intuitive/etc.
For example, I may have thousands of images on my disk and organizing them into a proper directory hierarchy can be very difficult because there are multiple criteria (time, work vs vacation, hobbies,…)
In WinFS you can attach a number of attributes (date/time, place, people, work or hobby, happy or sad, whatever else) Then you can seek files in many different ways, whatever fits any particular requrement. Today if your query doesn’t fit the chosen directory hierarchy you are out of luck.
In fact, people try to compensate by packing a ton of attributes into file name resulting in a massive ugly file name. WinFS allows you to have a simple file name and attach all the attributes separately.
In terms of performance overhead, we’ll have to wait and see. I can’t imagine that they’ll make it mandatory for every single file in every single app. They’ll probably have config options to enable/disable as appropriate. The extra work involved shouldn’t be huge. We are talking about one or two records containing a limited number of attributes per file. Big deal!
So all the negative focus on performance overhead is far too great. The fact is that for majority of Windows users benefit will be far greater than the performance hit.