Linked by Thom Holwerda on Mon 2nd Nov 2009 23:20 UTC
Sun Solaris, OpenSolaris ZFS has received built-in deduplication. "Deduplication is the process of eliminating duplicate copies of data. Dedup is generally either file-level, block-level, or byte-level. Chunks of data - files, blocks, or byte ranges - are checksummed using some hash function that uniquely identifies data with very high probability. Chunks of data are remembered in a table of some sort that maps the data's checksum to its storage location and reference count. When you store another copy of existing data, instead of allocating new space on disk, the dedup code just increments the reference count on the existing data. When data is highly replicated, which is typical of backup servers, virtual machine images, and source code repositories, deduplication can reduce space consumption not just by percentages, but by multiples."
Permalink for comment 392473
To read all comments associated with this story, please click here.
RE: I skimmed the article...
by Laurence on Tue 3rd Nov 2009 10:53 UTC in reply to "I skimmed the article..."
Laurence
Member since:
2007-03-26

...so I probably missed this but I assume this feature is meant for servers, hopefully with (is this irony?) data replication (mirror set, backups) - what happens if the disk sustains physical damage? That could affect (theoretically - unless I misunderstand the concept) a lot more data than it would have without the dedup feature?


This feature will be better suited for servers than home PCs, but that doesn't mean that such facility couldn't be useful for some home users:

HTPC / media servers: if you have lot's of DVD rips of TV shows, then you could save several hundred MBs with the intro/outro credits being duduped alone.

media professionals: Granted ZFS isn't coming to OS X now, but if you're a media professional (music, graphics, etc) and want to keep back ups of your projects then you may well have several files with similar contents as the art took shape (much like backed up lines of code in a CVS repository)


that all said, I'd be a touch cautious about jumping in and dedup'ing your file system on consumer grade hardware unless you were confident with your hardware and I'd still recommend weekly scrubs to highlight data degradation before it rots your data completely

Reply Parent Bookmark Score: 2