Linked by Thom Holwerda on Mon 2nd Nov 2009 23:20 UTC
Sun Solaris, OpenSolaris ZFS has received built-in deduplication. "Deduplication is the process of eliminating duplicate copies of data. Dedup is generally either file-level, block-level, or byte-level. Chunks of data - files, blocks, or byte ranges - are checksummed using some hash function that uniquely identifies data with very high probability. Chunks of data are remembered in a table of some sort that maps the data's checksum to its storage location and reference count. When you store another copy of existing data, instead of allocating new space on disk, the dedup code just increments the reference count on the existing data. When data is highly replicated, which is typical of backup servers, virtual machine images, and source code repositories, deduplication can reduce space consumption not just by percentages, but by multiples."
Permalink for comment 392470
To read all comments associated with this story, please click here.
Laurence
Member since:
2007-03-26

I'm aware that zfs compression will be compatible with deduplication, but I wonder if it will generally give improved storage capacity over simply using dedup.

The only way it would be worse is if compressing larger blocks will give fewer identical, compressed, smaller blocks. This seems unlikely although I haven't gone over the details of the compression algorithm used.


You can choose the compression algorithm to use.
A quick look on wikipedia details: LZJB and gzip as supported algorithms but I'm pretty sure theres since been a 3rd supported format.

Reply Parent Bookmark Score: 2