Linked by Thom Holwerda on Mon 2nd Nov 2009 23:20 UTC
Sun Solaris, OpenSolaris ZFS has received built-in deduplication. "Deduplication is the process of eliminating duplicate copies of data. Dedup is generally either file-level, block-level, or byte-level. Chunks of data - files, blocks, or byte ranges - are checksummed using some hash function that uniquely identifies data with very high probability. Chunks of data are remembered in a table of some sort that maps the data's checksum to its storage location and reference count. When you store another copy of existing data, instead of allocating new space on disk, the dedup code just increments the reference count on the existing data. When data is highly replicated, which is typical of backup servers, virtual machine images, and source code repositories, deduplication can reduce space consumption not just by percentages, but by multiples."
Thread beginning with comment 392782
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[6]: I skimmed the article...
by Kebabbert on Wed 4th Nov 2009 12:02 UTC in reply to "RE[5]: I skimmed the article..."
Kebabbert
Member since:
2007-07-27

ZFS dedupes at block level, not byte level.

For synchronous, I think you can just move your old ZFS data to a new ZFS filesystem which has dedup=on, and then your data will be deduped. Or, move your data off ZFS and move back to a dedup=on ZFS filesystem.

For asynchronous, Jeff bonwick (ZFS architect) says it is needed if you dont have enough CPU and RAM. Then you can dedupe in the night, when no one uses the server. This functionality is needed for legacy hardware. But todays modern hardware, CPU and RAM will be enough. And it will only be better in the future. Hence, asynchronous dedupe is not important with modern hardware. It's role will diminish. Why focus on something legacy? ZFS is top modern and state of the art. No need for asynch. ZFS can dedupe in real time, it requires not that many CPU cycles.

Edited 2009-11-04 12:03 UTC

Reply Parent Bookmark Score: 2

Laurence Member since:
2007-03-26

ZFS dedupes at block level, not byte level.

Sorry, you're right.

I thought the article stated that you could choose which of the 3 levels of dedup you wanted ZFS to perform, but it wasn't. It was just detailing the theory of dedup and the levels around.

Reply Parent Bookmark Score: 2