ZFS Gets Deduplication

Thom Holwerda 2009-11-02 Solaris 20 Comments

ZFS has received built-in deduplication. “Deduplication is the process of eliminating duplicate copies of data. Dedup is generally either file-level, block-level, or byte-level. Chunks of data – files, blocks, or byte ranges – are checksummed using some hash function that uniquely identifies data with very high probability. Chunks of data are remembered in a table of some sort that maps the data’s checksum to its storage location and reference count. When you store another copy of existing data, instead of allocating new space on disk, the dedup code just increments the reference count on the existing data. When data is highly replicated, which is typical of backup servers, virtual machine images, and source code repositories, deduplication can reduce space consumption not just by percentages, but by multiples.”

About The Author

Thom Holwerda

Follow me on Mastodon @[email protected]

20 Comments

2009-11-02 11:48 pm
diegocg
The link doesn’t work, it has extra characters.

2009-11-03 12:45 am
tobyv
Perhaps the link needs to be fscked?

2009-11-03 2:37 am
bbarker
I’m aware that zfs compression will be compatible with deduplication, but I wonder if it will generally give improved storage capacity over simply using dedup.
The only way it would be worse is if compressing larger blocks will give fewer identical, compressed, smaller blocks. This seems unlikely although I haven’t gone over the details of the compression algorithm used.

2009-11-03 10:36 am
Laurence
I’m aware that zfs compression will be compatible with deduplication, but I wonder if it will generally give improved storage capacity over simply using dedup.
The only way it would be worse is if compressing larger blocks will give fewer identical, compressed, smaller blocks. This seems unlikely although I haven’t gone over the details of the compression algorithm used.
You can choose the compression algorithm to use.
A quick look on wikipedia details: LZJB and gzip as supported algorithms but I’m pretty sure theres since been a 3rd supported format.
2009-11-03 4:18 pm
MrVain
If you compress two identical files you will have two compressed files. With dedup, you will only have one file. So compress and dedup are complementary and allows for great storage savings. For safety, you use raid functionality of some sort.

2009-11-03 6:13 am
Tuishimi
…so I probably missed this but I assume this feature is meant for servers, hopefully with (is this irony?) data replication (mirror set, backups) – what happens if the disk sustains physical damage? That could affect (theoretically – unless I misunderstand the concept) a lot more data than it would have without the dedup feature?

2009-11-03 6:35 am
bbarker
A good point, but I think most people using this in servers will have already taken appropriate (RAID, backup) precautions.
ZFS compression and dedup should be nice for home users of Solaris as well (of which I am one). Still have a ways to go before my workstation’s pool is full, even w/o dedup:
[brandon@barker]:[8]:[~]:$ zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
rpool 149G 64.0G 85.0G 42% ONLINE –
storage 6.31T 863G 5.47T 13% ONLINE –
2009-11-03 8:07 am
c0t0d0s0
That’s easy: The RAID1, RAID0, RAIDZ, RAIDZ1, RAIDZ2, RAIDZ3 takes care of ensuring the availability of your data. The deduplication takes care of the fact, that you store a block only once for example when you have a dozen similar VMware/Virtualbox images with a Windows in it for example. By the way: Dedup creates a second copy of the block when more than hundred blocks (default, it’s adjustable) are just refering to a single block due to the dedup.
2009-11-03 10:53 am
Laurence
…so I probably missed this but I assume this feature is meant for servers, hopefully with (is this irony?) data replication (mirror set, backups) – what happens if the disk sustains physical damage? That could affect (theoretically – unless I misunderstand the concept) a lot more data than it would have without the dedup feature?
This feature will be better suited for servers than home PCs, but that doesn’t mean that such facility couldn’t be useful for some home users:
HTPC / media servers: if you have lot’s of DVD rips of TV shows, then you could save several hundred MBs with the intro/outro credits being duduped alone.
media professionals: Granted ZFS isn’t coming to OS X now, but if you’re a media professional (music, graphics, etc) and want to keep back ups of your projects then you may well have several files with similar contents as the art took shape (much like backed up lines of code in a CVS repository)
that all said, I’d be a touch cautious about jumping in and dedup’ing your file system on consumer grade hardware unless you were confident with your hardware and I’d still recommend weekly scrubs to highlight data degradation before it rots your data completely

2009-11-03 2:15 pm
Tuxie
HTPC / media servers: if you have lot’s of DVD rips of TV shows, then you could save several hundred MBs with the intro/outro credits being duduped alone.
Err, no. There is no way the intro/outro scenes are going to be byte-by-byte-identical in the encoded data for different episodes even if they look identical to the eye. Even if nothing else is, the timestamp metadata for each frame is going to differ.

2009-11-03 2:31 pm
Laurence
Err, no. There is no way the intro/outro scenes are going to be byte-by-byte-identical in the encoded data for different episodes even if they look identical to the eye. Even if nothing else is, the timestamp metadata for each frame is going to differ.
I guess that depends on the codec used.
I thought many MPEG codecs didn’t have a timestamp as such and used a form of encoding that allowed an MPEG file (be it a video container file or an MP3 audio file) to be chopped in to parts at any random point and each of the parts can still play individually (much like the myth about worms ability to be chopped up and each part becoming alive)
Besides, your point is only valid for shows that have a pre-opening credits teaser rather than those (typically older) shows that always opened with music and credits.

2009-11-03 2:35 pm
Tuxie
Well, why don’t you just try it for yourself?
diff <(head -c 100000 file1.avi) <(head -c 100000 file2.avi)
This will compare the first 100000 bytes of file1.avi and file2.avi.
2009-11-04 8:39 am
Laurence
Well, why don’t you just try it for yourself?
diff <(head -c 100000 file1.avi) <(head -c 100000 file2.avi)
This will compare the first 100000 bytes of file1.avi and file2.avi.
Unfortunately I’m moving house in a couple of days so my DVDs are packed away – thus I can’t rip and diff. (though if anyone else is able to perform this test then i’d love to see the results)
….So I’m going to have to take your word on the differences being there.
However, (and going back to deduping for a moment) if understood the article properly, then the credits don’t have to by byte for byte exact as the dedup looks at the bytes themselves rather than the whole MB block of bytes.
Thus there only has to be enough similar grouped bytes for a space saving to occure.
So unless MPEG compression uses some kind of random hash to encode it’s data, then surely the very fact that the A/V is the same (timestamp or not) must mean that there are SOME similar bytes that can be grouped and indexed?
2009-11-04 12:02 pm
MrVain
ZFS dedupes at block level, not byte level.
For synchronous, I think you can just move your old ZFS data to a new ZFS filesystem which has dedup=on, and then your data will be deduped. Or, move your data off ZFS and move back to a dedup=on ZFS filesystem.
For asynchronous, Jeff bonwick (ZFS architect) says it is needed if you dont have enough CPU and RAM. Then you can dedupe in the night, when no one uses the server. This functionality is needed for legacy hardware. But todays modern hardware, CPU and RAM will be enough. And it will only be better in the future. Hence, asynchronous dedupe is not important with modern hardware. It’s role will diminish. Why focus on something legacy? ZFS is top modern and state of the art. No need for asynch. ZFS can dedupe in real time, it requires not that many CPU cycles.
Edited 2009-11-04 12:03 UTC
2009-11-04 12:35 pm
Laurence
ZFS dedupes at block level, not byte level.
Sorry, you’re right.
I thought the article stated that you could choose which of the 3 levels of dedup you wanted ZFS to perform, but it wasn’t. It was just detailing the theory of dedup and the levels around.

2009-11-03 5:52 pm
Luminair
this is the biggest news all month so it should be page 1
truly a magnificent event has occured which will forever change the course of human history
2009-11-03 6:42 pm
aaron
ZFS deduplication is synchronous….
What happens when you turn de-duplication on, for an existing ZFS pool?
I am unsure whether the existing data is de-duplicated or not.

2009-11-03 10:39 pm
dilidolo
You can backup and restore, or ZFS send and receive. I believe async is still in development.
We use NetApp and Datadomain, NetApp only has async but Datadomain only has sync. Now Datadomain is owned by NetApp, we’ll see when NetApp would have both. Hopefully SUN would beat NetApp to have both first.
2009-11-04 12:18 pm
Beket_
I would *expect* to be the same as with compression= and checksum= options.
For example if you switch your checksumming algorithm from A to B, the old files are using A and new files B. Or, if you enable compression in a dataset that already has uncompressed files, they remain uncompressed. Only newly created files are affected.

2009-11-04 3:57 pm
MrVain
Standard chksum algorithm is SHA256. Incidentally, Niagara SPARC computes SHA256 in chip hardware, achieving 41GB/sec.
You can also choose to use fletcher4, which is very fast but not cryptographically strong. Which means that there is a very low probability of yielding a collision.
With SHA256, the chance of a collision is 2^(-256) which is extremely extremely low probability. Maybe it is like 10^(-71) or so for two differing blocks to collide.
But, you can request that if there is a hash collision, ZFS must compare bit for bit. This makes dedupe totally safe against collisions.