To view parent comment, click here.
To read all comments associated with this story, please click here.
diff <(head -c 100000 file1.avi) <(head -c 100000 file2.avi)
This will compare the first 100000 bytes of file1.avi and file2.avi.
Unfortunately I'm moving house in a couple of days so my DVDs are packed away - thus I can't rip and diff. (though if anyone else is able to perform this test then i'd love to see the results)
....So I'm going to have to take your word on the differences being there.
However, (and going back to deduping for a moment) if understood the article properly, then the credits don't have to by byte for byte exact as the dedup looks at the bytes themselves rather than the whole MB block of bytes.
Thus there only has to be enough similar grouped bytes for a space saving to occure.
So unless MPEG compression uses some kind of random hash to encode it's data, then surely the very fact that the A/V is the same (timestamp or not) must mean that there are SOME similar bytes that can be grouped and indexed?
ZFS dedupes at block level, not byte level.
For synchronous, I think you can just move your old ZFS data to a new ZFS filesystem which has dedup=on, and then your data will be deduped. Or, move your data off ZFS and move back to a dedup=on ZFS filesystem.
For asynchronous, Jeff bonwick (ZFS architect) says it is needed if you dont have enough CPU and RAM. Then you can dedupe in the night, when no one uses the server. This functionality is needed for legacy hardware. But todays modern hardware, CPU and RAM will be enough. And it will only be better in the future. Hence, asynchronous dedupe is not important with modern hardware. It's role will diminish. Why focus on something legacy? ZFS is top modern and state of the art. No need for asynch. ZFS can dedupe in real time, it requires not that many CPU cycles.
Edited 2009-11-04 12:03 UTC





Member since:
2009-04-22
Well, why don't you just try it for yourself?
diff <(head -c 100000 file1.avi) <(head -c 100000 file2.avi)
This will compare the first 100000 bytes of file1.avi and file2.avi.