To view parent comment, click here.
To read all comments associated with this story, please click here.
Of course it is bigger. It is pretty much always going to be bigger. But the point is that it is dramatically more compressible.
Executable binaries give something like zlib very little to work with. Zlib (and almost any other compression algorithm) does its magic by finding patterns in the data. Binaries have very little in the way of patterns. The assembly code of said binary, however, is chock block full of patterns.
I'm not saying that it would definitely make exe files smaller, in fact I doubt it would be worth the trouble - but it isn't outside the realm of possibility. The odds of it being a net gain increase with executable size - hence why I qualified that it would only be appropriate (possibly) for very large binaries.
Well, I just went take a look at the source code:
http://src.chromium.org/viewvc/chrome/trunk/src/courgette/
Frankly, my idea that this might be good as a generic exe packer are probably not valid. I only briefly glanced over the code, and frankly alot of it is WAY over my head, but the secret sauce appears to be what Google calls the "adjustment" step, and that step is only of any consequence for production of a delta - which is of no real consequence for a packer.
The disassembler is designed to produce a representation of the data in a format that is optimal for their adjustment step. It isn't _really_ a disassembler - its much more primitive than that, but it does in effect generate something akin to an instruction stream and a symbol table.
There is some pretty cool comments in there though that help explain things a bit. Neat stuff.
Close, but no cigar. A major problem is not patterns in the code, but rather function addresses. If you add even one byte to the beginning of a file, all addresses beyond that point are now shifted by one byte. Think of all the functions statically linked which are called by address. Every single call now has a new address. That's a lot of cruft to send over a wire. If you replace function addresses with function symbols, most of them are not going to change. Less info to send that way.






Member since:
2006-06-12
Not really, you cant make miracles
mov ax,bx (72 bits)
add bx,0xF4D4 (104 bits)
(sorry, I never actually wrote assembly before so I know it is wrong) is bigger than the binary counterpart, let say
F4 D3 9A 73 (64 bits)
F5 D4 9B 74 (64 bits)
so for the complete binary, the assembly code may be more compressible, but it is also slightly bigger. So in my opinion, you make no gain, it is worst.
-I did not test-
Edited 2009-07-17 04:18 UTC