Courgette Significantly Reduces Chromium Update Sizes

Thom Holwerda 2009-07-16 Google 33 Comments

One of the more controversial features of Google’s Chrome/Chromium web browser is the way it handles updates. Contrary to other browsers, updates on Chrome are installed silently and automatically, without asking for the user’s consent. While convenient and effective, it was also a security risk and sometimes it slowed people’s machines down. Google now proposes a fix called Courgette.

Courgette is a new differential compression algorithm which significantly reduces the size of updates to the Chrome web browser. Instead of pushing out a full new browser binary during every update (10MB), Google was using the bsdiff algorithm to reduce the size of updates. Still, Google wanted smaller sizes to increase the bandwidth available for each user downloading a new update, as well as providing a better experience for people with lower connectivity.

So they came up with Courgette, which does a lot of magic stuff that I don’t understand very well. I do understand the end result: updates are now significantly smaller; 89% smaller than the bsdiff method used before. The sizes in this chart refer to the 190.1->190.4 update in Chrome’s developer channel (version 3).

“Courgette transforms the input into an alternate form where binary diffing is more effective, does the differential compression in the transformed space, and inverts the transform to get the patched output in the original format,” Google explains, “With careful choice of the alternate format we can get substantially smaller updates.”

About The Author

Thom Holwerda

Follow me on Mastodon @[email protected]

33 Comments

2009-07-16 10:05 pm

kragil
It basically looks at the machine code of the new and the old executable. Adjusts/aligns the matching parts and then does a diff.(Not 100% correct, but you get the idea)

Simple but really genius .. just like their page rank algorithm… I am still pissed that I didn’t come up with something that obvious .. damn .. I could really use all that money.(I would spent most of it on cars, women and drugs. The rest I would squander.)

Edited 2009-07-16 22:06 UTC

2009-07-16 10:17 pm

kaiwai
It basically looks at the machine code of the new and the old executable. Adjusts/aligns the matching parts and then does a diff.(Not 100% correct, but you get the idea)

Simple but really genius .. just like their page rank algorithm… I am still pissed that I didn’t come up with something that obvious .. damn .. I could really use all that money.(I would spent most of it on cars, women and drugs. The rest I would squander.)

From what I understand, this isn’t particularly new – Linux and Windows have delta updates which are binary diffs; I remember Microsoft before Microsoft developed a technology to reduce the size of updates; where a 1MB update might have a 500K file which is only around 30K difference between the old and new file.

2009-07-16 10:35 pm

galvanash
This is quite a bit beyond a simple binary diff. The approach currently taken (afaik) is just generating a simple diff and compressing it using zip/bzip/whatever.

This is doing what amounts to a full disassembly, complete with a symbol table, prior to compression.

In spirit, this is the same approach that the png format uses to get such high compression rates. Png uses a loss-less filtering pass that converts image data into a format that is more compressible by zlib. The filtering algorithms themselves are not compressing anything – they often make the data bigger, but they create a representation of the data that zlib can compress much more than the original data, resulting in a net gain. What Google has done is essentially the same thing, the disassembly step is used as a filtering pass to make the data more compressible.
2009-07-16 11:53 pm

StephenBeDoper
Exactly – isn’t that the way that that rsync has worked for years now?

2009-07-17 12:14 am

galvanash
Read the article again. Rsync does nothing even remotely like this.

2009-07-17 12:20 am

StephenBeDoper
Read the article again. Rsync does nothing even remotely like this.

Or, perhaps, you could provide some (any) reference to, or citation of, the portion of the article which makes that detail clear? Just a thought.
2009-07-17 12:32 am

galvanash
How do we use this to generate a better diff? With bsdiff we would compute the new file, ‘update’ from the ‘original’ like this:

server:

diff = bsdiff(original, update)

transmit diff

client:

receive diff

update = bspatch(original, diff)

(The server would pre-compute diff so that it could be transmitted immediately)

Courgette transforms the program into the primitive assembly language and does the diffing at the assembly level:

server:

asm_old = disassemble(original)

asm_new = disassemble(update)

asm_new_adjusted = adjust(asm_new, asm_old)

asm_diff = bsdiff(asm_old, asm_new_adjusted)

transmit asm_diff

client:

receive asm_diff

asm_old = disassemble(original)

asm_new_adjusted = bspatch(asm_old, asm_diff)

update = assemble(asm_new_adjusted)

rsync does essentially the same thing as bsdiff.

2009-07-17 12:25 am

flynn
rsync just compares chunks by their checksums and sends over the chunks that don’t match. When it comes to binary executables the number of differences is gigantic and spread all over the file, which will effectively make rsync transmit the whole file.

2009-07-17 3:39 am

n4cer
From what I understand, this isn’t particularly new – Linux and Windows have delta updates which are binary diffs; I remember Microsoft before Microsoft developed a technology to reduce the size of updates; where a 1MB update might have a 500K file which is only around 30K difference between the old and new file.

Here’s the relevant info from MS’ Binary Delta Compression reference:

The Delta Compression API offers special handling for certain types of executable files (PE files, such as EXE or DLL files.) In particular, executables designed to run on the 32-bit Intel i386 family get special treatment. When the basis and target files are similar executables, this feature can reduce the size of the delta significantlyâ€”typically 50-70% further.

To get the maximum benefit from this feature, symbol files from the LINK process can be supplied during delta creation. This will help the creation process recognize the nature of the changes between the files. Richer symbol info, such as private symbols, will help make smaller deltas. Figure 3 illustrates this process.

Figure 3. Delta compression using symbol files

None of the symbol data is passed in the delta, and the symbol files are not needed during the delta apply process.

See Optional Data for Delta Create for more information on how symbols can be provided.

http://msdn.microsoft.com/en-us/library/ms811406.aspx

2009-07-17 7:19 am

galvanash
Interesting. What they are doing sounds very similar, but they require the user to supply the symbol tables manually. Google’s approach doesn’t require anyone to supply the symbol tables, it creates them itself directly from the machine code (at least it creates something _like_ a symbol table and uses it for the same purpose).

I don’t know which approach is more effective, but if they are close then Google’s approach would be far more convenient.

2009-07-17 7:57 am

galvanash
Also, from my understanding of the MS documentation: the Delta Compression API when used without the user supplied symbol table is probably roughly equivalent to bsdiff. It may be slightly more effective because it understands PE format, but that is just a guess.I don’t see any other magic going on.

2009-07-16 10:37 pm

flynn
It basically looks at the machine code of the new and the old executable. Adjusts/aligns the matching parts and then does a diff.(Not 100% correct, but you get the idea)

Simple but really genius .. just like their page rank algorithm… I am still pissed that I didn’t come up with something that obvious .. damn .. I could really use all that money.(I would spent most of it on cars, women and drugs. The rest I would squander.)

No, what you described is what they were doing up until now. The problem is that diffs on compiled binaries end up being ineffective because the binary undergoes many more changes between versions then the corresponding source. Remember that programming languages are very often one-to-many mappings for machine instructions. Even if only one line of code changes in the source, many machine instructions will change in the corresponding binary. When you add optimization in to the mix it gets even more messy. As the code changes the available optimizations the compiler can do also change. Some optimizations are no longer possible, while others become available. This has a drastic effect on the resulting binary.

What Google has done is developed a reversible transformation from a regular binary to one that is more condusive to diff’ing. This was they diff the transformed form, send it over the network and then reverse the transformation.

Edited 2009-07-16 22:42 UTC

2009-07-16 10:59 pm

kragil
No, they didn’t do the disassemble/adjust part before. That is new.

2009-07-16 11:10 pm

flynn
This is new, but what you described in the initial post wasn’t. What you described is just a regular diff. Diff (binary or regular) already works by finding long common subsequences and ‘aligning’ them. But a regular diff doesn’t work on compiled binaries all that well because there are very few such long common subsequences.

2009-07-16 11:21 pm

kragil
I maybe didn’t do a good job, but a regular diff has absolutly no understanding what machine code is and how to align machine code. I just left out stuff laymen wouldn’t understand and I said it wasn’t 100% correct.

So stop telling me what I said, I know.
2009-07-16 11:35 pm

flynn
I maybe didn’t do a good job, but a regular diff has absolutly no understanding what machine code is and how to align machine code. I just left out stuff laymen wouldn’t understand and I said it wasn’t 100% correct.

So stop telling me what I said, I know.

Diff doesn’t need to understand machine code to do it’s job. It looks for long common subsequences and notes the differences of the rest. The fact that the sequence of 1’s and 0’s it is comparing corresponds to an add instruction or a mul instruction or a jmp or a call or whatever is absolutely inconsequential. It has no bearing on the algorithm. Just like a text file diff doesn’t care about the meaning of the words it’s compairing so does binary diff not care about which instructions it’s operating on. It just cares about finding the long common sequences of 1’s and 0’s, no matter what they represent.

The reason binary diff doesn’t work all that well isn’t because it doesn’t ‘understant’ machine code, it’s because changes between two compiled binaries are often so drastic the resulting diff ends ups being almost as large as the file itself, making it almost useless as a space saver.
2009-07-17 9:05 am

kragil
???????

That is what I am saying all along, that the new way analyses and adjusts (understands) machine code, but you obviously don’t understand a single word I am saying so I will stop this insanity now.

Bye

Edited 2009-07-17 09:12 UTC

2009-07-16 10:07 pm

obsidian
I’d love to see the detailed algorithm for this (which will hopefully be in the paper that they’re working on). That should be well worth waiting for!

Really good to see work like this being done – looks like a very clever and elegant solution to the “diffing problem”.

It’ll be interesting to see which license the Courgette code is releaed under (if it is released).

I’d love to see it as “public domain” as (say) the LZMA algorithm is (with the LZMA SDK).

Edited 2009-07-16 22:20 UTC
2009-07-16 10:24 pm

Ventajou
They would also save bandwidth if they only updated when the browser is actually running. Right now if you install Chrome and don’t use it for 6 months, you still have that Google service running in the background and downloading updates whenever it feels like it.

Also they might want to start installing Chrome in the right location on disk instead of in the user’s folder. It almost looks like an attempt to help enterprise users circumvent IT policies. Why can’t they just install in the Program Files folder like everyone else? Between that and the lack of standalone installer, I don’t see Chrome making much progress in the enterprise world.

2009-07-17 3:37 am

3rdalbum
If Chrome is inside Program Files, then it can only update when it is running as Administrator. That’s why it’s done that way.

2009-07-16 10:25 pm

galvanash
Wow. Whoever thought that up is pretty clever. I mean its not like this is something that will change the world or anything – just a very neat optimization.

I wonder how expensive this operation is at runtime. If it was fast enough, this might be a useful approach to building self-extracting compressed executables for languages that produce large, monolithic, statically linked executables (Delphi comes to mind as a prime example of this)… Probably not worth the trouble for Delphi since it isn’t very heavily used anymore, but it might be applicable for the output of some other languages.

2009-07-16 10:40 pm

flynn
If it was fast enough, this might be a useful approach to building self-extracting compressed executables for languages that produce large, monolithic, statically linked executables (Delphi comes to mind as a prime example of this)… Probably not worth the trouble for Delphi since it isn’t very heavily used anymore, but it might be applicable for the output of some other languages.

*cough*Haskell*cough*

2009-07-16 10:52 pm

galvanash
Didn’t think of Haskell… Thats a VERY good example – the runtime is rather large and is pretty much always statically linked from what I understand (I’m not a user, but I’m somewhat familiar with it).

Of course the really big gains Google’s approach achieves are only apparent on diffs, for the reason that a small code change often creates a rather large change in the resulting binary. But I wouldn’t be surprised if a net gain is achievable even for the entire binary image – the disassembly is likely much more compressible than the binary image.

2009-07-17 4:17 am

Elv13
Not really, you cant make miracles

mov ax,bx (72 bits)

add bx,0xF4D4 (104 bits)

(sorry, I never actually wrote assembly before so I know it is wrong) is bigger than the binary counterpart, let say

F4 D3 9A 73 (64 bits)

F5 D4 9B 74 (64 bits)

so for the complete binary, the assembly code may be more compressible, but it is also slightly bigger. So in my opinion, you make no gain, it is worst.

-I did not test-

Edited 2009-07-17 04:18 UTC

2009-07-17 6:33 am

galvanash
Of course it is bigger. It is pretty much always going to be bigger. But the point is that it is dramatically more compressible.

Executable binaries give something like zlib very little to work with. Zlib (and almost any other compression algorithm) does its magic by finding patterns in the data. Binaries have very little in the way of patterns. The assembly code of said binary, however, is chock block full of patterns.

I’m not saying that it would definitely make exe files smaller, in fact I doubt it would be worth the trouble – but it isn’t outside the realm of possibility. The odds of it being a net gain increase with executable size – hence why I qualified that it would only be appropriate (possibly) for very large binaries.
2009-07-17 7:51 am

galvanash
Well, I just went take a look at the source code:

http://src.chromium.org/viewvc/chrome/trunk/src/courgette/

Frankly, my idea that this might be good as a generic exe packer are probably not valid. I only briefly glanced over the code, and frankly alot of it is WAY over my head, but the secret sauce appears to be what Google calls the “adjustment” step, and that step is only of any consequence for production of a delta – which is of no real consequence for a packer.

The disassembler is designed to produce a representation of the data in a format that is optimal for their adjustment step. It isn’t _really_ a disassembler – its much more primitive than that, but it does in effect generate something akin to an instruction stream and a symbol table.

There is some pretty cool comments in there though that help explain things a bit. Neat stuff.
2009-07-17 7:04 am

kjmph
Close, but no cigar. A major problem is not patterns in the code, but rather function addresses. If you add even one byte to the beginning of a file, all addresses beyond that point are now shifted by one byte. Think of all the functions statically linked which are called by address. Every single call now has a new address. That’s a lot of cruft to send over a wire. If you replace function addresses with function symbols, most of them are not going to change. Less info to send that way.

2009-07-16 10:53 pm

vijayd81
It may not be exact same technology but firefox has incremental updates. They are called “mar” files. I believe their auto-updates use that. Also, you can download them from their ftp site (in update folder) and install them manually (filenames end with partial.mar)

Mozilla has not advertised it like Google. I am surprised that people are saying Google invented this cool feature.

2009-07-16 11:24 pm

sbergman27
It may not be exact same technology but firefox has incremental updates. They are called “mar” files.

No. Actually, MAR files use a variation on the same “bsdiff” algorithm that Google has been using, and has found to be unsatisfactory. Google’s new approach looks to be 10x more effective. So no, Firefox doesn’t have anything even remotely like it.

I’ve been saying for ages that we needed competition in the OSS browser area, and that the Firefox monopoly was bad, bad bad. (Konqueror? Oh, please…) And I’m beginning to feel vindicated, thanks to Google and the Webkit guys.

Edited 2009-07-16 23:25 UTC

2009-07-17 11:30 am

Blomma
Right, i think i can keel over and die now. Firefox being called a monopoly was the last thing on my list of things i thought i’d never hear.

2009-07-17 8:45 am

Nagilum
Back in the DOS days there was a compressor (I think just called ABC, advanced binary coder or something) that allowed you to save the dictionary that is generated when compressing a file. You could then re-use that to compress another file. So if you wanted to do a patch you would save the dictionary from compressing the original then use it to compress the updated version. The resulting file would be very small. Then you transfer the patch, use the compressor to temporary compress the original on the client side. That gives you the dictionary which you can then use to decompress the patch.

To get an idea how much you save by this compress a file with something like lzma (or something else with a big dictionary), then make a small change to a copy of the file, concatenate the two files and compress the result. Then compare filesize between the compressed original file and the compressed concatenated file. That difference is the approximate size of such a patch. (it will in fact be a bit larger as the dictionary cannot be modified by the patch without increasing the patch size)

Personally I would prefer this approach be implemented with something like lzma. Specialized data handling modules like courgette could then supplement that approach.

2009-07-17 9:03 am

Nagilum
Just did a small test, looks like this doesn’t work with lzma, However I did find the compressor:

ftp://ftp.elf.stuba.sk/pub/pc/pack/acb_200c.zip

2009-07-17 4:04 pm

rockwell
need more sleep .. or beer.