Linked by Michael on Tue 29th Mar 2011 23:53 UTC
Benchmarks "Version 4.6 of GCC was released over the weekend with a multitude of improvements and version 2.9 of the Low-Level Virtual Machine is due out in early April with its share of improvements. How though do these two leading open-source compilers compare? In this article we are providing benchmarks of GCC 4.5.2, GCC 4.6.0, DragonEgg with LLVM 2.9, and Clang with LLVM 2.9 across five distinct AMD/Intel systems to see how the compiler performance compares."
Thread beginning with comment 468413
To view parent comment, click here.
To read all comments associated with this story, please click here.
Member since:

Note that -O2 is used in GCC because it often (usually?) produces faster code than -O3. For some discussion on the topic, see Gentoo's documentation page on optimization flags: . Basically, it sounds like the GCC optimization levels are the separated by the amount of work the compiler has to do to optimize the code, and the extra work done by the -O3 optimizations tends to increase code size (and therefore hurt caching) so it often slows down programs.

That said, testing compilers at multiple optimization levels would likely be more informative about how good their optimizations actually are.

Edited 2011-03-30 20:34 UTC

Reply Parent Score: 1

Valhalla Member since:

Note that -O2 is used in GCC because it often (usually?) produces faster code than -O3.

Well, the fact that -O2 does beat -O3 sometimes is why I wrote *should*, but from my experience -O3 usually beats -O2 on both GCC and LLVM. Which is as it should be, since -O0 is no optimizations, -O1 is slight optimization, -Os favours code size over speed, -O2 tries to strike a balance between code size and speed, and -O3 will opt for maximum speed at the cost of code size.

The reason -O2 sometimes beats -O3 is most likely due to flawed heuristics resulting in cache misses and failed branch prediction etc by some of the more advanced optimizations enabled by -O3. Cache optimization is sensitive to cpu platform settings, so using '-march=native' would be a good choice for code to perform as good as possible on your machine.

It's interesting though that while I've found -O2 to beat -O3 on certain tests using GCC and LLVM, when I've tried Open64, -O3 has always performed much better than -O2, so in a -O2 test between GCC, LLVM and Open64, Open64 would likely be at a disadvantage, hence why I think it's apt to go for the option that is *meant* to generate the fastest code (-O3), OR benchmark compilers across several optimization levels.

Also note that what once was faster with -O2 may not be faster with the next iteration of that compiler, given that heuristics improve (sadly they also sometimes regress). This is a very difficult part of compiler technology which is why optimizations such as PGO (profile guided optimization) is so effective. It is also why programs like the Linux kernel makes use of C extensions like __builtin_expect and __builtin_prefetch to guide the compiler when optimizing for branch predictions and cache prefetching.

Reply Parent Score: 2