Linked by Michael on Tue 29th Mar 2011 23:53 UTC
Benchmarks "Version 4.6 of GCC was released over the weekend with a multitude of improvements and version 2.9 of the Low-Level Virtual Machine is due out in early April with its share of improvements. How though do these two leading open-source compilers compare? In this article we are providing benchmarks of GCC 4.5.2, GCC 4.6.0, DragonEgg with LLVM 2.9, and Clang with LLVM 2.9 across five distinct AMD/Intel systems to see how the compiler performance compares."
Permalink for comment 468283
To read all comments associated with this story, please click here.
Valhalla
Member since:
2006-01-24

What were the flags, data sets used, etc, etc, etc?
While the data sets shouldn't matter in any way I can only agree that the lack of information concerning the compiler flags makes it sort of a black box benchmark.

From what I've gathered these tests are done with the flags that have been set upstream (if it's those in the original source packages or those chosen by some repository package maintainer I have no idea) but those flags may very well be set to very low levels of optimization and/or contain debugging flags which impact performance.

Other packages like x264 enables tons of handwritten assembly for x86,x64 by default which pretty much renders the tests worthless as a comparison between compilers, and afaik Phoronix's tests do not disable this assembly code when doing their benchmarks.

A (imo) proper test would be to compile all packages at atleast -O3 (and possibly -O2) and compare the corresponding results.

As it is now, an upstream package may come with compiler settings either intentionally tailored to a specific compiler or one that by chance suits one compiler better than it does another which may not reflect the performance of each compiler when told to generate their fastest code (usually -O3).

Not comparing the compilers at a specific (or several specific) optimization level (preferably the highest if only one is to be used) means that the test-results may often be a poor reflection of the actual compiler capacity.

I can see that Phoronix may shy away from testing a large set of optimization levels but then they should atleast settle on -O3 which is the level which from the compilers standpoint *should* generate the fastest code. As it is now, the packages they benchmark may have -Os, or -O2 for all we know and since there's really no fair way of measuring performance of 'middle' settings (how can you decide if -O2 on one compiler corresponds to -O2 on another? maybe -O2 is closer to -O3 in compiler A, while -O2 is closer to -O1 on compiler B) the only 'fair' way would be to either compare several optimization levels against eachother or only the highest.

As it is now, I find Phoronix test results interesting but I do take them with a large grain of salt. I will continue to rely on my own tests and tests where all relevant data is presented for easy verification.

I do applaud Phoronix for doing these benchmarks, I just wish they weren't done in (again imo) such a sloppy manner.

Reply Parent Score: 5