“The latest gcc holds it own against Intel C++, winning some benchmarks it lost previously. There are still applications where Intel shines, but the differences between the compilers have narrowed. So which compiler is better? Like Einstein, I have to say the answer is relative.” Read the article here.
If you read the last comparision, and if you read the gcc mailing list, you would have probably mentioned that there are no, or at least no significant speed improvments of gcc (version 3.1 and 3.2). Just look at the comarison chart yourself: the only thing that changed from the last comparision is the additional column for gcc-3.2 and it almost never differes from that of gcc-3.1.
For that reason the author (of the comparision) didn’t make
any changes in his conclusion of the comparision. So the “benchmarks that it lost previously” are actually still just those benchmarks lost by gcc-3.0 and won by gcc-3.1 (and gcc-3.2).
I don’t think this is worth any announcement on osnews.com!
>I don’t think this is worth any announcement on osnews.com!
I think it does. If your conclusion was that GCC had no speed improvements on the new major version (3.2.1 over 3.1), then that alone, is news.
Another news item there is that ICC 7 is not 40% faster than ICC 6, no matter what Intel wants you to believe (http://www.extremetech.com/article2/0,3973,742682,00.asp).
This test seems to have been done on a dual P3, but I wonder how that changes for a P4, or an AthlonXP(has SSE)
>Another news item there is that ICC 7 is not 40% faster >than ICC 6, no matter what Intel wants you to believe >(http://www.extremetech.com/article2/0,3973,742682,00.asp).
Maybe the speed improvements of 40% are only related to enabling HT support for newer P4’s?
Just a thought..
AthlonXP would probably need GCC’s “athlonxp” optimization attribute rather the “sse” one. I don’t believe that ICC has support for AMD cpus, but only for their own products. However, it is funny that AMD usually uses ICC when they publish SPEC benchmarks instead of GCC.
As for P4, the author clearly says that he doesn’t have one to test with.
The author of that extremetech article probably misquoted the person he was interviewing (surprise) who was probably referring to one of the quotes on ICC’s web page that has someone referring to a performance increase for a specialized scientiffic application.
One thing that strikes me with this article as do the others is that the author was advised by a GCC developer (Richard Henderson) to use “-funroll-all-loops”. I don’t think ICC does that and instead it selectively chooses which ones to unroll.
The GCC manual also has this to say about -funroll-all-loops: “This usually makes programs run more slowly.” So any performance improvements gained in these synthetic benchmarks probably aren’t going to reflect ones in typical applications.
> This test seems to have been done on a dual P3, but I wonder how that changes for a P4, or an AthlonXP(has SSE)
Going by the benchmarks that people have done with Pentium 4s, Athlons, etc, the single processor systems seem to fare better than dual processors in a few tests. Maybe the claims of 40% performance improvements apply only to a few of those tests?
There are a few main things to note here:
1) GCC is easily in league with ICC. GCC wins a few tests, but even then, it is usually not far behind ICC. To quote Tannanbaum “avoiding disaster is far more important than optimal performance.” A 20% difference in performance is not really that noticible, even for (say) a scientific application (is a 80 hour runtime really that much better than a 100 hour runtime?) What’s most important is avoiding cases where GCC generates code two or three times slower than ICC. GCC 3.0.x used to do this in several places, and GCC 3.2.x does this in a very few. That’s a good thing.
2) Both compilers seem to finally have a handle on C++. On paper, both are nearly fully complient with standard C++, except for ‘export’ In practice, GCC 3.2.x is extremely complient, while Intel C++ is a notch less so, but still very good.
3) Both compilers seem to handle reducing the abstraction penalty of C++ quite well. Modern C++ program design depends on the fact that most of C++’s complexity is compile-time, and all the fancy high-level features are structured so they can (at least in theory) be optimized out. It is critical that actual compilers conform as closely as possible to the theoretical expecatations, to allow developers to write elegant code that also performs well. The STL is a great example of all this. While it is high level and quite elegant, the STL is still just as fast as equivilent hand-built C data-structures and algorithms. Sometimes, it’s even faster, because generic data structures in C require additional indirection via pointers, which is not necessary in the STL thanks to templates.
4) It’s kind of odd how GCC and ICC are closer in the latest round, but mainly because of performance regressions in ICC. Perhaps optimization for the P4 has negatively affected performance on the Athlon?
I would bet that they are due to IA64 work more than P4, which require a lot more re-thinking and work. Also, the author of the article was running a PIII system.
Personally the most significant difference I notice is the difference in compilation time between ICC and GCC with ICC being considerably faster.
One benchmark that I think would be of considerable interest to C++ folks would be a benchmark involving something like boost. And on the C side a comparison of the Linux kernel when they can build it.
As a for IA64 performance, I don’t think many care much about that for now.
Personally the most significant difference I notice is the difference in compilation time between ICC and GCC with ICC being considerably faster.
I’m guessing ICC requires less memory than does gcc.
Does anybody happen to have run tests on how quickly each compiler compiles code – simple “C”, C++ with no Templates, Templates, etc…
t took me a couple hours to compile the latest QT sources with g++ – Just wondering if it would have been faster with ICC, (or if it would even work, for that matter..)
“and all the fancy high-level features are structured so they can (at least in theory) be optimized out.”
What C++ features do you call “high-level”? Things like late binding, self reflection, etc. are all runtime things. You even have restrictions when it comes to type checking at compile time (this is important when it comes to template types). Substitute “all” with “some” and your statement is true. Because of their semantics you can’t “optimize out” runtime features at compile time. Otherwise they wouln’t be added to the runtime at all.
—
Andreas
PPC, Alpha, Sparc, StrongARM, MIPS, AMD, Transmeta, etc.
do a comparison on those platforms as well as Intel, then average out the results and then tell us which it the better compiler.
‘All’ is correct. High level C++ features are designed so they can compile down to the same thing as C code which does something equivilent. In other words, the generated code contains only what is inherent in the task being done, not any language overhead. Classes, for example, have no runtime representation: they exist only at compile time. As a consequence, OO features like encapsulation and inheritence have no impact on the generated code. Templates too, have no runtime representation. Polymorphism is a single indirect function call, something which must be done anytime a different function needs to be called at runtime, even if one is using ASM. The extensive type system is largely a compile time entity, the only trace of it that exists at runtime is via RTTI, which is directly analagous to C structures that contain ID fields.
A concrete example of all this is a mathematical vector class: In C++, you can write a vector class that is templated based on value type and behaves perfectly like a regular mathematical entity (overloaded +/-/* operators, etc). It’s a nice, high level, OO design. When you compile code using this class, all of the abstraction should be resolved at compile time (including elimination of temporaries and whatnot), and you should end up with something that is equivilent to hand-written C code that does the same thing. Other languages don’t necessarily have this property. Objective C, for example, extensively uses messaging for method dispatch. Much of the time, the overhead of messaging is not necessary, and if you wrote a C program that did the same thing, you’d use direct (or indirect via pointer) calls instead. Other languages include a pointer in every objects that points to an object’s class, which makes all objects bigger. Others implement member access indirectly, and check access rights.
PS> Well, I’m fuding a little RTTI could be considered an exception, since C code has no reason to manipulate type information or do safe casts. Exceptions, though, are not an exception. In fact, they’re an exception that shows the rule in an ever better light: current compilers include what are called zero-overhead exception mechanisms. The cost of an exception is deferred entirely to the point where the exception is raised (no saving and restoring state information like setjmp() and longjmp()) so they are even more efficient than any error handling mechanism that can be done in C.
Exceptions:
The exception mechanism can be very efficient in C++ but it doesn’t have the really useful feature of a stack trace built in. You do get the exception but there is no way of knowing from where it was thrown without writing that piece of code yourself. Often I really on macros for this purpose.
Ex.
#define RAISE_ERROR() throw Error(__FILE__, __FUNCTION__, __LINE__)
which does the job of showing from where the exception was thrown but it doesn’t show the call stack. Which is often useful.
Template example:
What you are talking about is parameterized data structures. The whole point of parameterized programming techniques is that they SHOULD be resolved at compile time. The features of the runtime is often more important when you are using imperative techniques and that’s when you realize that the C++ runtime is very much lacking, particulary when it comes to the field of GUI and system toolkits. Examples of this are;
– QT, which added a custom pre-processor to add features needed.
– XPCOM, the Microsoft COM look-a-like of Mozilla.
– And of course Microsoft COM itself.
These “extensions” add features unavailable in the C++ runtime. Compare this with OS X Cocoa toolkit where Objective-C didn’t only have these features built in but provides them with far less “fuzz” and ease of use than any of the variants grafting it on to C++.
The point is that C++ activly AVOIDS features that cannot be resolved into efficient runtime structures regardless how useful they are. This is a good thing in some cases but makes C++ very much a “better C” and not so much a complete high level object oriented language.
“Classes, for example, have no runtime representation: they exist only at compile time.”
Hmm … a memory block containing the class is no representation? A vtable containing pointers which are required for implementing classes is not part of their representation? Maybe we have incompatible semantics for “representation” 😉
> What C++ features do you call “high-level”?
High-level features are those that provide some sort of abstraction, regardless of whether the overhead is taken at run-time or compile-time.
> Things like late binding, self reflection, etc. are all runtime things.
Yes, but it has nothing to do with whether those features are at a high or low level of abstraction.
> You even have restrictions when it comes to type checking at compile
> time (this is important when it comes to template types).
Of course.
> Substitute “all” with “some” and your statement is true.
No.
> Because of their semantics you can’t “optimize out” runtime features at
> compile time. Otherwise they wouln’t be added to the runtime at all.
He is speaking about the overhead relative to the functionally equivalent Standard C code, most likely because that is the camp from where most arguments against C++’s supposed bad performance come.
> The exception mechanism can be very efficient in C++ but it doesn’t have the
> really useful feature of a stack trace built in. You do get the exception
> but there is no way of knowing from where it was thrown without writing that
> piece of code yourself. Often I really on macros for this purpose.
With VC++ one can set the debugger to break whenever an exception was thrown, and this gives you access to the call stack, local variables, etc. that caused your code to throw the exception. If what you say is true on Linux, then it is not the fault of the language.
> What you are talking about is parameterized data structures. The whole point
> of parameterized programming techniques is that they SHOULD be resolved at
> compile time. The features of the runtime is often more important when you
> are using imperative techniques and that’s when you realize that the C++
> runtime is very much lacking, particulary when it comes to the field of GUI
> and system toolkits.
>
> These “extensions” add features unavailable in the C++ runtime. Compare this
> with OS X Cocoa toolkit where Objective-C didn’t only have these features
> built in but provides them with far less “fuzz” and ease of use than any of
> the variants grafting it on to C++.
>
> The point is that C++ activly AVOIDS features that cannot be resolved into
> efficient runtime structures regardless how useful they are. This is a good
> thing in some cases but makes C++ very much a “better C” and not so much a
> complete high level object oriented language.
Even if that is true, it is irrelevant. He never claimed that C++ had all the possible high-level features, and anyway you should know that such a claim would be false for any language because we cannot know whether there are undiscovered high-level features.
> Hmm … a memory block containing the class is no representation? A vtable
> containing pointers which are required for implementing classes is not part of
> their representation? Maybe we have incompatible semantics for “representation” 😉
It was a bad choice of words on his part, but you do not have to compound the problem. A virtual function table, which is a common method of implementing overloaded virtual functions, is only necessary for each class that actually contains virtual functions. A virtual function table is not “required for implementing classes,” but only “required for implementing classes that contain at least one virtual member function or are derived from at least one class containing at least one virtual member function.”
Even then the compiler has the option to inline calls to virtual functions if the exact type of the object is known at the time of the call. Why is this important? Because even though those member functions have been declared virtual, we may not want them to be overridden throughout the entire program. Perhaps we only want the polymorphic behavior in one specific part of the program; we should not have to bear the cost of going through the virtual function table in the parts that do not need the polymorphic behavior.
Example of a virtual function call that can be safely inlined:
class object { public: virtual void do_something () {} }
int main ()
{
object o;
o.do_something ();
}
“He is speaking about the overhead relative to the functionally equivalent Standard C code, most likely because that is the camp from where most arguments against C++’s supposed bad performance come. ”
Yes, I have misunderstood this. Thanks.
Well, GDB does this too, via the ‘catch throw’ command. And additional clarification to the guy who mentioned COM and Qt Moc. They’re two different things, and neither have any real relevence to C++ language features. COM and XPCOM are simply additional mechanisms on top of C++. All COM classes are just regular C++ classes and COM interfaces are regular abstract base classes. C++ provides the language support necessary to dynamically introduce code into a program (thanks to virtual functions) while the COM provides a mechanism to find and load that code. Since the scope of COM is so large and OS-specific, it doesn’t really belong in the core language anyway. Now, if something like COM wasn’t possible within the standard C++ language, then that would truely indicate a weakness of the language. As it stands, it indicates a strength. As for Qt’s Moc, it exists largely because support for standard C++ wasn’t widespread enough when Qt was written. Visual C++, one of Qt’s prime targets, only got decent support for templates in 7.0, and you really need to wait for 7.1 (which is in testing right now) to fully resolve template issues. If the full language had been available for Qt to use, they could have implemented most of the functionality provided by Moc entirely within the core language. LibSigC++ is an example of a library that does just this.
what would be the estimated overall gain on a 3.2 BeOS compiled, over the current 2.9 one?
The overall gain would probably be fairly minimal, like 10-15% BeOS code doesn’t use a lot of complex C++ (basically C with classes and some virtual functions) and operating system level code in general is almost totally algorithm dependent, rather than optimization dependent.