Application optimization with compilers for Linux on POWER

Eugenia Loli 2005-05-23 IBM 15 Comments

Interested in tuning your C/C++ applications for Linux on POWER? This article compares the optimization options for both Linux on POWER C/C++ compilers: GCC and IBM XL C/C++. This paper also reviews tactics, such as Interprocedural Analysis, Profile Directed Feedback, and High Order Transformations, which are used by one or both of the compilers to extract higher performance from the Power architecture.

About The Author

Eugenia Loli

Ex-programmer, ex-editor in chief at OSNews.com, now a visual artist/filmmaker.

Follow me on Twitter @EugeniaLoli

15 Comments

2005-05-23 8:17 am
Anonymous
Does IBM fight the AMD64 aka EM64T, which obsoletes the IA32 architecture, camp or the RISC (including epic broadness) camp?
Or do they attack both?
Carsten
2005-05-23 8:20 am
Anonymous
Does IBM fight the AMD64 aka EM64T, which obsoletes the IA32 architecture, camp or the RISC (including epic broadness) camp?
Or do they attack both?
IBM manufactures BOTH POWER and AMD64 chips, last I heard…
2005-05-23 8:43 am
Anonymous
Might should start using some optimization flags
2005-05-23 8:50 am
Anonymous
Its been a while since they’ve upgraded their MacOS X version of it; its still sitting at v6. Also, is it entirely necessary to make downloading their software next to impossible? I mean, is all the registration and crap, absolutely necessary?
2005-05-23 9:03 am
Anonymous
> IBM manufactures BOTH POWER and AMD64 chips, last I heard…
The PC department of IBM built Itanium Servers, while the eSeries department fought against Itanium.
IBM Global Services sells Oracle Databases, while the DB2 group fights against Oracle products.
IBM is no monolith.
Carsten
2005-05-23 10:07 am
Anonymous
Looking at the results of table 6, XLC produces some ridiculously fast benchmark results compared to GCC. An improvement of ~16 times according to the article.
If you looked at the source code, you’d see that it was just doing a pointless loop. What the article failed to mention is that result is most probably obtained because XLC with the -qhot option guessed that the loop was unnecessary and thus removed it completely.
2005-05-23 10:11 am
Anonymous
Does IBM fight the AMD64 aka EM64T, which obsoletes the IA32 architecture, camp or the RISC (including epic broadness) camp?
I am a bit wondering what this might have to do with the article? I think the main platform mentioned in the article is POWER5, which competes maybe with IA-64, but certainly does not compete with AMD64/EM64T.
2005-05-23 10:43 am
Anonymous
Does POWER5 feature VMX? The POWER4, I know for sure, did not. The PPC970 does. Since the article mentions VMX optimisations it is maybe not thar POWER5 specific.
Nevertheless, I personally see IBM OpenPOWER 7xx — especially 710 — rivaling also AMD64/EM64T based servers.
Carsten
2005-05-23 12:43 pm
Anonymous
I just wanted to know how GCC compares to IBM C
2005-05-23 12:45 pm
Anonymous
I just wanted to know how GCC compares to IBM Compiler in terms of performance. I think IBM Compiler would kill the GCC in terms of performance, thats just my opinion. Anyone here with some sort of stats.
2005-05-24 6:26 am
Anonymous
If you looked at the source code, you’d see that it was just doing a pointless loop.
I had a look, it doesn’t look pointless to me. Looks like it runs 10 big matrix operations, which is a good way to test optimisation strategies. If you read the article, they say that -qhot invokes the auto-vectoriser, so a 16x speedup on a big matrix operation looks reasonable…
If the compilers were really smart, they’d look at the entire program, see that it has no outputs, and optimise the entire thing away. It would run in 0.00 seconds! But is that the point?
2005-05-24 6:32 am
Anonymous
There are probably some comparisons around, but let me say this: GCC’s primary focus is not speed. It is portability and being free software. Hell, they say that early on in the article: “GCC is a robust compiler aimed at world class quality with emphasis on portability across platforms and open source development.” Notice nothing about speed.
However, they still have a big focus on speed – it is just not their main focus. For example, one of the big new features of GCC 4.0 is their new optimisation framework. Whilst they don’t use much of it yet, it will allow them to implement awesome new optimisations in the future.
Contrast this with IBM’s compiler. Now, they produce the PowerPC chip, so they know how best to optimise for it. Also, unlike GCC, they do not have to be portable to many different processor types, so they can put as many PowerPC tricks in there as they like. Finally, compiler optimisation is one of the huge cutting-edge areas of research these days, and IBM has a nice large research team for this.
2005-05-24 4:47 pm
Anonymous
Two interesting points from the article. The first was that I didn’t realize that gcc could do PDF. I had seen Microsoft demonstrating that at SuperComputing 2004 back in November. I was impressed, and diasppointed that it would take awhile for gcc to do the same thing. It turns out, according to the article, that gcc has had it since 3.4. That’s cool.
I was fascinated with some of their suggestions. First of all, I’ve done some performance benchmarking in scientific computing and I have not found a substantial cost of doing a divide instead of a multiple. I’m also curious what type of performance improvement can be expected from using register sized unsigned variables for scalar values. The article doesn’t say. Do you guys have any idea?
Ironically, the example does not use longs as its scalar type, but the good old reliable int. Compiled in 64 bit mode, sizeof(int) != sizeof(long) on POWER, or do they? The loop is certainly doing something, contrary to what a previous poster said. It is looping through the elements of two 4000×4000 arrays. The code has a note describing the optimization leading to the 16x speed improvement: “XLC interchanges i and j loops and vectorizes the float divide.” The article explicitly says that one should not expect compiling with the IBM compiler to net a general program speedup of 1600%, but that this is an indicator of how drastic the effect of optimization on individual code segments can be.
2005-05-25 11:54 pm
Anonymous
nice article, ty for pointing out.
just, *very* disappointing example from IBM just to claim 16x better speed on gcc… (‘wrong’ indexes in matrix multiplication, with clear results)
nice that their compiler recognizes it, but a good compiler doesn’t help with that bad code anyway learn coding instead
2005-05-25 11:58 pm
Anonymous
not matrix mul, just nothing i was just looking at the code way too fast and supposing too much ;D