Linked by Tony Bourke on Mon 23rd Feb 2004 21:54 UTC
Benchmarks In continuing with my articles exploring the my SPARC-based Sun Ultra 5, I'm going to cover the topic of compiler optimizations on the SPARC platform. While many are familiar with GCC compiler optimizations for the x86 platform, there are naturally differences for GCC on SPARC, and some platform-specific issues to keep in mind.
Order by: Score:
Two Things
by Anonymous on Mon 23rd Feb 2004 22:07 UTC


1. Why aren't you writing for one of the hardware sites? You do nice work.

2. Can you do a series like this for x86?

Thanks for an extremely interesting article, even though I have never touched a sparc in my life. You prove that optimizations do matter, in spite of those who say they make no difference.

RE: Two Things
by Eugenia on Mon 23rd Feb 2004 22:10 UTC

These series of articles is as much about software as it is for hardware, in fact software has the upper hand IMHO.

Gentoo Sparc
by Aaron Bennett on Mon 23rd Feb 2004 22:48 UTC

Hey, excellent article. I use Gentoo Linux on my ultrasparc and the best thing about it is everything is compiled with a common set of CFLAGS -- so any optimization settings I make are carried through all installed apps.

If you really want to get the most from your UltraSPARC you should really check out gentoo linux.

Thanks for the article
by Chris on Mon 23rd Feb 2004 22:54 UTC

Just wanted to say thanks for the well written article. Its insiteful and while I'd love to see the effects of optimization on various peices of software instead of just OpenSSH, its something I can do in my own time.

re : Thanks for the article
by Wee-Jin Goh on Mon 23rd Feb 2004 23:00 UTC

Yep, I echo those sentiments. Its definitely a good article.

Couple of speculation I have about the performance of SPARC. My guess is that because its largely a RISC chip, instruction scheduling is much more important, which is why you get a larger jump in performance when optimizing for correct CPU architecture. That probably explains why the performance delta on x86 isn't so great.

re : Thanks for the article
by Wee-Jin Goh on Mon 23rd Feb 2004 23:01 UTC

Heh... couple? I guess that's just one speculation :-)

Interesting article
by Bascule on Mon 23rd Feb 2004 23:14 UTC

Perhaps I've misjudged gcc on SPARC by not properly performance tuning the flags I've been passing when I've used gcc. It'd be quite interesting to see the benchmarks in comparison to the C/C++ compilers included in Sun's Forte Compiler Collection.

Aaron Bennett (IP: ---.edu)
If you really want to get the most from your UltraSPARC you should really check out gentoo linux.

If you really want to get the most from your UltraSPARC system you should probably be running Solaris...

Re: Two Things
by Anonymous on Tue 24th Feb 2004 00:33 UTC

" You prove that optimizations do matter, in spite of those who say they make no difference."

Who in their right mind goes on around saying that compiler optimizations make no difference, especially dealing with a RISC design?

Compiler optimization makes all the difference with respect to RISC machines, sure hardware is usually a few steps ahead when it comes to dynamic scheduling. But most RISC chips depends heavily on a good scheduling, this dependence is taken to an extreme by VLIW machines.

Other thoughts
by Leslie Donaldson on Tue 24th Feb 2004 00:37 UTC

Hello,
For those out there optimizing for Sparc and are writing code don't forget to limit the level of subrotine calls. Sparcs use a sliding register structure and if you get too deep in subroutine calls it will kill performance. One reason why unrolling loops may help your code (and inlines)

Donaldson

RE: Gentoo Sparc
by Brian on Tue 24th Feb 2004 01:16 UTC

I have to agree with what Aaron said about Gentoo. I've tried out many different OSs/distros with my Ultra 10. I've always been a huge Debian fan, so I used it happily on the machine for two years or so.

Recently, I had to install new hard disks and start again from scratch, so I decided to see how Gentoo worked, just for curiosity. Although it took a LONG time to build from scratch, I can really see the speed difference, especially with gcc flags like those discussed in this article. I would highly recommend Gentoo for anyone running an Ultra.

my $.02
by andy richter on Tue 24th Feb 2004 03:44 UTC

Again... great article...

i can't wait for the next...

Re: Gentoo Sparc
by Syntaxis on Tue 24th Feb 2004 04:10 UTC

"I use Gentoo Linux on my ultrasparc and the best thing about it is everything is compiled with a common set of CFLAGS"

I agree that this is a nice feature, but I question how advantageous this really is in practice. I suspect that one would get most of the speed benefits by simply carrying out targetted optimization of those key apps that really stand to benefit from it (such as OpenSSL, which was the example chosen in this review). It's perfectly possible to this in binary distributions - Debian has apt-build (http://packages.debian.org/unstable/devel/apt-build), for example. Anyway, a benchmark putting this to the test would be quite interesting.

Additionally, there's the flip side of the coin to consider. For the apps that *don't* particularly stand to benefit, some so-called system-wide "optimizations" may actually have a negative effect. For instance, the majority of Gentoo users blindly set "-O3" as the default CFLAGS for their x86 systems (http://www.mail-archive.com/gentoo-dev@gentoo.org/msg02236.html) even though in many cases "-O2" would probably yield better performance.

Register windows, etc...
by MJ on Tue 24th Feb 2004 06:12 UTC

Sparcs use a sliding register structure and if you get too deep in subroutine calls it will kill performance.

These are called register windows. The performance degredation occurs from taking a Spill Trap, which is when you fill your set of register windows and have to save one off elsewhere. Judicious register use can sometimes avoid this, but then again, on x86, the standard procedure is to push everything onto the stack since register usage is tight, and there aren't really any commonly used alternate constructs.

If you really want to get the most from your UltraSPARC you should really check out gentoo linux.

I don't know if this is necessarily true or not. I'm sure gentoo works well on SPARC, but one of the advantages of having such hardware is that it's really easy to get Solaris to run on it w/o much hassle. I'd try both as they each have different features, strengths, etc. But realistically, this article is about optimizing application performance on SPARC with GCC. Tony has done a great job of presenting the topic cogently, and frankly it's not much use to have the conversation degenerate into a "my operating system is bigger than yours" contest. I'd be curious to know if he's planning a similar article for optimizations with Sun's compilers.

Where's the -Os flag?
by Priit Laes on Tue 24th Feb 2004 06:44 UTC

Too bad that you didn't test the -Os flag. When the file is size-optimised, then the possibility of the L1 and L2 cache hit would be bigger and this makes it faster too.

Great article
by Andrew on Tue 24th Feb 2004 06:56 UTC

I concur - this is a well thought out and executed article. I would love to see more compiler/optimization articles. Especially by this author. Thanks.

Me too
by Christopher X on Tue 24th Feb 2004 08:09 UTC

I too would be curious to see how -Os would do compared to the others, especially how a 64-bit binary compiled with -Os would do compared to a normal 32-bit one, since 64-bit binaries tend to be larger. Would a 64-bit -Os binary still be larger then say a 32-bit -O3? Beyond that I recently purchased my first Ultra machine, specifically to run Solaris. Its a Ultra 60, and I plan on upgrading the hell out of it. :-) I also downloaded some Aurora iso's just to give it a whirl and see if Linux on these older UltraSparcs really is faster, but even if it is I want to run Solaris.

Great article
by Reflekt on Tue 24th Feb 2004 14:50 UTC

I would Especially love to see more i can't wait for the next...

Which gcc version did you use?
by BSDero on Tue 24th Feb 2004 16:24 UTC

..cause the Free Software Companion CD of Sun Solaris 9 has two gcc versions: gcc-2.95.x and 3.2...

Are there differences between both gcc versions? (At least in Solaris 9/Sparc?)

I tried that optimizations in my Quad/SS20 and my Dual/SS10 and it definitely worth the try.

Warning: If you have SMP systems (like mine) some software don't use SMP capability of sparc engines.. like some ray tracers and such number-crunch apps...( povray and so on)..

BSDero

Which GCC version
by TonyB on Tue 24th Feb 2004 16:31 UTC

For these tests, I used GCC 3.3.2 (as outlined in the article). There's very little performance difference between 3.3.2 and 2.95 (from my tests in the GCC versus Sun compiler article, 2.95 was actually very slightly faster, by about 1%).

I prefer using 3.3.2, simply because it's the most recent.

due to the nature of work being done by SSL?
by Jeff M on Tue 2nd Mar 2004 21:52 UTC

After reading this article, I went and tried compiling Ethereal ( http://www.ethereal.com ) with "-mcpu=ultrasparc" to test how it peformed.

The results were very different from what was shown in this article: in particular, it didn't change much. My post to the Ethereal-dev mailing list is archived here:

http://www.ethereal.com/lists/ethereal-dev/200403/msg00021.html

After reading this article, I went and tried compiling Ethereal ( http://www.ethereal.com ) with "-mcpu=ultrasparc" to test how it peformed; the tests I did were CPU intensive but they involved little to no math (multiplication or division).

The results were very different from what was shown in this article: in particular, it didn't change much. My post to the Ethereal-dev mailing list (with my results) is archived here:

http://www.ethereal.com/lists/ethereal-dev/200403/msg00021.html

My guess is that SSL is doing a lot of work that became hardware instructions in the newer chips.

Results
by TonyB on Thu 4th Mar 2004 14:21 UTC

It looks like for two of the tests, there was a significant difference, although not as dramatic as the OpenSSL tests. About 30% for one test, and 17% for the other. Whether that's worth it to you or to the project itself of course is a matter of opinion.

Also remember, there is I/O involved in these tests, and I/O operations donot benefit from compiler optimizations. When I ran tests with gzip for the various compilers in an earlier article, every compiler showed about the same results for a gunzip operation. The reason is likely that the bottleneck was the disk, as it couldn't read the data fast enough to show any difference.