To show you the contrast in performance in intra-platform optimizations, I ran the same test on a Pentium III 1 GHz x86 system. I compiled OpenSSL with
-march=i686 (the highest effective optimization for my Pentium III system).
The x86 test system is running Linux 2.4, and OpenSSL 0.9.7c was again compiled with GCC 3.3.2. They were compiled with
-O3, and each run was done 3 times with the results averaged. Again, there was very little delta between the individual runs.
Since I'm running a Pentium III, I could have used
-march=pentium3. I actually did, and found there to be no difference in results between
-march=pentium3. Also, OpenSSL on Linux x86 is often distributed in both i386 and i686 iterations.
Remember, we're not comparing the performance of a 1 GHz Pentium III processor with a 333 MHz UltraSPARC IIi processor, rather we're comparing the difference between the lowest common denominator and the highest (effective) optimization between x86 and SPARC.
As you can see, the i686 flag does indeed give a performance boost as expected, but it's not nearly as dramatic as the difference between V7 and V9 (or even V8) on SPARC. This highlights the importance of optimizations for SPARC.
Contrasting With x86
You may have noticed that I used
-march for x86, yet
-mcpu for SPARC. For x86 GCC users this may seem confusing, since
-mcpuunder x86 only tunes a specific CPU, but doesn't take advantage of any additional instructions or additional functionality.
For SPARC, there is no
-march flag, instead it uses
-mcputo specify platform-specific optimizations. The
-mtuneflags works as the
-mcpuhas typically been used on the x86 platform, by tuning code for a particular platform but not taking advantage of additional instructions. (It should be noted that the -mcpu flag has actually been deprecated on x86 GCC in favor of
-mtune is the same on both x86 and SPARC (creates backward compatible tuned binaries),
-mcpu creates CPU-specific binaries (and not backward compatible) for SPARC, and
-march does the same for x86.
Another optimization option for GCC (universal to all platforms) is the
-Onflag, which controls many more specific optimization flags.
Further reading on these optimizations can be found on the GCC document site.
To see what the effect of the
-On flag with GCC has, I compiled OpenSSL 0.9.7c with
-On (where n could be 0 through 3), which is the range for GCC (there's also
-Os, which does maximum optimizations save for anything that might tend to dramatically increase size, but I didn't test that).
As before, the tests were run 3 times for each variant, and the results averaged. There was very little delta between the runs. OpenSSL 0.9.7c was used on Solaris 9 (12/03), compiled with GCC 3.3.2.
The results where quite surprising, as I had thought going in that there would be greater delta between the various levels of optimizations. As the results show, there wasn't much difference until going to zero.
This was only a single application, and the effectiveness of these optimizations will vary of course depending on your application, so keep that in mind.