I'll start with OpenSSL and its openssl utility. I used OpenSSL 0.9.7c, the latest version at the time of this writing from http://www.openssl.org.
Running the ./config utility in the openssl-0.9.7c root directory detects that the Ultra 5 I'm running this on is an UltraSPARC system, capable of 64-bits, and gives instructions on how to specify 64-bit compilation:
Operating system: sun4u-whatever-solaris2
NOTICE! If you *know* that your GNU C supports 64-bit/V9 ABI
and wish to build 64-bit library, then you have to
invoke './Configure solaris64-sparcv9-gcc' *manually*.
You have about 5 seconds to press Ctrl-C to abort.
The first compilation I'm going to do will be the 32-bit, so I'll ignore this for now. The config utility runs and prepares the build for solaris-sparcv9-gcc.
Configured for solaris-sparcv9-gcc.
Here are the CFLAGS from the main Makefile:
CFLAG= -DOPENSSL_SYSNAME_ULTRASPARC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DOPENSSL_NO_KRB5 -m32 -mcpu=ultrasparc -O3 -fomit-frame-pointer -Wall -DB_ENDIAN -DBN_DIV2W -DMD5_ASM
Two important flags here are -m32, which while GCC defaults to 32-bit binaries, explicitly sets 32-bit binaries. The other is -mcpu=ultrasparc, which sets the compiler to use optimizations for the UltraSPARC CPU (versus a SuperSPARC or older SPARC processor platform).
If you've done OpenSSL compilation on the x86 platform, this optimization is akin to x86's -march=i686 , which produces faster code for Pentium Pro processors and above (there's no benefit that I could measure by optimizing for new processors, like the P3). Most of the time OpenSSL and a few other applications, as well as the kernel, are released with i686 optimizations. These CPU-specific optimizations make a big difference in OpenSSL performance for both the SPARC and x86 platforms.
The only thing left to do is a make, which worked flawlessly. In the apps/ directory is where the openssl binary sits, and we can check to ensure it's a 32-bit binary:
# file openssl
openssl: ELF 32-bit MSB executable SPARC32PLUS Version 1, V8+ Required, UltraSPARC1 Extensions Required, dynamically linked, not stripped
I went on and built OpenSSL with 4 variations: 32-bit and 64-bit version with shared libraries (where libssl.so and libcrypto.so are separate), and 32-bit and 64-bit versions without external libcrypto and libssl libraries. I ran each iteration a few times, and took the first run. There was very little disparity between the runs.
In general, if you're using OpenSSL, you're probably using it with at least OpenSSH and possibly other SSL or crypto applications. Thus, building shared libraries is probably your best bet.
The test I ran was openssl speed rsa dsa, which runs through various RSA and DSA operations. I ran the tests 3 times, averaged the results, and rounded. There was little disparity between the three runs. Here are the results:
OpenSSL 0.9.7c: Verify operations per second (longer bars are better)
OpenSSL 0.9.7c: Sign operations per second (longer bars are better)
In this first test, we can see that 32-bit binaries were usually faster than 64-bit binaries, although in some cases the results were nearly identical. However, the speed difference wasn't all that great, topping out at about 12%.
GNU gzip 1.2.4a
GNU's gzip is also a useful benchmark, and it's one of the tools used on SPEC's CPU2000 ratings, so I grabbed gzip's source from the main GNU FTP site. I picked the latest available on the site, 1.2.4a.
To test gzip, I needed something to zip and unzip. I ended up using a tar of my /usr/local/ directory, as it had a nice mix of text files, binaries, tar balls, and even already gzip'd files. Also, it created a 624 MB file, which is big enough to negate disk or system caching.
I then created a 32-bit binary and 64-bit binary using GCC 3.3.2. I used “-O3 -mcpu=ultrasparc” as the compiler CFLAG for both (with “-m64” for the 64-bit version). I used the time utility to measure how long it took to run gzip and gunzip on the 624 MB tar file. I ran each operation for the each binary three times and averaged the results (rounding to the nearest whole number). The three runs were very consistent.
GNU gzip 1.2.4a: gzip and gunzip
For the gzip operating, the 32-bit binary about 20% faster than the 64-bit binary. For the gunzip operation, the 32-bit binary was nearly identical to the 64-bit runs (91 seconds versus 92 seconds for completion).