Are 64-bit Binaries Really Slower than 32-bit Binaries?

When running tests, installing operating systems, and compiling software for my Ultra 5, I came to the stunning realization that hey, this system is 64-bit, and all of the operating systems I installed on this Ultra 5 (can) run in 64-bit mode.

I wondered if it would be best to compile my
applications in 32-bit mode or 64-bit mode. The modern dogma is that
32-bit applications are faster, and that 64-bit imposes a performance
penalty. Time and time again I found people making the assertion
that 64-bit binaries were slower, but I found no benchmarks to back
that up. It seemed it could be another case of rumor taken as fact.


So I decided to run a few of my own tests to see
if indeed 64-bit binaries ran slower than 32-bit binaries, and what
the actual performance disparity would ultimately be.


Why 64-bit?

In the process of this evaluation, I came to ask
myself: Why go through all the trouble to make it 64-bit anyway?
Other than sex appeal, what other reasons are there for 64-bit?


Checking out Sun’s Docs site, I ran across this
article: Solaris 64-bit Developer’s Guide
(http://docs.sun.com/db/doc/806-0477).
It gives some good detail on when 64-bit might make sense, and is
relevant for Solaris as well as other 64-bit capable operating
systems.


The benefits for 64-bit seem to be primarily
mathematical (being able to natively deal with much larger integers)
and for memory usage, as a 64-bit application can grow well beyond 2
GB.


Operating systems seem to be benefiting from
64-bit first, allowing them to natively handle greater large amounts
of RAM (some operating systems such as Windows and Linux have ways
around the 2/4 GB limit on 32-bit systems, but there is funkyness
involved).  There aren’t many applications that use more than 2
GB of memory and greater, but there are more on the horizon.


Given that this Ultra 5 cannot hold more than 512
MB of RAM (it’s got 256 MB in it now), there’s not much benefit in
the memory area, for either the OS or applications. Still, I want to
see what issues and performance penalties there may be with 64-bit
binaries, so I’ll give it a shot anyway.


Here are some links for further reading on 64-bit
computing.



Applications


The first step was to select some applications to
run these tests against. They would have to be open source so I
could compile them in 32-bit and 64-bit versions, and there would
need to be some way to benchmark those applications. I would also
need to be able to get them to compile in 64-bit mode, which can be
tricky. I tried a few different applications, and ended up settling
on GNU gzip, OpenSSL, and MySQL.


Test
System


My test system is my old Sun Ultra 5 workstation,
for the specs please refer to the intro article. The operating
system install is Solaris 9, SPARC edition, 12/03. The 9_Recommended
public patch cluster was also installed.


For the compiler, I used GCC 3.3 .2 that I got
from http://www.sunfreeware.com
which is built for producing both 32-bit and 64-bit binaries. To see
if it can successfully produce 64-bit as well as 32-bit binaries,
I’ll run a very simple test of the compiler.


I create a very simple C file, which I call
hello.c:


main()
{
printf("Hello!\n");
}


I’ll just run a quick
test to see if it compiles in regular 32-bit mode:


gcc hello.c -o hello32


The complier gives no
errors, and we see that a binary has been created:


-rwxr-xr-x 1 tony tony 6604 Jan
6 13:24 hello32*


Just to make sure,
we’ll use the file utility to see what type of binary it is:


# file hello32


hello32: ELF 32-bit MSB executable
SPARC Version 1, dynamically linked, not stripped


So the file is 32-bit,
and runs SPARC Version 1, which means it should run on any SPARC
system. And of course, we’ll run it to see that it runs correctly:




# ./hello32


Hello!




All is well, so now
let’s try compiling hello.c
as a 64-bit binary. For GCC, to create a 64-bit binary, the CFLAG
is -m64.




gcc hello.c -o hello64 -m64




No errors were given,
and we see that a binary has been crated:




-rwxr-xr-x 1 tony tony 9080 Jan
6 13:24 hello64*




But is it a 64-bit
binary? The file
utility will know.




hello64: ELF 64-bit MSB executable
SPARCV9 Version 1, dynamically linked, not stripped




The binary is 64-bit,
as well as SPARCV9, which means it will only run on SPARC Version 9
64-bit CPUs (UltraSPARCs). So now we’ve got a 64-bit binary, but
does it run?




# ./hello64


Hello!




OK, so now we’re set.
Now we can test to see if 64-bit binaries are really slower, but
we’ll have to use something a little more intensive than “Hello!”.


OpenSSL 0.9.7c




I’ll start with OpenSSL
and its openssl
utility. I used OpenSSL 0.9.7c, the latest version at the time of
this writing from http://www.openssl.org.




Running the ./config
utility in the openssl-0.9.7c root directory detects that the Ultra 5
I’m running this on is an UltraSPARC system, capable of 64-bits, and
gives instructions on how to specify 64-bit compilation:




#
./config


Operating
system: sun4u-whatever-solaris2


NOTICE!
If you *know* that your GNU C supports 64-bit/V9 ABI



and
wish to build 64-bit library, then you have to



invoke
‘./Configure solaris64-sparcv9-gcc’ *manually*.



You
have about 5 seconds to press Ctrl-C to abort.




The first compilation
I’m going to do will be the 32-bit, so I’ll ignore this for now. The
config
utility runs and prepares the build for solaris-sparcv9-gcc.




Configured for solaris-sparcv9-gcc.




Here are the CFLAGS
from the main Makefile:




CFLAG= -DOPENSSL_SYSNAME_ULTRASPARC
-DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H
-DOPENSSL_NO_KRB5 -m32 -mcpu=ultrasparc -O3 -fomit-frame-pointer
-Wall -DB_ENDIAN -DBN_DIV2W -DMD5_ASM




Two important flags
here are -m32,
which while GCC defaults to 32-bit binaries, explicitly sets 32-bit
binaries. The other is -mcpu=ultrasparc,
which sets the compiler to use optimizations for the UltraSPARC CPU
(versus a SuperSPARC or older SPARC processor platform).




If you’ve done
OpenSSL compilation on the x86 platform, this optimization is akin to
x86’s
-march=i686
, which produces faster code for Pentium Pro processors and above
(there’s no benefit that I could measure by optimizing for new
processors, like the P3). Most of the time OpenSSL and a few other
applications, as well as the kernel, are released with i686
optimizations. These CPU-specific optimizations make a big
difference in OpenSSL performance for both the SPARC and x86
platforms.




The only thing left to
do is a make, which worked flawlessly. In the apps/
directory is where the openssl
binary sits, and we can check to ensure it’s a 32-bit binary:




# file openssl


openssl: ELF 32-bit MSB executable
SPARC32PLUS Version 1, V8+ Required, UltraSPARC1 Extensions Required,
dynamically linked, not stripped




I went on and built
OpenSSL with 4 variations: 32-bit and 64-bit version with shared
libraries (where libssl.so and libcrypto.so are separate), and 32-bit
and 64-bit versions without external libcrypto and libssl libraries.
I ran each iteration a few times, and took the first run. There was
very little disparity between the runs.







In general, if you’re
using OpenSSL, you’re probably using it with at least OpenSSH and
possibly other SSL or crypto applications. Thus, building shared
libraries is probably your best bet.




The test I ran was
openssl
speed rsa dsa
, which runs through various RSA
and DSA operations. I ran the tests 3 times, averaged the results,
and rounded. There was little disparity between the three runs.
Here are the results:






OpenSSL 0.9.7c: Verify operations per second (longer bars are better)





OpenSSL 0.9.7c: Sign operations per second (longer bars are better)



In this first test, we
can see that 32-bit binaries were usually faster than 64-bit
binaries, although in some cases the results were nearly identical.
However, the speed difference wasn’t all that great, topping out at
about 12%.


GNU
gzip 1.2.4a


GNU’s gzip is also a
useful benchmark, and it’s one of the tools used on SPEC’s CPU2000
ratings, so I grabbed gzip’s source from the main GNU FTP site. I
picked the latest available on the site, 1.2.4a.




To test gzip, I needed
something to zip and unzip. I ended up using a tar of my /usr/local/
directory, as it had a nice mix of text files, binaries, tar balls,
and even already gzip’d files. Also, it created a 624 MB file, which
is big enough to negate disk or system caching.




I then created a 32-bit
binary and 64-bit binary using GCC 3.3.2. I used “-O3
-mcpu=ultrasparc
” as the compiler CFLAG
for both (with “-m64
for the 64-bit version). I used the time
utility to measure how long it took to run gzip and gunzip on the 624
MB tar file. I ran each operation for the each binary three times and
averaged the results (rounding to the nearest whole number). The
three runs were very consistent.





GNU
gzip 1.2.4a: gzip and gunzip



For the gzip operating,
the 32-bit binary about 20% faster than the 64-bit binary. For the
gunzip operation, the 32-bit binary was nearly identical to the
64-bit runs (91 seconds versus 92 seconds for completion).


MySQL 4.0.17


MySQL was the most challenging, as compilation is
quite a bit more involved than either gzip or OpenSSL. I ran into
several problems getting it compiled for 64-bit, but I was able to
sort them out and ended up with a 64-bit binary. The added
-mcpu=ultrasparc
for the compile flags (for both the C compiler and the C++ compiler,
MySQL uses both), and of course “-m64
for the 64-bit version. The MySQL configure script added “-O3
as a compiler option.


To test MySQL, I used the included sql-bench,
which is a benchmarking toolkit for MySQL and a few other RDBMs. It
consists of a Perl script runs that through a set of operations using
the DBI and DBD::mysql Perl modules. On this system, the full tests
take about 4 hours to run, so I only ran the tests twice, and took
the best of the two. There was very little disparity between the
tests.


It should be noted that for both the 64-bit and
32-bit tests, I used the 32-bit build of sql-bench and MySQL client
libraries, as the Perl I used (5.8.2) is 32-bit. This was done to
keep the client end consistent. To complete each run takes about 4
hours, so I only ran them twice, and used the best run. Again, there
was little disparity between each run.





MySQL:32-bit versus 64-bit (shorter is better)





MySQL:32-bit versus 64-bit (shorter is better)



The MySQL results were
a bit surprising, as two of the operations, insert and select, showed
faster results for the 64-bit binary of the MySQL server than the
32-bit version.


The
Size Factor


Another argument against 64-bit binaries that I
see frequently is their larger size. And indeed, all of the 64-bit
binaries and libraries created for this test were larger:













































Binary



32-bit size (bytes)



64-bit size (bytes)



% Larger



mysqld



3,993,792



4,864,832



22%



gzip



75,472



115,976



54%



openssl (shared)



465,256



539,992



16%



openssl (static)



1,568,640



1,935,736



23%

































Library



32-bit size (bytes)



64-bit size(bytes)



% Larger



libcrypto.so.0.9.7



1,404,964



1,733,864



24%



libssl.so.0.9.7



244,280



294,056



20%



However, the difference wasn’t all that huge, only
around 16% to 54% larger for 64-bit than 32-bit. Unless the system
was an embedded system with very limited storage space, I can’t see
this being all that much of a negative factor.


The
Compile Factor


Getting applications to compile as 64-bit binaries
can be tricky. The build process for some applications, such as
OpenSSL, have 64-bit specifically in mind, and require nothing fancy.
Others, like MySQL and especially PostgreSQL (I was originally going
to include PostgreSQL benchmarks) took quite a bit of tweaking.
There are compiler flags, linker flags, and you’ll likely end up in a
position where you need to know your way around a Makefile.


Also, building a 64-bit capable compiler can be an
experience to behold. That is to say, it can suck royally. After
spending quite a bit of time getting a 64-bit compiler built for (one
of my many) Linux installs, I ended up just going with the
pre-compiled version from http://www.sunfreeware.com.


Both compiling a 64-bit capable compiler or
getting an application to compile 64-bit can be time intensive, as a
compile will often start out fine, and die somewhere down the line.
Then you fix a problem, start the compile again, and it will die
maybe 10 minutes later. Then you’ll fix another issue, and repeat
until (hopefully) the compile finishes cleanly.


The
Library Factor


One important factor in considering whether to
compile/use 64-bit binaries is the problem of shared libraries.
Initially, I hadn’t even thought of the library issue, but when
building 64-bit applications, I came to find it was significant.


The issue is when using a 64-bit application
that’s requires external libraries, those libraries need to be
64-bit; 32-bit libraries won’t work.


Take OpenSSH for example. It’s typically
dynamically linked, and requires libcrypto.so.
If your libcrypto.so
is compiled 32-bit and SSH is compiled for 64-bit, you’ll get an
error:


> /usr/local/bin/ssh


ld.so.1: /usr/local/bin/ssh: fatal:
/usr/local/ssl/lib/libcrypto.so.0.9.7: wrong ELF class: ELFCLASS32


Killed




This means you may very
well need to keep two copies of your shared libraries: One set for
32-bit binaries, and another for 64-bit binaries. Solaris keeps
64-bit libraries for in directories like /usr/lib/sparcv9.
You’ll also need to adjust your LD_LIBRARY_PATH
environment variable to point to where these 64-bit libraries are
located when running a 64-bit binary.




I ran into this time
and time again throughout these tests and throughout the entire
evaluation as a whole, and it was a huge pain in the ass.


Conclusion


While these tests are limited in scope, and there
are far more sophisticated tests that could be performed (such as raw
integer and floating point), this is a start, as I haven’t seen any
32-bit versus 64-bit tests out there. The lack of other benchmarks
seems strange to me; perhaps I didn’t look in the right places. (If
you know of any published benchmarks comparing 32-bit binary
performance versus 64-bit, please let me know).


Keep in mind these tests were performed on the
UltraSPARC platform for Solaris 9, and while they probably would have
relevance to other operating systems and platforms (such as Linux on
x86-64, or FreeBSD on UltraSPARC), specific tests on those platforms
would be far more revealing.


So while the tests I ran were on only a few
applications and in limited ways, the results seem to show that
indeed 64-bits do generally run slower. However, there are there are
a few issues to consider.


One issue is that the difference in performance
varies not only from application to application, but also what
specific operation a given application is performing. Also, the
largest disparity I was able to see was around 20%, and not the
many-times-slower disparity that I’ve seen some claim.


Since this was a limited number of applications in
limited scenarios, the best way to know for yourself is to give the
applications you’re concerned about a try in both modes.


In the end it’s the library and compiling issues
that are the most compelling reasons to stay with 32-bit binaries,
and not so much performance or size. I think it’s safe to say that
you’re not missing out by going with the simpler-to-manage 32-bit
binaries, unless your application can specifically benefit from
64-bit.



Related reading: My Sun Ultra 5 And Me: A Geek Odyssey

Benchmark Player Haters


Benchmarks are always such a contentious and embattled topic. No matter what benchmark you run, invariably someone has a negative comment to make (often rude and obnoxious). If you think otherwise, then write an article
and include benchmarks of some sort and get it published. Then sit back and enjoy the belligerent emails and rude comments.


So why is this? Well, I think it’s a combination of a couple of factors.


There are two pervasive emotional factors in benchmark-bashing. One is when a beloved and heroic operating system or beloved application ends up on the losing end to some vile, contemptible waste-of-time operating system or application. Such choices in operating systems, applications, hardware platforms, choice in databases, etc., are very personal, so it’s easy for some people to take results of a benchmark as an affront to one’s
manhood.


The problem is that benchmarks, by their very nature, are narrow in scope and fail to encompass the complexity of an operating system, application, or hardware platform. As a result, someone with even a mediocre knowledge of the technology can easily poke holes, and make themselves seem smart in the process.


But that’s not what benchmarks are varying depths of exploration into unknown territory. Sometimes they can be very comprehensive, and other times they can be very simple. They answer only the questions they are asked, and can provide a basis for asking other questions.


For instance, in my review of UnixWare 7.1.3, I ran some OpenSSL tests to see if there’d be any performance hit from the Linux emulation layer, known as the LKP. Why did I do this? I had no idea if there would be any performance penalty. No one had tested it before to my knowledge, so predicting the outcome was impossible.


Even someone intimately familiar with the inner workings of UnixWare, Linux, and the methodology that enables the LKP to work, could not know the outcome without running the test. Any supposition as to the results would be just that: a supposition. Anyone who says they know doesn’t know
what they’re talking about, and it’s easy to say so after the fact. Now that those benchmarks have been run, now we know. It was an easy test, and one that I could run.


But still, there were a few he doesn’t know what he’s talking about comments, including one particularly obnoxious guy who posted the same ignorant chest-beating comment in Slashdot and OSNews, like some sort of
cyber-geek-stalker-player-hater. He relies on the virtues of hindsight, looking back and saying of course the results would be such!


If I ever do a review and you want to make a point, drop me a line. If you’re polite about it, I’m happy to discuss it and I’ll even take suggestions for other benchmarks. If you enjoy making obnoxious remarks
about benchmarks done by myself or others, then do your own benchmarks, write it up in an article, and get it published. No one’s stopping you, and it’s not all that difficult.


So keep that in mind as you read my reviews and benchmarks, and as you read benchmarks from others.

70 Comments

  1. 2004-01-22 9:36 pm
  2. 2004-01-22 10:01 pm
  3. 2004-01-22 10:05 pm
  4. 2004-01-22 10:16 pm
  5. 2004-01-22 10:16 pm
  6. 2004-01-22 10:18 pm
  7. 2004-01-22 10:19 pm
  8. 2004-01-22 10:23 pm
  9. 2004-01-22 10:29 pm
  10. 2004-01-22 10:34 pm
  11. 2004-01-22 10:38 pm
  12. 2004-01-22 10:39 pm
  13. 2004-01-22 10:49 pm
  14. 2004-01-22 10:50 pm
  15. 2004-01-22 10:51 pm
  16. 2004-01-22 10:51 pm
  17. 2004-01-22 10:53 pm
  18. 2004-01-22 11:09 pm
  19. 2004-01-22 11:30 pm
  20. 2004-01-22 11:37 pm
  21. 2004-01-22 11:39 pm
  22. 2004-01-22 11:41 pm
  23. 2004-01-22 11:45 pm
  24. 2004-01-22 11:58 pm
  25. 2004-01-23 12:06 am
  26. 2004-01-23 12:18 am
  27. 2004-01-23 12:28 am
  28. 2004-01-23 1:05 am
  29. 2004-01-23 1:54 am
  30. 2004-01-23 2:17 am
  31. 2004-01-23 2:27 am
  32. 2004-01-23 6:11 am
  33. 2004-01-23 6:55 am
  34. 2004-01-23 6:56 am
  35. 2004-01-23 8:35 am
  36. 2004-01-23 9:23 am
  37. 2004-01-23 10:06 am
  38. 2004-01-23 11:03 am
  39. 2004-01-23 3:37 pm
  40. 2004-01-23 7:50 pm
  41. 2004-01-23 7:59 pm
  42. 2004-01-23 9:48 pm
  43. 2004-01-23 10:11 pm
  44. 2004-01-23 10:21 pm
  45. 2004-01-23 10:50 pm
  46. 2004-01-23 11:28 pm
  47. 2004-01-23 11:47 pm
  48. 2004-01-24 1:24 am
  49. 2004-01-24 3:07 am
  50. 2004-01-24 4:16 am
  51. 2004-01-24 6:02 am
  52. 2004-01-24 6:03 am
  53. 2004-01-24 6:04 am
  54. 2004-01-24 6:34 am
  55. 2004-01-24 7:40 am
  56. 2004-01-24 9:15 am
  57. 2004-01-24 9:16 am
  58. 2004-01-24 11:00 am
  59. 2004-01-24 11:20 am
  60. 2004-01-24 12:56 pm
  61. 2004-01-24 1:00 pm
  62. 2004-01-24 1:02 pm
  63. 2004-01-24 2:12 pm
  64. 2004-01-24 3:12 pm
  65. 2004-01-24 3:30 pm
  66. 2004-01-24 5:05 pm
  67. 2004-01-24 8:27 pm
  68. 2004-01-24 11:11 pm
  69. 2004-01-24 11:15 pm
  70. 2004-01-25 5:38 am