Linked by Tony Bourke on Thu 22nd Jan 2004 21:29 UTC
Benchmarks When running tests, installing operating systems, and compiling software for my Ultra 5, I came to the stunning realization that hey, this system is 64-bit, and all of the operating systems I installed on this Ultra 5 (can) run in 64-bit mode.
Permalink for comment
To read all comments associated with this story, please click here.
Misconceptions
by jizzles on Fri 23rd Jan 2004 19:50 UTC

@Gandalf

"On the other hand using 64bit registers/pointers take up more memory. IE if you are deadling with 32bit integer variables, then you are essentially waisting 32bits of memory for each variable, and loading/storing from/to memory takes also more time. Therefore your applications slow down if you use 64bit builds that only deal with 32bit (integer) data."

Generally not true. There are multiple levels of the memory hierarchy, and traffic between main memory and caches generally happens in blocks (i.e. 32 or 64 bytes) at a time, which can read and written in burst modes. The on-die L1 cache is what the processor does actual reads and writes from, and generally this connection is as wide as a register (i.e. 64 bits). Whether you read 8 bits or 32 bits or 64 bits to/from the L1 cache doesn't make a damn bit of difference.

What does contribute to slowdown of 64 bit application is that more space (both static data and heap data) takes up more space, reducing the overall effectiveness of the cache.

"OpenSSL is using lots of floating point calculations, therefore performance is better with a 64bits build: the overhead of the 64bit pointers is not as bad as the benefit from using 64bit floating point registers directly."

Generally floating point registers on modern processors are 64 or 80 bits wide (sometimes 128), regardless of the size of an integer register. There is a benefit to having a larger bus between the register file and the L1 cache, however.

@Megol

"Do you really belive that compilers model processors to that detail? They don't. There is no need to do that (and is in reality impossible) to get good performance out of the processor, and that applies for human coders also."

As Rayiner said, yes they do. GCC has a fairly general and configurable processor description language. Other compilers for specific architectures can go into even greater detail and perform optimizations that would be impossible to express in this language or mean nothing on a different architecture. There is a need to do that in some (very limited) situations. Generally a person will profile the code to identify hot parts of the code and begin removing bottlenecks in those areas. When two or three extremely hot paths of code are identified and the codebase is fairly stable and mature, then it may be worthwhile to optimize those few dozen or couple hundred lines to death for the best performance.