posted by Christopher W. Cowell-Shah on Thu 8th Jan 2004 19:33 UTC

"Benchmark design"

Designing good, helpful benchmarks is fiendishly difficult. This fact led me to keep the scope of this benchmark quite limited. I tested only math operations (32-bit integer arithmetic, 64-bit integer arithmetic, 64-bit floating point arithmetic, and 64-bit trigonometry), and file I/O with sequential access. The tests were not comprehensive by any stretch of the imagination; I didn't test string manipulation, graphics, object creation and management (for object oriented languages), complex data structures, network access, database access, or any of the countless other things that go on in any non-trivial program. But I did test some basic building blocks that form the foundation of many programs, and these tests should give a rough idea of how efficiently various languages can perform some of their most fundamental operations.

Here's what happens in each part of the benchmark:

32-bit integer math: using a 32-bit integer loop counter and 32-bit integer operands, alternate among the four arithmetic functions while working through a loop from one to one billion. That is, calculate the following (while discarding any remainders):

1 - 1 + 2 * 3 / 4 - 5 + 6 * 7 / 8 - ... - 999,999,997 + 999,999,998 * 999,999,999 / 1,000,000,000

64-bit integer math: same algorithm as above, but use a 64-bit integer loop counter and operands. Start at ten billion and end at eleven billion so the compiler doesn't knock the data types down to 32-bit.
64-bit floating point math: same as for 64-bit integer math, but use a 64-bit floating point loop counter and operands. Don't discard remainders.
64-bit floating point trigonometry: using a 64-bit floating point loop counter, calculate sine, cosine, tangent, logarithm (base 10) and square root of all values from one to ten million. I chose 64-bit values for all languages because some languages required them, but if a compiler was able to convert the values to 32 bits, I let it go ahead and perform that optimization.
I/O: Write one million 80-character lines to a text file, then read the lines back into memory.

At the end of each benchmark component I printed a value that was generated by the code. This was to ensure that compilers didn't completely optimize away portions of the benchmarks after seeing that the code was not actually used for anything (a phenomenon I discovered when early versions of the benchmark returned bafflingly optimistic results in Java 1.4.2 and Visual C++). But I wanted to let the compilers optimize as much as possible while still ensuring that every line of code ran. The optimization settings I settled on were as follows:

Java 1.3.1: compiled with javac -g:none -O to exclude debugging information and turn on optimization, ran with java -hotspot to activate the just-in-time compiler within the JVM.
Java 1.4.2: compiled with javac -g:none to exclude debugging information, ran with java -server to use the slower-starting but faster-running server configuration of the JVM.
C: compiled with gcc -march=pentium4 -msse2 -mfpmath=sse -O3 -s -mno-cygwin to optimize for my CPU, enable SSE2 extensions for as many math operations as possible, and link to Windows libraries instead of Cygwin libraries.
Python with and without Psyco: no optimization used. The python -O interpreter flag optimizes Python for fast loading rather than fast performance, so was not used.
Visual Basic: used "release" configuration, turned on "optimized," turned off "integer overflow checks" within Visual Studio.
Visual C#: used "release" configuration, turned on "optimize code" within Visual Studio.
Visual C++: used "release" configuration, turned on "whole program optimization," set "optimization" to "maximize speed," turned on "global optimizations," turned on "enable intrinsic functions," set "favor size or speed" to "favor fast code," set "omit frame pointers" to "yes," set "optimize for processor" to "Pentium 4 and above," set "buffer security check" to "no," set "enable enhanced instruction set" to "SIMD2," and set "optimize for Windows98" to "no" within Visual Studio.
Visual J#: used "release" configuration, turned on "optimize code," turned off "generate debugging information" within Visual Studio.

All benchmark code can be found at my website. The Java benchmarks were created with the Eclipse IDE, but were compiled and run from the command line. I used identical source code for the Java 1.3.1, Java 1.4.2, and Visual J# benchmarks. The Visual C++ and gcc C benchmarks used nearly identical source code. The C program was written with TextPad, compiled using gcc within the Cygwin bash shell emulation layer for Windows, and run from the Windows command line after quitting Cygwin. I programmed the Python benchmark with TextPad and ran it from the command line. Adding Psyco's just-in-time compilation to Python was simple: I downloaded Psyco from Sourceforge and added import psyco and psyco.full() to the top of the Python source code. The four Microsoft benchmarks were programmed and compiled within Microsoft Visual Studio .NET 2003, though I ran each program's .exe file from the command line.

It should be noted that the Java log() function computes natural logarithms (using e as a base), whereas the other languages compute logarithms using base 10. I only discovered this after running the benchmarks, and I assume it had little or no effect on the results, but it does seem strange that Java has no built-in base 10 log function.

Before running each set of benchmarks I defragged the hard disk, rebooted, and shut down unnecessary background services. I ran each benchmark at least three times and used the best score from each component, assuming that slower scores were the result of unrelated background processes getting in the way of the CPU and/or hard disk. Start-up time for each benchmark was not included in the performance results. The benchmarks were run on the following hardware:

Type: Dell Latitude C640 Notebook
CPU: Pentium 4-M 2GHz
RAM: 768MB
Hard Disk: IBM Travelstar 20GB/4500RPM
Video: Radeon Mobility 7500/32MB
OS: Windows XP Pro SP 1
File System: NTFS

Table of contents
  1. "Intro, Why benchmark?"
  2. "Benchmark design"
  3. "Results, Analysis, Expanding the Benchmark"
e p (0)    171 Comment(s)

Technology White Papers

See More