posted by Christopher W. Cowell-Shah on Thu 8th Jan 2004 19:33 UTC

"Results, Analysis, Expanding the Benchmark"

Results

Here are the benchmark results presented in both table and graph form. The Python and Python/Psyco results are excluded from the graph since the large numbers throw off the graph's scale and render the other results illegible. All scores are given in seconds; lower is better.

int
math
long
math
double
math

trig

I/O

TOTAL
Visual C++9.618.86.43.510.548.8
Visual C#9.723.917.74.19.965.3
gcc C9.828.89.514.910.073.0
Visual Basic9.823.717.74.130.785.9
Visual J#9.623.917.54.235.190.4
Java 1.3.114.529.619.022.112.397.6
Java 1.4.29.320.26.557.110.1103.1
Python/Psyco29.7615.4100.413.110.5769.1
Python322.4891.9405.747.111.91679.0

Click for a larger view
Click the thumbnail or here for a full-sized graph of the results


Analysis

Let's review the results by returning to the five questions that motivated these benchmarks. First, Java (at least, in the 1.4.2 version) performed very well on most benchmark components when compared to the .NET 2003 languages. If we exclude the trigonometry component, Java performed virtually identically to Visual C++, the fastest of Microsoft's languages. Unfortunately, the trigonometry performance of Java 1.4.2 can only be described as dismal. It was bafflingly bad--worse even than fully interpreted Python! This was especially puzzling given the much faster trigonometry performance of Java 1.3.1, and suggests that there may be more efficient ways to code the benchmark in Java. Perhaps someone with more experience with 1.4.2 can suggest a higher-speed workaround.

Java performed especially well (when discounting the strange trigonometry performance) compared to Microsoft's syntactically equivalent Visual J#. This discrepancy may be due to the additional overhead of the CLR engine (as compared to the overhead of the JVM), or may have something to do with Visual J# implementing only version 1.1.4 of the Java spec.

Second, Microsoft's claim that all four .NET 2003 languages compile into identical MSIL code seemed mostly true for the math routines. The integer math component produced virtually identical scores in all four languages. The long math, double math, and trig scores were identical in Visual C#, Visual Basic, and Visual J#, but the C++ compiler somehow produced impressively faster code for these benchmark components. Perhaps C++ is able to make better use of the Pentium 4's SSE2 SIMD extensions for arithmetic and trigonometry, but this is pure speculation on my part. The I/O scores fell into two clusters, with Visual Basic and Visual J# apparently using much less efficient I/O routines than Visual C# or Visual C++. This is a clear case where functionally identical source code does not compile into identical MSIL code.

Third, Java 1.4.2 performed as well as or better than the fully compiled gcc C benchmark, after discounting the odd trigonometry performance. I found this to be the most surprising result of these tests, since it only seems logical that running bytecode within a JVM would introduce some sort of performance penalty relative to native machine code. But for reasons unclear to me, this seems not to be true for these tests.

Fourth, fully interpreted Python was, as expected, much slower than any of the fully compiled or semi-compiled languages--sometimes by a factor of over 60. It should be noted that Python's I/O performance was in the same league as the fastest languages in this group, and was faster than Visual Basic and Visual J#. The Psyco compiler worked wonders with Python, reducing the time required for the math and trig components to between 10% and 70% of that required for Python without Psyco. This was an astonishing increase, especially considering how easy it is to include Psyco in a Python project.

Fifth, Java 1.4.2 was much faster than Java 1.3.1 in the arithmetic components, but as already mentioned, it lagged way behind the older version on the trigonometry component. Again, I can't help but think that there may be a different, more efficient way to call trigonometric functions in 1.4.2. Another possibility is that 1.4.2 may be trading accuracy for speed relative to 1.3.1, with new routines that are slower but more correct.

What lessons can we take away from all of this? I was surprised to see the four .NET 2003 languages clustered so closely on many of the benchmark components, and I was astonished to see how well Java 1.4.2 did (discounting the trigonometry score). It would be foolish to offer blanket recommendations about which languages to use in which situations, but it seems clear that performance is no longer a compelling reason to choose C over Java (or perhaps even over Visual J#, Visual C#, or Visual Basic)--especially given the extreme advantages in readability, maintainability, and speed of development that those languages have over C. Even if C did still enjoy its traditional performance advantage, there are very few cases (I'm hard pressed to come up with a single example from my work) where performance should be the sole criterion when picking a programming language. I would even argue that that for very complex systems that are designed to be in use for many years, maintainability ought to trump all other considerations (but that's an issue to take up in another article).

Expanding the Benchmark

The most obvious way to make this benchmark more useful is to expand it beyond basic arithmetic, trigonometry, and file I/O. I could also extend the range of languages or variants tested. For example, testing Visual Basic 6 (the last of the pre-.NET versions of VB) would give us an idea how much (if any) of a performance hit the CLR adds to VB. There are other JVMs available to be tested, including the open-source Kaffe and the JVM included with IBM's SDK (which seems to be stuck at version 1.3 of the Java spec). BEA has an interesting JVM called JRockit which promises performance improvements in certain situations, but unfortunately only works on Windows. GNU's gcj front-end to gcc allows Java source code to be compiled all the way to executable machine code, but I don't know how compatible or complete the package is. There are a number of other C compilers available that could be tested (including the highly regarded Intel C compiler), as well as a host of other popular interpreted languages like Perl, PHP, or Ruby. So there's plenty of room for further investigation.

I am by no means an expert in benchmarking; I launched this project largely as a learning experience and welcome suggestions on how to improve these benchmarks. Just remember the limited ambitions of my tests: I am not trying to test all aspects of a system--just a small subset of the fundamental operations on which all programs are built.

About the author:
Christopher W. Cowell-Shah works in Palo Alto as a consultant for the Accenture Technology Labs (the research & development wing of Accenture). He has an A.B. in computer science from Harvard and a Ph.D. in philosophy from Berkeley. Chris is especially interested in issues in artificial intelligence, human/computer interaction and security. His website is www.cowell-shah.com.

Table of contents
  1. "Intro, Why benchmark?"
  2. "Benchmark design"
  3. "Results, Analysis, Expanding the Benchmark"
e p (0)    171 Comment(s)

Technology White Papers

See More