This article discusses a small-scale benchmark test run on nine modern computer languages or variants: Java 1.3.1, Java 1.4.2, C compiled with gcc 3.3.1, Python 2.3.2, Python compiled with Psyco 1.1.1, and the four languages supported by Microsoft’s Visual Studio .NET 2003 development environment: Visual Basic, Visual C#, Visual C++, and Visual J#. The benchmark tests arithmetic and trigonometric functions using a variety of data types, and also tests simple file I/O. All tests took place on a Pentium 4-based computer running Windows XP. Update: Delphi version of the benchmark here.
Five questions motivated me to design and run these benchmarks. First, I was curious about how the performance of Java 1.4.2 (the latest official version from Sun) compares to that of Microsoft’s relatively new .NET 2003 suite of languages. Both Java and the .NET languages are “semi-compiled” (or, looking at the flip side of the coin, “semi-interpreted”). By this I mean that source code is compiled into intermediate-level code and then run by a combination interpreter/just-in-time compiler. With Java, the intermediate language is called bytecode and the interpreter/compiler is called a Java Virtual Machine (JVM). Source code in the .NET world is compiled into the Microsoft Intermediate Language (MSIL) and is run on the .NET Common Language Runtime (CLR) engine.
The .NET languages benefit from many of the same features that have made Java so popular, including automatic resource management/garbage collection and type safety. They also add interesting new features and conveniences such as cross-language debugging, easy GUI design, and virtually idiot-proof application deployment. But what is the performance penalty of these new features? By adding layers of complexity to its programming model, has Microsoft given up its speed advantage over Java?
Microsoft makes it especially easy to compare the overhead of the Java and .NET frameworks by including J# in the .NET suite. This language is syntactically identical to Java (although it implements only version 1.1.4 of the Java spec, which is by now quite out of date), so any differences in speed between Java and J# should be attributable purely to differences between the Sun and Microsoft runtime overhead.
Second, I wanted to assess Microsoft’s claim that the same routine coded in any of the .NET languages is compiled into identical MSIL code which will ultimately run at the same speed. This led me to keep the benchmark very simple, so that I could make sure the routines in each of the .NET languages really were functionally identical. Would all four languages really run at the same speed?
Third, I was curious to see how much slower Java or the .NET languages are than a fully compiled language like C, especially when the C program is unburdened by the runtime overhead of the CLR. I first tried to eliminate the CLR from the Visual C++ benchmark by turning off the language’s “managed” features with the
#pragma unmanaged directive, but I was surprised to see that this didn’t lead to any performance gains. After that strategy failed, I recompiled the Visual C++ program with Gnu’s gcc C compiler in order to give C every opportunity to shine in its native, unmanaged, CRL-free form.
Fourth, I wanted to find out how semi-compiled languages compare to fully interpreted languages like Python, Perl or PHP. It is often said that as hardware continues to get faster and cheaper we will reach a point where the extra speed of compiled languages will be largely unnecessary. But if there is still an order-of-magnitude difference between the performance of a routine coded in C and the same algorithm coded in Python, we would be wise to keep our C skills up to date. To test this, I wrote another version of the benchmark in Python. I then re-ran the Python benchmark with the Psyco just-in-time compiler to see if we could combine Python’s spectacular readability and rapid development with the speed of a compiled language. Greedy perhaps, but worth a try.
Finally, I thought it would be interesting to see how Sun’s latest Java release compares to earlier versions. Sun has makes strong claims about performance improvements in the 1.4.2 version of its compiler and JVM relative to the earlier 1.3.1 release, and I wanted to see if the performance lived up to the hype. So I added Java 1.3.1 to the benchmark roster.
Designing good, helpful benchmarks is fiendishly difficult. This fact led me to keep the scope of this benchmark quite limited. I tested only math operations (32-bit integer arithmetic, 64-bit integer arithmetic, 64-bit floating point arithmetic, and 64-bit trigonometry), and file I/O with sequential access. The tests were not comprehensive by any stretch of the imagination; I didn’t test string manipulation, graphics, object creation and management (for object oriented languages), complex data structures, network access, database access, or any of the countless other things that go on in any non-trivial program. But I did test some basic building blocks that form the foundation of many programs, and these tests should give a rough idea of how efficiently various languages can perform some of their most fundamental operations.
Here’s what happens in each part of the benchmark:
32-bit integer math: using a 32-bit integer loop counter and 32-bit integer operands, alternate among the four arithmetic functions while working through a loop from one to one billion. That is, calculate the following (while discarding any remainders):
1 – 1 + 2 * 3 / 4 – 5 + 6 * 7 / 8 – … – 999,999,997 + 999,999,998 * 999,999,999 / 1,000,000,000
64-bit integer math: same algorithm as above, but use a 64-bit integer loop counter and operands. Start at ten billion and end at eleven billion so the compiler doesn’t knock the data types down to 32-bit.
64-bit floating point math: same as for 64-bit integer math, but use a 64-bit floating point loop counter and operands. Don’t discard remainders.
64-bit floating point trigonometry: using a 64-bit floating point loop counter, calculate sine, cosine, tangent, logarithm (base 10) and square root of all values from one to ten million. I chose 64-bit values for all languages because some languages required them, but if a compiler was able to convert the values to 32 bits, I let it go ahead and perform that optimization.
I/O: Write one million 80-character lines to a text file, then read the lines back into memory.
At the end of each benchmark component I printed a value that was generated by the code. This was to ensure that compilers didn’t completely optimize away portions of the benchmarks after seeing that the code was not actually used for anything (a phenomenon I discovered when early versions of the benchmark returned bafflingly optimistic results in Java 1.4.2 and Visual C++). But I wanted to let the compilers optimize as much as possible while still ensuring that every line of code ran. The optimization settings I settled on were as follows:
Java 1.3.1: compiled with
javac -g:none -O to exclude debugging information and turn on optimization, ran with
java -hotspot to activate the just-in-time compiler within the JVM.
Java 1.4.2: compiled with
javac -g:none to exclude debugging information, ran with
java -server to use the slower-starting but faster-running server configuration of the JVM.
C: compiled with
gcc -march=pentium4 -msse2 -mfpmath=sse -O3 -s -mno-cygwin to optimize for my CPU, enable SSE2 extensions for as many math operations as possible, and link to Windows libraries instead of Cygwin libraries.
Python with and without Psyco: no optimization used. The
python -O interpreter flag optimizes Python for fast loading rather than fast performance, so was not used.
Visual Basic: used “release” configuration, turned on “optimized,” turned off “integer overflow checks” within Visual Studio.
Visual C#: used “release” configuration, turned on “optimize code” within Visual Studio.
Visual C++: used “release” configuration, turned on “whole program optimization,” set “optimization” to “maximize speed,” turned on “global optimizations,” turned on “enable intrinsic functions,” set “favor size or speed” to “favor fast code,” set “omit frame pointers” to “yes,” set “optimize for processor” to “Pentium 4 and above,” set “buffer security check” to “no,” set “enable enhanced instruction set” to “SIMD2,” and set “optimize for Windows98” to “no” within Visual Studio.
Visual J#: used “release” configuration, turned on “optimize code,” turned off “generate debugging information” within Visual Studio.
All benchmark code can be found at my website. The Java benchmarks were created with the Eclipse IDE, but were compiled and run from the command line. I used identical source code for the Java 1.3.1, Java 1.4.2, and Visual J# benchmarks. The Visual C++ and gcc C benchmarks used nearly identical source code. The C program was written with TextPad, compiled using gcc within the Cygwin bash shell emulation layer for Windows, and run from the Windows command line after quitting Cygwin. I programmed the Python benchmark with TextPad and ran it from the command line. Adding Psyco’s just-in-time compilation to Python was simple: I downloaded Psyco from Sourceforge and added
import psyco and
psyco.full() to the top of the Python source code. The four Microsoft benchmarks were programmed and compiled within Microsoft Visual Studio .NET 2003, though I ran each program’s
.exe file from the command line.
It should be noted that the Java
log() function computes natural logarithms (using e as a base), whereas the other languages compute logarithms using base 10. I only discovered this after running the benchmarks, and I assume it had little or no effect on the results, but it does seem strange that Java has no built-in base 10 log function.
Before running each set of benchmarks I defragged the hard disk, rebooted, and shut down unnecessary background services. I ran each benchmark at least three times and used the best score from each component, assuming that slower scores were the result of unrelated background processes getting in the way of the CPU and/or hard disk. Start-up time for each benchmark was not included in the performance results. The benchmarks were run on the following hardware:
Type: Dell Latitude C640 Notebook
CPU: Pentium 4-M 2GHz
Hard Disk: IBM Travelstar 20GB/4500RPM
Video: Radeon Mobility 7500/32MB
OS: Windows XP Pro SP 1
File System: NTFS
Here are the benchmark results presented in both table and graph form. The Python and Python/Psyco results are excluded from the graph since the large numbers throw off the graph’s scale and render the other results illegible. All scores are given in seconds; lower is better.
Click the thumbnail or here for a full-sized graph of the results
Let’s review the results by returning to the five questions that motivated these benchmarks. First, Java (at least, in the 1.4.2 version) performed very well on most benchmark components when compared to the .NET 2003 languages. If we exclude the trigonometry component, Java performed virtually identically to Visual C++, the fastest of Microsoft’s languages. Unfortunately, the trigonometry performance of Java 1.4.2 can only be described as dismal. It was bafflingly bad–worse even than fully interpreted Python! This was especially puzzling given the much faster trigonometry performance of Java 1.3.1, and suggests that there may be more efficient ways to code the benchmark in Java. Perhaps someone with more experience with 1.4.2 can suggest a higher-speed workaround.
Java performed especially well (when discounting the strange trigonometry performance) compared to Microsoft’s syntactically equivalent Visual J#. This discrepancy may be due to the additional overhead of the CLR engine (as compared to the overhead of the JVM), or may have something to do with Visual J# implementing only version 1.1.4 of the Java spec.
Second, Microsoft’s claim that all four .NET 2003 languages compile into identical MSIL code seemed mostly true for the math routines. The integer math component produced virtually identical scores in all four languages. The long math, double math, and trig scores were identical in Visual C#, Visual Basic, and Visual J#, but the C++ compiler somehow produced impressively faster code for these benchmark components. Perhaps C++ is able to make better use of the Pentium 4’s SSE2 SIMD extensions for arithmetic and trigonometry, but this is pure speculation on my part. The I/O scores fell into two clusters, with Visual Basic and Visual J# apparently using much less efficient I/O routines than Visual C# or Visual C++. This is a clear case where functionally identical source code does not compile into identical MSIL code.
Third, Java 1.4.2 performed as well as or better than the fully compiled gcc C benchmark, after discounting the odd trigonometry performance. I found this to be the most surprising result of these tests, since it only seems logical that running bytecode within a JVM would introduce some sort of performance penalty relative to native machine code. But for reasons unclear to me, this seems not to be true for these tests.
Fourth, fully interpreted Python was, as expected, much slower than any of the fully compiled or semi-compiled languages–sometimes by a factor of over 60. It should be noted that Python’s I/O performance was in the same league as the fastest languages in this group, and was faster than Visual Basic and Visual J#. The Psyco compiler worked wonders with Python, reducing the time required for the math and trig components to between 10% and 70% of that required for Python without Psyco. This was an astonishing increase, especially considering how easy it is to include Psyco in a Python project.
Fifth, Java 1.4.2 was much faster than Java 1.3.1 in the arithmetic components, but as already mentioned, it lagged way behind the older version on the trigonometry component. Again, I can’t help but think that there may be a different, more efficient way to call trigonometric functions in 1.4.2. Another possibility is that 1.4.2 may be trading accuracy for speed relative to 1.3.1, with new routines that are slower but more correct.
What lessons can we take away from all of this? I was surprised to see the four .NET 2003 languages clustered so closely on many of the benchmark components, and I was astonished to see how well Java 1.4.2 did (discounting the trigonometry score). It would be foolish to offer blanket recommendations about which languages to use in which situations, but it seems clear that performance is no longer a compelling reason to choose C over Java (or perhaps even over Visual J#, Visual C#, or Visual Basic)–especially given the extreme advantages in readability, maintainability, and speed of development that those languages have over C. Even if C did still enjoy its traditional performance advantage, there are very few cases (I’m hard pressed to come up with a single example from my work) where performance should be the sole criterion when picking a programming language. I would even argue that that for very complex systems that are designed to be in use for many years, maintainability ought to trump all other considerations (but that’s an issue to take up in another article).
Expanding the Benchmark
The most obvious way to make this benchmark more useful is to expand it beyond basic arithmetic, trigonometry, and file I/O. I could also extend the range of languages or variants tested. For example, testing Visual Basic 6 (the last of the pre-.NET versions of VB) would give us an idea how much (if any) of a performance hit the CLR adds to VB. There are other JVMs available to be tested, including the open-source Kaffe and the JVM included with IBM’s SDK (which seems to be stuck at version 1.3 of the Java spec). BEA has an interesting JVM called JRockit which promises performance improvements in certain situations, but unfortunately only works on Windows. GNU’s gcj front-end to gcc allows Java source code to be compiled all the way to executable machine code, but I don’t know how compatible or complete the package is. There are a number of other C compilers available that could be tested (including the highly regarded Intel C compiler), as well as a host of other popular interpreted languages like Perl, PHP, or Ruby. So there’s plenty of room for further investigation.
I am by no means an expert in benchmarking; I launched this project largely as a learning experience and welcome suggestions on how to improve these benchmarks. Just remember the limited ambitions of my tests: I am not trying to test all aspects of a system–just a small subset of the fundamental operations on which all programs are built.
About the author:
Christopher W. Cowell-Shah works in Palo Alto as a consultant for the Accenture Technology Labs (the research & development wing of Accenture). He has an A.B. in computer science from Harvard and a Ph.D. in philosophy from Berkeley. Chris is especially interested in issues in artificial intelligence, human/computer interaction and security. His website is www.cowell-shah.com.