Linked by Christopher W. Cowell-Shah on Thu 8th Jan 2004 19:33 UTC
General Development This article discusses a small-scale benchmark test run on nine modern computer languages or variants: Java 1.3.1, Java 1.4.2, C compiled with gcc 3.3.1, Python 2.3.2, Python compiled with Psyco 1.1.1, and the four languages supported by Microsoft's Visual Studio .NET 2003 development environment: Visual Basic, Visual C#, Visual C++, and Visual J#. The benchmark tests arithmetic and trigonometric functions using a variety of data types, and also tests simple file I/O. All tests took place on a Pentium 4-based computer running Windows XP. Update: Delphi version of the benchmark here.
Order by: Score:
misses some test
by k on Thu 8th Jan 2004 19:41 UTC

What about a mono or portable.net test with the same benchmark ? I would be *very* curious of knowing which level of performance they can achieve.

gcc and python in their native environments?
by ponds on Thu 8th Jan 2004 19:43 UTC

I'm curious as to how much better gcc and python would perform in a POSIX environment, especially gcc linked to glibc rather than to the windows C libraries.

Also, what happened to perl?

Interesting...
by Jack Hughes on Thu 8th Jan 2004 19:43 UTC

.. Any chance of trying the intel and the openwatcom compilers?

.. It would also be interesting to try the benchmarks on another operating system to see if the same level of differences are observed.

All .net languages will perform the same!
by Anonymous on Thu 8th Jan 2004 19:50 UTC

All .net languages will perform exactly the same because they are all compiled down to the CLR comman language runtime.
So your VB.net app will perform the same as C# and so will your Delphi.net, cobol.net etc etc etc.

You only really need to benchmark C#

RE: gcc and python in their native environments?
by Anonymous on Thu 8th Jan 2004 19:51 UTC

Perl is not a compile language... perhaps thats why it was left out.

Mike

Python is interpreted
by snowflake on Thu 8th Jan 2004 19:53 UTC

>Perl is not a compile language... perhaps thats why it >was left out.

>Mike

Python is interpreted and it wasn't left out.

RE : All .net languages will perform the same
by Yoni on Thu 8th Jan 2004 19:55 UTC

All .Net languages are not the same! while they might in simple cases produce the same MSIL code. In more complex situations they will not, leading ofcourse to diffenet results.

RE: Python is interpreted
by Anonymous on Thu 8th Jan 2004 19:57 UTC

>Python is interpreted and it wasn't left out.

Actually... if you read the posting, he compiled the python code with Psyco. Also python is compiled into byte code at runtime for fast execution too. Perl does not behave like this nor can you compile it. So my commet remains.

Mike

>In more complex situations they will not, leading ofcourse to diffenet results.

Well said Yoni.

How to remove the CLR
by null_pointer_us on Thu 8th Jan 2004 19:59 UTC

>> Article: I first tried to eliminate the CLR from the Visual C++ benchmark by turning off the language's "managed" features with the #pragma unmanaged directive, but I was surprised to see that this didn't lead to any performance gains.

Just start a new, unmanaged project and add your Standard C++ code to it.

RE: Python is interpreted
by Jason Lotito on Thu 8th Jan 2004 20:01 UTC

Actually, if you read the posting, he benchmarked Python with Psyco, and also without, getting both. And, if you read the posting, he also mentioned it would be interesting to see what Perl, PHP, and Ruby results would look like.

Re:
by hmmm on Thu 8th Jan 2004 20:01 UTC

Comparing server VM with client VM is invalid.
He had to put both in both cases.

I tested 1.4.2 vs. 1.3.1 (both SUN VMs) and 1.4.2 is 3 times slower than 1.3.1 (they rewrote the VM itself and System.arraycopy() that is being used everywhere got 3x slower).

I have failed a bug report on that. Reply was: "Too late", which disappointed me as this is a major regression and had to be caught by their QA people, not me.

Well, let's hope that 1.5 they improve performance to 1.3.1 level.

Virtual machine does not mean slower
by Kasper on Thu 8th Jan 2004 20:02 UTC

The author states that he is surprised that java performs better than compiled code.... This really shouldn't be a surprise. The Java virtual machine compiles its code just like a c++ compiler. There is just one big difference. The c++ compiler compiles its code before it is run, and Java while it is being run. In other words, Java actually knows more about how the code is used, which in theory should let it reach better performance than c++. In real life though, its just recently (the last couple of years) that Java has actually approached (and in some cases passed) c++.

Lisp?
by Brian on Thu 8th Jan 2004 20:03 UTC

I suggest he should benchmark Common Lisp (perhaps using Corman Common Lisp, Allegro or LispWorks on Windows). Common Lisp is a native-compiled language which provides even more dynamism than the popular interpreted languages like Perl, Python, Ruby and PHP. It's not hard to learn for someone used to these languages and may provide a nice surprise for the benchmark results.

Java-1.4.2 trig is simple mistake
by Hong Zhang on Thu 8th Jan 2004 20:04 UTC

The default math library is compiled with -O0 to preserve strict IEEE semantics. In fact, with minor change to the source code, -O2 will work as well. Java has two math libs, Math and StrictMath. They are default to the same implementation. But JVM is allowed to use faster/less accurate version of Math. The VC++ uses loose math (x86 trig instructions directly).

RE:Python is interpreted
by Anonymous on Thu 8th Jan 2004 20:04 UTC

Perl is compiled in a similar manner to python - it just isn't written to disk. Perl 6 more so - read about Parrot if you're interested. Why perl isn't included is really a question to the author and if I were to guess it would perform similarly to C++ since it generally wraps the standard libraries unless i/o is included in the bench - in which case there's a penalty for parsing the file...Then it should be comparable to python. How 'bout it Christopher - run a perl bench?

Will

Please post your sources
by Ben Maurer on Thu 8th Jan 2004 20:05 UTC

Hello,

Given the large differences between VB.net and C#, it is very likely you are doing something wrong. You may be mistakenly using a different construct. The `native' VB IO functions may be much slower than the standard CLR classes (System.IO). If you are using the Visual Basic library, you are not fairly testing the language. Again, you must post your source code to allow independant review.

As well, I would love to run the C# tests on Mono.

Another thing I should point out, most applications *DO NOT* involve intensive IO or Math alone. This is not a measure of true application performance. You are mearly measuring how well the JIT or compiler emits code for a specific case. I am sure any of the JIT developer could optimize for this specific test case. I think prehaps the more interesting view you could take is `what language provides highspeed building blocks -- such as collections classes, callback functionality, and object creation.' The answer to this question is *MUCH* less of a micro-benchmark.

Also, I would add, for JIT'd languages, you should call the function you are requesting once before you do the call that you time. Depending on how you structure your run, you may end up counting JIT time. Although JIT time can matter in a 60 second benchmark, when running a web server for days, weeks or even months at a time it really does not matter. In fact, many applications use a native code precompiler to reduce startup time (under Mono, Miguel de Icaza often reports performance improvements of over 30% by using AOT compilation of our C# compiler mcs.exe [times are for the compilation of our mscorlib library, consisting of 1000 C# source files]). However, AOT does loose out in a large bench mark like this because it is forced to generate sub-optimal code (like a C++ compiler). So, it is much more fair to allow for a warm up runt to allow the runtime to JIT the code.

-- Ben

Thanks for taking the time to post the article.
by Blah on Thu 8th Jan 2004 20:05 UTC

It's funny that there are so many hardware sites that benchmark every aspect of CPU's, chipsets, and graphics cards, but few people bother to benchmark software, programming languages, and operating systems.

RE: misses some test
by Chris on Thu 8th Jan 2004 20:09 UTC

I would also like to see mono and portable .NET benchmarks. Also gcc in a POSIX environment.

By the way, I liked the article - much higher quality than what you normally see here on osnews.

Re: Java, and legal considerations
by Dawnrider on Thu 8th Jan 2004 20:11 UTC

Firstly, Java code should, in the general best cases perform in the same manner as a well compiled C++ program. If we are doing pure loops and integer/FP tasks, there should be virtually nothing in it. A C++ compiler doing this properly should produce the same output as Java as a base case. A good C++ compiler using architecture optimisations should be able to do even better, though. The Java has the overhead of the VM and the JIT process, the additional predictiveness of which should be negated by a repetitive looping test anyway. Similarly, a well compiled benchmark from C and C++ should always be faster than a managed .Net application. The distance between the two will vary, but it should still be faster.

These benchmarks are rather daft, anyway, since they manage to avoid using any sort of objects. Java is meaningless for most real tasks without creating and manipulating objects (otherwise you're basically writing C anyway), and objects are where Java really does slow down.

Last of all, I'd like to draw the author's attention to the .Net framework EULAs... It is in fact a violation of the EULA to produce benchmarks of this sort of .Net against other platforms. Which is why they haven't been done all over the place by now ;)

total vs. geometric average?
by JBQ on Thu 8th Jan 2004 20:13 UTC

Very interesting results. Sadly the sorting criteria (using the total instead of a geometric average) is unusual, and favors the languages that optimize the slow operations. The results of double math and trig show some big variations between languages (3:1 for double, more than 15:1 for trig) but this is not properly reflected in the results (in my humble opinion).

Here are the numbers with the geometric average. Notice how Java 1.3.1 suddenly appears much slower than Visual J# or Java 1.4.2, and Python/Psyco is far ahead of Python (the arithmetic average doesn't show the improvement on the trig test).

Visual C++: 8.4
Visual C#: 11.1
gcc C: 13.2
Visual Basic: 13.9
Visual J#: 14.2
Java 1.4.2: 14.8
Java 1.3.1: 18.6
Python/Psyco: 47.9
Python: 145.5

Ignoring Python for the moment, it's interesting to see that Java 1.3.1 is the only one that is far off the lead on most tests. gcc only need to improve on trig and long math, Visual C#/Basic/J# all have issues with long math and double math, with Visual Basic and J# suffering from slow I/O.

Java 1.4.2 has a very obvious and sever issue with trig. If that test was as fast as 1.3.1, Java 1.4.2 would score 12.2, very close to the lead. If it could be made to score 4.2 like Visual J#, that score would fall to 8.7, barely slower than Visual C++.

The differences between the various MSIL/CLR languages is also very interesting. It's obvious that VC++ manages to issue better 64-bit code than the rest of the pack, and that I/O is the only differentiator between Visual C#, Basic and J#.

Final note...
by Dawnrider on Thu 8th Jan 2004 20:14 UTC

If we're freely violating EULAs and all the rest of it, can anyone test the C++ code on Linux and the Java code on IBM's VM? Both should be quite different.

Where for art thou Ruby?
by Mr. Banned on Thu 8th Jan 2004 20:15 UTC

I've been strongly considering picking Ruby up as my next language to learn, but it's hard to find a lot of information (recent info, at least) on it that's in English.

I was hoping to see it benchmarked as well... That could have been the push I need to get learnin' it.

Does anyone here know both Java, Python & Ruby? Any thoughts as to speed, or reccomendations one way or the other?

Re: Please post your sources
by Jeff on Thu 8th Jan 2004 20:16 UTC

He did, on the second page of the article:

http://www.ocf.berkeley.edu/~cowell/research/benchmark/code/

VC++
by Carl Spackler on Thu 8th Jan 2004 20:16 UTC

I would have ran VC++ 6 instead of VC++ .NET (or whatever it is called). Since the author didn't know how to create an unmanaged project in VC++, I don't believe his results when it comes to VC++.

I addition I run STLPort, instead of MS's STL, which is much faster.

mingw
by Sagres on Thu 8th Jan 2004 20:18 UTC

My results for mingw (instead of cygwin) on my athlon xp 2.4
(i just felt like i should test it ;)

Start C benchmark
Int arithmetic elapsed time: 6125 ms with intMax of 1000000000
i: 1000000001
intResult: 1
Double arithmetic elapsed time: 5687 ms with doubleMin 10000000000.000000, doubleMax 11000000000.000000
i: 11000000000.000000
doubleResult: 10011632717.388229
Long arithmetic elapsed time: 20016 ms with longMin 10000000000, longMax 11000000000
i: 11000000000
longResult: 776627965
Trig elapsed time: 6750 ms with max of 10000000
i: 10000000.000000
sine: 0.990665
cosine: -0.136322
tangent: -7.267119
logarithm: 7.000000
squareRoot: 3162.277502
I/O elapsed time: 5484 ms with max of 1000000
last line: abcdefghijklmnopqrstuvwxyz1234567890abcdefghijklmnopqrstuvwxyz12345678 90abcdefgh
Total elapsed time: 44062 ms
Stop C benchmark


Great article btw.

Completly useless!
by Mike on Thu 8th Jan 2004 20:23 UTC

Doing so simple maths and IO tests is completly useless.

Real differences lies within string/character manipulations, memory allocations, searching, sorting, garbage collecting, virtual calls througs classes, etc..

If you have some more time to spend on this benchmark, try thoses. Mileages will be much more differents.

More tests, and don't use the best of....
by Steve on Thu 8th Jan 2004 20:23 UTC

Please don't post the best of only 3 runs, it's silly to do so because you are not getting a good sample of data.

You should to have run more like 10 runs, especially because the micro-benchmarks that you have produced can be run automatically in the background many times withyou user intervention.

Then you should provide the mean and median of the runs, along with all the data from each run.

On the topic of the Java benchmark:
The Java tests you should use both the server vm and the client VM and compare results. The 2 vms are actually very different. The java benchmark isn't really doing much but testing the interpreter as I doubt that much of that code is actually being compiled to native code. I think that it's fairly well know that for short running programs Java is slow ( but for long running programs it's fairly competive). In the jave benchmark you shouldn't be using new Date().getTime() to get the time, you should instead use System.currentTimeMillis() as it is faster and doesn't involve the creation of more objects.

Re: Where for art thou Ruby?
by bsdrocks on Thu 8th Jan 2004 20:27 UTC

I think, it would be best to wait until after 2.0... Because, Ruby is kind of little slow, not thread-safe and many others. It forced me to learn the different language, but I will come back to Ruby when the 2.0 is released. I love Ruby, it's easy and clean to me.

Re: Re: Java, and legal considerations
by Jeff on Thu 8th Jan 2004 20:28 UTC

Last of all, I'd like to draw the author's attention to the .Net framework EULAs... It is in fact a violation of the EULA to produce benchmarks of this sort of .Net against other platforms. Which is why they haven't been done all over the place by now ;)

If that is true, it's rather astonishing! "BTW, if you use our product, you are forbidden to discuss its performance publicly." It sounds like they have no faith in their product ;)

Synthetic benchmarks
by jizzles on Thu 8th Jan 2004 20:39 UTC

Small synthetic benchmarks are generally not representative of real programs. Typically a benchmark suite of real applications that compute real things people are interested in are the best indicator, but unfortunately it is hard to find a large enough suite implemented well in a large enough number of languages to matter.

Even so, I will say this.

Java is the real star of this benchmarking effort. The conventional thinking of people who say "Java? Bytecode? VM? It will always be slow!" is clearly in error. A huge (and I do mean huge) amount of engineering effort by thousands of smart people from all kinds of institutions has gone into designing and building high-performance virtual machines these days, and Java, through mainly SUN and IBM's efforts, has been the principal recipient of those benefits. JIT compilers are extremely advanced, far ahead in many areas than static compilers. It is no wonder that you see the performance gap rapidly closing--though it shouldn't be called a gap because the potential to also exceed static compilation is huge.

The speed of the language has less and less to do with the speed of the resulting application these days. What matters most now (and it has always mattered) is smart designs and efficient algorithms. For integer and float math, the design space is small, but for an application the size of a webserver, a graphics program, a web browser, etc, the design space is huge. Even if it did break down to one language is X% slower than another (which kind of thinking is complete rubbish anyway), what does it matter?

Virtual machines get better every generation. And every single program ever written for that VM--anytime, anywhere, no matter who wrote it, how it was compiled, what platform it was on--gets faster right along with it. Static compilation is static--it has long slowed its evolution and stabilized. But dynamic compilation is evolving at an amazing rate.

Don't be a naysayer, be excited about what the future brings for new languages!

great language shootout, anyone
by Miron Brezuleanu on Thu 8th Jan 2004 20:39 UTC


Hello everybody,

In case anyone is interested, there is a very interesting benchmarking site (many languages, many tests) at:

http://www.bagley.org/~doug/shootout/

It doesn't include the Microsoft new CLR language/language implementations, iirc, so the tests in the articles are still interesting.

It's weird to see gcc performing so badly. maybe the cygwin overhead is to blame?

Python
by Tyr on Thu 8th Jan 2004 20:41 UTC

I think Python was misrepresented a bit here, since most Python programmers will either write the 'number crunching' parts of their programs as a c library or use more low level python modules such as numpy or scientific python.
Serious mathematical operations in pure Python are a rarity.

Fortran?
by Nate on Thu 8th Jan 2004 20:52 UTC

I would be interested to see how FORTRAN does in a similar benchmark.

Fortran 95, Anyone?
by TheMatt on Thu 8th Jan 2004 20:52 UTC

Heck, FORTRAN 77, even. Unfortunately, I could only do it for Linux and Tru64. I don't have a F95 compiler for my WinXP box at home.

If anyone out there is using ifort on WinXP, please try out the program.

RE: jizzles (IP: ---.CS.UCLA.EDU)
by Anonymous on Thu 8th Jan 2004 20:53 UTC

Java is the real star of this benchmarking effort. The conventional thinking of people who say "Java? Bytecode? VM? It will always be slow!" is clearly in error.

There seems to be general agreement that Java is fast on the server-side, but most of the complaints about Java's speed relate to it's performance in desktop apps, something not tested in this benchmark.

RE: Fortran?
by TheMatt on Thu 8th Jan 2004 20:55 UTC

Grrr...*shakes fist at Nate*

Actually, it'd be cool to see how gfortran does. I'm sure gcc would finish a hojillion times faster, but still.

Hmm
by Rayiner Hashem on Thu 8th Jan 2004 20:57 UTC

Well, these benchmarks aren't really very indicative. The I/O benchmark is, well, I/O bound, which is why interpreted Python performed as fast as compiled C++. The numeric benchmarks are just that, numeric benchmarks. Numerics are really the best case for an optimizer, because they are so low level. All the JIT compilers should have compiled the loop once to native code, and gotten out of the way. This is fine if all you are doing is inner-loop numeric code (some scientific stuff, graphics) but not really a good indicator of general performance. Even for scientific code, this benchmark probably isn't representative, because you often need proper mathematical semantics for your calculations, which C/C++/Java/C# don't provide.

A more telling test would be to get higher-level language features involved. Test virtual C++ function calls vs Java method calls (which are automatically virtual). Test the speed of memory allocation. Test the speed of iterators in various languages. Do an abstraction benchmark (like Stepanov for C++) to test how well the compiler optimizes-away abstraction.

@Brian: I can tell you how a Common Lisp result of the same benchmark would turn out. Given proper type declarations, and a good compiler (SBCL, CMUCL), you will get arbitrarily close to C++ for this task. The compiler should generate more or less the same code. See this thread for some good numbers:

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&threadm=87n0t236...

Note that cmucl is very competitive with gcc. Intel C++ blew both cmucl and gcc away, but that has nothing to do with the language. Intel C++ has an auto-vectorizer that will automatically generate SSE code if it finds algorithms (like the dot-prod and scale in this benchmark) that can be vectorized. GCC and CMUCL don't support his feature.

Interestingly, there is evidence that Lisp performs extremely well for large programs:

See this links:
http://www.flownet.com/gat/papers/lisp-java.pdf
http://www.norvig.com/java-lisp.html

In the study, the fastest programs were C++, but the average of the Lisp programs was faster than the average of the C++ programs. The Java stats on the study are a bit outdated, because it was done with JDK 1.2.

RE: Please post your sources
by Ben Maurer on Thu 8th Jan 2004 21:09 UTC

Ok, I am an idiot for not seeing the sources (OTOH, I would usually expect to find them at the *END* of the article)

For VB's file routines, it is no wonder they are so slow. You are, as I suspected, using the VB file routines. Just to give you an idea, here is what happens EVER TIME you call PrintLine:

1) An object [1] array is created (PrintLine takes a param array). This requires an allocation, and then requires copying to the array. Given the number of items you write, you will trigger quite a few GCs

2) The VB runtime must walk the stack to find out what assembly you are calling from. This requires quite a bit of reflection, and involves acquiring a few locks. This is done to prevent conflicts between assemblies.

3) The VB runtime must find which filestream is refered to by the specified handle

4) Stream.WriteLine is called.

Well, its no small wonder it is so slow... Something similar may be happening for J#.NET

I would suggest you consider rewriting the VB file IO routines, and resubmitting your data.

As well, you should be aware that you are putting C++/C at a HUGE advantage in the IO tests. In the read portion of the test, you do the following in
while (i++ < ioMax) {
myLine = streamReader.ReadLine();
}

While in C you do:

char readLine[100];
stream = fopen("C:\TestGcc.txt", "r");
i = 0;
while (i++ < ioMax)
{
fgets(readLine, 100, stream);
}

This is very much unfair to the C# language. You are forcing it to allocate a new buffer for every line that is read. C is not forced to allocate the new array, which saves it alot of time. If you would like to make the test fair, please use the StreamReader.ReadChars method to read to a char [] buffer. This will prevent the repeated allocations, which should make the test more fair. A similar technique should be used for the other languages.

Really, you should have posted these items for review before claiming that you had a fair benchmark. Really, the article should have been split into two postings, the first two sections in one, and then after commenting, the third. I would also encourge OSNews to not post benchmarks that have not obtained peer review.

Lots of other bottlenecks
by LinuxBuddy on Thu 8th Jan 2004 21:15 UTC

So, Rayiner has is right. These benchmarks are mostly testing parts of the operating system, not necessarly the runtimes all that much. That's why the I/O scores are all so close. The only anomaly here is that Java is probably using strict IEEE arithmetic for the trig stuff, which is why it's so slow. I think another poster mentioned how to turn that off.

It's these kinds of benchmarks that I get nervous about when people start saying "Java is just as fast as C." Well, I'm a Java programmer, and I love the language, and it really has gotten a LOT faster over the last few years, but there are some things in Java that are just inherently difficult to optimize away. I'm talking about things like array bounds checking, every data structure being a reference to an object, GC, and all data types being signed (try working a lot with byte values in Java and see how much ANDing you end up doing to combat sign extension issues). These structural choices in the language design cause extra overhead that C programs just don't suffer. Now, those are also some of Java's greatest strengths in terms of making programming safer, but they do have a price. Fancy VMs can reduce that price, sometimes to zero for certain sections of code (the latest Hotspot does an excellent job getting rid of array bounds checking in many instances), but it's asymtotic.

Now, here's what I think: for most types of programs, C or Java are just fine, particularly given today's fast CPUs and spacious memory supplies. Both of those favor Java and tend to make the difference small for anything real-world. Even the purely interpreted languages like Perl or Python are fine for all but the heaviest workloads.

Why is it that when languages are compared ...
by iDaZe on Thu 8th Jan 2004 21:38 UTC

... people never include Delphi in any benchmarks? It's always C, C++, Java etc ... but I never see any of these benchmarks with Delphi included. ;)

gcc optimizations...
by M. Tim Jones on Thu 8th Jan 2004 21:42 UTC

I believe that -fomit-frame-pointer is not enabled at any optimization, therefore it's curious that it's specifically enabled for Visual C++, but not for gnu. Also, there's the potential that better performance would result from -O2, and then selectively optimizing from there, rather than going all the way to -O3.

RE: Please post your sources
by Rambus on Thu 8th Jan 2004 21:43 UTC

Someone said that python guys would've used C libraries or they would've written their own c routines, if that is true, maybe in Java I would used JNI or in VB I would called a C DLL or I would even used assembler methods in C (not sure if any of those methods are faster, just making my point).
This guy just showed results from code that many people who are not experts in a specific programming language would've code. And one must recognize that kind of code is awfuly common, So I think this benchmark is valid, in certain environments and cases. It is up to reader to be clever enough to understand that.

Python interpreted?
by snowflake on Thu 8th Jan 2004 21:43 UTC


>>Python is interpreted and it wasn't left out.

>Actually... if you read the posting, he compiled the >python code with Psyco. Also python is compiled into byte >code at runtime for fast execution too. Perl does not >behave like this nor can you compile it. So my commet >remains.

>Mike

But he also tried straight uncompiled python. I thought Perl was intepreted like Python?

Hello,
some years ago I've found a site whith a collection of identical bench for every progamming language, one for forth, one for c, one for c++...
the results were organised in a chart viewing all the spec, the machine, the system, the compiler...

I wanted to send the url to this guy but I'm not able to find it again....

(there'were some for perl and prolog for example)

if someone could send me the url, it would be nice




Cheers,

Djamé

Who cares?
by snowflake on Thu 8th Jan 2004 21:48 UTC

>Last of all, I'd like to draw the author's attention to >the .Net framework EULAs... It is in fact a violation of >the EULA to produce benchmarks of this sort of .Net >against other platforms. Which is why they haven't been >done all over the place by now ;)

Who cares, the worst they can do is send a letter to the author to stop by which time the benchmarks are out. After that someone else can take over, etc.

Re: Linux buddy
by Rayiner Hashem on Thu 8th Jan 2004 21:55 UTC

I'm talking about things like array bounds checking, every data structure being a reference to an object, GC
----------
These three are not necessarily that bad. Analysis shows that most bound checks can be eliminated ahead of time. I'm sure the Java compiler does this optimization. On the other hand, every data structure being a reference is something that is slow in Java, but doesn't need to be.

You see, the primitive/class distinction in Java is largely unnecessary. It is entirely possible for a powerful compiler to determine what should be boxed and what should not. Powerful CL/Scheme/Dylan/ML/Smalltalk compilers do such analysis. So in these languages, there are no primitive types. Everything seems to be a full object on the heap. The compiler will take care of doing things like stack-allocating variables when no references to it escape the function, or unbox an object when it can be determined that it is safe to do so.

VB Code not well writen
by Anonymous on Thu 8th Jan 2004 21:55 UTC

It would seem that the vb code DOES not use filestream for the file IO like C#. But that he was using the old method of file manipulation which would be alot slower. Try using a System.IO.File and a System.IO.StreamReader the performance will be closer to that of C#.

Good Job!
by Mike on Thu 8th Jan 2004 21:55 UTC

Good Job, interesting results.
But, as someone else said, on these small programs,
I'm not sure the Java HotSpot compiler/profiler even kicked in. I think the Java JIT only profiles and compiles after it's sure the job isn't going to end soon?

Python & Numeric
by Anonymous on Thu 8th Jan 2004 21:56 UTC

Interesting article Christopher! It was quite informative to see MS VC++ at the top of the speed marks, and also interesting to see the file IO in Python didn't seem to show much significant difference with many of the other languages.

So, I'll be the one to mention it: would it be possible to provide a benchmark for Python using the Numeric/NumArray libraries? These were written specifically for numerical operations (the benchmarks you used were bread and butter to Numeric), and they do provide a speed boost. I should imagine that the results still wouldn't approach the fastest languages here, but it would probably improve the performance, possibly even faster than the Psyco compiled Python.

Or maybe you should include some Fortran/Forth too? (nag! nag!) ;)

@djame
by Rayiner Hashem on Thu 8th Jan 2004 21:58 UTC

You're looking for Doug's Great Computer Language Shootout

http://www.bagley.org/~doug/shootout/

Its a bit outdated, and some of the code isn't well-written (I know a lot of people on c.l.lisp complained about sub-optimal CL code) but its overall pretty good.

Language comparison
by hah on Thu 8th Jan 2004 22:00 UTC

I would actually be more interested in how many lines/characters of code he has to write to achieve in each language. Then let a third party read the code, and say which one was more "readable".

RE: Please post your sources
by Ben Maurer on Thu 8th Jan 2004 22:01 UTC

Rambus,

Your assertion that "This guy just showed results from code that many people who are not experts in a specific programming language would've code. And one must recognize that kind of code is awfuly common, So I think this benchmark is valid, in certain environments and cases. It is up to reader to be clever enough to understand that" is incorrect.

When you compare one language against another, it is not fair to take effort to optimize one language (like C) and not take time to optimize another (VB).

However, the real issue is this: the author is attempting to compare how the IO in C# stacks up to the IO in VB (in the IO test). He ended up, on the other hand, comparing how fast reflection is! A benchmark should generally be optimized as much as possible. Benchmarks are meant to stimulate real life, high expense computations. In such a situation, people do not just write `common code'. They profile code, make it faster, and profile some more. This benchmark is not representative of such a situation, and thus is not valid for its intended purpose.

language comparison
by hah on Thu 8th Jan 2004 22:09 UTC

Here is the output of wc -l on his code. Although it is pretty useless IMO, it would be more interesting on a more complex program with +100 classes.

Benchmark.py 160
Benchmark.c 180
Benchmark.cpp 181
Benchmark.vb 186
Benchmark.java 211
Benchmark.jsl 212
Benchmark.cs 215

Author's early response
by Chris Cowell-Shah on Thu 8th Jan 2004 22:26 UTC

I appreciate all the comments--keep them coming! I'll try to write up a response to the major (or recurring) points later tonight. -- Chris

cool
by Pascal de Bruijn on Thu 8th Jan 2004 22:30 UTC

I would really like to see Watcom C, Intel C and Perl added.
Also the thing about IBM's JDK and Kaffe would be very interesting. Maybe (GNU) Ada could also be added. I think GCJ would have support for basic math, could be very interesting also! I would really like to see more!

Re: Kasper
by Bascule on Thu 8th Jan 2004 22:35 UTC

The c++ compiler compiles its code before it is run, and Java while it is being run. In other words, Java actually knows more about how the code is used, which in theory should let it reach better performance than c++. In real life though, its just recently (the last couple of years) that Java has actually approached (and in some cases passed) c++.

And again, in performance critical code (for which you probably wouldn't be implementing in Java in the first place) you can always compile your C/C++ code using profile guided optimizations, which allow a process to be run and a report generated at run-time of how the code could be better optimized, at which point the code can be fed through the compiler again. Depending on how long it takes your codebase to compile (obviously it will be quite a pain of that's in the several hour range), this really is a trivial process, especially before the final gold master release of a particular piece of software.

The only cases where Java consistently outperforms native code compiled with profile guided optimizations (which allow for runtime optimization of compiled languages) are cases where a large number of virtual methods are used in C++ code. Java can inline virtual methods at run-time, whereas in C++ a vptr table lookup is incurred at the invocation of any virtual method.

Of course, the poor performance of virtual methods is usually pounded into the heads of C++ programmers during introductory classes (at least it was for me). If you are using virtual methods inside loops with large numbers of iterations, you are doing something wrong. In such cases, reimplementing with templates will solve the performance issues.

AMD vs Intel
by Pascal de Bruijn on Thu 8th Jan 2004 22:35 UTC

I would also really like to see all the same programs run on a AMD CPU (not to see if AMD beats Intel), but to see how each compiler generates code, which is generally efficient or just on one CPU.

RE: Raynier and LinuxBuddy
by RoyBatty on Thu 8th Jan 2004 22:39 UTC

@Raynier
You see, the primitive/class distinction in Java is largely unnecessary. It is entirely possible for a powerful compiler to determine what should be boxed and what should not. Powerful CL/Scheme/Dylan/ML/Smalltalk compilers do such analysis. So in these languages, there are no primitive types. Everything seems to be a full object on the heap. The compiler will take care of doing things like stack-allocating variables when no references to it escape the function, or unbox an object when it can be determined that it is safe to do so.

You can add C# into that mix too.

@LinuxBuddy
One of my biggest pet peeves about java has always been no unsigned. It might not seem like a big deal to a lot of people but for what I was doing, I ended up doing a lot bit-masking to get things done, as you stated. C# has unsigneds. C# also has the auto-boxing as Raynier mentioned.


I would like to see comparison between MS c# and Java(I guess you could add Mono and Pnet into the mix too) to see how well they optimize out bounds checking and also see what kind of a performance hit each takes from it.

Results on iBook G4 800MHz
by daen1543 on Thu 8th Jan 2004 22:53 UTC

I compiled the C and C++ benchmarks on an iBook G4 800MHz. I used the best possible optimization. I edited out the IO test because it was giving me a "Bus error" after creating a 70Mb file. Weird.

C (gcc -fast -mcpu=7450 -o Benchmark Benchmark.c)
Integer: 8.8s
Double: 17.2s
Long: 56.2s
Trig: 12.0s

C++ (g++ -fast -mcpu=7450 -o BenchmarkCPP Benchmark.cpp)
Integer: 8.7s
Double: 16.9s
Long: N/A
Trig: 12.0s
(I wag getting a "integer constant is too large for "long" type" warning, so I left it out)

I didn't have the patience to wait for the Python program to complete.

Re: Rayiner Hashem
by Bascule on Thu 8th Jan 2004 22:59 UTC

Numerics are really the best case for an optimizer, because they are so low level. All the JIT compilers should have compiled the loop once to native code, and gotten out of the way. This is fine if all you are doing is inner-loop numeric code (some scientific stuff, graphics) but not really a good indicator of general performance.

Yes, this benchmark could really give people the wrong idea about Java. Obviously HotSpot is doing its job, and it performs comparable to native code.

Even for scientific code, this benchmark probably isn't representative, because you often need proper mathematical semantics for your calculations, which C/C++/Java/C# don't provide.

Fortran is a wonderful language for the scientific community, not only for its language semantics but also its optimization potential. While this potential is not fully realized on most platforms (the Alpha is the only ISA where the Fortran compiler has been optimized to the point that for scientific computing Fortran is the clear performance winner over C) Fortran does have a distinct advantage in that many mathematical operations which work as library functions in languages like C (i.e. exponents, imaginary numbers) are part of the language syntax in Fortran, and thus complex mathematical expressions involving things like exponents can be highly optimized, as opposed to C where a function invocation is required to perform exponentation. Algorithms for doing things like modular exponents can be applied to cases where they are found in the language syntax and be applied to the code at compile time, whereas C requires programmers to implement these sort of optimizations themselves. With more and more processors getting vector units, a language which allows such units to be used effectively really is in order.

Java really dropped the ball on mathematical code. C at least has a rationale behind why things like exp() and log() are functions rather than part of the language syntax. C is designed to be a language with a relatively simple mapping between language syntax and processor features. Java/.NET could have made exponentation a language feature rather than a library function... after all, they certainly aren't bound by the limitations of processors. Instead we find these sorts of things in java.lang.Math and System.Math because they are clinging to C/C++'s legacy rather than thinking about the rationale behind the C language syntax and how the syntax could be better designed when a simple mapping between processor features and language syntax isn't required.

Lack of operator overloading is the biggest drawback to mathetmatical code in Java. Complex mathematical expressions, which are hard enough to read with conventional syntax, become completely indecipherable when method calls are used in their stead. I had the unfortunate expreience of working on a Java application to process the output of our atmospheric model, which is an experience I would never like to repeat. Working with a number of former Fortran programmers, everyone grew quickly disgusted with the difficulty of analyzing matrix math as method calls, and were quite amazed when I told them that with C++ operator overloading such code could be written with conventional mathematical syntax (although there are some issues differentiating dot products from cross products, but it's still much less ugly than method invocations)

A more telling test would be to get higher-level language features involved. Test virtual C++ function calls vs Java method calls (which are automatically virtual).

Java will almost certainly win on the speed of virtual methods because it can inline them at run-time. Again, the solution in C++ is not to use virtual methods within performace critical portions of the code, especially within large loops, and the simple solution is to replace such uses with templates where applicable.

C++ ???
by Arthur Dent on Thu 8th Jan 2004 23:00 UTC

His C++ is straight C. Iy even uses printf instead of cout... Where are the classes? Where is the standard c++ library usage??

Hmmm....

Re: RoyBatty
by Bascule on Thu 8th Jan 2004 23:03 UTC

One of my biggest pet peeves about java has always been no unsigned. It might not seem like a big deal to a lot of people but for what I was doing, I ended up doing a lot bit-masking to get things done, as you stated. C# has unsigneds.

Agreed. One of the first things I ever wrote in Java (about 9 years ago) was an implementation of IDEA, and I quickly learned why lack of unsigned types was a bad thing. I ended up using signed 32-bit integers to emulate unsigned 16-bit integers, and of course this was done in conjunction with a great deal of masking. This revealed to me one of the many hacks which were thrown into the Java syntax, the unsigned shift operator >>>. Sun, wouldn't it have been simpler to support unsigned types?

@Roy Batty
by Rayiner Hashem on Thu 8th Jan 2004 23:05 UTC

I don't know if auto-boxing is really the same thing. If it is, then why are there structs in C#? And why is there a distinction between allocating a struct on the heap vs the stack? It might be, but I'm not familiar enough with C# to make the comparison.

Of course, C#'s compiler might have such analysis. Microsoft has some smart compiler guys. They're not very innovative, but they've got their fingers in some nifty pies. But I hear C# 2.0 will get lambdas with proper closures! At that point, it would be cool to do a benchmark to see how good their closure-elimination optimizations are compared to CL/ML compilers. Is type-inference too much to ask for in C# 3.0 ;)

Trig tests
by Rich Gibbs on Thu 8th Jan 2004 23:06 UTC

I think there are a couple of issues surrounding the trig tests in the benchmark.

The first is obvious: all of the computational "heavy lifting" is being done by the run-time library. Performance differences in the code you actually wrote are likely to be unimportant.

The second is that results like this are almost meaningless unless they are accompanied by some measure of the accuracy of the result. Without going into the gory details, functions like sine and cosine are typically calculated from power series approximations. Taking fewer terms is faster, but less accurate. For example, I can write a very fast C routine to approximate the value of pi:

double pi() {
return 3.0 ;
}

The result is correct to one significant figure, after all. ;-)

(Numerical accuracy is not just a theoretical concern. Early versions of Lotus 1-2-3 implemented calculation of standard deviation wrong, and consequently got the wrong answer for the set of numbers {999999, 1000000, 1000001}.)

Forth?
by mario on Thu 8th Jan 2004 23:07 UTC

Forth is a powerful and expecialy, very fast compiled programming language, too often forgotten.

@Bascule
by Rayiner Hashem on Thu 8th Jan 2004 23:12 UTC

With more and more processors getting vector units, a language which allows such units to be used effectively really is in order.
--------
APL anyone?

In addition to the advantages of Fortran you mentioned, I was also thinking about a full numeric tower like some languages have. Standard machine integers and floats are nice for a lot of scientific computing (and accounting, as I'm told), but for some computations, you need things like infinite precision rationals, arbitrary precision integers, etc.

ocaml?
by hmmm on Thu 8th Jan 2004 23:19 UTC

I bet ocaml would kick all their asses. ;)

flawed
by omega on Thu 8th Jan 2004 23:24 UTC

Comparing only arithmetic/math operations is far from being representative of the performancec of any language. What about the other instructions found in languages such as tests or assignations, real memory allocations, object manipulation, GUI, file system and network access, etc, etc.

Plus we all know that GCC by default generates a terrible code on Intel. It generates a very clean (and makes good use of the x86 instruction set) only in optimised mode, which was not used for the benchmark.

This benchmark could lead one to think that Java is just 2 times slower than C/C++. Something that anyone who has used a large application written in Java will know can't be exact.

This benchmark has a very limited scope and its results are not representative of the real world.

VB IO Code could be improved...
by rizzo on Thu 8th Jan 2004 23:32 UTC

The following code:

FileOpen(1, fileName, Microsoft.VisualBasic.OpenMode.Output)
Do While (i < ioMax)
PrintLine(1, myString)
i += 1
Loop
FileClose(1)

could be much improved by using the native .NET methods. In fact, the code should be identical to that of C#.
In addition, the C# IO code is using try..catch construct that could slow the code down. It would be good to retest the code using these suggestions.

You can grab the binaries I made here:

http://fails.org/benchmark/

These are optimized for Athlon XP/MP and will require SSE

b-gcc is compiled with gcc 3.3 with -O3 -march=athlon -msse
b-icc is compiled with icc 8.0 with -tpp6 -xiMK -O3

b-icc-opt has been optimized with Profile Guided Optimization. First, Benchmark.c was compiled with -prof_gen to create an "instrument" executable. Next, the instrument executable was executed, and a run-time profile was generated (in the form of a .dyn file). Finally, b-icc-opt itself was compiled with -prof_use -tpp6 -xiMK -O3.

Respective scores when executed on a dual Athlon MP 2.0GHz:

gcc 3.3:
Int arithmetic elapsed time: 6550 ms
Double arithmetic elapsed time: 6250 ms
Long arithmetic elapsed time: 16760 ms
Trig elapsed time: 3640 ms
I/O elapsed time: 1090 ms
Total elapsed time: 34290 ms

icc 8.0:
Int arithmetic elapsed time: 6740 ms
Double arithmetic elapsed time: 5560 ms
Long arithmetic elapsed time: 27140 ms
Trig elapsed time: 2510 ms
I/O elapsed time: 1230 ms
Total elapsed time: 43180 ms

icc 8.0 (with profile guided optimization):
Int arithmetic elapsed time: 6340 ms
Double arithmetic elapsed time: 5540 ms
Long arithmetic elapsed time: 27460 ms
Trig elapsed time: 2430 ms
I/O elapsed time: 1190 ms
Total elapsed time: 42960 ms

Ouch! Clearly icc has trouble with 64-bit math. But otherwise, icc clearly outperforms gcc 3.3 in all other respects being tested, definitively when profile guided optimization is used.

Perl
by MobyTurbo on Thu 8th Jan 2004 23:43 UTC

If I recall correctly, the Camal book says that Perl is also byte compiled internally before execution, like Python (and even Tcl nowadays) Benchmarks usually show Perl being a bit faster than Python, though I don't know if Perl has an equivalent of Psycho. (The Python native compiler used in the benchmark.)

Some results...
by cheezwog on Thu 8th Jan 2004 23:51 UTC

The C benchmark compiled -O2 on AthlonXP 1.4ghz. Fedora core1.
gcc version 3.3.2
gcc -o2 Benchmark.c

Int arithmetic elapsed time: 8330 ms
Double arithmetic elapsed time: 7850 ms
Long arithmetic elapsed time: 20810 ms
I/O elapsed time: 21750 ms

(I could not get the trig benchmark working, so left it out)

It's interesting how much faster the int, double and long benchmarks are than his results.... Though the can really crunch those numbers compared to Pentium 4M 2GHz. I/O is slower.

Compiled with -O3 it gets slightly slower!

Int arithmetic elapsed time: 8320 ms
Double arithmetic elapsed time: 7860 ms
Long arithmetic elapsed time: 20840 ms
I/O elapsed time: 21850 ms

Could these better results be due to running gcc on native Linux, or is it the different processor?

@Raynier
by RoyBatty on Fri 9th Jan 2004 00:07 UTC

Structs and primitives are value types and allocated on the stack, but they are also objects too. The compiler automatically creates an object if needed instead of you having to do it. The main benefit, obviously, you only pay for what you use. I thought you were referring to the way in java that you have to use wrapper classes for the primitives. Java, obviously, doesn't have structs. Classes are always on the heap in C#

Tested using J2SE v 1.4.2_03 on a dual Athlon MP 2.0GHz running Linux 2.6.0.

The code was compiled with javac -g:none and executed with java -server:

Int arithmetic elapsed time: 7271 ms
Double arithmetic elapsed time: 11501 ms
Long arithmetic elapsed time: 23017 ms
Trig elapsed time: 77649 ms
IO elapsed time: 3418 ms
Total Java benchmark time: 122856 ms

Well, Java trumps icc on 64-bit math, but thoroughly loses everywhere else, especially the floating point and trig benchmarks.

RE: Hong Zhang (IP: ---.SNVACAID.covad.net) - Posted on 2004-01-08 20:04:14
by ChocolateCheeseCake on Fri 9th Jan 2004 00:24 UTC

The default math library is compiled with -O0 to preserve strict IEEE semantics. In fact, with minor change to the source code, -O2 will work as well. Java has two math libs, Math and StrictMath. They are default to the same implementation. But JVM is allowed to use faster/less accurate version of Math. The VC++ uses loose math (x86 trig instructions directly).

I assume you're refering to floating point precision, Java by default follows the IEE 754 international specification, however, java has also allowed for EXTENDED PRECISION on platforms that support it.

I assume when you mean "faster/less accurate version of Math", I assume you are refering to the standard library that is used.

@Roy Batty
by Rayiner Hashem on Fri 9th Jan 2004 00:38 UTC

Ah, I see. In Lisp/etc, there is no distinction stack-allocated primitive types and heap-allocated classes. The compiler will automatically determine where to allocate the object to maximize performance. Also, the compiler doesn't box/unbox primitives at runtime, but decides at compile-time what objects should be boxed and which should be unboxed.

Linux results GCC 3.3.2 Athlon XP 2400
by Vincent on Fri 9th Jan 2004 00:43 UTC

gcc -lm -O2 Benchmark.c
Int arithmetic elapsed time: 6240 ms
Double arithmetic elapsed time: 5920 ms
Long arithmetic elapsed time: 16370 ms
Trig elapsed time: 3370 ms
I/O elapsed time: 890 ms

Total elapsed time: 32790 ms

gcc -lm -O0 Benchmark.c
Int arithmetic elapsed time: 8780 ms
Double arithmetic elapsed time: 9470 ms
Long arithmetic elapsed time: 18920 ms
Trig elapsed time: 3650 ms

Total elapsed time: 41930 ms

Looks like Cygwin is a lot slower.

Profiling the Profiled code
by MikeDreamingofabetterDay: OS X on Fri 9th Jan 2004 00:45 UTC

I only wish. I've been on three large C++ version 6 project.
( 100 to 300 classes ). The VC++ compiler generated broken release code. We shipped the "Debug build".

Could not convince product manager to buy better tools or allocate any time to find the problem. Many memory leaks were from Microsoft's MFC classes.

My point is, Java code profiling on the fly, is the better solution.
Mostly because 99% of the programmer's out there will never get the chance to profile their code, if they even know how to do it. Manger's won't spend the time or money.

Linux results GCJ 3.3.2 Athlon XP 2400 (Java)
by Vincent on Fri 9th Jan 2004 01:01 UTC

gcj-3.3 -O2 --main=Benchmark Benchmark.java
Int arithmetic elapsed time: 6220 ms
Double arithmetic elapsed time: 5914 ms
Long arithmetic elapsed time: 16485 ms
Trig elapsed time: 26012 ms
IO elapsed time: 10229 ms

Int, Double and Long at the same speed of GCC, io en trig a lot slower than GCC.

For those who care...
by Bascule on Fri 9th Jan 2004 01:14 UTC

I've compiled my results into a more easy to interpret format, and drawn some different conclusions than I posted here:

http://fails.org/benchmarks.html

In reply to MikeDreamingofabetterDay...

My point is, Java code profiling on the fly, is the better solution.

The primary drawback of Java's run-time profiling is that all optimizations are discarded when the application exits. Profiling really helps optimize code which spends most of its time executing in a small number of places within the executable. Consequently, large applications which do an elaborate amount of startup processing take an additional performance hit from run-time optimization in that the startup code will only be touched once, but the run-time's optimization code still attempts to determine how best to optimize. Eclipse and NetBeans certainly come to mind... their start-up times are an order of magnitude worse than any others IDEs I've used.

Profile guided optimization, on the other hand, is a one-time process, and the optimizations are permanent to the binary, thus no performance loss is incurred.

Mostly because 99% of the programmer's out there will never get the chance to profile their code, if they even know how to do it.

Profiling should be (and often is) an additional automated function of the unit testing process. Intel's icc can take a number of profiles from a number of different test runs and compile the collective results (a separate .dyn file is generated for each run of the instrument executable) to determine the best way to optimize the given module when a release build is performed.

I've never used Microsoft Visual C++ on a large project, but your woes there are not really pertainent to the use of profile guided optimization.

Object Performance
by Per Arneng on Fri 9th Jan 2004 01:15 UTC

Object Oriented performance is really important with oo languages. Creating and destroying objects. casting and stuff like that..

@Bascule
by mario on Fri 9th Jan 2004 01:18 UTC

Your tests are indeed interesting, but what I think is the main point is, Java, generally speaking, doesn't lag behind significantly! We're not talking orders of magnitude here, it's the same ballpark!

On boxing/unboxing in Java
by LinuxBuddy on Fri 9th Jan 2004 01:19 UTC

@Rayiner
@RoyBatty

On boxing/unboxing in Java, yes, you are right that this can be certainly be done. I belive the JDK 1.5 Hotspot is going to be doing this at some level. As I said, it isn't the case that Java can't go faster with better optimizations, just that such optimizations have to be done, thus adding to the complexity of the runtime.

These are language-level, structural issues that C just doesn't have to deal with. C's simple, "assembler with loops" sort of orientation is both a blessing and a curse. It's a blessing when it comes to optimization as you don't have these sorts of constraints to deal with, and frankly, the language leaves lots of implementation-dependent behavior to exploit. Java is more constrained, which eliminates broad classes of bugs that are very difficult to debug, but in return, the language exacts an overhead which the JVM compilers all seek to reduce to near-zero. Put another way, it's a lot easier to write a passable C compiler than a passable Java VM (though very difficult to write sophisticated versions of either).

Again, I love Java. It's my main programming language. I love its relatively small and simple language design with resemblance C (probably my next favorite language). With CPU speeds increasing and Java JVMs just getting better and better, I find myself programming almost exclusively in Java now.

Linux results MCS 0.26 Athlon XP 2400 (Mono)
by Vincent on Fri 9th Jan 2004 01:21 UTC

mcs Benchmark.cs

Int arithmetic elapsed time: 9955 ms
Double arithmetic elapsed time: 21385 ms
Long arithmetic elapsed time: 55066 ms
Trig elapsed time: 3707 ms
IO elapsed time: 20949 ms

Total C# benchmark time: 115636 ms

@Bascule
by jizzles on Fri 9th Jan 2004 01:24 UTC

"Agreed. One of the first things I ever wrote in Java (about 9 years ago) was an implementation of IDEA, and I quickly learned why lack of unsigned types was a bad thing. I ended up using signed 32-bit integers to emulate unsigned 16-bit integers...."

Characters are unsigned 16 bit quantities. They are the only unsigned types in Java. Why didn't you use them?

@Bascule
by LinuxBuddy on Fri 9th Jan 2004 01:27 UTC

Java really dropped the ball on mathematical code. C at least has a rationale behind why things like exp() and log() are functions rather than part of the language syntax. C is designed to be a language with a relatively simple mapping between language syntax and processor features. Java/.NET could have made exponentation a language feature rather than a library function... after all, they certainly aren't bound by the limitations of processors. Instead we find these sorts of things in java.lang.Math and System.Math because they are clinging to C/C++'s legacy rather than thinking about the rationale behind the C language syntax and how the syntax could be better designed when a simple mapping between processor features and language syntax isn't required.


You're right from a syntax perspective, but there is no reason that speed has to suffer. A given JVM may optimize various library calls to inlined, optimal instruction sequences. This is done in some JVMs for basic java.lang classes (like String handling, etc.). Your point about not having inline operators that make your code readable is very true, however.

visual c++ on windows is fast
by Anonymous on Fri 9th Jan 2004 01:29 UTC

Visual c++ is fast on windows - is that surprizing ?

I just hope that people will not conclude that gcc is slow in general - gcc is a lot faster on Linux : for example the benchmark for c (amd1800+, linux, gcc3.3.1mdk) took a total of 54ms (41ms with -O2 -march=athlon-xp).

@Rayiner
by LinuxBuddy on Fri 9th Jan 2004 01:33 UTC

Ah, I see. In Lisp/etc, there is no distinction stack-allocated primitive types and heap-allocated classes. The compiler will automatically determine where to allocate the object to maximize performance. Also, the compiler doesn't box/unbox primitives at runtime, but decides at compile-time what objects should be boxed and which should be unboxed.

Right. In most every Lisp implementation, every value travels along with its type. Typically, a few of the low-order bits are used to encode the type. There is no real distinction between "primitive type" versus other types when it comes to function calls, etc.

Platform matters too
by jizzles on Fri 9th Jan 2004 01:33 UTC

Hotspot is heavily optimized for Solaris-sparc, being Sun's flagship platform and all. GCC is targetted towards x86 mostly (although I will stop short of saying it is heavily optimized, because honestly, it isn't).

Compare the same benchmarks on a Solaris-sparc system, especially a large-scale system, and you might find some very interesting results.

@LinuxBuddy
by Rayiner Hashem on Fri 9th Jan 2004 01:56 UTC

Right. In most every Lisp implementation, every value travels along with its type. Typically, a few of the low-order bits are used to encode the type.
--------------
This isn't necessarily correct. In the general case, every object has a header describing its type, just like Java/C# classes. However, there are a number of optimizations to this general case.

- Some implementations store certain special types (integers, cons cells, etc) right in the pointer word, with some bits reserved as a type tag.

- Some implementations don't bother with tag bits, and instead use an analysis that determines when an object doesn't need to be a full object. For example, when you use an integer as a loop counter, you can just use a regular (untagged) machine word.

- Some implementations support type specialization, and generate type-specialized versions of functions, like C++ templates do.

Thus, even though the programmer always deals with objects, the generated machine code will often deal directly with machine types. So its not strictly true that every value travels with its type. For the numeric benchmarks in these articles, for example, the machine code would would deal with regular floats.

for those interested (benchmarking)
by djamé on Fri 9th Jan 2004 02:59 UTC

I have put on line the famous unixbench from the magazine byte :
you have to tweak the makefile in order to get the best optimisation. For myself : I put that :

OPTON = -O3 -fomit-frame-pointer -fforce-addr -fforce-mem -ffast-math
-march=i686 -mcpu=i686 -pipe -malign-loops=2 -malign-jumps=2 -malign-functions=2

it can be found at http://www.loria.fr/~seddah/bench.tar.gz

I know this is a bit off-topics but I would like to see results from various system, I will put mine in this forum for a celeron 450 running mandrake 8.0 and for a powerbook (old one) running linuxppc2K.....
if someone could make this bench run under mac os X, it would be great



Cheers

Djamé

v ...
by Anonymous on Fri 9th Jan 2004 03:05 UTC
v ...
by Anonymous on Fri 9th Jan 2004 03:12 UTC
v ...
by Anonymous on Fri 9th Jan 2004 03:35 UTC
Java 1.5
by Anonymous on Fri 9th Jan 2004 03:39 UTC

I'm curious to know whether anyone has checked out the Java 1.5 Alpha?

http://java.sun.com/developer/earlyAccess/j2sdk150_alpha/

IDE slow startup time in Java
by MikeDreamingofabetterDay: OS X on Fri 9th Jan 2004 03:45 UTC

Just one point, Java long startup time is caused by Java doing what most languages don't: class verification. In essesnce, a scan of the class files to be sure they haven't been hacked.

Again, if you "get" Java you put up with the "time" issue for the benefit you get from the language. The Java Security model and the productivity of it's huge class library.

v ...
by Anonymous on Fri 9th Jan 2004 03:48 UTC
Linked to Windows Libraries?!
by Chris on Fri 9th Jan 2004 04:04 UTC

The trig test is pointless, Windows libraries aren't compiled with gcc. So is the I/O test.

I am astonished that c# performs so well though.

How not to write a benchmark....
by Paris Hilton on Fri 9th Jan 2004 04:05 UTC


Your code does not test for successful completion and
accurate results!

All that is needed to win your benchmark is a library
like this:

double tan (double x) {return 1;}
double sin (double x) {return 1;}
double cos (double x) {return 1;}

and so forth. What is the value of fast but wrong
answers?

Unfair test-environment
by vilarz on Fri 9th Jan 2004 04:10 UTC

testing gcc under cygwin and against windows libraries isnt actually fair is it? You test Visual C++, which is quite good native, why not test gcc under a POSIX-environment, for example linux, too? Perhaps Testing Visual C++ under Linux with wine should be done too?

RE:RE : All .net languages will perform the same
by snorkel on Fri 9th Jan 2004 04:17 UTC

Of course if you write crappy code in VB.net and good code in C# that does the same thing, yes you will get different results. The point is if you write similar code that takes advantage of a particular .net language you are going to get almost identical results, which is what this benchmark reported.
Understand....

He forgot three important languages
by Michael David on Fri 9th Jan 2004 05:28 UTC

He left out three of the best languages - DELPHI, EUPHORIA and Assembly. Believe me they are really fast as hell - Especially DELPHI.

Can anyone give some benchmarks with these three languages?

Re: Paris Hilton
by Bascule on Fri 9th Jan 2004 06:27 UTC

How not to write a benchmark.... Your code does not test for successful completion and accurate results!

(...much like the real Paris Hilton, a basic conceptual understanding is present but a knowledge of details is lacking...)

As long as you're using standard runtimes or linking against standard libm's, there really isn't going to be a problem.

Attempting to check the results may be especially problematic in certain areas, due to floating point round off error, unless you're doing all your testing on platforms with IEEE floats.

dougs shootout
by pel on Fri 9th Jan 2004 06:38 UTC

There is an updated version, not maintained by dough, of doughs computer language shootout.
Relaxen und watchen die numbers

http://dada.perl.it/shootout/craps.html
http://dada.perl.it/shootout/craps2craps.html

more benchmarking
by bengt kleberg on Fri 9th Jan 2004 07:26 UTC

a rather wellknown bechmarking paper is the Kernighan and Van Wyk micro-benchmarks ( http://www.ccs.neu.edu/home/will/Twobit/KVW/kvwbenchmarks.html ).
it helped create The Great Computer Language Shootout ( http://www.bagley.org/~doug/shootout ).


bengt

Unbaffling you on Java v. compield C++
by Anonymous on Fri 9th Jan 2004 08:41 UTC

I may be able to help you a touch on understanding why the JDK beats C++ in some cases. It comes down to one of two issues:

(1) Object allocation.
Object allocatio nis highly tuned in java because the goal is to encourage solid object oriented programming which means a LOT of object creation. C allocation of memory and C++ object allocztion tend to be abyssmal. (Although to be fair, MSFT's C on MSFT"s OS is about the best at it I've seen, coming cloe to Java peformance for some tests.)

(2) Optimization
Quite simply there is more information available at run-tiem on exactly how code is going to be used then you have through static analysis at compile time. Hotspot gets its name from the fact that it sports a very sophisticated profiler built into the run-time that analyzes actual code use and comes up with best case optimizations. Some of these optimizations would be impossible at compile time because, although they are based on likely code behavior, they could be dangerous if an optimizer guessed wrong. Being a run-time optimizer however Hotspot can pursue these optimizations and then back them out if it see the critical condition occur.

A key example is what we call "agressive inlining". Hotspot will in-line any method call for which tehre is only one possible target from the current call site.
This is done by tracking what the currently loaded class hirearchy is. If the hirearchy changes (through late binding) then those in-lines are backed out.
This means that Java can effectively get rid of all v-table
calls except those where the call site actually is poly-morphic.

Author's Reply, part 1
by Chris Cowell-Shah on Fri 9th Jan 2004 08:59 UTC

I expected plenty of comments and criticism, and you folks didn't disappoint! I do appreciate all of the suggestions on how to improve the benchmark. I'd like to respond to the questions or criticisms that either arose most often or that seem most significant. Some of the comments point to real methodological flaws, while others seem to come from a lack of understanding about what I was trying to achieve (which is probably my fault for not being clearer). Many of the complaints could have been avoided if I had included more detailed explanations of my testing procedures and their justifications, but I didn't want the article to get too long or too dry. So in no particular order, here we go...

Why didn't you include my favorite language? It's [faster|newer|cooler] than any of the languages you picked.
I had to limit the number of languages somehow, so I put together what I hoped was a representative selection of the major languages in use today. Also, I had limited time and limited skills. Sure I could have added Perl or Fortran or whatever, but then I would have had to learn enough Perl or Fortran to code the benchmark. Before starting this project, the only language I knew well enough to code even this simple benchmark in was Java/J#. Besides, if anyone is really interested in seeing how AppleScript, Standard ML of New Jersey, or Visual C# running on Mono compare, I invite you to adapt my code and run it yourself. Porting over the benchmark should be trivial if you already know the language, and I'd love to see more results (particularly if you use Lisp, Forth, Perl, or Ruby).

Why not use other C compilers? There are a ton of them out there.
See above.

Why didn't you test on AMD hardware, or on a Solaris box?
The only machine I had ready access to was my trusty P4 laptop.

The GCC C code is going to run faster in a POSIX environment, linked to glibc instead of Windows libraries. Why didn't you run it on Linux?
Lack of time, lack of space on my hard drive to dual-boot even a minimal Linux distro. I did run the gcc code within Cygwin, linked to the Cygwin libraries (I assume Cygwin uses glibc, but don't know for sure). I didn't post those results since they were nearly identical to the results of the gcc code linked to Windows libraries, but in retrospect I should have included them in my report.

You didn't really test a fully interpreted language. Python gets compiled down to bytecode by the Python interpreter, so it doesn't count. Why not include Perl or PHP?
Good point. I didn't realize that any compilation was going on at all with Python until I read about it here. So yes, it would be instructive to see Perl results (assuming it really is fully interpreted--there seems to be some debate here on that point). But I don't know Perl and am trying my best never to learn it.

All .NET languages should perform the same. Why did you benchmark all four of them?
Because I wanted to see if Microsoft is telling the truth when they say that functionally identical code gets compiled into identical MSIL code. It turns out that, for the most part, it does.

You can't be a serious .NET programmer if you don't even know how to start an unmanaged Visual C++ project!
You're right. I'm not. But now I know how to do it, thanks. I considered using Visual C++ 6 instead, but ultimately decided to just stick with whatever languages Microsoft's throwing their weight behind now, and that's the .NET languages.

It's unfair to test Java 1.4.2 with the -server flag, but Java 1.3.1 with the -client flag. Everyone knows that the -server version of the JVM runs bytecode faster than the -client version (at the expense of slightly longer startup time).
I was astonished to see that the JVM included with the 1.3.1 SDK doesn't have a -server version. The only flag available for setting the JVM version is -hotspot, which is the default JVM for 1.3.1 anyway. Install a 1.3.1 SDK, type "java -help" and see for yourself. Maybe they had the -server option in earlier versions of 1.3.1--I used 1.3.1_09.

Why is it surprising to see Java perform well? The bytecode is compiled by the JVM before (or as) it runs, after all.
It's surprising only because everyone thinks Java is slow. This is probably because early versions of Java really were slow, but I think we're now witnessing a case of perception lagging behind reality.

Java 1.4.2 is slow on the trig component because it's using the StrictMath package, which trades speed for accuracy.
Well, maybe. I called the Math package, which (as stated in the Javadoc) may or may not in turn call the StrictMath package. So I don't really know what's going on behind the scenes. I did randomly compare results out to eight decimal places or so and got the same trig results for all languages.

You're not being fair to VB--you're using its native I/O functions instead of using the standard CLR I/O classes.
You're probably right, but what I did was... hang on. This requires a detour for a second. I'll come back to this after the next comment.

You said the only language you knew before writing these benchmarks was Java. Then what right do you have to call these real benchmarks? There are probably all sorts of optimizations that you didn't know about and didn't use--real programmers understand their languages better and know how to squeeze performance out of them. No one codes production-quality code after spending a single afternoon learning a language!
I beg to differ. For better or worse, tons of people code exactly like that. In my industry (IT consulting), virtually everyone does! It's absolutely routine to be given a programming task, a language, and an unrealistic deadline. You're expected to learn what you can from on-line help, whatever tutorials you can scrounge up on the net, and O'Reilly books, and cobble together code that works. In an ideal world we'd have loads of time to optimize everything and profile the heck out of code before releasing it, but on an actual project that's very rare. At least, that's been my experience. So I treated these benchmarks the same way: pick up a book, learn how to do what you need to do, spend a little time making sure you're being smart about stuff (use buffered I/O streams in Java, for example), but don't expect it to be 100% optimized. Then move on to the next language. My results won't duplicate results derived from laboratory conditions, but they should be close to real world (or at least, IT consulting world) results. This is a point I should have made much, much clearer in the article, and I'm sorry for the confusion I caused by not being making it more explicit.

You never answered the VB I/O question!
Right. I learned how to do VB I/O by using the built-in help in Visual Studio .NET 2003. I did what it told me to do. If it told me to use native VB I/O classes, that's what I did. If I had spent a lot more time poking around I might have been able to figure out how to use more efficient CLR classes, but that route was non-obvious and I had no way of knowing whether its code would have been faster without actually trying it. Again: I was trying to replicate real-world, time-constrained, scenarios with programmers who know the basics but are by no means experts. Having said all that, I appreciate the advice about speeding up VB I/O. Some day I may re-code with that change in mind.

continued in next post...

Author's Reply, part 2
by Chris Cowell-Shah on Fri 9th Jan 2004 09:01 UTC

...continued from previous post

These results are not indicative of anything. Real programs do more than just math and I/O. What about string manipulation? What about object creation? etc.
The short answer: of course you're right. But most programs do some math and some I/O, so these results will be at least somewhat relevant to virtually any program. Besides, I made liberal use of the phrase "limited scope" and even titled the article "Math and File I/O" so no one could claim false advertising! The longer answer is more interesting, but probably also more controversial. I think it's fair to say that there are two camps when it comes to benchmarking: the "big, full-scale application benchmark" camp and the "tiny building block benchmark" camp. The arguments used by members of each camp go like this. Big is more accurate in that it tests more of the language and tests complex interactions between the various parts of the language. That's why only large applications like the J2EE Pet Store (later copied by Microsoft to demonstrate .NET performance) are helpful. But wait, says the other camp. Small is more accurate because it tests common components that all programs share. Big is useless because it covers performance for your program, not mine. Mine may use very different parts of the language than yours, hence show very different results. Performance results gleaned from a database-heavy application like Amazon's on-line catalogue can tell us nothing about what language to use when coding a CPU-intensive Seti@Home client. No no, the big camp retorts, small is useless because it doesn?t really do much, and what it does do reduces to near-identical calls to the OS or basic CPU operations. Small doesn?t let differences between various languages show through, because the aspects that are unique to each language are not tested. My own take on the issue is this: all of these points are true, and they suggest that the only worthwhile benchmarking is lots of different benchmarks, written on different scales, testing different things. Throw together enough different sorts of benchmarks and you?ll end up with something useful. The benchmark I presented here falls within the "small benchmark" camp simply because small benchmarks are a whole lot quicker and easier to write than big benchmarks. But I've presented just one (or two, if you split up math and I/O) benchmark. These results are not useless by any means, but they become a whole lot more useful when they are combined with other benchmarks with different scopes, testing different aspects of languages (such as string manipulation, object creation, collections, graphics, and a gazillion others). And while my project can certainly be criticized for being ?too small,? keep in mind that different languages do produce different results under this benchmark, so it is showing some sort of difference between the languages. In other words, I don?t think it?s too small to be at least a little helpful.

The compile time required for JIT compilers (like a JVM) approaches zero when it's amortized over the time that a typical server app (for example) runs. Shouldn't you exclude it from your test?
Good point; I hadn't thought of that. Next time I will probably exclude it by calling each function once before starting the timer.

Java should perform about the same as C++, and an unmanaged C program should perform better than a managed .NET program. Why run benchmarks when we all know how they'll turn out?
Because theory isn't always borne out in reality.

The sorting criteria (using the total instead of a geometric average) is unusual, and favors languages that optimize slow operations.
I did not know about the geometric mean technique, but am very interested in hearing more about it. I had no idea how best to weight the various components of the benchmark, so figured the easiest thing to do was to weight them equally and just add them all up. Some may complain that since the trig component is relatively small, it should be given less weight in the final tally. But I would respond that it?s not small for all languages. The trig component for Java 1.4.2 is longer than all of that language's other components combined. But the real answer to the problem of sorting and analyzing the results is simple: if people want to massage the raw data differently (maybe you never use trig in your programs, so want to exclude the trig results entirely), go for it! And be sure to tell us what you come up with.

You should use more than 3 runs, and you should provide the mean and median of all scores.
I actually did more like 15 to 20 runs of each benchmark, with at least 3 under tightly controlled conditions. I was a little surprised to find that there were virtually no differences in results regardless of how many other processes were running or how many resources were free. I guess all the other processes were running as very low priority threads and didn't interfere much. I deliberately included only the best scores rather than the median because I didn't want results skewed by a virus scanner firing off in the background, or some Windows file system cache getting dumped to disk just as the I/O component started. I figured the best case scenario was most useful and most fair.

Why didn't you use a high-speed math package for Python, such as numpy or numeric?
I didn't know about numpy or numeric. I probably should have used a high-speed math package, assuming it would be something that a new Python programmer could find out about easily and learn quickly.

Shouldn't stuff like this be peer reviewed before being posted?
This ain't Nature or Communications of the ACM--I figure the 100+ comments I received do constitute a peer review! ;) Nevertheless, I like your idea of a two-part submission, with methodological critique after part 1 and results presented in part 2. I'll remember that for next time.

Your compile-time optimization is inconsistent. E.g., why omit frame pointers with Visual C++ but not gcc C?
Because Visual Studio had an easy checkbox to turn on that optimization, whereas the 3 minutes I spent scanning the gcc man page revealed -O3 but not -fomit-frame-pointers. Similarly, I compiled Java withy -g:none to turn strip debugging code but didn't mess with memory settings for the JVM. Someone who programs professionally in C/C++ (or knows more about Java than I do) could have hand-tuned the optimizations more successfully, I'm sure.

Your C++ program is really just C! What gives?
I don?t know C++. I taught myself just enough C (from an O?Reilly book) to code the benchmark. So yes, the C++ benchmark is running pure C code. From my rudimentary knowledge of C vs. C++, I assumed that there were no important extensions to C that would produce significantly different performance over straight C for low-level operations like this, so I stuck to straight C. I called it a ?Visual C++? benchmark because it was compiled by Microsoft?s C++ compiler. And if C++ really is a superset of C (please correct me if that?s not the case?I could be very wrong), then a C program is also a C++ program.

Your trig results are meaningless because you don't check the accuracy of the results. You could be trading accuracy for speed.
Mea culpa--I did sample the trig test results to compare accuracy across languages; they're all equally accurate (at least, to 8 decimal places or so). I forgot to explain that in the article.

Again, thanks for all of the comments. I?ve learned a lot from your suggestions and future benchmarks I may run will certainly benefit from the collective experience of all of the posters.

-- Chris Cowell-Shah

Python numbers
by error27 on Fri 9th Jan 2004 09:05 UTC

Python is in these tests because it has to look up the variable each time.

For example: i = i + 1. In Python i is a word that is stored in a hash that points to an pointer. In the C code it's just a memory location on the stack. Of course, the slow down does affect real life but maybe not as much as it affects this benchmark.

The Long test is unfair for Python because Python has true big number support (based on how much memory you have). In C and Java long is around 64 bits only. In those languages there are special libraries for really large numbers. Apple to oranges type situation.

Python does OK in the trig test because all those functions are implemented in C. It still suffers from the variable name look up problems though.

RE: Your C++ program is really just C!
by Arthur Dent on Fri 9th Jan 2004 10:56 UTC

"I don?t know C++. I taught myself just enough C (from an O?Reilly book) to code the benchmark. So yes, the C++ benchmark is running pure C code. From my rudimentary knowledge of C vs. C++, I assumed that there were no important extensions to C that would produce significantly different performance over straight C for low-level operations like this, so I stuck to straight C. I called it a ?Visual C++? benchmark because it was compiled by Microsoft?s C++ compiler. And if C++ really is a superset of C (please correct me if that?s not the case?I could be very wrong), then a C program is also a C++ program."

Yes, but that is a poor excuse. The C# benchmark uses a class, and the C++ class would have been pretty much the same, bar tha main function being external to the class.

You need to look at the "iostreams" C++ standard library header (you may find it as "iostreams.h" under some environments.) Look at the cout instance variable, and note that it completely replaces printf, as iostreams replaces stdio.

for example:

printf("my value %d", d);

vs.

cout << "my value " << d;

There is obviously some different code going on here, and there *will* be a different result, if only fractional.

There will be other things you could have done too.

.NET benchmarks are illegal
by Per Arneng on Fri 9th Jan 2004 11:56 UTC

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dn...

snip --
You may not disclose the results of any benchmark test of the .NET Framework component of the OS Components to any third party without Microsoft's prior written approval. Microsoft retains all right, title and interest in and to the OS Components. All rights not expressly granted are reserved by Microsoft.
snip ---

Sick but true...

Should include Fortran95
by Clive Page on Fri 9th Jan 2004 12:07 UTC

When doing numerical work and you want a list of _modern_ computer languages, then the list is incomplete without Fortran95. Available on a wide range of platforms, Fortran is still very widely used in scientific and technical applications, and Fortran2003 will be fully object-oriented.

If I had time I'd be happy to try these benchmarks on a few modern F90 compilers, but I do think they are a bit naive. Just one example: memory utilisation is an important factor in many problems: you need a few tests using large arrays or matrices.

> Last of all, I'd like to draw the author's attention to the
> .Net framework EULAs... It is in fact a violation of the
> EULA to produce benchmarks of this sort of .Net against
> other platforms. Which is why they haven't been done all
> over the place by now ;)

Such a requirement would be illegal in Europe, and more so in the US, it's against the right to free speach.

Re: .NET benchmarks are illegal
by Fabio Alemagna on Fri 9th Jan 2004 12:25 UTC

EULA's are not the law, thus this benchmark is not illegal. On the other hand, it's very lilely that that clause in the EULA is against the law, and as such invalid.

other benchmark
by Drazen Gemic on Fri 9th Jan 2004 12:30 UTC

Sorting would be interesting too.

DG

Intel C compiler
by Mig on Fri 9th Jan 2004 12:39 UTC

Why isn't the Intel C compiler benchmarked along with the others? I'm sure it would give the best result.

Reply to authors comments
by Ben Maurer on Fri 9th Jan 2004 12:57 UTC

Hello,

Although I see where you are coming from with your comments, I am not quite sure they are correct.

Your benchmark is designed to measure intensive Math and IO. Thus, it is designed to be useful for a programer who (gasp) is doing intensive Math and IO.

In such a program, the programmer would normally take the initative to make his program as fast as possible. If he saw that the IO in his VB program was 3x as slow as the raw IO speed achieved by C, he very likely would have profiled his program. Running the VB program in CLR profiler it is pretty clear what is up.

Your argument that: `Again: I was trying to replicate real-world, time-constrained, scenarios with programmers who know the basics but are by no means experts. Having said all that, I appreciate the advice about speeding up VB I/O. Some day I may re-code with that change in mind. ' is pretty much invalid then. If you want to simulate the performance for newbie programmers in each language, than that is what you should title your article.

Remember, the VS.net tutorials are not designed for writing high performance apps. They are designed to get you off the ground when designing something.

C/C++ Math Performance
by deSelby on Fri 9th Jan 2004 13:01 UTC

I suppose this might be the right place to mention that C/C++ has libraries available like Blitz++ that can make scientific and mathematical operations faster than in Fortran.

blah...
by Anonymous on Fri 9th Jan 2004 13:25 UTC

#include "time.h"
#include <iostream>

using namespace std;

template <class T> T arithmetic(const T min, const T max, double& rtime);

int main()
{
const int int_min = 1;
const int int_max = 1000000000; // 1B
const double double_min = 10000000000.0; // 10B
const double double_max = 11000000000.0; // 11B
const long long ll_min = 10000000000; // 10B
const long long ll_max = 11000000000; // 11B

cout << "start c++ benchmark" << endl;

double time;

arithmetic(int_min, int_max, time);
cout << "int arithmetic elapsed time: " << time << " ms. min=" << int_min
<< ",max=" << int_max << endl;

arithmetic(double_min, double_max, time);
cout << "double arithmetic elapsed time: " << time << " ms. min=" << double_min
<< ",max=" << double_max << endl;

arithmetic(ll_min, ll_max, time);
cout << "long long arithmetic elapsed time: " << time << " ms. min=" << ll_min
<< ",max=" << ll_max << endl;

cout << "stop c++ benchmark" << endl;

return 0;
}

class auto_clock
{
clock_t t0;
double& r;
public:
auto_clock(double& rtime)
: r(rtime)
{
t0 = clock();
if (t0 < 0) throw;
}

~auto_clock()
{
clock_t t1 = clock();
if (t1 < 0) throw;
r = (t1 - t0) * 1000.0 / CLOCKS_PER_SEC;
}
};

template <class T> T arithmetic(const T min, const T max, double& rtime)
{
auto_clock ac(rtime);

T r = 1;
T i = min;

while (i < max) {
r -= i++;
r += i++;
r *= i++;
r /= i++;
}

return r;
}

I ran this "benchmarks" on my Linux,PIII/600Mhz,320MB Ram
gcc 3.3.2 versus java 1.4.2(sun). The results are nearly the same gcc versus java
int gcc 21.0s : java 21.9s)
double 23.6s : 33s
long 71s : 69.9s,
trig 14s : 393s !!!!!!
I/O 2.5s : 40.7s !!!!!

and what about memory allocation ( creating/deleting) objects !!!!

someone mentioned this ....
Java actually knows more about how the code is used, which in theory should let it reach better performance than c/c++
bla bla bla ....

this is incredible ... :-))))



err
by KLAMATH on Fri 9th Jan 2004 13:55 UTC

"someone mentioned this ....
Java actually knows more about how the code is used, which in theory should let it reach better performance than c/c++
bla bla bla .... "

This is crap.....The JITer "knows" only about the current method. That method is the only thing that can be "enhanced". On the other hand when it's compiled the C/C++ compiler "has access" to the whole thing and can do a better job on "tweaking/enhancing" the speed...

"what about memory allocation ( creating/deleting) objects !!!! "
On this category the winner is CLR.

perl test and another test method
by Jimmy on Fri 9th Jan 2004 13:55 UTC

I've sent the auther a perl version that i hacked up during lunch break. I hope he'll post the results along with the others. I "ported" it from the C version, keeping the authors original style of writing (all math in perl/python is FP anyway)

I once did a test of perl vs. python vs. C (as ref. language) by using a small is_prime function, and then setting it off on a given range of primes. I didnt keep the results, but if i remember correct, C would finish quite fast, perl would take it's sweet time, and python was just dog slow (all tests without optimisation, and no precompilation for intepreted languages). I never did a Java port of it (target was a unix platform, and java is not my first choise there), but it would be interesting to see any comparisons. The algorithm is (C syntax ) :

int is_prime(unsigned long x)
{
if(x <= 3)
{
if(x == 1)
return(0); // 1 isnt a "real" prime
else
return(1);
}
else if ( x % 2 == 0)
return(0);
else
{
long ctr = 3;
while(1)
{
if((x % ctr) == 0)
return(0);
else if ((ctr * ctr) > x)
return(1);
ctr+=2;
}
}
}

Why did VB do so bad in IO
by Nick on Fri 9th Jan 2004 15:30 UTC

Why did VB do so bad in IO when they are all .Net langugues? I mean they were pretty equal up until the IO part. Any chances of getting the code published for each?

v Slashdotted!
by -1, troll on Fri 9th Jan 2004 15:49 UTC
IF C++ really is a superset of C...
by Anonymous on Fri 9th Jan 2004 15:58 UTC

It's not. C++ just looks like C because it was felt that it would make it more familiar to programmers. Originally Stroustrup's work focused on adding syntactic sugar to support OO programming with C. At this point it was a strict superset, and Stroustrup used ordinary C compilers along with a pre-processor.

Fast forward a few years and the language is much more complicated and has acquired the name "C++" despite objections. It has also escaped AT&T and is now the "next big thing". Unfortunately standardisation, and compiler quality still leave a lot to be desired.

An unfortunate side effect, especially outside Unix, was that people took C++ to be "C, only better" and began to insist that good C programs be re-written as bad C++ programs. Since C was in fact standardised, portable and had a working ABI, while C++ had not (some would say still has not today) these things, the cost was uncountable.

There's a section in Stroustrup's book "The C++ language" which confirms that in fact C++ is not a superset and was not intended to be. Since ISO C9X is actually newer than the C++ split, it has features which aren't present in C++ at all. Some programs are valid C and valid C++ but mean different things in each language, a recipe for disaster.

method-calls, string-operations, regexp, hashtables
by Jan M on Fri 9th Jan 2004 16:44 UTC

I missed benchmarking of 'real-life' actions like instantiation, method-calls with and without params, string-operations, regexp, hashtables...
(didn't read the whole article though)

[ copied from my slashdot posting on this article. ]

We weren't quite ready to release it, but we've been working on a language performance comparison test of our own. It is available at:

http://scutigena.sourceforge.net/

It's designed as a framework that ought to run cross-platform, so you can run it yourself. We haven't added it yet, but I think we really want to divide the tests into two categories. "Get it done" - and each language implements it the best way for that language, and "Basic features comparison" - where each language has to show off features like lists, hash tables, how fast function calls are, and so forth.

It's an ongoing project, so new participants are welcome! I would appreciate it if comments went to the appropriate SF mailing lists instead of here, so that I can better keep track of them.


I have benchmarked fast string search. Results here : http://www.arstdesign.com/articles/fastsearch.html

VB IO results are skewed
by Cameron on Fri 9th Jan 2004 18:07 UTC

The VB IO component of the benchmark uses the backwards compatibility routines to do IO -- routines which were never intended for performance. The correct way to IO in VB is to use the StreamReader and StreamWriter classes in the .NET frameworks.

benchmarks are for comparison and learning...
by Anonymous on Fri 9th Jan 2004 18:29 UTC

your comments about not having time to learn a language is fine if that is the quality of code that yout customers of Accenture want. However when you take upon yourself the idea to publish a BENCHMARK you have the right to do a standard job and this is totally suboptimal work. If I were your boss and had assigned you a benchmark and told you to publish it with Accenture's name on it. I would have told you that you DID NOT MEET expectations.

Having spent a number of years in Silicon Valley after a few years in what was then Big 6 consulting, your document fails for BOTH communities.

Memory footprint
by Antoine on Fri 9th Jan 2004 18:44 UTC

Memory footprint of the benchmark on different languages should be interesting too!

This was already done... Overall Performance of Computer Languages...
by Programming Expert on Fri 9th Jan 2004 18:56 UTC

Google -> The Great Programming Language shootout... for more details pertaining to system functions and tests...

Some benchmark ideas to keep you busy
by Luke McCarthy on Fri 9th Jan 2004 18:56 UTC

Function calls
Method calls
Recursion
Recursion (no tail optimisation)
Recursion with several locals (no tail optimisation)
Recursion with large local array (no tail optimisation)
Deeply nested (different) function calls

Heap allocate
Heap re-allocate
Heap de-allocate
Heap fetch, store, move
Heap allocate with random de-allocations for small sizes
Heap allocate with random de-allocations for large sizes
Heap allocate with random de-allocations for mixed sizes
Heap allocate and copy

Local fetch, store, increment
Global fetch, store, increment
Member fetch, store, increment (with or without getters/setters)
Array fetch, store, increment
Array fetch, store, increment (2 dimensional)
Pointer fetch, store, increment
Pointer-to-pointer fetch, store, increment

Loop and compare (while)
Static numbered loop (for)

Graphic composition
Bitmap flip
Bitmap blit
Bitmap flip while window moving

Linked list generation
Linked list sort
Linked list search
Linked list random insert
Linked list random insert (with automatic sort)
Linked list random deconstruct
Linked list destroy

XML parse
XML tree sequential traverse
XML tree random search

Socket send packets throughput over controlled LAN
Socket recieve throughput over controlled LAN
Socket two-way throughput over controlled LAN
Socket response/latency over controlled LAN
Socket response/latency under heavy load over controlled LAN

Signal/event response idle
Signal/event response in busy loop
Signal/event response under heavy memory access
Signal/event response under heavy I/O
Signal/event response under heavy calculation

Thread spawn latency
Fetch, store, move, copy, increment mutexed memory with many threads sharing

(All the random stuff should be pre-generated so that it's the same every test).

non-IEEE compliant
by grendel on Fri 9th Jan 2004 19:09 UTC

Important Note: Pentium trig functions are not IEEE compliant. GCC's trig functions are. If you need accuracy you cannot unse the built in intel trig functions. The MS compilers in this test used the built in trig functions. You can di this with GCC; however, you have to specify it in the code. I seriously doubt this was done.
<p>
Bottom line: The trig functions in this test are not computing the same thing. The MS results will give 5-6 digits of precision. GCC's will be correct to 12+. *HUGE* difference.

It's not clear from the text that you used the compile options to remove integer overflow checks and enable optimizations which are already done by default in C#. This can account for (sometimes) significant differences in the compiled code.

Test with IBM 1.4.1 on Linux
by Bjørn-Ove Heimsund on Fri 9th Jan 2004 19:18 UTC

I ran the benchmark with both Sun 1.4.2 and IBM 1.4.1. The IBM VM used 38.2 secs, while Sun used 121.5 secs, a considerable difference (yes, I used the server VM).

In particular, IBM was 8 times faster with the trigonometic computations, and almost twice as fast with longs.

Re: Re: Java, and legal considerations (Fabio Alemagna)
by Megol on Fri 9th Jan 2004 19:26 UTC

"Such a requirement would be illegal in Europe, and more so in the US, it's against the right to free speach."

No it isn't. I don't know how many times I have seen people make statements like this without really knowing what "free speech" is all about... But it is likely that those requirements would be unenforcable in most countries (for other reasons).

gcc results
by Ivan on Fri 9th Jan 2004 20:30 UTC

The gcc results look so bad because there is something
wrong with the math libraries in MinGW and cygwin.

Being extremely surprised by the fact that the trig
run times with gcc are almost 4 times longer than
with .NET, I redid the trig test with each operation
tested individually. Here the results on a dual boot
2 GHz P4 laptop (WinXP and SuSE Linux 9) using
-O3 -ffast-math -march=pentium4 -mfpmath=sse -msse2
as optimization options in both cases:

WinXP and the cygwin version of GCC 3.3.1:
sin: 1.03 seconds
cos: 1.02 seconds
tan: 10.33 seconds
log: 1.92 seconds
sqrt: 0.20 seconds
all 5 in the same loop: 14.36 seconds
WinXP and MinGW: results essentially identical

SuSE 9 and GCC 3.3.1:
sin: 1.02 seconds
cos: 0.99 seconds
tan: 1.16 seconds
log: 0.57 seconds
sqrt: 0.21 seconds
all 5 in the same loop: 3.59 seconds

Clearly, there is something wrong with the tan and
log functions on cygwin and MinGW.

So, the whole test on Linux:
integer arithmetic: 9.6
long integer: 24.5
double: 8.4
trig: 3.6
I/O: 1
total: 47.1

Someone was interested in the Intel compiler results,
here they are:
integer: 9.0
long integer: 39.9
double: 7.0
trig: 4.4
I/O: 1.1
total: 61.4
=> if you have to use 64 bit integers in your
program, don't use the Intel compiler.

VB.NET Slower?.... please!
by Corrado Cavalli on Fri 9th Jan 2004 21:04 UTC

You'd better learn VB.NET better before issuing such kind of benchmarks...
Thumbs down... :-(

A nit about the tests:

The java benchmark is using the Reader/Writer classes rather than their InputStream/OutputStream counterparts. Reader and Writer are performing unicode conversions, which is apparently having a major impact. Switching to the InputStream/OutputStream classes with the 1.4.2 jvm (-server option) on linux gave the IO portion of the benchmark a 24% speed boost.

I would also recommend that for this benchmark, rather than calling FileWriter.write(yourString), you do something more akin to:

byte[] b = yourString.getBytes();
(loop) {
yourFileOutputStream.write(b);
...
}

I believe that such a change would make the java IO portion of the benchmark more fair.

- Marty

JDK 1.5 times
by Washu on Fri 9th Jan 2004 21:46 UTC

I ran the benchmark with jdk 1.5 alpha compiler

Box is an Athlon XP 1700, Windows 2k 512M memory

Times:
int math 9796
double math 14406
long math 19735
trig 53890
i/o 6266
total benchmark 104094

Time in milliseconds, and I ran with -server option compiled with debug info off

Java and the log method
by S Javeed on Fri 9th Jan 2004 21:58 UTC

Christopher raised the question of why Java only provides a method to calculate the natural log and not one to calculate the log base 10. The only reason I can think of is that calculating log base 10 from the natural log is easily done using a routine of the form:

public double log10 (double number) {
return Math.log(number) / Math.log(10);
}

This routine is based on the standard mathematical formula log base a (x) = log base b (x) / log base b (a). It is applied in the following form here: log(x) = ln(x) / ln(10).

While this doesn't justify not putting it in there, it's possible that the minimalistic but complete provision of the natural log method is what was desired.

Just my $0.02.

delphi benchmark
by Marc collin on Fri 9th Jan 2004 23:09 UTC

hi

i convert the code to delphi:

Code :


program Benchmark;

{$APPTYPE CONSOLE}

uses
SysUtils,
MMSystem,
Math;

var
startTime :Longint;
stopTime :Longint;
elapsedTime :Longint;

intMax :integer;
doubleMin ;) ouble;
doubleMax ;) ouble;
longMin :Int64;
longMax :Int64;
trigMax ;) ouble;
ioMax :integer;

intArithmeticTime: double;
doubleArithmeticTime ;) ouble;
longCountTime ;) ouble;
trigTime ;) ouble;
ioTime ;) ouble;
totalTime ;) ouble;


function intArithmetic(intMax:integer):Longint;
var
intResult :integer;
i :integer;
begin

startTime := timeGetTime;
intResult := 1;
i := 1;

while (i < intMax) do
begin
intResult := intResult - i;
inc(i);
intResult := intResult + i;
inc(i);
intResult := intResult * i;
inc(i);
intResult := intResult div i;
inc(i);
end;

stopTime := timeGetTime;
elapsedTime := stopTime - startTime;

WriteLn('Int arithmetic elapsed time: ' + inttostr(elapsedTime) + 'ms with max of ' + inttostr(intMax));
WriteLn(' i: ' + inttostr(i) + ' intResult: ' + inttostr(intResult));
result := elapsedTime;

end;

function doubleArithmetic(doubleMin, doubleMax:Double):Longint;
var
doubleResult ;) ouble;
i :double;
begin

startTime := timeGetTime;

doubleResult := doubleMin;
i := doubleMin;

while (i < doubleMax) do
begin
doubleResult := doubleResult - i;
i:=i+1;
doubleResult := doubleResult + i;
i:=i+1;
doubleResult := doubleResult * i;
i:=i+1;
doubleResult := doubleResult / i;
i:=i+1;
end;

stopTime := timeGetTime;
elapsedTime := stopTime - startTime;

WriteLn('Double arithmetic elapsed time: ' + inttostr(elapsedTime) + ' ms with min of ' + floattostr(doubleMin) + ', max of ' + floattostr(doubleMax));
WriteLn(' i: ' + floattostr(i) + ' doubleResult: ' + floattostr(doubleResult));
result := elapsedTime;


end;

function longArithmetic(longMin, longMax:Int64):Longint;
var
longResult :Int64;
i :Int64;
begin

startTime := timeGetTime;

longResult := longMin;
i := longMin;

while (i < longMax) do
begin
longResult := longResult - i;
inc(i);
longResult := longResult + i;
inc(i);
longResult := longResult * i;
inc(i);
longResult := longResult div i;
inc(i);
end;

stopTime := timeGetTime;
elapsedTime := stopTime - startTime;

WriteLn('Long arithmetic elapsed time: ' + inttostr(elapsedTime) + ' ms with min of ' + inttostr(longMin) + ', max of ' + inttostr(longMax));
WriteLn(' i: ' + inttostr(i));
WriteLn(' longResult: ' + inttostr(longResult));
result := elapsedTime;


end;

function trig(trigMax:double):Longint;
var
sine :double;
cosine :double;
tangent :double;
logarithm :double;
squareRoot :double;
i :double;
begin


startTime := timeGetTime;

sine := 0.0;
cosine := 0.0;
tangent := 0.0;
logarithm := 0.0;
squareRoot := 0.0;
i := 0.1;
while (i < trigMax) do
begin
sine := Sin(i);
cosine := Cos(i);
tangent := Tan(i);
logarithm := Log10(i);
squareRoot := sqrt(i);
i := i+1;
end;

stopTime := timeGetTime;
elapsedTime := stopTime - startTime;

WriteLn('Trig elapsed time: ' + inttostr(elapsedTime) + ' ms with max of ' + floattostr(trigMax));
WriteLn(' i: ' + floattostr(i));
WriteLn(' sine: ' + floattostr(sine));
WriteLn(' cosine: ' + floattostr(cosine));
WriteLn(' tangent: ' + floattostr(tangent));
WriteLn(' logarithm: ' + floattostr(logarithm));
WriteLn(' squareRoot: ' + floattostr(squareRoot));
result := elapsedTime;


end;

function io(ioMax:integer):Longint;
var
textLine :string;
i:integer;
myLine:string;
F:TextFile;
begin

startTime := timeGetTime;;

textLine := 'abcdefghijklmnopqrstuvwxyz1234567890abcdefghijklmnopqrstuvwxyz1234567 890abcdefgh';
i := 0;
myLine := '';

assignfile(F,'TestDelphi.txt');
rewrite(F);
while (i < ioMax) do
begin
Writeln(F,textLine);
inc(i);
end;

CloseFile(F);

stopTime := timeGetTime;
elapsedTime := stopTime - startTime;

WriteLn('IO elapsed time: ' + inttostr(elapsedTime) + ' ms with max of ' + inttostr(ioMax));
WriteLn(' i: ' + inttostr(i));
WriteLn(' myLine: ' + myLine);
result := elapsedTime;
end;

begin
intMax := 1000000000;
doubleMin := 10000000000;
doubleMax := 11000000000;
longMin := 10000000000;
longMax := 11000000000;
trigMax := 10000000;
ioMax := 1000000;


WriteLn('Start Delphi benchmark');

intArithmeticTime := intArithmetic(intMax);
doubleArithmeticTime := doubleArithmetic(doubleMin, doubleMax);
longCountTime := longArithmetic(longMin, longMax);
trigTime := trig(trigMax);
ioTime := io(ioMax);
totalTime := intArithmeticTime + doubleArithmeticTime + longCountTime + trigTime + ioTime;

WriteLn('Total Delphi benchmark time: ' + floattostr(totalTime) + ' ms');
WriteLn('End Delphi benchmark');

Readln;

end.

delphi suite...
by Marc collin on Fri 9th Jan 2004 23:12 UTC

in the trig function i modify

i := 0.0;

by

i := 0.1;

because log10 delphi function don't accept 0...

am1800+ 512mb
result test:
nt aritmetic: 8121ms
double aritmetic: 11627ms
long aritmetic: 112101ms
trigo: 3896ms
io: 3835ms
total: 139580ms



Data Access
by Data Joe on Fri 9th Jan 2004 23:16 UTC

Fascinating stuff. The most interesting thing I've noticed is that the AUTHOR has been one of the few (possibly the only one I didn't read EVERY post) who mentioned the lack of analysis regarding data base access and its importance for "real world" applications.

I would LOVE to see the results of a few of these languages hitting some databases. I know that would open up the very ugly world of data base comparisons, but what the heck. It would be great to run tests against Access, MS-SQL, DB2 and Oracle to see how each language and it's preferred DB driver does.

It would be INCREDIBLY interesting to compare ADO.NET with JDBC, ODBC or whatever but I think it would take more resources than a single P4 laptop huh?

To Author: Python benchmark a bit misleading
by Critic on Sat 10th Jan 2004 00:10 UTC

The starting page claims you test 32-bit and 64-bit math. Python doesn't use native 64-bit types on 32-bit architectures if I recall, and promotes all integers that don't fit into a native type into long (arbitrary-precision) integers. Long integers don't use hardware multiplication or specialized algorithms, while all of the other languages do use those, doing Python some injustice. I think this should've been mentioned at the beginning of the article, because what Python is doing isn't really 32-bit math or 64-bit math in the sense that most programmers are used to.

You could also do much better in some tests by using Numeric Python/numarray, which is designed for these kinds of problems.

And about Perl vs. Python: both are compiled into intermediate code, in Perl it just has a different form.

To those who do other benchmarks
by Anonymous on Sat 10th Jan 2004 01:30 UTC

Since the hardware you use is different it´s difficult to make comparations so i suggest to use gcc with the options in the article as baseline.

delphi code
by Marc collin on Sat 10th Jan 2004 01:47 UTC

to increase delphi performance for long test...

change

longResult := longResult div i; (112101ms)

to

longResult := Trunc(longResult/i); (19958ms )

new langage
by Marc collin on Sat 10th Jan 2004 02:38 UTC

somebody can translate the code to: Caml, EIFFEL, assembler...

=======
http://pages.infinit.net/borland/

.Net framework Unsigned
by Anonymous on Sat 10th Jan 2004 04:10 UTC

In the earlier messages, someone complained about a lack of unsigned data types in java and C#. The C# comment appears to be incorrect.
Look at:
System.UInt16
System.UInt32
System.UInt64
System.UIntPtr

These all claim to be unsigned integers for anyone who needs them

Memory
by qw on Sat 10th Jan 2004 04:27 UTC

I'm interested in memory benchmarks. I know Java JITs the code, but what if this JITting causes > 60 MB of memory consumption (when it could be 5 MB, for example)?? The machine would swap a lot (supposing the machine is being heavily used; 60 MB may not be too much, but what if you have 5 x 60 MB apps running?)

There is a serious problem with the long math benchmarks, due to python being a dynamically (not statically) typed language.
In C you can say
long long int i;
and get a 64-bit signed integer (in C99).

If you do an operation that makes 'i' too big it overflows, 'i' then contains the incorrect answer, but its type remains the same.
Python works differently to C (and the others). You can use a (plain) integer type, add to it, and instead of overflowing it is dynamically promoted to a long integer.

$ python
>>> i=1
>>> type(i)<type 'int'>
>>> i=i+pow(2,32)
>>> i
4294967297L
>>> type(i)<type 'long'>

Furthermore, the "long integer" does not have 64-bit precision, it has unlimited precision!
For example try
$ python
>>> n = pow(2,63)-1
>>> n
9223372036854775807L
>>> n = n * 10
>>> n
92233720368547758070L
>>> pow(2,128)
340282366920938463463374607431768211456L
>>> pow(2,256)
1157920892373161954235709850086879078532699846656405640394575840079131 29639936L

This is why in the long math (64-bit integer) benchmark the LongResult for C and Python differ.
C has 776627965
Python has 10000000000
In a integer benchmark the results must match otherwise you are not benchmarking the same operations!

For the third iteration through the loop, i = 10000000002
longResult = 10000000001 * 10000000002= 100000000030000000002
This is more than a 64-bit signed int can handle and it overflows. Python however calculates the correct results for integers bigger than 64 bits.

It looks to me like every multiply operation in the long integer test in C overflows 64 bits, so you are benchmarking 1/4 billion 64-bit integer overflows in C (and the others) against 1/4 billion 128-bit integer multiplications in Python, not a fair comparison.
Python is slower but it gives you the correct result!

A fair benchmark would involve the recoding C (and others) so that they check for 64-bit integer overflow and then do 128-bit arithmetic. Not so easy to do, Python however gives you this for free as its built in.

VB Net -Not
by -mhd on Sat 10th Jan 2004 06:46 UTC

VB6 would have been faster.

GCC 3.3.2 benchmark under Linux
by Pier Luigi Fiorini on Sat 10th Jan 2004 08:55 UTC

AthlonXP 1800+
512MB RAM
Gentoo 1.4 compiled with -mcpu=athlon-xp
Linux 2.4.23 + Con Kolivas patchset
# gcc -v
gcc version 3.3.2 20031201 (Gentoo Linux 3.3.2-r4, propolice)
# gcc -march=athlon-xp -mmmx -O3 Benchmark.c -s -o bench_c -lm
Start C benchmark
Int arithmetic elapsed time: 7950 ms with intMax of 1000000000
i: 1000000001
intResult: 1
Double arithmetic elapsed time: 7480 ms with doubleMin 10000000000.000000, doubleMax 11000000000.000000
i: 11000000000.000000
doubleResult: 10011632717.388229
Long arithmetic elapsed time: 18750 ms with longMin 1410065408, longMax 2
i: -1884901888
longResult: 776627965
Trig elapsed time: 4270 ms with max of 10000000
i: 10000000.000000
sine: 0.990665
cosine: -0.136322
tangent: -7.267119
logarithm: 7.000000
squareRoot: 3162.277502
I/O elapsed time: 1220 ms with max of 1000000
last line: abcdefghijklmnopqrstuvwxyz1234567890abcdefghijklmnopqrstuvwxyz12345678 90abcdefgh
Total elapsed time: 39670 ms
Stop C benchmark

Very bad benchmark: comparing oranges to apples
by mastro on Sat 10th Jan 2004 10:11 UTC

This guy really need to learn to write better benchmarks, because in this one he is comparing oranges to apples (e.g. in Python the I/O benchmark preallocates in memory a list of all the lines to be written to the file) and almost none of the benchmarks test *real* *life* performance (that very often involves complicated data structure and deep call stacks: things like "intResult -= i++;" test only your CPU speed, nothing more).

Memory
by Alacran on Sat 10th Jan 2004 16:16 UTC

What about, memory footprint comparation .....

gcj vs java-1.4.2
by Elendal on Sat 10th Jan 2004 17:32 UTC

Kernel 2.6.0-1mdk
glibc-2.3.3-1mdk

CPU: Athlon 1.2Ghz
Mem: 256MB

/usr/java/j2sdk1.4.2_01/jre/bin/java -version
java version "1.4.2_01"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_01-b06)
Java HotSpot(TM) Client VM (build 1.4.2_01-b06, mixed mode)

/usr/java/j2sdk1.4.2_01/bin/javac -O -target 1.4 -g:none Benchmark.java

------

gcj --version
gcj (GCC) 3.3.2 (Mandrake Linux 10.0 3.3.2-3mdk)
Copyright (C) 2003 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

gcj -o Benchmark -O3 -march=athlon -mcpu=athlon -fomit-frame-pointer -fforce-addr -fforce-mem -ffast-math -pipe -falign-loops -falign-functions -falign-jumps --main=Benchmark Benchmark.java


/usr/java/j2sdk1.4.2_01/jre/bin/java -server Benchmark
Start Java benchmark
Int arithmetic elapsed time: 10621 ms with max of 1000000000 [12:02:26]
i: 1000000001
intResult: 1
Double arithmetic elapsed time: 17518 ms with min of 1.0E10, max of 1.1E10
i: 1.1E10 [12:27:35]
doubleResult: 1.00116327174955E10
Long arithmetic elapsed time: 34736 ms with min of 10000000000, max of 11000000000
i: 11000000000
longResult: 776627965
Trig elapsed time: 118742 ms with max of 1.0E7
i: 1.0E7
sine: 0.9906646477361263
cosine: -0.13632151600483616
tangent: -7.267118770165242
logarithm: 16.118095550958316
squareRoot: 3162.2775020544923
IO elapsed time: 6665 ms with max of 1000000
i: 1000001
myLine: abcdefghijklmnopqrstuvwxyz1234567890abcdefghijklmnopqrstuvwxyz12345678 90abcdefgh
Total Java benchmark time: 188282 ms
End Java benchmark



./Benchmark (gcj)
Start Java benchmark
Int arithmetic elapsed time: 10632 ms with max of 1000000000 [12:28:01]
i: 1000000001
intResult: 1
Double arithmetic elapsed time: 6022 ms with min of 1.0E10, max of 1.1E10
i: 1.1E10
doubleResult: 1.0011632717389214E10
Long arithmetic elapsed time: 25322 ms with min of 10000000000, max of 11000000000
i: 11000000000
longResult: 776627965
Trig elapsed time: 18140 ms with max of 1.0E7
i: 1.0E7
sine: 0.9906646477361245
cosine: -0.1363215160048489
tangent: -7.267118770165242
logarithm: 16.118095550958316
squareRoot: 3162.2775020544923
IO elapsed time: 13913 ms with max of 1000000
i: 1000001
myLine: abcdefghijklmnopqrstuvwxyz1234567890abcdefghijklmnopqrstuvwxyz12345678 90abcdefgh
Total Java benchmark time: 74029 ms
End Java benchmark

Hazards of microbenchmarking
by Paris Hilton Returns on Sat 10th Jan 2004 18:50 UTC


If your real-world application resembles this benchmark, then the results >might< be useful.

For any floating point codes I have seen (Orbit/Attitide determination, weather simulation, and so forth), accurate results are more important than sheer speed. See the evaluation on this bug report for a discussion of wildly inaccurate results from simple-minded calculations:

http://developer.java.sun.com/developer/bugParade/bugs/4807358.html


As mentioned in the Evaluation: if you don't care about accurate, consistent results, why do you need the calculation at all?

Java Trig Performance
by clfischer on Sat 10th Jan 2004 21:36 UTC

By altering the values given to sin, cos, and tan from 0 - trigMax and normalizing then to 0 - 2PI, Java 1.4.2_03 benchmark improves from 57 secs to 10 secs. This change provides a more even distribution of radian values than does the original 0 - 10M. It would seem that the cost of certain radian values for the trig functions is not evenly distributed in java.

I wonder if it is the same for all languages?

Curt

MS C++ 6.0 vs MS C++ .net (2003)
by Rod S on Sat 10th Jan 2004 22:13 UTC

I was able to test both the 6.0 and .net2003 compilers on a PIII 733 Win2000 box.

Best times of three runs. Runs varied no more than 2%.

6.0
Int 16183
Double 22742
Long 41220
Trig 5928
I/O 5148
Total 91221

.net (new unmanaged project)
Int 16183
Double 23423
Long 41360
Trig 5888
I/O 5258
Total 92112

RE: Lots of other bottlenecks
by Shea on Sat 10th Jan 2004 22:14 UTC

>> the latest Hotspot does an excellent job getting rid of
>> array bounds checking in many instances

How do you know that Hotspot does an excellent job of eliminating ABC's? Sun does not tell us the means by which it determines whether ABC's can be eliminated, nor do they provide a mechanism telling us where ABC's have successfully been eliminated (which would allow us to do some reverse engineering and figure out in what scenarios ABC's can be eliminated)... this leads to a question:

Can code be written in a way which would make the compiler better able to eliminate ABC's? I'm not quite sure what this would entail, but I'm curious as to whether or not it can be done.

c on cygwin
by Anonymous on Sat 10th Jan 2004 23:13 UTC

I don't know if this is just me, but i've tried several times compiling c code in cygwin e the same c code outside cygwin(linux for exemple) and it seems that gcc in cygwin suffers some lag.
So perhaps, the benchmark of C was not that right

Worthless Benchmark
by Anonymous on Sun 11th Jan 2004 04:31 UTC

A lang. benchmark is not going to be very useful in this context. As numerous people have pointed out, you need a benchmark that will simulate more of what you need your code todo in a given project, to see if there's any performance benefit from switching to another lang. This will generally, be different for each project. Not to mention, this is more of a compiler benchmark, then a lang. benchmark, and GCC's claim to fame, is portabilty, not optimization, especially for non-x86 arch's (Sun's Compilers, and SGI's MIPSpro compilers will give you a binary that usually performs about %300-%500 Better than a gcc built binary, since I don't use x86 hardware much, I am not an expert in what compiler you should use to benchmark on that platform, but I would imagine Intel's compiler to be leaps and bounds ahead of gcc in optimization.

Thus the blanket assurtion that C shouldn't be used for speed anymore is wrong, and then adding that the code is less maintainable, is absurd. People have been maintaining C source far longer than most other languages in such wide use (Fortran/etc. the execption, and they still have their place as well). The Best lang is often times, the one the programmer is most familiar with, but C can generally be optimized much better than other languages (C++ included.), although some languages, like fortran, are easier for the compiler to parallize for MP's (take into account SGI's -apo options) for multiple reasons beyond the scope of this comment.

Also
by Anonymous on Sun 11th Jan 2004 04:33 UTC

The author needs to give the machine specs, compilers and versions used, source code, and the flags passed to the compiler... for example on irix I might compile like

cc -Ofast=ip30 -TARG:platform=ip30:isa=mips4:processor=r14000 etc. etc. etc.

RE:RE: gcc and python in their native environments?
by Uno Engborg on Sun 11th Jan 2004 06:13 UTC

Actually perl is compiled. It's just that it normally isn't stored like a compiled binary. Interpreting languages are usually iterpreted on a line by line bases, while all of the perl code is read and converted to something machine executable before a perl program starts doing some useful work.

At least in some unix environments it is possible to dump the compiled perl binary to create a true executable.

VB file i/o can be easily improved,
by Vinay on Mon 12th Jan 2004 06:31 UTC

Hi,

I was very surprised to find VB.NET's bad file i/o performance especially compared to C# because they should indeed produce close results.
And I just checked the VB benchmark Code and the reason was evident. The author used FileOpen, LineInput functions which were available as keywords (open , line input, close etc.) in VB6. In .NET word, MS has provided them as functions to make migration simpler however at the cost of performance.
Use of functions from System.IO namespace (as C# code must be using) should give the same performance for VB.NET.

-Vinay.

This is very funny "benchmark" :)
by Dejan Lekic on Mon 12th Jan 2004 08:04 UTC

Author(s) of this benchmark must be drunk during the new-year-days... Using GCC from Cygwin as a reference is the most ridiculous thing some "programmer" can do. ;) No comment.

This benchmarks are valid ONLY on windowz
by SevenSky on Mon 12th Jan 2004 13:29 UTC

the result only show us Windowz is an ill platform, it is clear that C is the fastest on Linux ;)

some links about java strict-fp
by AnonymousCoward on Mon 12th Jan 2004 19:44 UTC

as has been mentioned by others, the java performance is probably due to strict floating point ... here are some links on that:

http://java.sun.com/docs/books/jls/second_edition/html/expressions....

http://java.sun.com/docs/books/vmspec/2nd-edition/html/Concepts.doc...

http://www.jcp.org/en/jsr/detail?id=84