Linked by ebasconp on Fri 10th Jun 2011 22:22 UTC
Benchmarks "Google has released a research paper that suggests C++ is the best-performing programming language in the market. The internet giant implemented a compact algorithm in four languages - C++, Java, Scala and its own programming language Go - and then benchmarked results to find 'factors of difference'."
Order by: Score:
Results are Implementation dependent
by Alfman on Fri 10th Jun 2011 22:38 UTC
Alfman
Member since:
2011-01-28

I didn't see anything about which compilers they chose to compare.

Performance characteristics not the result of language syntax so much as the implementation of the compilers and supporting libraries. Google should have known this from the get go, so why didn't they compare implementations?

The significant variation in performance between C/C++ compilers clearly prove (as if any proof was needed) that compiler implementations do matter.

Another point is that some language implementations (eiffel) compile down to C and then are compiled/optimized by the C compiler.

The question of which language syntax performs better is meaningless.

Reply Score: 3

Valhalla Member since:
2006-01-24


Performance characteristics not the result of language syntax so much as the implementation of the compilers and supporting libraries.

Certain language characteristics have obvious performance consequences, for instance java, scala, go are all garbage collected which certainly comes with a performance penalty.


Another point is that some language implementations (eiffel) compile down to C and then are compiled/optimized by the C compiler.

Eiffel wasn't measured here was it? Nor was any other language (I know of Vala) which compiles to C?

Either way, I doubt there was any news to anyone that C++ was faster than Java, Scala and Go. As for the compilers of these languages, certainly Go has a very immature compiler suite given it's young age and it still being heavily in development, Gccgo (which relies on GCC's backend for code optimization) is said by the Go devs to produce much faster code than the official compilers.

Reply Score: 5

dorin.lazar Member since:
2006-12-15

"
Performance characteristics not the result of language syntax so much as the implementation of the compilers and supporting libraries.

Certain language characteristics have obvious performance consequences, for instance java, scala, go are all garbage collected which certainly comes with a performance penalty.
"

I doubt that those exact features were put to test - garbage collection has nothing to do with this paper - and I doubt that it had anything to do with memory allocation. As a side note, I found that garbage collection does work better than manual memory management in memory-intensive programs.

Reply Score: 0

Elv13 Member since:
2006-06-12

Read the last part of the paper.

(myself) Manual delete are faster as long as the destructor is simple. You can have a pseudo fast GC my deriving all objects from a GC class and use it to increment/decrement object reference count and use a templated, inlined accessor like function instead of = or override = manually for every class (to keep the object count). Then when the ref count = 0, then delete itself.

But this is not "safe" nor complete. Real time ref analysis is slow. Static/dynamic analysis also require to keep/manage a complete list of all memory allocation, this take management time, interpreter time (for VM based language) and a lot of memory. Al least, it how I see it.

Reply Score: 2

f0dder Member since:
2009-08-05

How do you handle cyclic references with that approach? (root-tracing GC helps a lot in handling those).

How do you suspect it performs in heavily threaded code? (you need mutexing both on the refcount and on your heap).

GC isn't the end-all-be-all, but for a lot of scenarios it works very well - and it's going to be a lot of effort emulating in C++, and won't be a full solution as long as you need to interface with legacy code.

Don't get me wrong, I'm a sucker for C++ and RAII, but it's not the only-and-best solution for all scenarios.

Reply Score: 2

sakeniwefu Member since:
2008-02-26


How do you suspect it performs in heavily threaded code? (you need mutexing both on the refcount and on your heap).

Faster than GC if it was to free memory at the same rate.

GC performs better than manual memory management only in a small subset of scenarios. It is true that it is good enough for many more, and that it keeps dumb programmers from freeing used objects and the like.

However, as soon as their design assumptions aren't met they fail badly. Generational GC aces object creation and destruction benchmarks. As true as it is that it bloats up until virtual memory dries up, when faced with real world apps that don't follow the general rule.

Reply Score: 2

pantheraleo Member since:
2007-03-07

GC performs better than manual memory management only in a small subset of scenarios.


GC typically performs better in situations where there is a lot of fairly small amounts of memory allocation and deallocation happening in a relatively short period of time. The typical (and usually recommended) practice in C or C++, of course, is to free memory as soon as you are done with it. In this type of situation, that can create a lot of overhead. Garbage collection will use more memory of course, but will be significantly faster because it optimizes memory deallocation. The infilife algorithm is probably one of the simplest test cases where this pattern can be seen, and where GC runs circles around manual memory management.

It is true that it is good enough for many more, and that it keeps dumb programmers from freeing used objects and the like.


I don't think it's fair to use the word "dumb" really. Everyone makes mistakes, or sometimes forgets to free memory when they are done with it. And anyone who tells you they have never de-referenced a pointer that they though should point at something, but was actually null, is flat out lying. Especially in today's high pressure "release or die" environments where software release cycles tend to be very fast paced.

And of course, I'm sure I don't have to repeat it here because it is common knowledge these days. But with RAM and CPUs as cheap as they are these days, it's far more economical to throw another CPU, or throw more RAM at the problem, than it is to spend even a couple of hours trying to heavily optimize code to get just a little more speed or a little less memory consumption out of it.

And of course, that was one of the flaws with this benchmark, which Google admitted flat out. In the real world, no one spends that much time optimizing their C++ code. Not with development cycles as fast paced as they are today.

Edited 2011-06-13 13:43 UTC

Reply Score: 2

zlynx Member since:
2005-07-20

And of course, I'm sure I don't have to repeat it here because it is common knowledge these days. But with RAM and CPUs as cheap as they are these days, it's far more economical to throw another CPU, or throw more RAM at the problem, than it is to spend even a couple of hours trying to heavily optimize code to get just a little more speed or a little less memory consumption out of it.


A little more speed? A little less memory consumption?

A popular Java program these days is Minecraft. I love it myself. But it has cases where, when memory consumption hits 700 MB or so, the game gets insanely choppy. This is the garbage collection happening. It sucks. It's unusable.

To fix it, the recommendation is to run the 64-bit server JVM and let it use a gigabyte of RAM.

A whole gigabyte, to hold a chunk of active blocks which, if stored in a well-thought out C++ data structure, would only take a hundred megabytes or so.

Reply Score: 2

TemporalBeing Member since:
2007-08-22

"GC performs better than manual memory management only in a small subset of scenarios.


GC typically performs better in situations where there is a lot of fairly small amounts of memory allocation and deallocation happening in a relatively short period of time. The typical (and usually recommended) practice in C or C++, of course, is to free memory as soon as you are done with it. In this type of situation, that can create a lot of overhead. Garbage collection will use more memory of course, but will be significantly faster because it optimizes memory deallocation. The infilife algorithm is probably one of the simplest test cases where this pattern can be seen, and where GC runs circles around manual memory management.
"

Problem with GC is that your applications take a performance penalty at unpredictable times - whenever the GC decides to clean-up. That can adversely affect the rest of the application and you have little to no control over it. Yes, some GC algorithms allow you to specify a good time to do GC clean-up, but AFAIK they are neither popular nor the 'top dog' algorithms.

Reply Score: 2

f0dder Member since:
2009-08-05

"
How do you suspect it performs in heavily threaded code? (you need mutexing both on the refcount and on your heap).

Faster than GC if it was to free memory at the same rate.
"...and the whole point is that you aren't doing it that way when you're dealing with GC, leading to scenarios where you get much better performance. Right tool for the job and all that, ever heard about it? ;)

Of course you can spend a lot of time developing a different approach in C++, or you can spend time writing a custom memory allocator with different characteristics, but that's not always time effective.

It is true that it is good enough for many more, and that it keeps dumb programmers from freeing used objects and the like.
You generally have to be more careful wrt. deallocation in GC languages since they typically only deal with memory rather than resources, whereas in C++ we can utilize the wonderful RAII. If you think Java and C# programmers are "dumb" and/or lazy because of GC, I suspect you haven't done much development in either of those languages.

Reply Score: 1

Alfman Member since:
2011-01-28

Valhalla,

"Certain language characteristics have obvious performance consequences, for instance java, scala, go are all garbage collected which certainly comes with a performance penalty"

That's an assumption, which may or may not be true in practice.

Generational garbage collection algorithms can provide excellent data locality. C/C++ objects are generally forced to live out their lives at a fixed address, which is often not ideal after address space becomes fragmented.

Malloc/Free are more expensive than certain garbage collection algorithms such as semi-space collectors, which are absolutely trivial.

The tradeoffs are too complex to make absolute statements about which is better. I wouldn't be comfortable saying which is better without benchmarking them on a specific project.

The following link advocates garbage collection over malloc/free for performance reasons (although I think their conclusions are biased and overgeneralised)

http://www.ibm.com/developerworks/java/library/j-jtp09275/index.htm...


All this said, I personally prefer to control new/delete myself.


"Eiffel wasn't measured here was it? Nor was any other language (I know of Vala) which compiles to C?"

If we compare modern languages which are on more equal footing feature-wise, then which is faster: perl or ruby or python?

I'm saying that it doesn't make sense to compare the *language* (as google claims to have done), one has to compare the *implementation*.

Reply Score: 4

Carewolf Member since:
2005-09-08

Technically garbage-collected languages should be faster but just use more memory (faster as long as the extra memory is available). In practice though this is almost never the case, but you can make many examples where Java runs faster than a C++ program, because they have no "inline" overhead of memory management.

Reply Score: 2

Really?
by poundsmack on Fri 10th Jun 2011 22:52 UTC
poundsmack
Member since:
2005-07-13

faster than programs coded in, say, assembly? I kind of find that hard to believe.

Reply Score: 2

RE: Really?
by Neolander on Fri 10th Jun 2011 22:54 UTC in reply to "Really?"
Neolander Member since:
2010-03-08

Depends on who codes the assembly. I've heard once that 90% of ASM developers can't beat the output of a modern optimizing C/C++ compiler.

Reply Score: 4

RE[2]: Really?
by poundsmack on Fri 10th Jun 2011 22:56 UTC in reply to "RE: Really?"
poundsmack Member since:
2005-07-13

That's very true, it all really depends on the quality of the code. And with Intel's C++ compiler, a well coded C++ program can get pretty fast depending on the operation.

Reply Score: 6

RE[3]: Really?
by viton on Sat 11th Jun 2011 10:55 UTC in reply to "RE[2]: Really?"
viton Member since:
2005-08-09

And with Intel's C++ compiler

You think of ICC as it has been sent by God from heaven.
In reality it is just +-5% difference on average between ICC and GCC4 on performance side.

Reply Score: 3

RE[2]: Really?
by ShadesFox on Sat 11th Jun 2011 04:10 UTC in reply to "RE: Really?"
ShadesFox Member since:
2006-10-01

And even when you can beat the compiler, often it isn't worth the time sunk into the asm coding to beat the compiler.

Reply Score: 2

RE: Really?
by MacMan on Sat 11th Jun 2011 01:06 UTC in reply to "Really?"
MacMan Member since:
2006-11-19

faster than programs coded in, say, assembly? I kind of find that hard to believe.

Very few people can write code as good as a modern C++ compiler.

I work with a guy who kept bragging about how great his LU factorization that he hand-coded directly in assembler is. Well, turns out this clown wrote it using the ancient x87 FP stack, he never heard of SSE (this was a few months ago).

So, I through together a quick and dirty LU factorization in C++ (OK, copied it out of numerical recipes), compiled it with gcc 4.4, enable SSE, and it was about 20 freaking times faster than his "hand tuned assembler"

And when the next great SSE 4,5,6... comes out, all I do is re-compile:)

Unless they have a really really good reason for doing it, when someone tells you they want to write something in assembler, you should probably back hand them.

Reply Score: 4

RE[2]: Really?
by ebasconp on Sat 11th Jun 2011 01:19 UTC in reply to "RE: Really?"
ebasconp Member since:
2006-05-09

Agree completely with you.

Assembler has THE POTENTIAL that could help writing the fastest algorithms; but just real geniuses get full advantage of such potential.

If you have 1000 man; all of them will be able to cut a wood into pieces; 20% of them will be able to build a chair with the wood but just one or two will be able to create a beautiful, durable, reliable chair ;)

Reply Score: 4

Assembly
by jessesmith on Sat 11th Jun 2011 01:33 UTC in reply to "RE: Really?"
jessesmith Member since:
2010-03-11

I've done some work in Assembly and I tend to agree with you. It is possible to make Assembly code that's smaller and more efficient than C/C++, but it is rarely worth it. And most people aren't going to write asm that's better than a modern compiler with optimizations turned on.

But it is nice to have the option of falling back on Assembly code when you really need to cut corners.

Reply Score: 2

RE[2]: Really?
by panzi on Sat 11th Jun 2011 02:36 UTC in reply to "RE: Really?"
panzi Member since:
2006-01-22

What did the guy say when you showed him your results?

Reply Score: 2

RE[3]: Really?
by ebasconp on Sat 11th Jun 2011 02:59 UTC in reply to "RE[2]: Really?"
ebasconp Member since:
2006-05-09

He actually said: "That occurred because you executed my program in client mode instead of server mode. That makes an enormous difference. Running it in server mode will almost always run circles around client mode." ;)

Reply Score: 4

RE[3]: Really?
by MacMan on Sat 11th Jun 2011 14:34 UTC in reply to "RE[2]: Really?"
MacMan Member since:
2006-11-19

What did the guy say when you showed him your results?


Not much, just kind of starred at me, then started saying some nonsense about how in Russia, everything is written in assembler because the the compilers are crippled by the CIA, or something to that extent.

Reply Score: 2

RE[2]: Really?
by eldarion on Tue 14th Jun 2011 13:26 UTC in reply to "RE: Really?"
eldarion Member since:
2008-12-15

Just a small note: it's assembly (the language), not assembler (the software that creates "object files" from assembly code).

Reply Score: 1

RE[3]: Really?
by Neolander on Tue 14th Jun 2011 13:50 UTC in reply to "RE[2]: Really?"
Neolander Member since:
2010-03-08

Just a small note: it's assembly (the language), not assembler (the software that creates "object files" from assembly code).

If he's French, he has half an excuse though : here, we say "assembleur" (which sounds like "assembler") for both ;)

Edited 2011-06-14 13:50 UTC

Reply Score: 1

Meant as a Scala Demonstration
by Rooki on Fri 10th Jun 2011 23:20 UTC
Rooki
Member since:
2011-05-12

This paper has some pretty serious issues as has been discussed extensively https://groups.google.com/forum/#!topic/golang-nuts/G8L4af-Q9WE and http://www.reddit.com/r/programming/comments/hqkwk/google_paper_com...

For instance nearly all the optimization for the C++ code could have been applied to the Java code. Also according to Ian Lance Taylor:

Despite the name, the "Go Pro" code was never intended to be an example
of idiomatic or efficient Go. Robert asked me to take a look at his
code and I hacked on it for an hour to make a little bit nicer. If I
had realized that he was going to publish it externally I would have put
a lot more time into making it nicer.

I'm told that the program as compiled by 6g is significantly faster now
than it was when Robert did his measurements.


So yeah, in general a fairly floored benchmark, see the threads I linked for more details. I'm sure there's equivalent Java and Scala biased threads floating around as well!

Edited 2011-06-10 23:22 UTC

Reply Score: 3

pantheraleo Member since:
2007-03-07

I agree the paper has multiple problems.

Another issue with the the Java portion is they don't state whether they ran the JVM in client mode or server mode. That makes an enormous difference. Running it in server mode will almost always run circles around client mode.

Also, it's a rather theoretical benchmark that says next to nothing about real world performance. As they admit themselves, they didn't test threading performance or anything. The benchmark is not very indicative of the kinds of operations that most real world applications actually perform.

Edited 2011-06-10 23:46 UTC

Reply Score: 5

Kasi Member since:
2008-07-12

You sir, are a comedic god.

Reply Score: 0

pantheraleo Member since:
2007-03-07

You sir, are a comedic god.


You, sir. Added absolutely nothing of value to this conversation with this stupid comment.

Reply Score: 2

Slambert666 Member since:
2008-10-30

"You sir, are a comedic god.


You, sir. Added absolutely nothing of value to this conversation with this stupid comment.
"

Well, he did. By calling you a "comedic god" he implied that your comment was pure comedy. Hence he adds value by directing unsuspecting readers of your original comment to the fact that it is COMPLETELY STUPID and could perhaps be interpreted as comedy and not informed discourse.

Reply Score: 1

pantheraleo Member since:
2007-03-07

Well, he did. By calling you a "comedic god" he implied that your comment was pure comedy


Well, considering it got voted up several times, apparently most readers didn't think it was stupid. So maybe that says more about your level of stupidity if you thought it was stupid? Or perhaps it says something about your lack of knowledge of Java, or your lack of knowledge of how real world applications typically behave?

Reply Score: 2

baderman Member since:
2006-07-17

Or simply points out, which programmers are in majority here. It's an unfortunate flaw of democracy ;) .

Guys - relax ;)

Reply Score: 1

pantheraleo Member since:
2007-03-07

Or simply points out, which programmers are in majority here. It's an unfortunate flaw of democracy ;) .


Well, again. The server vs. client JVM mode makes a very big difference in performance. And, using a different algorithm, I can produce results where Java is 5x faster than C++. So the points I made were completely valid. Also, given that the test was intended specifically to test Scala, the fact that they didn't rest concurrency at all is kind of mind boggling to me.

Reply Score: 2

Savior Member since:
2006-09-02

Another issue with the the Java portion is they don't state whether they ran the JVM in client mode or server mode.

They used server mode, as should be obvious from the first paragraph of section V:

"The benchmark consists of a driver ... and performs loop recognition on it for 15.000 times. The purpose is to force Java/Scala compilation, so that
the benchmark itself can be measured on compiled code."


The client-mode VM compiles the code right away and then leaves it at that. It's the server mode that needs a few thousand executions to recompile and optimize critical parts.

Also, it's a rather theoretical benchmark that says next to nothing about real world performance.

While the benchmark is rather theoretical, the issues discovered are real and DO affect real world performance. Everybody uses collections, recursions, etc. Also, I would definitely consider mathematical computations part of the "real world".

Reply Score: 2

pantheraleo Member since:
2007-03-07

The client-mode VM compiles the code right away and then leaves it at that. It's the server mode that needs a few thousand executions to recompile and optimize critical parts.



Not true. Both the client and server mode JVM to JIT compilation. Also, it typically doesn't take a few thousand executions before JIT can be done.

While the benchmark is rather theoretical, the issues discovered are real and DO affect real world performance.


Some of the issues do, sure. But if I use a different test, I can make Java be 5 times faster than C++. For example, try doing a benchmark with infilife in Java vs. C++. The Java version is literally almost 5 times faster than the C++ version. The reason is because infilife is one of those tests where garbage collection provides a major advantage in that it optimizes memory allocation and deallocation. Sure, if you really want to spend the time on it, you could probably optimize the C++ version to be as fast as the Java version. But the code would be many times larger, and much more complex than the Java version.

The point is, using different code and different operations, I can very easily produce very different results. You can't take one sample like this and try to say that C++ is the fastest language.

Google even admitted in the paper that optimizing the C++ version was very difficult, and that most times those types of optimizations would not be done.

Reply Score: 3

GCC isn't all that great
by Alfman on Sat 11th Jun 2011 03:29 UTC
Alfman
Member since:
2011-01-28

I tend to agree that it's not usually worth optimizing code in assembly, however GCC's code is often less efficient than it should be.


We as programmers can do a better job than GCC at anticipating certain states and algorithmic conditions which give GCC trouble. Under these circumstances, it's not all that difficult to beat it.

I know ICC is generally better than GCC.

Reply Score: 4

RE: GCC isn't all that great
by Valhalla on Sat 11th Jun 2011 04:12 UTC in reply to "GCC isn't all that great"
Valhalla Member since:
2006-01-24

I tend to agree that it's not usually worth optimizing code in assembly, however GCC's code is often less efficient than it should be.

Examples?


We as programmers can do a better job than GCC at anticipating certain states and algorithmic conditions which give GCC trouble. Under these circumstances, it's not all that difficult to beat it.

Certainly we programmers can do a better job given that we understand the overall purpose of the code much better than any compiler, however I very much disagree with it 'not being all that difficult to beat it', you certainly need to be a very good assembly programmer to do so.

Expert programmers who are both well-versed in assembly and the things they are solving with it will certainly atleast equal and most often beat a compiler. Case in point would be encoders like x264 etc, they are full with hand-optimized assembly which easily beat the generated output of both GCC, ICC, LLVM etc when disabling the hand-optimized assembly and relying entirely on compiler optimization. But these guys are very good assembly programmers and experts at what they are trying to implement using it, this is hardly the norm amongst programmers. Hence very few programmers out there can actually beat the optimized output of compilers.

Reply Score: 5

RE[2]: GCC isn't all that great
by Alfman on Sat 11th Jun 2011 07:54 UTC in reply to "RE: GCC isn't all that great"
Alfman Member since:
2011-01-28

Valhalla,

"Examples?"


That's a fair question, but please understand I don't have very much time to allot to it.

Anyway, here's a trivial example.

void vector_add(unsigned int*a, unsigned int a_len, unsigned int*b, unsigned int b_len) {
unsigned int i;
for(i=0; i<a_len && i<b_len; i++) {
a[i] += b[i];
}
}


Here is the GCC output using -O3 and march=core2


vector_add:
push ebp
mov ebp, esp
push edi
mov ecx, DWORD PTR [ebp+8]
push esi
mov edi, DWORD PTR [ebp+20]
push ebx
mov esi, DWORD PTR [ebp+16]
mov ebx, DWORD PTR [ebp+12]
test ebx, ebx
je .L5
test edi, edi
je .L5
xor edx, edx
.L3:
mov eax, DWORD PTR [esi+edx*4]
add DWORD PTR [ecx+edx*4], eax
inc edx
cmp ebx, edx
jbe .L5
cmp edi, edx
ja .L3
.L5:
pop ebx
pop esi
pop edi
leave
ret


Now I'm not sure how many people read asm, but it's short and not difficult to follow.

GCC failed to vectorize the code with SSE2 instructions. This would be an obvious way to significantly optimize the example (both in terms of memory bandwidth and in terms of parallel computation).

Secondly, GCC failed to recognize that is isn't necessary to carry two compare operations into the loop. It should be using a cmov to calculate the minimum number of loops upfront, and then there would only be a single condition in the loop.

Thirdly, although there is no performance penalty in this case, GCC generally manipulates indexes instead of converting loops to use pointer arithmetic. Pointer arithmetic is far more efficient in the case when a shortage of general purpose registers forces variables onto the stack. I've seen GCC load both the base pointer and index from the stack and add them inside the loop at every iteration.

Fourthly, it didn't unroll any loops. Personally I will usually trade off in favour of instruction size anyway, but it's interesting because I specified -O3. I'm sure we could get GCC to unroll loops by manually setting the heuristics, but an assembly programmer would have yet another easy edge over the default compiler output.


"however I very much disagree with it 'not being all that difficult to beat it', you certainly need to be a very good assembly programmer to do so."

You may be assuming that GCC does a good job in the first place. It's usually 'good enough', but it isn't great.


"Hence very few programmers out there can actually beat the optimized output of compilers."

Maybe the example above will change your mind?

I don't have ICC on hand to compare, but it is said to be excellent at SSE2 and is probably also better at optimization all around.

Reply Score: 5

RE[3]: GCC isn't all that great
by acobar on Sat 11th Jun 2011 09:12 UTC in reply to "RE[2]: GCC isn't all that great"
acobar Member since:
2005-11-15

Well, I will repeat to you what I do to everyone I met and that would ask my advise (not your case) on programming: first look for a good, tested, implementation. Many people try to code fragments that are already optimized for ages. They don't look at Donald Knuth's ACP, fftw, lapack, atlas and many other well-thought code before they start and, as such, write subpar code.

For example, if you change your code to:

void vector_add(unsigned int*a, unsigned int a_len, unsigned int*b, unsigned int b_len)
{
unsigned int i;
for (i = a_len < b_len ? a_len : b_len ; i > 0 ; ) {
i--;
a[i] += b[i];
}
}

You get a pretty decent optimized code:

vector_add:
.LFB0:
.cfi_startproc
cmpl %esi, %ecx
cmovbe %ecx, %esi
testl %esi, %esi
je .L1
.p2align 4,,10
.p2align 3
.L4:
decl %esi
mov %esi, %eax
movl (%rdx,%rax,4), %ecx
addl %ecx, (%rdi,%rax,4)
testl %esi, %esi
jne .L4
.L1:
rep
ret
.cfi_endproc

As can be clearly seen, compilers are not black-magic wizards and, regretfully, depend on the code quality we give to them, and we should never forget that.

Reply Score: 4

RE[4]: GCC isn't all that great
by Alfman on Sat 11th Jun 2011 09:57 UTC in reply to "RE[3]: GCC isn't all that great"
Alfman Member since:
2011-01-28

acobar,

"Well, I will repeat to you what I do to everyone I met and that would ask my advise (not your case) on programming: first look for a good, tested, implementation. Many people try to code fragments that are already optimized for ages. They don't look at Donald Knuth's ACP, fftw, lapack, atlas and many other well-thought code before they start and, as such, write subpar code."


I actually don't encourage people to optimize things at such small levels or in assembly. I'm just disturbed that so many people are under the false impression that their compilers are so difficult to beat.


"You get a pretty decent optimized code:"

Yes, I am aware that we can coerce the compiler to generate better instructions by rewriting the C code.

(Although it still isn't as good for the other reasons already discussed).

It concerns me that GCC converts our C code to assembly so literally instead of generating more optimal paths.

Should we really encourage devs to adjust thier programming style to compensate for GCC optimizer deficiency?

Does this mean that C devs will have to understand CPU internals in order to have GCC produce optimal code?

I really don't think we should be making excuses for GCC here. There is nothing unreasonable about expecting the original code to be optimized by the compiler.


Edit: By the way I am guilty of rewriting C code at the micro level in order to help GCC optimizer along. But we shouldn't have to do this in the first place.

Edited 2011-06-11 10:08 UTC

Reply Score: 5

RE[5]: GCC isn't all that great
by acobar on Sat 11th Jun 2011 11:22 UTC in reply to "RE[4]: GCC isn't all that great"
acobar Member since:
2005-11-15

Perhaps, I should have said it on a more direct style: I do not encourage people to write optimized code, I tell them to look for good algorithms and rather to good libraries whenever they are available.

To me, actually, all this talk about the better language to code in is a bit misleading, we should clearly concentrate on algorithms. We don't know on what we will be coding 20 years from now be we know that good algorithms will probably survive, like they did on past.

Sincerely, for most of the applications it does not matter a dime if you spend 1 millisecond or 1 microsecond doing something, and there are a factor of 1000 involved. And for the ones that matter, there are a chance that it is already well known and that a wise compromise was devised.

I see people everyday praising languages that make them write less code, or type less, or write more comfortable (understandable) code, or in the ones they like the syntax more. And all those arguments are sensible.

I, particularly, like languages that have all tools I feel I may need, like GUI generators, automation by scripts, code templates (not necessarily like on C++), profilers (for the very rare occasions where they are needed), many of modern OO concepts and, of course, that may be linked against good, well known tested libraries.

To be even more direct: people should learn more before code, unlike what we so repeatedly see. This would save time on rewrites and latter tunes.

Edited 2011-06-11 11:38 UTC

Reply Score: 2

RE[6]: GCC isn't all that great
by ebasconp on Sat 11th Jun 2011 13:13 UTC in reply to "RE[5]: GCC isn't all that great"
ebasconp Member since:
2006-05-09

OT: I enjoyed this thread... Though I cannot contribute here, the only thing I can say is that I like OSnews because it encourages so technical discussions like this. Thank you guys!!!

Reply Score: 3

RE[6]: GCC isn't all that great
by Alfman on Sat 11th Jun 2011 17:30 UTC in reply to "RE[5]: GCC isn't all that great"
Alfman Member since:
2011-01-28

acobar,

"Perhaps, I should have said it on a more direct style: I do not encourage people to write optimized code, I tell them to look for good algorithms and rather to good libraries whenever they are available."

I'm not sure what this has to do with my original post, but we seem to be on the same track.

"To me, actually, all this talk about the better language to code in is a bit misleading, we should clearly concentrate on algorithms."

Ditto.

"Sincerely, for most of the applications it does not matter a dime if you spend 1 millisecond or 1 microsecond doing something, and there are a factor of 1000 involved."

Well I actually disagree here, but that's just me.

"And for the ones that matter, there are a chance that it is already well known and that a wise compromise was devised."

Yes, by far and large, it is very difficult to write truly original algorithms because it's extremely likely thousands of other developers have tried to solve the same problems before us. However just because someone else has already done it dosn't strictly mean that we need to copy and paste every single line of code either.

"I see people everyday praising languages that make them write less code...And all those arguments are sensible."

Yes they are.

Reply Score: 2

RE[7]: GCC isn't all that great
by acobar on Sat 11th Jun 2011 18:48 UTC in reply to "RE[6]: GCC isn't all that great"
acobar Member since:
2005-11-15

I guess I read too much on your arguments on:

I actually don't encourage people to optimize things at such small levels or in assembly. I'm just disturbed that so many people are under the false impression that their compilers are so difficult to beat.


I understand that what you gave was just a very simple example about code optimization and the common sense idea people have about compilers outsmart our flesh brains, what we both see as unconnected to reality. But we should not forget that they are coded by biological primates, though possibly very smart ones and build through on hours orders of magnitude greater than the time we have available to complete our work, they are still creations of human brains and, as such, will always have space to a little helping hand.

Anyway, I was just trying to emphasize a very common fault on our education on CS. People feel a push toward optimizing but, at same time lacks a proper knowledge about algorithms, computer architecture and code generation and how they are linked. Because of that, they frequently write subpar code that they should not. That is why I vehemently push them towards libraries.

Reply Score: 2

RE[5]: GCC isn't all that great
by vodoomoth on Tue 14th Jun 2011 08:38 UTC in reply to "RE[4]: GCC isn't all that great"
vodoomoth Member since:
2010-03-30

Are you saying that GCC should have turned the code you wrote into the code by @acobar? Or that it should
magically guess the intentions in some code section?

I don't see a compiler failing to do that qualify as a "deficiency".

Reply Score: 2

RE[6]: GCC isn't all that great
by Alfman on Tue 14th Jun 2011 09:36 UTC in reply to "RE[5]: GCC isn't all that great"
Alfman Member since:
2011-01-28

vodoomoth,

"Are you saying that GCC should have turned the code you wrote into the code by @acobar? Or that it should
magically guess the intentions in some code section?"

Not exactly, no.

It should have recognized that it would achieve the exact same logic semantics had it precomputed the number of iterations up front instead of testing array bounds on each iteration. This is a fairly common optimization for many use cases.

Reply Score: 2

RE[4]: GCC isn't all that great
by Neolander on Sat 11th Jun 2011 10:15 UTC in reply to "RE[3]: GCC isn't all that great"
Neolander Member since:
2010-03-08

I'd argue that the C/C++ developer shouldn't have to do black magic in order to optimize its code either.

Alfman's use of for loops is perfectly normal, easy to understand, and you'll find it in every book on C. One shouldn't have to sacrifice code clarity like you did in order to get better performance, especially on such trivial code.

Reply Score: 2

RE[5]: GCC isn't all that great
by acobar on Sat 11th Jun 2011 10:47 UTC in reply to "RE[4]: GCC isn't all that great"
acobar Member since:
2005-11-15

Sorry, but I fail to find where the code I showed is "less understandable". It is actually a very well known technique, i.e., reduce the number of unneeded comparisons.

Anyway, it was used only to point out one of the many gross assumptions people do about code efficiency, compiler optimizations and related stuff. We clearly should teach more about algorithms than about languages like we do today.

Reply Score: 2

RE[6]: GCC isn't all that great
by Neolander on Sat 11th Jun 2011 10:57 UTC in reply to "RE[5]: GCC isn't all that great"
Neolander Member since:
2010-03-08

Sorry, but I fail to find where the code I showed is "less understandable". It is actually a very well known technique, i.e., reduce the number of unneeded comparisons.

I was thinking about how you used trigraphs and put the loop's iterator inside of the loop's body instead of putting it within the for() itself.

Anyway, it was used only to point out one of the many gross assumptions people do about code efficiency, compiler optimizations and related stuff. We clearly should teach more about algorithms than about languages like we do today.

But then wouldn't there be a feeling among both students and their future employers that they have learned lots of pretty theory, but not how to actually do things in practice ?

Also, is there such a thing as a language-independent algorithm ? Different languages have very different ways of expressing things, and algorithmic course generally teach students about an imperative algorithmic language pretty close to the structure of Pascal and C.

Edited 2011-06-11 11:00 UTC

Reply Score: 1

RE[7]: GCC isn't all that great
by vodoomoth on Tue 14th Jun 2011 08:20 UTC in reply to "RE[6]: GCC isn't all that great"
vodoomoth Member since:
2010-03-30


But then wouldn't there be a feeling among both students and their future employers that they have learned lots of pretty theory, but not how to actually do things in practice ?

I hear this often in the professional world, as if theory was the natural antonym of "practice". But do you think that Oracle made the piles of money they've made through the years without using Codd's relational algebra? Instead of relying on "indexes", hints and DBAs to make SQL queries more efficient, it is sometimes faster to rewrite a query, which means you know about relational algebra, you understand that it is always better to reduce the number of tables in a join, you know when nested queries are a valuable contribution to the overall efficiency. In that specific real-life case I'm implicitly referring to, the query was rewritten by me and the response time that was above the 10-minute timeout fell to something like 30 seconds on average.

In practice, the goal is to query the database (or execute code on the CPU). Any elements that get me closer to that goal, be they elements of theory, are welcome.

My point is: knowing the theory behind things doesn't take a thing away from people. If I was to hire someone to fill a position, I'd rather take the guy who knows the theory. And, as a side note, the "professionals" I've worked with who disparaged "nice theory" not only have terrible justifications for their bad practices, but they are resistant to thinking outside the box. Err, "their" box.

I'm still waiting to see an example of theory ruining or impeding some work instead of making it better IN PRACTICE.

Reply Score: 2

RE[5]: GCC isn't all that great
by Carewolf on Mon 13th Jun 2011 10:25 UTC in reply to "RE[4]: GCC isn't all that great"
Carewolf Member since:
2005-09-08

If you are not a wizard you should not be programming C/C++. There are other languages that are much better suited for high-level programming. C/C++ are only superior languages in enabling more power to a powerful wizard, in unskilled hands they are at best teaching/testing tools.

Reply Score: 2

RE[6]: GCC isn't all that great
by Neolander on Mon 13th Jun 2011 11:06 UTC in reply to "RE[5]: GCC isn't all that great"
Neolander Member since:
2010-03-08

Yup, but you can do low-level programming (which makes the use of HLLs much harder, even though some projects like MS Singularity try to use them anyway) and still wish you didn't have to care about the internals of the compiler's optimizer and to do compiler-specific wizardry in order to optimize your code.

Edited 2011-06-13 11:08 UTC

Reply Score: 1

RE[4]: GCC isn't all that great
by Timmmm on Mon 13th Jun 2011 10:56 UTC in reply to "RE[3]: GCC isn't all that great"
Timmmm Member since:
2006-07-25

for (i = a_len < b_len ? a_len : b_len ; i > 0 ; ) {
i--;
a[i] += b[i];
}


I'm pretty sure this is a terrible idea because it will mess up all the cache lines and pre-fetching if you walk through memory backwards.

Reply Score: 2

RE[5]: GCC isn't all that great
by Megol on Tue 14th Jun 2011 14:26 UTC in reply to "RE[4]: GCC isn't all that great"
Megol Member since:
2011-04-11

"for (i = a_len < b_len ? a_len : b_len ; i > 0 ; ) {
i--;
a[i] += b[i];
}


I'm pretty sure this is a terrible idea because it will mess up all the cache lines and pre-fetching if you walk through memory backwards.
"

No. Prefetchers (in every processor I've touched) support negative strides and working backwards have no performance problems.

Reply Score: 1

RE[3]: GCC isn't all that great
by Valhalla on Sat 11th Jun 2011 20:21 UTC in reply to "RE[2]: GCC isn't all that great"
Valhalla Member since:
2006-01-24

Valhalla,

"Examples?"

Maybe GCC should have been able to optimize one of the len comparisons away, but really it's (deliberately?) stupid code imo and there will always be these cases where the compiler fails to grasp 'the bigger picture'.

Secondly it seems obvious that gcc doesn't choose to use vectorization since it has no idea of how many integers are to be added, and depending on that it could very well be less efficient to use vectorization (I seriously doubt ICC would do so either with this snippet), case in point I changed the 'len' variables to a constant (100) in your snippet which resulted in the following (note 64bit):

leaq 16(%rdx), %rax
cmpq %rax, %rdi
jbe .L11
.L7:
xorl %eax, %eax
.p2align 4,,10
.p2align 3
.L4:
movdqu (%rdx,%rax), %xmm1
movdqu (%rdi,%rax), %xmm0
paddd %xmm1, %xmm0
movdqu %xmm0, (%rdi,%rax)
addq $16, %rax
cmpq $400, %rax
jne .L4
rep
ret
.L11:
leaq 16(%rdi), %rax
cmpq %rax, %rdx
ja .L7
xorl %eax, %eax
.p2align 4,,10
.p2align 3
.L2:
movl (%rdx,%rax), %ecx
addl %ecx, (%rdi,%rax)
addq $4, %rax
cmpq $400, %rax
jne .L2
rep
ret

Setting the constant to 5 or above made GCC use vectorization for the loop.

Granted, calls to this function would not be using constants in a 'real program' but the compiler would likely have a much better chance of deciding if it should vectorize the loop or not.

As for loop unrolling, it is not turned on by -O3 due to being difficult to estimate it's efficiency without runtime data (it's turned on automatically when PGO is used), does any compiler turn this on by default? (GCC and Clang/LLVM doesn't).

Reply Score: 4

RE[4]: GCC isn't all that great
by Alfman on Sun 12th Jun 2011 02:11 UTC in reply to "RE[3]: GCC isn't all that great"
Alfman Member since:
2011-01-28

"Maybe GCC should have been able to optimize one of the len comparisons away, but really it's (deliberately?) stupid code imo and there will always be these cases where the compiler fails to grasp 'the bigger picture'."

Stupid code? Sure it was a trivial example, but that was deliberate. You'll have to take my word that GCC has the same shortcomings on more complex code from real programs.

Even if you want to blame the programmer here, a seasoned programmer will have no reasonable way of knowing if GCC has optimized the loop correctly without looking at the assembly output. Do we really want to go down the route of saying programmers need to check up on GCC's assembly output?

Should students be taught to avoid legal C constructs which give the GCC optimizer a hard time?


"Secondly it seems obvious that gcc doesn't choose to use vectorization since it has no idea of how many integers are to be added, and depending on that it could very well be less efficient to use vectorization"

I have no idea why GCC chose not to use SSE, but the result is still that the assembly language programmer would be able to beat it.

"(I seriously doubt ICC would do so either with this snippet)"

I wish someone could test this for us, Intel boasts very aggressive SSE optimization.

Reply Score: 3

RE[5]: GCC isn't all that great
by moondevil on Sun 12th Jun 2011 21:00 UTC in reply to "RE[4]: GCC isn't all that great"
moondevil Member since:
2005-07-08

You are forgetting something in your examples.

It used to be so that most humans could beat compiler generated code. In this day and age it is only true for small code snippets or simple processors.

Most up to date processors use out-of-order execution with superscalar processing units, and translate CISC instructions into microcode RISC like code. And this varies from processor model to processor model within the same family even!

It is very hard for most humans to still be able to keep all processor features on their head while coding assembly and still be able to beat the code generated from high performance compilers. Not GCC, but the ones you pay several thousand euros/dollars for, with years of research put into them.

Reply Score: 2

RE[6]: GCC isn't all that great
by Alfman on Mon 13th Jun 2011 01:35 UTC in reply to "RE[5]: GCC isn't all that great"
Alfman Member since:
2011-01-28

moondevil,

"You are forgetting something in your examples."

Fair enough, but what?

"It used to be so that most humans could beat compiler generated code. In this day and age it is only true for small code snippets or simple processors."

I was asked for an example, which I provided. Then I provided another example with division. In these examples, I'm not aware of any processor for which GCC generated optimal code, which is what I set out to demonstrate; Nothing more, nothing less.

"Most up to date processors use out-of-order execution with superscalar processing units, and translate CISC instructions into microcode RISC like code. And this varies from processor model to processor model within the same family even!"

I have some observations:

1. In practice x86 binary code is precompiled and shared among many models. If you have an Intel Core2 Q6600, you cannot simply purchase/download software specifically for that model.

2. I don't believe GCC even allows model-specific compilation (I doubt ICC does either, but I could be wrong). You can specify a processor family, but that's it.


"It is very hard for most humans to still be able to keep all processor features on their head while coding assembly and still be able to beat the code generated from high performance compilers. Not GCC, but the ones you pay several thousand euros/dollars for, with years of research put into them."

Well there you go, in conclusion it seems that you do agree with my argument that we can beat GCC? I'm not moving goalposts here, this is what I said from the beginning - it's even in the title of the thread.

Edited 2011-06-13 01:48 UTC

Reply Score: 2

RE[5]: GCC isn't all that great
by Valhalla on Mon 13th Jun 2011 06:02 UTC in reply to "RE[4]: GCC isn't all that great"
Valhalla Member since:
2006-01-24


Stupid code? Sure it was a trivial example, but that was deliberate.

Having two comparisons within the loop was obviously poor coding in this example, which you expected the compiler to fix for you.


Do we really want to go down the route of saying programmers need to check up on GCC's assembly output?

If you find that the performance is not what you'd expect out of the given code, you will profile and look at assembly output of the performance hotspots, doesn't matter if it's GCC, VC, ICC, Clang/LLVM.


Should students be taught to avoid legal C constructs which give the GCC optimizer a hard time?

Legal constructs does not equal efficient code. Compilers have never been able to turn shitty code into good code. If you know of one, please inform me, I'd buy it in a second.


I wish someone could test this for us, Intel boasts very aggressive SSE optimization.

I compiled your snippet with Clang 2.9, it didn't vectorize it either until I exchanged the len vars with constants just like in the case with GCC. Again I doubt ICC would do it either.

Again, this function within the context of a whole program would likely yield a different result (definately if PGO was used). For instance it would be interesting disecting the output of some of the micro-benchmarks over at language-shootout.

On a slightly off-topic note, anyone have any experience with the ekopath4 compiler suite? it appears that it is to be released as open source (gplv3) and judging by the performance benchmarks it appears to offer some gpgpu solution:

http://www.phoronix.com/scan.php?page=article&item=phoronix_dirndl_...

Reply Score: 2

RE[6]: GCC isn't all that great
by Alfman on Mon 13th Jun 2011 19:01 UTC in reply to "RE[5]: GCC isn't all that great"
Alfman Member since:
2011-01-28

"Having two comparisons within the loop was obviously poor coding in this example, which you expected the compiler to fix for you."

I think it surprised you that I was able to come up with an example, and now your grasping at straws... I won't hold you to your original statements, don't feel the need to defend them.


"If you find that the performance is not what you'd expect out of the given code, you will profile and look at assembly output of the performance hotspots, doesn't matter if it's GCC, VC, ICC, Clang/LLVM."

The difference is, there is no need to audit the compiler if it can be trusted to do a great job in the first place. The fact that we can reveal shortcomings by looking at GCC's asm dump implies that we are able to do better.

"Legal constructs does not equal efficient code. Compilers have never been able to turn shitty code into good code. If you know of one, please inform me, I'd buy it in a second."

Still more excuses. Why does GCC reorder and optimize some code paths but not others? The developer shouldn't have to mess with clean code just to make it perform better under GCC.


"I compiled your snippet with Clang 2.9, it didn't vectorize it either until I exchanged the len vars with constants just like in the case with GCC. Again I doubt ICC would do it either."

That's not really a sufficient answer.

Reply Score: 2

RE[7]: GCC isn't all that great
by Valhalla on Mon 13th Jun 2011 23:29 UTC in reply to "RE[6]: GCC isn't all that great"
Valhalla Member since:
2006-01-24


I think it surprised you that I was able to come up with an example,

Hardly, depending on the number of integers to add and the memory alignment of the integer data (both of which are unknown to the compiler in this example), vectorizing this loop may very well turn out slower afaik. There's a reason all the compilers support sse intrinsics, they're anything but general purpose registers. Both GCC and Clang/LLVM are considered strong compilers, neither of them vectorized this snippet. You claim this proves them 'not all that great', I say this 6-line example is anything but conclusive.

The difference is, there is no need to audit the compiler if it can be trusted to do a great job in the first place. The fact that we can reveal shortcomings by looking at GCC's asm dump implies that we are able to do better.

You are basically saying that you know that a vectorized loop will outperform a non-vectorized loop no matter what the data length and data alignment is?

Because that's what this example entails, the compiler knows nothing about the data length and the data alignment at compile-time. And not knowing this, the compilers (GCC and Clang) chose not to vectorize. When I gave it the data length both compilers vectorized the loop (as I showed in an earlier post).

Still more excuses. Why does GCC reorder and optimize some code paths but not others? The developer shouldn't have to mess with clean code just to make it perform better under GCC.

Because no compiler is perfect, and again clean code does not equal efficient code. There's a reason you don't start questioning the compiler optimizations until you've questioned the actual algorithm.

That's not really a sufficient answer.

It was a statement, neither GCC not Clang/LLVM vectorized your snippet, and like I said doubt ICC would either.

Despite our bantering (or perhaps because of it!), I have to say it's fun discussing technical stuff here on OSNews once in a while. So thanks alfman, acobar and others for participating in this (imho) interesting discussion ;)

btw I'm surpised f0dder hasn't weighed in, I seem to recall him from win32 assembly forums way back in the day (maybe my memory is playing tricks on me)!

Reply Score: 2

RE[5]: GCC isn't all that great
by f0dder on Mon 13th Jun 2011 16:42 UTC in reply to "RE[4]: GCC isn't all that great"
f0dder Member since:
2009-08-05

Do we really want to go down the route of saying programmers need to check up on GCC's assembly output?
When you need maximum performance for a piece of code, yes. When you don't, simply don't worry about it - not generating optimal code doesn't matter if the code isn't a hotspot.

Reply Score: 1

RE[3]: GCC isn't all that great
by burnttoys on Mon 13th Jun 2011 08:53 UTC in reply to "RE[2]: GCC isn't all that great"
burnttoys Member since:
2008-12-20

That code can not be vectorised as it is written as 'a' and 'b' may point to overlapping areas of memory.

The 'restrict' keyword is necessary.

Reply Score: 1

RE[4]: GCC isn't all that great
by Alfman on Mon 13th Jun 2011 19:12 UTC in reply to "RE[3]: GCC isn't all that great"
Alfman Member since:
2011-01-28

burnttoys,

"That code can not be vectorised as it is written as 'a' and 'b' may point to overlapping areas of memory.

The 'restrict' keyword is necessary."

Wow, thanks burnttoys, you are correct, some constructive criticism!

It should be using restrict just like 'memcpy' does.

Unfortunately I tried it and it didn't change the output at all. Additionally I see a lot of devs claiming that the restrict keyword doesn't have any effect under GCC, can you confirm that?

Reply Score: 2

RE[5]: GCC isn't all that great
by burnttoys on Mon 13th Jun 2011 20:18 UTC in reply to "RE[4]: GCC isn't all that great"
burnttoys Member since:
2008-12-20

I couldn't tell you more about GCC vectorisation in any great detail.

I have seen restrict et al work (that is generate vector code) on ARM for the NEON instruction set but right now I can't remember if that was RVCT or GCC. If ARM is your thing it might be worth checking the linaro.org builds of GCC. For x86 - I'm afraid I've no idea. I just hit "build" in Qt Creator as none of my x86 code is really that performance critical anymore!

Reply Score: 1

RE[6]: GCC isn't all that great
by Alfman on Tue 14th Jun 2011 07:26 UTC in reply to "RE[5]: GCC isn't all that great"
Alfman Member since:
2011-01-28

burnttoys,

There seem to have been a small army of devs reporting issues with 'restrict' since 2002, which have been marked as dups of each other. They ultimately point to this bug which indicates a fix in GCC 4.5.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14187

Reply Score: 2

RE[3]: GCC isn't all that great
by Carewolf on Mon 13th Jun 2011 09:45 UTC in reply to "RE[2]: GCC isn't all that great"
Carewolf Member since:
2005-09-08

After identifying that gcc didn't perform these optimization have you considered trying gcc with the options to enable them? -funroll-loops -ftree-vectorize.

It is not fun blaming the compiler for not doing optimizations it hasn't been asked to do. I know the default optimizations suck, but that is a well known problem with gcc.

The auto-vectorizer isn't very good with integers, but give it a try.

Edited 2011-06-13 09:46 UTC

Reply Score: 2

RE[4]: GCC isn't all that great
by Neolander on Mon 13th Jun 2011 09:56 UTC in reply to "RE[3]: GCC isn't all that great"
Neolander Member since:
2010-03-08

Aren't those supposed to be automatically enabled by O2 or O3 ?

Reply Score: 1

RE[5]: GCC isn't all that great
by Carewolf on Mon 13th Jun 2011 10:30 UTC in reply to "RE[4]: GCC isn't all that great"
Carewolf Member since:
2005-09-08

Depends on the gcc-version.

I just double-checked: According to info:gcc the most recent version (4.6) has enabled -ftree-vectorize on -O3, but still not -funroll-loops.

Unrolling is only default enabled if you compile with profiling-data that helps the compiler to unroll the correct loops.

Reply Score: 2

RE[6]: GCC isn't all that great
by Neolander on Mon 13th Jun 2011 11:12 UTC in reply to "RE[5]: GCC isn't all that great"
Neolander Member since:
2010-03-08

Thanks for the clarification.

I think I can understand the rationale behind that decision : loop unrolling sounds like a rather costly optimization if mindlessly applied to all loops without PGO coming into play. But in this case, where PGO is not used, it indeed reduces the performance of the GCC binary.

Reply Score: 1

RE[4]: GCC isn't all that great
by Alfman on Mon 13th Jun 2011 19:51 UTC in reply to "RE[3]: GCC isn't all that great"
Alfman Member since:
2011-01-28

Carewolf,

Thanks for the feedback.

For me GCC does use SSE for me with and without '-ftree-vectorize' (when I tweak the C source code).

I couldn't get GCC to do vector math without changing the source file to calculate the number of loops for GCC. In a case as simple as this, the compiler should have been able to handle it.

I know Valhalla is complaining about this specific example (not sure why?), but I do frequently come across issues like this in much more complex code where GCC misses an equally trivial optimization.

Reply Score: 2

RE[5]: GCC isn't all that great
by Carewolf on Tue 14th Jun 2011 09:32 UTC in reply to "RE[4]: GCC isn't all that great"
Carewolf Member since:
2005-09-08

I also rarely see GCC vectorizing integer loops (it seems to choke on signed vs unsigned), GCC does slightly better with floating-point loops, and it helps if you allow it to break some strict math rules.

As I noted in another comment the newest version of gcc now has -ftree-vectorize in -O3, so I was not fully up to date in my first reply.

If you compile to AMD64, SSE is automatically used for all math (not vectorized, one value at a time). You can also use SSE for math in IA32 using -mfpmath=sse. SSE math is generally much faster but removes the 80bit temporaries quirk 487 math has.

Reply Score: 2

RE[6]: GCC isn't all that great
by Alfman on Tue 14th Jun 2011 09:59 UTC in reply to "RE[5]: GCC isn't all that great"
Alfman Member since:
2011-01-28

Carewolf,

"I also rarely see GCC vectorizing integer loops (it seems to choke on signed vs unsigned), GCC does slightly better with floating-point loops, and it helps if you allow it to break some strict math rules."

Really? I'd be surprised if signed vs unsigned was the reason it chokes. In two's complement, the calculation is identical regardless of sign.

0x43 + 0xfe (this is -2) = 0x41 (+carry flag)
0x43 - 2 = 0x41

0xf0 (this is -0x10) + 0xfd (this is -0x3) = 0xed (this is -0x13) (+carry flag)

0xfe (this is -2) * 0xfb (this is -5) = 0x0a (0xf9 carry)

In other words, GCC doesn't even have to care whether a variable is signed or unsigned in order to do "+ - *".

unsigned int x=-2;
unsigned int y=-10;
unsigned int z=49;
printf("%d\n", x*y*z); // give us 980

GCC merely ignores the carry (this is another one of my optimization gripes in fact, assembly programmers can use the carry flag, C programmers have to do computations using the next larger word size, sometimes wasting a register).


"If you compile to AMD64, SSE is automatically used for all math (not vectorized, one value at a time). You can also use SSE for math in IA32 using -mfpmath=sse. SSE math is generally much faster but removes the 80bit temporaries quirk 487 math has."

Yes, it's good to get away from quirky FP design.


Edit:

This C/assembly debate seems to come up all the time, the next time it does, I might just sit it out. It takes so much time to make the case, and at the end it's totally inconsequential.

Edited 2011-06-14 10:15 UTC

Reply Score: 2

RE[2]: GCC isn't all that great
by f0dder on Sat 11th Jun 2011 18:22 UTC in reply to "RE: GCC isn't all that great"
f0dder Member since:
2009-08-05

...not to mention that they're dealing with pretty darn specific algorithms that can take advantage of SSE and benefit from hand-rolled register allocation.

Not so much with your typical run of the mill software ;)

Reply Score: 1

RE[2]: GCC isn't all that great
by _xmv on Sun 12th Jun 2011 11:57 UTC in reply to "RE: GCC isn't all that great"
_xmv Member since:
2008-12-09

this is hardly the norm amongst programmers. Hence very few programmers out there can actually beat the optimized output of compilers.

i think you're confusing "programers" and "i fired up a shell and a compiler and i think i can program stuff"

Seriously, in fact.

Sure few will code in asm for the hell of it (except MenuetOS people ;-), but any who's actually a real programmer will insert asm wherever he needs it properly.
Coding everything asm is just too time consuming and can be major hair pulling due to the size of the projects without a strong incentive to manage things (like menuet does)

Reply Score: 2

RE[3]: GCC isn't all that great
by Alfman on Sun 12th Jun 2011 17:46 UTC in reply to "RE[2]: GCC isn't all that great"
Alfman Member since:
2011-01-28

_xmv,

"Sure few will code in asm for the hell of it (except MenuetOS people ;-), but any who's actually a real programmer will insert asm wherever he needs it properly.
Coding everything asm is just too time consuming and can be major hair pulling due to the size of the projects without a strong incentive to manage things (like menuet does)"

GCC could improve in the future, but until then, the common perceptions don't do justice to the skill of assembly level programmers.

I still think that for most programmers/projects, there are very strong reasons not to going down the ASM route regardless of asm skill.

It's usually too much of a maintenance burden, and I don't have the time or hardware to test all the hardware combinations that GCC/C can support.

There are some times when I'm not left with much choice though.

In the last performance critical project I worked on I needed to divide a 64bit number by a 32bit number such that the result would always fit in a 32bit register. Obviously this is exactly what the x86 DIV instruction does since the 386.

Unfortunately, GCC (on x86) cannot optimize this case and instead emulates a full 64bit division (numerator, divisor, dividend) in a hidden call to "__divdi3", which has terrible performance.

GCC does what's required of the C spec (full 64bit division, and then demoting to 32bit), but it doesn't bother trying to optimize the case.

Conversely, GCC does optimize the case where two 32bit variables are multiplied into a 64bit variable:

uint32_t a,b;
uint64_t ab;
...
ab = (uint64_t)a*(uint64_t)b;

GCC does not perform a 64bit multiplication above, it calls the x86 MUL instruction, which multiplies 32bit numbers and returns a 64bit number.


Fun times!

Reply Score: 2

Suprising from Google
by voidlogic on Sat 11th Jun 2011 04:44 UTC
voidlogic
Member since:
2005-09-03

Perhaps I expect too much. But after reading the paper- It makes me wonder what kind of peer review process Google has. Not that this paper does not have potential, but considering the obvious flaws discussed here and else where, and the general quality of the writing I feel like I am reading a draft of this paper.

I can't imagine many advisors in a Academic setting signing off on this paper as is- I would imagine a faculty advisor would have a lot of constructive criticism and would help the author create a great paper. This paper is not horrible, but it is just surprising considering the source- Google.

P.S. Those who are interested in this topic might also enjoy the following online database: http://shootout.alioth.debian.org/

Reply Score: 4

No C?
by Magma on Sat 11th Jun 2011 04:50 UTC
Magma
Member since:
2006-03-07

Can Google really be serious about a performance benchmark and not include C? I'm not implying that they need to use it to build gmail but it's a pretty common baseline.

C++ has a lot of issues in a big project that make any micro analysis moot. RE: Boost? STL? Did the framework you're using pay attention to performance... everywhere? How about the heap vs. stack issue that overtakes bigger projects.

Who cares about a loop? Yup. It's speedy, that's great until run into this guy:

"Umm, all my functions are like virtual and stuff, ya know, like, in case I want to like inherit somethin'."

Brilliant... Good feelings gone.

Scala, Java, and.. Go... why not PHP or Python while you're at it? Odd.

Reply Score: 1

RE: No C?
by edgeofsanity on Sat 11th Jun 2011 09:54 UTC in reply to "No C?"
edgeofsanity Member since:
2011-06-10

I agree, C++ looks accident compared to new languages. Reflection, for example, is a great way to implement runtime evolable systems. I have seen software systems, where people tried to build reflection by hacking g++. What an overhead...

Besides that its two-level compilation scheme and user extensible syntax really complicates things like syntax highlighted editors and aspect oriented development. Aspects are proven to reduce development for certain tasks. And we know what an editor like eclipse can do with development time.

Besides this, as an academician I can say this paper is direct reject. But certainly, it is good to see companies coming to conferences. We will learn something from them, and maybe they will take home our ideas and implement something usable.

However, if my PhD brought this paper to me (a asst. prof of software engineering) or to my prof, our reply would be rewrite and think more. In software engineering, you have other problems than simple performance. Maintainability, evolvability , reliability... The language of choice depends on most of these requirements. Oww and also, on some occasions multi-language (multi-paradigm) programing is really helpful. Some problems simply don't fit into OO.

Reply Score: 0

RE[2]: No C?
by bouhko on Sat 11th Jun 2011 10:10 UTC in reply to "RE: No C?"
bouhko Member since:
2010-06-24

Well, Google might be one of the few company where having a one percent performance increase might lead to multi-million savings. That's due to the scale of their product.

I'm not really convinced about reflection in large-scale systems. It usually seems like a good idea at first, but when you actually come back some year later and try to understand what's happening, it just adds another layer of complexity.

Almost all the other components of software engineering you mention - Maintainability, reliability - are fully achievable with C++ if you have competent programmers and some guidelines (like avoid pointer arithmetic and all the horrible things you can do). It might take a bit more work than with Java, but it's certainly doable and once again, if you're at Google-scale, taking 20% more development time to develop something 2% faster might be worth it.

Reply Score: 3

RE[2]: No C?
by ebasconp on Mon 13th Jun 2011 20:25 UTC in reply to "RE: No C?"
ebasconp Member since:
2006-05-09

Besides that its two-level compilation scheme and user extensible syntax really complicates things like syntax highlighted editors and aspect oriented development.


What two-level compilation scheme are you talking about?

In software engineering, you have other problems than simple performance. Maintainability, evolvability , reliability... The language of choice depends on most of these requirements. Oww and also, on some occasions multi-language (multi-paradigm) programing is really helpful. Some problems simply don't fit into OO.


Maintainability, reliability and all that stuff and out of topic for the article's document: It just talks about performance and, though C++ is not as clean or simple as Java, it is (together with C) dominating the industrial software development.

Reply Score: 2

RE[3]: No C?
by Alfman on Mon 13th Jun 2011 20:46 UTC in reply to "RE[2]: No C?"
Alfman Member since:
2011-01-28

ebasconp,

<offtopic>

"What two-level compilation scheme are you talking about?"

I think he's referring to include files and the C preprocessor. If so, I agree with him that it's a regrettable mess.


"though C++ is not as clean or simple as Java, it is (together with C) dominating the industrial software development."


C/C++ are often the defacto languages of choice for embedded and systems development.

I like the C logic syntax alot (more than ada/pascal languages). But both C and C++ have usability issues which would have been handled differently if they were designed today. I'm not sure if any new language can get enough momentum to take over from C in systems development?

</offtopic>

Reply Score: 2

Not Suprised
by computrius on Sat 11th Jun 2011 16:22 UTC
computrius
Member since:
2006-03-26

Is anyone really suprised by this? You have a language whose executable is native machine code vs languages that are run by an interpreter of byte code. Im not sure benchmarks were needed to say that c++ is going to produce a faster executing program.

Reply Score: 2

RE: Not Suprised
by Magma on Sat 11th Jun 2011 20:32 UTC in reply to "Not Suprised"
Magma Member since:
2006-03-07

Whaaa? are you trying to imply that Java is not compiled or as fast as C++?? OMG Heretic! Heretic! Heretic! Seize the infidel!

Reply Score: 1

Comment by StaubSaugerNZ
by StaubSaugerNZ on Sat 11th Jun 2011 20:19 UTC
StaubSaugerNZ
Member since:
2007-07-13

In *general* the fastest language is (yukk) FORTRAN. This is becasue F77 lacks pointer aliasing that hamper optimization for C and C++ compilers.

But who wants to develop in FORTRAN? or C++ for that matter?

James Gosling reports that the French scientific computing institute INRIA already evaluated Java for High-Performance Computing (HPC) in 2008 and found it faster than C (which is generally faster than C++):
http://blogs.oracle.com/jag/entry/current_state_of_java_for

They found the principle limitation for HPC was the Java networking library.

This study was in 2008 and the JVM has not gotten slower in that time.

Reply Score: 3

RE: Comment by StaubSaugerNZ
by Magma on Sat 11th Jun 2011 20:35 UTC in reply to "Comment by StaubSaugerNZ"
Magma Member since:
2006-03-07

Turbo Pascal was pretty fast. Never really caught on, except for Delphi and a few other niche languages.

I think that Free Pascal typically ranks very high high in performance benchmarks.

Reply Score: 1

RE[2]: Comment by StaubSaugerNZ
by galvanash on Sun 12th Jun 2011 08:17 UTC in reply to "RE: Comment by StaubSaugerNZ"
galvanash Member since:
2006-01-25

Turbo Pascal was pretty fast. Never really caught on, except for Delphi and a few other niche languages.


I programmed in Delphi for a few years back when it was still fairly popular. Borland's Pascal compilers were very fast as far as compilation speed goes... Compared to pretty much any C/C++ compiler it was at least an order or magnitude faster (or more even - mostly due to it's single pass nature).

However, it did very little as far a optimizing the performance of the resulting executable (also at least partially due to it being a single pass compiler). It was fairly dumb to be honest. It did default to using registers for parameter passing and avoided building a stack when it could, but other than that it didn't do much of anything interesting.

I don't know much about free pascal, but I see it at least has SSE support, something that Delphi never had back in the day (it was purely x87 unless you did SSE explicitly using 3rd party libs). I would be quite surprised to see a single pass pascal compiler doing the kind of trans-formative optimization that you get from GCC or other modern compilers. There are many types of optimization that are simply not possible without multiple passes.

Reply Score: 2

RE[2]: Comment by StaubSaugerNZ
by f0dder on Mon 13th Jun 2011 16:53 UTC in reply to "RE: Comment by StaubSaugerNZ"
f0dder Member since:
2009-08-05

Turbo Pascal, fast? Heh.

As noted belov, the compiler was fast, especially in the TP6 days - compiling directly to RAM was a nice productivity boost on the PIO harddrives back then.

But the compiler was naïve, and with the slow CPUs back then you often needed to resort to inline BASM for even simple stuff... anybody remember Sally's Peephole Optimizer? And anybody else around who rewrote (parts of) the CRT?

Things didn't improve much with Delphi either, except CPUs had become fast enough that it wasn't so relevant anymore.

Reply Score: 1

I'd like to see them benchmark D
by obsidian on Mon 13th Jun 2011 02:36 UTC
obsidian
Member since:
2007-05-12

Love to see them benchmark the D programming language.
I've been following D for a while and am rapidly falling in love with its syntax. Similar in many ways to C++, but generally has a *much* cleaner design.

C++ has a lot of "private this, friend that" noise.
The syntax in general is quite "noisy" and busy.

I've very rarely seen that kind of "syntax noise" in D. In D, you can throw templates around as much as you like, and it never seems to get as messy as C++.

Ok, D has its warts (the main couple being the separate Tango and Phobos libraries, and the "three char types, three string types" thing. However, the former is apparently being solved in D 2.0, and the latter is only a minor thing IMO.

Edited 2011-06-13 02:44 UTC

Reply Score: 2

So what do we have?
by Dasher42 on Mon 13th Jun 2011 09:02 UTC
Dasher42
Member since:
2007-04-05

What I like about C++: that you can use it like a slightly improved C, or use RTTI, STL templates, garbage collection, virtual function lookups, but the tradeoffs aren't there unless you choose to use them. On the other hand, the carpal-tunnel-inducing syntax has led me to use Python for most of my work.

Now, the closest projects to offering the speed and ease of programming of both, in my mind, are PyPy with its compiler, and Genie, the alternative preprocessor for the Vala project.

http://codespeak.net/pypy/dist/pypy/doc/faq.html
http://live.gnome.org/Genie

Reply Score: 1