Linked by ebasconp on Fri 10th Jun 2011 22:22 UTC
Benchmarks "Google has released a research paper that suggests C++ is the best-performing programming language in the market. The internet giant implemented a compact algorithm in four languages - C++, Java, Scala and its own programming language Go - and then benchmarked results to find 'factors of difference'."
Thread beginning with comment 477093
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[6]: GCC isn't all that great
by Alfman on Mon 13th Jun 2011 19:01 UTC in reply to "RE[5]: GCC isn't all that great"
Alfman
Member since:
2011-01-28

"Having two comparisons within the loop was obviously poor coding in this example, which you expected the compiler to fix for you."

I think it surprised you that I was able to come up with an example, and now your grasping at straws... I won't hold you to your original statements, don't feel the need to defend them.


"If you find that the performance is not what you'd expect out of the given code, you will profile and look at assembly output of the performance hotspots, doesn't matter if it's GCC, VC, ICC, Clang/LLVM."

The difference is, there is no need to audit the compiler if it can be trusted to do a great job in the first place. The fact that we can reveal shortcomings by looking at GCC's asm dump implies that we are able to do better.

"Legal constructs does not equal efficient code. Compilers have never been able to turn shitty code into good code. If you know of one, please inform me, I'd buy it in a second."

Still more excuses. Why does GCC reorder and optimize some code paths but not others? The developer shouldn't have to mess with clean code just to make it perform better under GCC.


"I compiled your snippet with Clang 2.9, it didn't vectorize it either until I exchanged the len vars with constants just like in the case with GCC. Again I doubt ICC would do it either."

That's not really a sufficient answer.

Reply Parent Score: 2

Valhalla Member since:
2006-01-24


I think it surprised you that I was able to come up with an example,

Hardly, depending on the number of integers to add and the memory alignment of the integer data (both of which are unknown to the compiler in this example), vectorizing this loop may very well turn out slower afaik. There's a reason all the compilers support sse intrinsics, they're anything but general purpose registers. Both GCC and Clang/LLVM are considered strong compilers, neither of them vectorized this snippet. You claim this proves them 'not all that great', I say this 6-line example is anything but conclusive.

The difference is, there is no need to audit the compiler if it can be trusted to do a great job in the first place. The fact that we can reveal shortcomings by looking at GCC's asm dump implies that we are able to do better.

You are basically saying that you know that a vectorized loop will outperform a non-vectorized loop no matter what the data length and data alignment is?

Because that's what this example entails, the compiler knows nothing about the data length and the data alignment at compile-time. And not knowing this, the compilers (GCC and Clang) chose not to vectorize. When I gave it the data length both compilers vectorized the loop (as I showed in an earlier post).

Still more excuses. Why does GCC reorder and optimize some code paths but not others? The developer shouldn't have to mess with clean code just to make it perform better under GCC.

Because no compiler is perfect, and again clean code does not equal efficient code. There's a reason you don't start questioning the compiler optimizations until you've questioned the actual algorithm.

That's not really a sufficient answer.

It was a statement, neither GCC not Clang/LLVM vectorized your snippet, and like I said doubt ICC would either.

Despite our bantering (or perhaps because of it!), I have to say it's fun discussing technical stuff here on OSNews once in a while. So thanks alfman, acobar and others for participating in this (imho) interesting discussion ;)

btw I'm surpised f0dder hasn't weighed in, I seem to recall him from win32 assembly forums way back in the day (maybe my memory is playing tricks on me)!

Reply Parent Score: 2

Alfman Member since:
2011-01-28

Valhalla,


"There's a reason all the compilers support sse intrinsics, they're anything but general purpose registers."

In theory, one could create intrinsics for every single assembly opcode available and then claim that it is the developer's fault that they don't get used. However C is used by devs who don't want to program at the opcode level.

If you can get away with using intrinsics instead of inline assembly, then sure, go ahead. But they are not very portable between compilers nor architectures.

And not every opcode we want to optimize has intrinsics. You haven't addressed the division example, I'd be very grateful if you could find a way to optimize 64bit / 32bit -> 32bit without using assembly.


"Both GCC and Clang/LLVM are considered strong compilers, neither of them vectorized this snippet."

Well if you say so, GCC usually doesn't score highly on benchmarks.


"Hardly, depending on the number of integers to add and the memory alignment of the integer data (both of which are unknown to the compiler in this example)"

Even so, GCC did a bad job.

Change the example so that the function only accepts one length, and GCC produces the SSE code. So it's clear that an unknown length was not the factor.


"Because that's what this example entails, the compiler knows nothing about the data length and the data alignment at compile-time. And not knowing this, the compilers (GCC and Clang) chose not to vectorize. When I gave it the data length both compilers vectorized the loop (as I showed in an earlier post)."

This is not strictly true, GCC can tell exactly how long the arrays are by looking at the rest of the program, it simply chooses not to optimize that way.

But this brings up another good point, which speaks to my first post (repeated here) "We as programmers can do a better job than GCC at anticipating certain states and algorithmic conditions which give GCC trouble."

There may be times when the developer knows things which we cannot reasonably expect the compiler to derive, nor does the C language provide the means for us to tell it. The result of this uncertainly is poorer optimization.

"It was a statement, neither GCC not Clang/LLVM vectorized your snippet, and like I said doubt ICC would either."

We'll have to leave it to the unknown.



Your view is too extreme for my liking. What if an optimizer does not eliminate loop invariants because the developer's code calculated them each time? We could blame the "shitty code" instead of the compiler there too. In fact any time the compiler failed to optimize but did a literal translation of logic, we could argue the developer is to blame, right?

If that is not your view, then what is the criteria for non-optimizations which are the developer's fault versus those which are the compiler's fault?

(and I won't accept this answer: if GCC doesn't handle it, then it's the developer's fault)


"Despite our bantering (or perhaps because of it!), I have to say it's fun discussing technical stuff here on OSNews once in a while."

I much prefer technical stuff to gadgetry hype, but maybe I'm just jealous that I cannot afford a lifestyle where many gadgets come into play.

Reply Parent Score: 2