Linked by ebasconp on Fri 10th Jun 2011 22:22 UTC
Benchmarks "Google has released a research paper that suggests C++ is the best-performing programming language in the market. The internet giant implemented a compact algorithm in four languages - C++, Java, Scala and its own programming language Go - and then benchmarked results to find 'factors of difference'."
Thread beginning with comment 476898
To read all comments associated with this story, please click here.
GCC isn't all that great
by Alfman on Sat 11th Jun 2011 03:29 UTC
Alfman
Member since:
2011-01-28

I tend to agree that it's not usually worth optimizing code in assembly, however GCC's code is often less efficient than it should be.


We as programmers can do a better job than GCC at anticipating certain states and algorithmic conditions which give GCC trouble. Under these circumstances, it's not all that difficult to beat it.

I know ICC is generally better than GCC.

Reply Score: 4

RE: GCC isn't all that great
by Valhalla on Sat 11th Jun 2011 04:12 in reply to "GCC isn't all that great"
Valhalla Member since:
2006-01-24

I tend to agree that it's not usually worth optimizing code in assembly, however GCC's code is often less efficient than it should be.

Examples?


We as programmers can do a better job than GCC at anticipating certain states and algorithmic conditions which give GCC trouble. Under these circumstances, it's not all that difficult to beat it.

Certainly we programmers can do a better job given that we understand the overall purpose of the code much better than any compiler, however I very much disagree with it 'not being all that difficult to beat it', you certainly need to be a very good assembly programmer to do so.

Expert programmers who are both well-versed in assembly and the things they are solving with it will certainly atleast equal and most often beat a compiler. Case in point would be encoders like x264 etc, they are full with hand-optimized assembly which easily beat the generated output of both GCC, ICC, LLVM etc when disabling the hand-optimized assembly and relying entirely on compiler optimization. But these guys are very good assembly programmers and experts at what they are trying to implement using it, this is hardly the norm amongst programmers. Hence very few programmers out there can actually beat the optimized output of compilers.

Reply Parent Score: 5

RE[2]: GCC isn't all that great
by Alfman on Sat 11th Jun 2011 07:54 in reply to "RE: GCC isn't all that great"
Alfman Member since:
2011-01-28

Valhalla,

"Examples?"


That's a fair question, but please understand I don't have very much time to allot to it.

Anyway, here's a trivial example.

void vector_add(unsigned int*a, unsigned int a_len, unsigned int*b, unsigned int b_len) {
unsigned int i;
for(i=0; i<a_len && i<b_len; i++) {
a[i] += b[i];
}
}


Here is the GCC output using -O3 and march=core2


vector_add:
push ebp
mov ebp, esp
push edi
mov ecx, DWORD PTR [ebp+8]
push esi
mov edi, DWORD PTR [ebp+20]
push ebx
mov esi, DWORD PTR [ebp+16]
mov ebx, DWORD PTR [ebp+12]
test ebx, ebx
je .L5
test edi, edi
je .L5
xor edx, edx
.L3:
mov eax, DWORD PTR [esi+edx*4]
add DWORD PTR [ecx+edx*4], eax
inc edx
cmp ebx, edx
jbe .L5
cmp edi, edx
ja .L3
.L5:
pop ebx
pop esi
pop edi
leave
ret


Now I'm not sure how many people read asm, but it's short and not difficult to follow.

GCC failed to vectorize the code with SSE2 instructions. This would be an obvious way to significantly optimize the example (both in terms of memory bandwidth and in terms of parallel computation).

Secondly, GCC failed to recognize that is isn't necessary to carry two compare operations into the loop. It should be using a cmov to calculate the minimum number of loops upfront, and then there would only be a single condition in the loop.

Thirdly, although there is no performance penalty in this case, GCC generally manipulates indexes instead of converting loops to use pointer arithmetic. Pointer arithmetic is far more efficient in the case when a shortage of general purpose registers forces variables onto the stack. I've seen GCC load both the base pointer and index from the stack and add them inside the loop at every iteration.

Fourthly, it didn't unroll any loops. Personally I will usually trade off in favour of instruction size anyway, but it's interesting because I specified -O3. I'm sure we could get GCC to unroll loops by manually setting the heuristics, but an assembly programmer would have yet another easy edge over the default compiler output.


"however I very much disagree with it 'not being all that difficult to beat it', you certainly need to be a very good assembly programmer to do so."

You may be assuming that GCC does a good job in the first place. It's usually 'good enough', but it isn't great.


"Hence very few programmers out there can actually beat the optimized output of compilers."

Maybe the example above will change your mind?

I don't have ICC on hand to compare, but it is said to be excellent at SSE2 and is probably also better at optimization all around.

Reply Parent Score: 5

RE[2]: GCC isn't all that great
by f0dder on Sat 11th Jun 2011 18:22 in reply to "RE: GCC isn't all that great"
f0dder Member since:
2009-08-05

...not to mention that they're dealing with pretty darn specific algorithms that can take advantage of SSE and benefit from hand-rolled register allocation.

Not so much with your typical run of the mill software ;)

Reply Parent Score: 1

RE[2]: GCC isn't all that great
by _xmv on Sun 12th Jun 2011 11:57 in reply to "RE: GCC isn't all that great"
_xmv Member since:
2008-12-09

this is hardly the norm amongst programmers. Hence very few programmers out there can actually beat the optimized output of compilers.

i think you're confusing "programers" and "i fired up a shell and a compiler and i think i can program stuff"

Seriously, in fact.

Sure few will code in asm for the hell of it (except MenuetOS people ;-), but any who's actually a real programmer will insert asm wherever he needs it properly.
Coding everything asm is just too time consuming and can be major hair pulling due to the size of the projects without a strong incentive to manage things (like menuet does)

Reply Parent Score: 2