Linked by ebasconp on Fri 10th Jun 2011 22:22 UTC
Benchmarks "Google has released a research paper that suggests C++ is the best-performing programming language in the market. The internet giant implemented a compact algorithm in four languages - C++, Java, Scala and its own programming language Go - and then benchmarked results to find 'factors of difference'."
Permalink for comment 476998
To read all comments associated with this story, please click here.
RE[3]: GCC isn't all that great
by Valhalla on Sat 11th Jun 2011 20:21 UTC in reply to "RE[2]: GCC isn't all that great"
Valhalla
Member since:
2006-01-24

Valhalla,

"Examples?"

Maybe GCC should have been able to optimize one of the len comparisons away, but really it's (deliberately?) stupid code imo and there will always be these cases where the compiler fails to grasp 'the bigger picture'.

Secondly it seems obvious that gcc doesn't choose to use vectorization since it has no idea of how many integers are to be added, and depending on that it could very well be less efficient to use vectorization (I seriously doubt ICC would do so either with this snippet), case in point I changed the 'len' variables to a constant (100) in your snippet which resulted in the following (note 64bit):

leaq 16(%rdx), %rax
cmpq %rax, %rdi
jbe .L11
.L7:
xorl %eax, %eax
.p2align 4,,10
.p2align 3
.L4:
movdqu (%rdx,%rax), %xmm1
movdqu (%rdi,%rax), %xmm0
paddd %xmm1, %xmm0
movdqu %xmm0, (%rdi,%rax)
addq $16, %rax
cmpq $400, %rax
jne .L4
rep
ret
.L11:
leaq 16(%rdi), %rax
cmpq %rax, %rdx
ja .L7
xorl %eax, %eax
.p2align 4,,10
.p2align 3
.L2:
movl (%rdx,%rax), %ecx
addl %ecx, (%rdi,%rax)
addq $4, %rax
cmpq $400, %rax
jne .L2
rep
ret

Setting the constant to 5 or above made GCC use vectorization for the loop.

Granted, calls to this function would not be using constants in a 'real program' but the compiler would likely have a much better chance of deciding if it should vectorize the loop or not.

As for loop unrolling, it is not turned on by -O3 due to being difficult to estimate it's efficiency without runtime data (it's turned on automatically when PGO is used), does any compiler turn this on by default? (GCC and Clang/LLVM doesn't).

Reply Parent Score: 4