Linked by Thom Holwerda on Mon 5th Dec 2011 22:48 UTC
Permalink for comment 499367
To read all comments associated with this story, please click here.
To read all comments associated with this story, please click here.
News
Linked by Thom Holwerda on 05/25/13 0:45 UTC
Linked by Thom Holwerda on 05/24/13 23:59 UTC
Linked by Thom Holwerda on 05/24/13 22:33 UTC
Linked by Howard Fosdick on 05/24/13 21:41 UTC
Linked by Thom Holwerda on 05/24/13 14:44 UTC
Linked by Thom Holwerda on 05/23/13 23:22 UTC
Linked by Thom Holwerda on 05/23/13 22:04 UTC
Linked by Thom Holwerda on 05/23/13 22:01 UTC
Linked by Thom Holwerda on 05/23/13 17:52 UTC
Linked by Thom Holwerda on 05/22/13 22:23 UTC
More News »
Sponsored Links



Member since:
2011-01-28
renox,
"Uh? MIPS has two version of integer operation: ADD/ADDU, SUB/SUBU (one which trap on overflow, one which doesn't and corresponds to modulo operation"
I wasn't really disagreeing with you.
"so the big thing here is that there is nearly no difference in performance between 'modulo' computations and 'trap on overflow' computations(*) which isn't the same with other ISA."
To be fair we'd need to actually test the performance differences on real CPUs. We cannot draw performance conclusions by counting instructions. For some x86 CPUs, jumps are "free" as long as they are predictable.
For example:
TEST1:
mov ecx, 0x00000000
mov eax, 0x00000000
.again:
add eax, 0x00000001
jo .overflow
.overflow:
loop .again
TEST2:
mov ecx, 0x00000000
mov eax, 0x00000000
.again:
add eax, 0x00000001
loop .again
TEST3:
mov ecx, 0x00000000
mov eax, 0x00000000
.again:
add eax, 0x00000001
jno .nooverflow
.nooverflow:
loop .again
On my 3GHz machine, 2^32 loops * 3 passes gives the following:
Test1=8.597570, 8.597259, 8.597007 -> 8.597278
Test2=8.596904, 8.599724, 8.598505 -> 8.598377
Test3=8.596864, 8.596893, 8.597207 -> 8.596988
Note that the addition of a jump instruction did not hurt the performance of the loop within a reasonable margin of error.
The same tests with unpredictable branching (adding 0x80000000 forces the branch to toggle each iteration).
Test1=10.694893, 10.742191, 10.759600 -> 10.732228
Test2=8.597308, 8.596457, 8.595678 -> 8.596481
Test3=10.862917, 10.777331, 10.680178 ->10.773475
So, the unpredictable branches hurt the performance, but I have to question whether a MIPS trap would do any better. Can MIPS do overflow checking without a trap? So long as the overflow is exceptional behaviour, I think we should both agree from these tests that the extra jump won't make any significant difference.
Now maybe it's true an inordinate amount of silicon has gone to branch prediction in the x86, which may have theoretically gone to better use in the MIPS, but you can't deny the x86 seems to do a decent job in this microbenchmark. Unfortunately I don't have a MIPS processor to test with.