Linked by Thom Holwerda on Fri 15th Feb 2013 10:40 UTC
Permalink for comment 553222
To read all comments associated with this story, please click here.
To read all comments associated with this story, please click here.
Features
Linked by Thom Holwerda on 05/18/13 21:33 UTC
Linked by David Adams on 05/16/13 4:23 UTC
Linked by Thom Holwerda on 05/11/13 21:41 UTC
Linked by Thom Holwerda on 05/08/13 14:22 UTC
Linked by Thom Holwerda on 05/02/13 15:28 UTC
Linked by Thom Holwerda on 04/29/13 21:06 UTC
Linked by Thom Holwerda on 04/24/13 22:24 UTC
Linked by Thom Holwerda on 04/18/13 11:21 UTC
Linked by Thom Holwerda on 04/16/13 9:29 UTC
Linked by Thom Holwerda on 04/15/13 22:44 UTC
More Features »
Sponsored Links



Member since:
2009-08-13
- out of order execution
It's like: hey we (the processor manufacturer) have inserted a little data flow engine in your processor, this is sadly only for values in registers - memory stuff still is mostly in order. And the programmer say okay, that's nice, I guess I don't have to work as hard to make instructions start in parallel and instead optimize data flows. "
There's also a dependency analysis. Your expressions shouldn't be too interwired.
- parallel execution units
See above. "
Flatness. Don't nest too deep. In C++, I've seen a simple loop iterating over an std::vector compiled to really fine SSE code.
- branch prediction
Okay, care to explain how this could make any difference when coding? This is a mechanism that applies predictions from dynamic patters when executing code, not something that have to be coded. Current x86 processors doesn't even support hints. "
This goes hand in hand with above and with speculative execution. Generally, your code should be as predictable as possible, i.e. the sooner the branch-condition is known, the better.
There's also the defaults when the CPU discovers a branch for the first time: If the CPU by default assumes "branch taken", you should structure your code so that it is less expensive to enter the body of an if statement.
Then, especially in tight loops and branches within them, the branch-conditions shouldn't vary for each iteration, because the CPU has a branch target buffer.
- multiple cache levels
Unless you code Fortran I don't think you'll ever see your compiler optimize towards this. But yes multi-dimensional cache blocking/tiling is a pain in the ass and making it dynamically adapt to the platform cache hierarchy almost require runtime code generation.
Which your standard compiler wont do. "
Enter the world of cache oblivious algorithms.
No it simply requires knowledge of software-hardware interaction. Assembly language programmers are also better when optimizing e.g. C code as they know that under the abstraction it's still a Von Neumann machine.
It is not that simple actually. I've seen assembly level programmers who really know what they do, and they do it well, however, that usually was at the micro-optimization level, which doesn't scale to long-lived and/or big softwares.
I see more potential on the algorithmic level, e.g., mentioned already, cache oblivious algorithms, knowing the right algorithms, your domain, and even being able to decide at runtime which algorithm to use.
All in all I am not sure if you are evangelizing Assembly, C or compilers in your post.