Linked by MOS6510 on Fri 17th May 2013 22:22 UTC
Hardware, Embedded Systems "It is good for programmers to understand what goes on inside a processor. The CPU is at the heart of our career. What goes on inside the CPU? How long does it take for one instruction to run? What does it mean when a new CPU has a 12-stage pipeline, or 18-stage pipeline, or even a 'deep' 31-stage pipeline? Programs generally treat the CPU as a black box. Instructions go into the box in order, instructions come out of the box in order, and some processing magic happens inside. As a programmer, it is useful to learn what happens inside the box. This is especially true if you will be working on tasks like program optimization. If you don't know what is going on inside the CPU, how can you optimize for it? This article is about what goes on inside the x86 processor's deep pipeline."
Thread beginning with comment 561996
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[6]: Comment by Drumhellar
by Alfman on Sat 18th May 2013 19:31 UTC in reply to "RE[5]: Comment by Drumhellar"
Alfman
Member since:
2011-01-28

tylerdurden,

Haha, I know right. Both Intel Itanium and superscalar x86 architectures are different ways to achieve a bit more parallelism by doing more work per cycle. But both of those architectures are still sequential in nature and neither of them are parallel in the sense that FPGAs are.

Reply Parent Score: 2

RE[7]: Comment by Drumhellar
by tylerdurden on Sat 18th May 2013 20:29 in reply to "RE[6]: Comment by Drumhellar"
tylerdurden Member since:
2009-03-17

Well, FPGAs are just seas of programmable logic cells with somewhat flexible interconnects, so their "parallelism" depends on the designs being implemented. E.g. there are plenty of FPGA's used to synthesize algorithms which could be considered "sequential" and non-parallel in the nature of how they process data. However, modern large FPGAs provide a sea of ALUs as well, which indeed lend themselves naturally to parallel programming models.

To be fair, modern CPUs do support most forms of parallelism; whether it be some form of instruction level parallelism (superscalar, SMT, out-of-order, multicore, etc), as well as data parallel structures like SIMD and Vector units. However, general purpose CPUs have to hit certain "balance" when it comes to their designs; how much chip area/power should be dedicated to control structures, how much to execution, how much to memory, etc. In order to hit a wide range of performance targets of general programmability. Whereas GPUs and ASICs have more restricted application targets. In the case of GPUs, they're used to run algorithms with elevated degrees of data parallelism, so they can dedicate most of their area to execution structures, rather than control (since they don't have to dynamically squeeze as much performance from a single instruction stream as possible), as an oversimplied example.

AMDs newer fusion microarchitectures are something that may interest you, since they are starting to support elevated degrees of data parallelism on die.

Edited 2013-05-18 20:47 UTC

Reply Parent Score: 3

RE[8]: Comment by Drumhellar
by Alfman on Sun 19th May 2013 02:17 in reply to "RE[7]: Comment by Drumhellar"
Alfman Member since:
2011-01-28

tylerdurden,

"Well, FPGAs are just seas of programmable logic cells with somewhat flexible interconnects, so their 'parallelism' depends on the designs being implemented."

Yes, it all depends on design, it'd be very powerful in the hands of innovative software developers, but I don't know if/when consumer CPUs will provide FPGA like technology enabling software developers to take advantage of them.


"To be fair, modern CPUs do support most forms of parallelism; whether it be some form of instruction level parallelism (superscalar, SMT, out-of-order, multicore, etc), as well as data parallel structures like SIMD and Vector units."

True, but it's watered down. Every time I look at SSE I ask myself why intel didn't make SIMD extension instructions that could accommodate much greater parallelism. x86 SIMD extensions only offer low parallel scaling factors. I know you are right that intel had to strike a balance somewhere, but never the less I feel their whole 'lite SIMD' approach is impeding significantly higher software scalability.



"In the case of GPUs, they're used to run algorithms with elevated degrees of data parallelism, so they can dedicate most of their area to execution structures"

I like the way GPUs in particular are designed to scale to arbitrary numbers of execution units without otherwise changing the software. This is just awesome for "embarrassingly parallel algorithms" (https://en.wikipedia.org/wiki/Embarrassingly_parallel).


"AMDs newer fusion microarchitectures are something that may interest you, since they are starting to support elevated degrees of data parallelism on die."

Yes maybe, I'll have to read up on it.

Reply Parent Score: 2