Linked by MOS6510 on Fri 17th May 2013 22:22 UTC
Hardware, Embedded Systems "It is good for programmers to understand what goes on inside a processor. The CPU is at the heart of our career. What goes on inside the CPU? How long does it take for one instruction to run? What does it mean when a new CPU has a 12-stage pipeline, or 18-stage pipeline, or even a 'deep' 31-stage pipeline? Programs generally treat the CPU as a black box. Instructions go into the box in order, instructions come out of the box in order, and some processing magic happens inside. As a programmer, it is useful to learn what happens inside the box. This is especially true if you will be working on tasks like program optimization. If you don't know what is going on inside the CPU, how can you optimize for it? This article is about what goes on inside the x86 processor's deep pipeline."
Permalink for comment 561998
To read all comments associated with this story, please click here.
RE[7]: Comment by Drumhellar
by tylerdurden on Sat 18th May 2013 20:29 UTC in reply to "RE[6]: Comment by Drumhellar"
Member since:

Well, FPGAs are just seas of programmable logic cells with somewhat flexible interconnects, so their "parallelism" depends on the designs being implemented. E.g. there are plenty of FPGA's used to synthesize algorithms which could be considered "sequential" and non-parallel in the nature of how they process data. However, modern large FPGAs provide a sea of ALUs as well, which indeed lend themselves naturally to parallel programming models.

To be fair, modern CPUs do support most forms of parallelism; whether it be some form of instruction level parallelism (superscalar, SMT, out-of-order, multicore, etc), as well as data parallel structures like SIMD and Vector units. However, general purpose CPUs have to hit certain "balance" when it comes to their designs; how much chip area/power should be dedicated to control structures, how much to execution, how much to memory, etc. In order to hit a wide range of performance targets of general programmability. Whereas GPUs and ASICs have more restricted application targets. In the case of GPUs, they're used to run algorithms with elevated degrees of data parallelism, so they can dedicate most of their area to execution structures, rather than control (since they don't have to dynamically squeeze as much performance from a single instruction stream as possible), as an oversimplied example.

AMDs newer fusion microarchitectures are something that may interest you, since they are starting to support elevated degrees of data parallelism on die.

Edited 2013-05-18 20:47 UTC

Reply Parent Score: 3