Linked by MOS6510 on Fri 17th May 2013 22:22 UTC
Hardware, Embedded Systems "It is good for programmers to understand what goes on inside a processor. The CPU is at the heart of our career. What goes on inside the CPU? How long does it take for one instruction to run? What does it mean when a new CPU has a 12-stage pipeline, or 18-stage pipeline, or even a 'deep' 31-stage pipeline? Programs generally treat the CPU as a black box. Instructions go into the box in order, instructions come out of the box in order, and some processing magic happens inside. As a programmer, it is useful to learn what happens inside the box. This is especially true if you will be working on tasks like program optimization. If you don't know what is going on inside the CPU, how can you optimize for it? This article is about what goes on inside the x86 processor's deep pipeline."
Permalink for comment 561957
To read all comments associated with this story, please click here.
RE[2]: Comment by Drumhellar
by butters on Sat 18th May 2013 03:03 UTC in reply to "RE: Comment by Drumhellar"
Member since:

Superscalar means multiple instructions may be issued in a single cycle. The P5 Pentium was the first superscalar x86 chip. It could dispatch and issue one or two instructions per cycle in program order. The P6 (Pentium Pro through Pentium III) could dispatch three instructions per cycle and issue five instructions per cycle out of program order.

Pentium M and Core (1) are a direct evolution of P6 with the same three dispatch ports and five issue ports. Pentium M added micro-ops fusion with an additional two pipeline stages (12 to 14). Core allowed two cores to share a common L2 cache.

Core 2, besides the 64-bit GPRs, is wider than P6, with four dispatch ports and six issue ports. And Haswell is adding another two issue ports for a total of eight.

As for pipeline depth, the entire industry has converged on 12-15 cycles for CPUs designed to be clocked in the 2-4GHz range. Apple A6 and Qualcomm Snapdragon have 12-cycle pipelines. Atom is moving from a 14-cycle pipeline to a brand-new 13-cycle pipeline. ARM Cortex A15 has an eponymous 15-cycle pipeline.

But at clock frequencies below ~1.5Ghz, a shorter pipeline is more optimal. The 7-cycle ARM Cortex A7 is the best example of a modern core designed to perform well at low clock frequencies.

Reply Parent Score: 5