Linked by Thom Holwerda on Fri 29th Dec 2006 21:35 UTC
IBM Judging by details revealed in a chip conference agenda, the clock frequency race isn't over yet. IBM's Power6 processor will be able to exceed 5 gigahertz in a high-performance mode, and the second-generation Cell Broadband Engine processor from IBM, Sony and Toshiba will run at 6GHz, according to the program for the International Solid State Circuits Conference that begins February 11 in San Francisco.
Thread beginning with comment 197360
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[4]: Uh-Oh
by Nicholas Blachford on Sat 30th Dec 2006 18:07 UTC in reply to "RE[3]: Uh-Oh"
Nicholas Blachford
Member since:
2005-07-06

At the very bottom of the ladder, you have microcontrollers, whose microarchitectures are almost completely driven by their instruction set. Interestingly, most of these are CISC chips, because of the code-density advantages of CISC versus RISC.

Low end Microcontrollers are liable to be 8 or even 4 bit CPUs which pre-date the RISC / CISC debate and as such, don't really count as either.

If you move up into 32 bit embedded controllers CISC ISA based processors are pretty much nowhere to be seen, it's dominated by ARM which is a very RISC ISA.

but essentially, Core 2 is suitable for workstations because Intel designed the microarchitecture for that role, while Power6 is suitable for servers because IBM designed the microarchitecture for that role, while Cell is suitable for consoles because IBM designed the microarchitecture for that role. All of these chips could've been designed with a different ISA without dramatically changing their performance characteristics.

That is probably true for Core2 probably not in the case of POWER6 and definitely not in the case of Cell. POWER6 is rumoured to be quite a bit simpler than POWER5 for out-of-order execution, this is more likely to hurt x86 performance than PowerPC.

In the case of Cell it's power comes from the SPEs, these are very much RISC designs and are highly dependant on their ISA, making them decode and execute something like x86 code would just plain hurt.

I see your point but it's only true for highly complex CPUs, they do do a very good job of hiding the "internal" ISA. Better examples would be POWER5, Core2 and Opteron.

It's a huge honking server chip designed for huge honking servers. Apple doesn't sell huge honking servers, what it sells are laptop and desktop machines that need high performance with lower power dissipation and with very cheap supporting infrastructure. Intel's Core provides that, in a way no Power6 derivative is going to.

They had to build a whole new processor when they made the 970, with POWER6 it's designed to be scalable so building a cut down cooler version is pretty much a case of putting the same chip in a smaller box.

POWER7 is even more aggressive in that direction - a version of it will fit into an Opteron socket.

Reply Parent Score: 3

RE[5]: Uh-Oh
by rayiner on Sat 30th Dec 2006 18:55 in reply to "RE[4]: Uh-Oh"
rayiner Member since:
2005-07-06

Low end Microcontrollers are liable to be 8 or even 4 bit CPUs which pre-date the RISC / CISC debate and as such, don't really count as either.

By your logic, all the CISC architectures that precipitated the RISC design don't count as either because they pre-date the RISC / CISC debate! Classic 8-bit microcontrollers like the Zilog Z80 and the Motorola 6800 are most definitely CISC chips. The only one I can think of that doesn't really count as either is the PIC, and then only because it's in some ways RISC-y (single-cycle fixed-length instructions), and in some ways CISC-y (accumulator-based register model with memory operands).

If you move up into 32 bit embedded controllers CISC ISA based processors are pretty much nowhere to be seen, it's dominated by ARM which is a very RISC ISA.

Those aren't really "micro" controllers as such --- I lumped them into my embedded category.

That is probably true for Core2 probably not in the case of POWER6 and definitely not in the case of Cell.

I wasn't implying that you could do an x86 Cell properly. You really wouldn't, because you really don't want to do an in-order x86. Of course, for the PPE's role in the chip, I don't think an in-order anything is a particularly good idea.

POWER6 is rumoured to be quite a bit simpler than POWER5 for out-of-order execution, this is more likely to hurt x86 performance than PowerPC.

It is doubtful that POWER6 has a simpler OOO core than, say, the Pentium Pro. IBM is espousing 2x the performance for POWER5, and at 4-5 GHz, Power6 will have to retain comparable IPC to POWER5 to meet that goal. That level of OOO is likely enough to make up for any deficiencies of x86. Sure, you'll have to make the pipeline a couple of stages longer to decode x86 efficiently, but that's not going to change your performance drastically.

In the case of Cell it's power comes from the SPEs, these are very much RISC designs and are highly dependant on their ISA, making them decode and execute something like x86 code would just plain hurt.

Yep. Entertainingly, PowerPC is apparently not RISC enough for the SPEs (which is another reason why this RISC versus CISC thing is so silly to talk about).

They had to build a whole new processor when they made the 970, with POWER6 it's designed to be scalable so building a cut down cooler version is pretty much a case of putting the same chip in a smaller box.

Every engineering design is a point in the design space. That point is decided via numerous trade-offs which are made to achieve a particular final result based on particular given specifications. You can't move a design to a radically different point and still expect it to perform as well as another design that's targetted for that specific point.

POWER6 has a specific design point: 100W+ TDP, 32MB+ external L3, 75GB/sec memory bus. It is designed to that specification. The circuits are designed for high-clockspeed, not low power consumption. The large L3 cache puts a lower burden on the OOO core to cover memory latency, allowing it to be simpler. The huge memory bandwidth influences the design of the prefetch algorithms. Core 2 is designed to a different point: 35W TDP, no external cache, 10GB/sec memory bus. It's circuits are designed for low-power consumption over ultimate clockspeed, it has a deeper OOO core to cover memory latency, and it has to be more judicious about its prefetching.

POWER6 is simply not going to scale down to Core 2's design point while performing competitively with Core 2. That's just not how things work. What Apple, rightly realized was the fact that the design point they needed was precisely the one Intel was targetting with their processors. They could get a chip that was actually designed for the tasks they needed, instead of having to use a chip that was drastically scaled up or down to fit their market, with sub-optimal results.

Edited 2006-12-30 18:58

Reply Parent Score: 3

RE[6]: Uh-Oh
by andrewg on Sat 30th Dec 2006 19:25 in reply to "RE[5]: Uh-Oh"
andrewg Member since:
2005-07-06

Hi Rayiner. Not going to try and contradict you but I read through the link (http://realworldtech.com/page.cfm?ArticleID=RWT101606194731) posted by Dubhthach which was very good. It went over a presentation on Power6 by IBM.

So interesting points.

1. The Power6 does seem have been designed to allow a lot of configurability.

"From the start, IBM has designed the POWER6 systems to be extremely configurable. The intra-node busses, which normally operate on 8 bytes/cycle can be chopped down to 2 bytes/cycle for low-end systems, and the inter-node busses can also operate at 4 bytes/cycle. Similarly, the two integrated memory controllers can both operate at half-width, and one of them can be removed entirely. The external L3 caches are optional, and are available either in the MCM, or in an external configuration. ..."

Now I realise that does not mean it would scale to a consumer level notebook but it is interesting.

2. It appears that Power6 will be better at out of order execution thanks to a change to configuation of pipline stages,

"The basic pipeline for the POWER6 is the same number of stages as the POWER5, but they have been rebalanced across the different phases. Most significantly, dependent ALU operations now can execute back to back, eliminating a vexing kludge in the original POWER4/5 architecture. This makes the out-of-order scheduling easier, and is probably the reason that the instruction issue/dispatch phase uses 2 cycles in the POWER6 (compared to 4 in the POWER5)."

Reply Parent Score: 1