Post a Comment
Team
The Last Macintosh PowerPC chip (G5) was based on the Power4 CPU.
Since then we have had:
Power4+
Power5 (+?)
and now the Power6
It makes you wonder what sort of desktop chip this could have been if the G5 equivalent were made from the Power6 base.
It does make you wonder.
edit: yep typos...
Darren
Edited 2007-05-22 00:49
Power is not PPC!
Actually they are the one and the same and have been for many years.
PowerPC started as a modified version of the first POWER chips, within a couple of generations though the two recombined. These days they are now all branded as "Power Architecture" and all use the PowerPC ISA.
The G5 and the Power5 didn't have much in common.
They were both modified versions of POWER4 so they had rather a lot in common actually, they can for instance run each other's binaries, so can Cell.
As for POWER6 and Apple, if Apple was still using PPC they'd have most likely have announced a POWER6 based machine, unlike the POWER4 which had to be modified to run Altivec for Apple, the POWER6 was designed from the beginning to run Altivec. It'll also scale down (less I/O etc) so Apple wouldn't even need to have a modified version.
I doubt Apple would have any difficulty getting OS X to boot on one of these machines.
--
As for laptops no POWER6 won't go in a laptop but could probably be modified to do so using the same techniques Intel and AMD use.
However that may not be necessary given the number of PPC chips now appearing on the market with laptop grade power consumption. Indeed it was lost in all the noise but yesterday AMCC announced a 2GHz part which runs at 2.5Watts.
Barcelona is 281 mm^2, but it's also a server processor. You're not going to see something that big in an iMac-class machine anytime soon.
I'm gonna break my own suggestion and mention Apple here. Notice how they're shipping 8-core (!) 3.0 GHz PowerMacs starting at $4000. They've gone from topping out at 2 cores with the 970MP to topping out at 8 cores with the Xeon, in the space of less than a year. Can you possibly imagine POWER competing with that kind of price/performance? How much is an 8-core POWER6 machine gonna cost, even stripped down to to target Apple's decidedly low-end market?
Edited 2007-05-22 17:17
The Octo Mac comparison is potentially misleading.
The 4.7 GHz Power6 has a SPECint Peak of 21.5, and a SPECint Base of 17.8
The 3.0 GHz Xeon 5160 has a SPECint Peak of 18.1, and a SPECint Base of 17.5
Now, the first thing to keep in mind is that the Peak figures are taken with PGO. PGO is great for SPEC, but almost nobody uses it for anything else. The second thing to keep in mind is that the numbers are taken with XLC and Intel C++, neither of which are relevant to 99% of code out there. The standard OS X compiler is GCC. GCC is much stronger on x86 than on PPC.
Given all that, I'd be very surprised if GCC-compiled code on the Power6 achieved even parity with GCC-compiled code on the Xeon.
The manufacturing cost of a chip goes up exponentially with die-size. The sweet-spot for a high-volume mainstream chip is considered to be in the ~120 mm^2 range.
Of course, the Power6 isn't a high-volume mainstream chip. The cost equation for chips like the Power6 (or Itanium or PA-RISC) is different. For such chips, the low-volumes mean that each unit carries a relatively large portion of the fixed cost of developing the design. Thus, the sweet spot for the per-unit die size is at a much higher point.
I know. Nicholas Blachford contended that the core could be used unchanged in the (presumably high-volume, low-margin) Mac market.
I wasn't talking about Mac Minis!
Apple has some of the highest margins in the entire industry, they have certainly never been considered as a low cost player.
Anyway, have you seen how much silicon there is in those quad core machines?
Anyway, quad core POWER5+ CPUs are already used unchanged in low end IBM machines which don't cost much more than Apple boxes.
Yes, Apple is high-margin, but "high-margin for personal computers". Power6 is for a market whose "low end" is way above the PowerMac level. And 341 mm^2 is pushing it, even for a PowerMac.
As for "don't cost much more than Apple boxes". The lowest-end quad-core Power5+ machine using the slowest-available Power5+ CPUs is $5500. The corresponding quad-core PowerMac using the slowest-available Xeons (same 4GB of RAM) costs $2900. The cheapest eight-core Power5+ machine using the slowest-available Power5+ CPUs is $18000. The cheapest eight-core PowerMac using the fastest available Xeon processors is $4000. Absolutely no contest, not in a market that doesn't need the POWER machine's extensive RAS features.
It should also be noted that Power5+ is substantially smaller than Power6, and is also a chip at the end of its lifecycle (so I'm not surprised to see good deals from IBM). If we see quad-core Power6 machines much under $8000, I'll be very surprised.
There is also the question of what the heck to do with iMacs and MacBooks. Because as far as Apple is concerned, those machines are way more important than the PowerMac. On the Intel side, these machines have chips in the same league (just a lower clockspeed or fewer cores) than the ones in the PowerMac. On the PPC side what do you use? PA6T? A chip that has performance comparable to a Core 2 at 1 GHz?
Edited 2007-05-23 02:42
Power is basically PowerPC these days. The G5 was based on the Power4 microarchitecture, as was the Power5. So the G5 and Power5 have a lot in common. The main difference between them is that the G5 has a VMX unit, while the Power5 had some optimizations to the basic Power4 microarchitecture (deeper buffers, tweeks to the grouping mechanism, etc).
The PPC970 that Apple uses/used was based on the Power4. By that it was a stripped down, one-core relative. (Later it has modified and fitted into Xenon CPU (xBox360), the Cell COU (PS3, blades and friends) and somewhat unmdified in the Wii/gameCube).
The Power has been made exclusively for server-use, even though some RS/6000 desktops might be Power6-powered with time.
No, NO, NO!
The PowerPC cores in the 360's Xenon, the PS3's Cell, and the Wii's Broadway processor have *NOTHING* to do with Power4 or PPC970. The single resemblence the Xenon/Cell cores share with the 970 is that a subset of their complete instruction set is the 64bit PPC instruction set. That is where the similarity ends. The Wii's Broadway processor isn't even 64bit!
The PPC cores in the Xenon are functionaly identical to the PPC core in the Cell. The only differences are in the cache-control mechanism and in the communication mechanism -- The cell uses their "XO" communication fabric for off-chip communication and a ring topology to communicate with the SPE vector units, while the 360 uses Hypertransport (or something similar).
These cores are *not* derived from any commercially available PPC product line both Server or Desktop. These chips come from an experimental architecture IBM developed to push the limits of PPC architecture on a small, low-power die. While the PPC970 and similar power designs have out-of-order execution, these embedded PPC cores do not. These cores also impliment their duel-threaded execution in a novel way; In addition to a standard alternating scheduler, when one thread stalls (say on a memory accress) the other thread will execute.
The Wii's Broadway processor is based on the gamecube's Gekko processor, which in turn is based on the G3 PowerPC processor that was found in the early iMacs. Nintendo had IBM add some SIMD instructions for the Cube which overlap the FPU execution unit -- basically they added instructions to process a pair of 32bit floats using the silicon from the 64bit FPU. The Broadway processor's re-spun silicon simply runs faster, adds aggressive power-saving features and more fine-grained cache control, uses a smaller process, and likely has some minor silicon tweaks. IBM offers the exact same PPC core for the embedded market as a cheap, powerful, extremely low-power embedded CPU.
That looks like one powerful processor.
4.7Ghz top frequency & Dual Core. And was able to almost double the benchmarks from the previous Power5 chip. Cool.
Looks like it is geared for the server market right now.
What OS will work on it? I'm thinking they'll use Linux or BSD. Any other ones?
Edit: I'll also add that the chips are 64 bit and have 8MB cache per chip & use same amount of energy / electricity as Power5.
http://www-03.ibm.com/press/us/en/pressrelease/21580.wss
And I believe max speed is 4.7Ghz per core.
Edited 2007-05-22 01:28
"""
It's a real shame that the Power line of chips are only suited for servers. Average desktop users will never benefit from all the innovations that IBM or Sun invent.
"""
Yeah. I'd love to have one of those in my laptop.
As with so many other things, perhaps bundling is the answer:
http://www.autozone.com/selectedZip,73112/initialAction,partProduct...
At 160 watts per die (as Rayiner mentions) my single die Presario would run for 7 hours on the Power6/27-DLG bundle. (Assuming about 1200 watt-hours for the battery)
I'm joking, of course, ;-)
Edited 2007-05-22 01:53
Oh, major game consoles are using "POWER line of chips", in case you don't know.
I am aware of the game consoles, but a Wii is not my laptop. Yes, average users do benefit from Intel and AMD as their server technology makes it way down to consumer chips. Specific hardware implementations from Sun/IBM can't transition to desktop/laptop chips because they don't make any. Intel/AMD can copy their ideas and pass them along to me, but this is an inefficient process.
Edited 2007-05-22 02:31
I still regret that Apple had to switch to Intel X86. I just like the idea of Apple computers running off of PowerPC chips more, if only to be different. Too bad IBM was not at least a little more accommodating with its PowerPC offerings.
Imagine Steve Jobs at WWDC07:
"Oh, and one more thing. Introducing the new 4.7Ghz, dual-core, PowerPC Power Mac. The new world's most powerful personal super-computer!
Now, that one would be worth attending!
http://www.cminusgames.com
FWIW, consumers do benefit from IBM's chip innovations. IBM is partnered with AMD and is pretty much the only firm seriously competing with Intel on process technology.
<rant>
IBM is in the rare breed of good old tech companies like HP that put significant money into R&D and bring us shiny new toys. Contrast this to Dell, and it's no wonder they are in trouble.
</rant>
True, but HP made one stupid decision with Itanium - personally, they would have been better off adopting SPARC ISA and plonking it on a superior micro architecture.
Intel volume manufacturing and cash with HP innovation, and using an openstandard Microprocessor ISA like SPARC would have put then in a good place to compete against POWER.
With that being said, however, with features being pushed back into consumer processors; MMIO for example is going to be added in future x86 processors. IMHO the mainstream need to look at the features that have long existed within the RISC world and pull them back into the x86 world which would improve the reliability and stability of consumer level processors.
Edited 2007-05-22 04:03
HP bet bot parts of the future of the company of Itanic and lost big time.
They inherited the Alpha with the Compaq Purchase and killed it off.
IMHO, they should have seen the writing on the Wall and ditched the Itanic in favour of the Alpha Architecture but I suspect the contract with Intel was a big hurdle in doing this.
HP is as bad as Microsoft with the FUD.
When Alpha was launched HP put out a spoiler ad campaign basically saying
"Who Needs 64Bit? Not you"
Back on Topic.
The Power 6 Architecture is so far removed from the majority of CPU's that Intel are turning out to make most comparisons very difficult.
I'll personally applaud Intem when the ditch their current X86 Arch and especially the way they do memory accesses. The AMD way is far superior.
Or amd64
I've been writing an assembler for amd64 for the past two weeks, and from what I've seen so far, I like it a whole lot more than PPC or SPARC.
Comparison of x86-64 instructions, and their equivalent in PPC.
add r10, r12 is 3 bytes and 1 issue slot on x86, 4 bytes and one issue slot on PPC.
add r10, [r12] is 3 bytes and 1 issue slot on x86, 8 bytes and two issue slots on PPC.
add r10, [r12 + r11*8 + 24] is 5 bytes and 1 issue slot on x86, 12 bytes and three issue slots on PPC.
push r10 is 2 bytes and 1 issue slot on x86, 8 bytes and two issue slots on PPC.
mov r10,0x123456789ABCDEF is 10 bytes and 1 issue slot on x86, 20 bytes and five issue slots on PPC.
All of these are 1 byte shorter for x86 if using the lower 8 GPRs and 32-bit operations.
Pretty neat for an architecture that supposedly sucks so bad...
First you're cheating with x86-64: x86 sucks bad, x86-64 is only not very good.
Comparing the byte length of instructions is only one metric, CISC's variable length encoding makes it more difficult to decode, which means that for a similar amount of money, a CPU maker would develop a RISC CPU with better performance than a CISC CPU.
Of course Intel have more money to spend developing x86 CPUs than most other CPU makers..
That said, it's also possible to make "RISC" CPUs with good instruction average length: ARM Thumb2 for example, they provide 16 and 32 bit operations which is a good compromise: easier to decode than byte-length instructions but provide a 'good enough' instruction density, comparable to x86.
CISC's variable length encoding makes it more difficult to decode
Intel engineers can do it with closed eyes.
CISC instructions, usually do more work/time.
Less commands to issue, less memory/cache bandwidth and execution resources requirements.
The advantage of RISC is a simple hardware with nice performance, while the CISC design of the comparable simplicity will have much worse performance. Theoretically, RISC vendors can provide more execution units for the same transistor count because of simpler hardware.
But x86 camp can outdo any (performance) RISC vendor in all of price/performance/power ratings.
Lets compare K8 vs ppc970/G5.
3 IU 3 FPU vs 2 IU 2 FPU
AMD has fast low latency units while G5 has shameful high latency simple units.
G5 OOO is weak and suffer from a lot of stalls.
G5 is slower and eats ~2 times more power.
Personally i don't like PPC as ISA, but it have some nice big-iron functionality like proper virtualization, etc.
Now we have Power6. Where best performance live with insane price and power consumption.
Edited 2007-05-23 11:35
“But x86 camp can outdo any (performance) RISC vendor in all of price/performance/power ratings.“
No you can’t.
Sole reason why you can compare G5 to x86 is because abundances of money and resources in x86 world. Great performance levels of x86 chips, are based on that pillar. Once you remove that pillar, that statement will crash.
“CISC instructions, usually do more work/time.“
In theory yes, I would agree to that statement. In practice no. I couldn’t agree less.
Chips like SuperH, ARM Thumb or MIPS-16 can and do have higher instruction density exactly because smart rethinking of their RISC roots.
But even if you take more common RISC designs, they will have instruction rate similar in metrics to x86, as present day RISC chips are abundant as x86 in features and instructions and whatever metrics you pick.
But truthfully speaking, it doesn’t matter any more. X86 is king of hill. One day everything else will be based around it.
SuperH, ARM Thumb or MIPS-16
These are not a high performance parts.
I like ARM and actually i have some ARM7 asm coding experince.
But even if you take more common RISC designs
ARM is the most common RISC design =)
There are only 1.5 "big" RISCs left - Power and SPARC.
X86 is king of hill. One day everything else will be based around it.
Not everything, but to all appearances (unfortunately) high-performance ARMs will be kicked out of complex gadgets in a matter of several years.
In theory yes, I would agree to that statement. In practice no. I couldn’t agree less.
In theory and in practice. Both modern lines of x86 chips use the CISC-y nature of x86 code to increase dispatch and execute bandwidth. In theory, Core 2 is a 4-issue design, but in practice the decode/issue bandwidth can be quite a bit higher, because it can issue a load+op as a single instruction. K8 is a 3-issue design, but with the right instruction mix can behave as up to a 6-issue design, because the basic unit of issue is a load+op macro-op.
amd64's variable-length immediate and complex addressing modes also reduces both code size and saves issue/execute bandwidth. Loading a 64-bit constant can be done with a single micro-op on K8 and Core 2, but is a sequence of 5 instructions in PPC64. x86 can do a load with an index and displacement in a single AGU micro-op, while it takes 2 instructions on PPC. And RIP-relative addressing is just a plain good idea.
First you're cheating with x86-64: x86 sucks bad, x86-64 is only not very good.
Cheating how? I didn't say anything about x86, just x86-64. And x86-64 is very much in the same spirit as x86.
Comparing the byte length of instructions is only one metric, CISC's variable length encoding makes it more difficult to decode
And who says ease of decode is the most important metric? Maybe it was once, when CPUs were much simpler beasts, but now?
which means that for a similar amount of money, a CPU maker would develop a RISC CPU with better performance than a CISC CPU.
I'm skeptical. x86-64 code can be half the size of PowerPC code. Saving a few pipeline stages in the decode step is unlikely to offset the cost of effectively halving the size of the instruction cache.
>Cheating how? I didn't say anything about x86, just x86-64. And x86-64 is very much in the same spirit as x86.
It was a tongue-in-cheek comment, but not far from the truth: the 8 to 16 registers of x86-64 can bring up to 20% of performance improvement which is *huge*.
>And who says ease of decode is the most important metric?
I didn't say that it was the most important metric..
If you're a CPU maker with little cash, it's quite important, if you're Intel not so much.
> halving the size of the instruction cache.
Depends on which cache and CPU..
If memory serves, on the P4, Intel stored in a 'trace cache' the *decoded* instructions, so here there is no gain in size in this cache, of course there is still a gain in size on the other cache and a gain in bandwidth usage.
> x86-64 code can be half the size of PowerPC code.
But the PowerPC is not the only RISC ISA! As I've said an ARM Thumb2 ISA is a 16/32bit RISC-like ISA nearly as easy to decode as a 32bit only ISA but with the instruction density more similar to x86-64.
>Saving a few pipeline stages in the decode step is unlikely to offset the cost of effectively halving the size of the instruction cache.
Frankly this is quite difficult to say, at this level it's all a matter of compromise..
We I logged on and saw 25 comments they were from people who had some hands on experience with the chip, the early
testers ya know? Should have known better.
Anyways, we have a few of the test machines at one of our larger data centers, too bad not at the one I work at *sigh* ( I work at a legacy one)
The POWER6 should be the hands down performance leader in the market. In fact, the POWER5 and POWER5+ are still kicking everyone's ass even as the POWER6 is being announced. The POWER5 chips beat out and HP Superdome in a TPC benchmark with half the number of processors.
What are Sun and HP doing to compete?





