Last year IBM introduced a cut-down POWER4 CPU called the PowerPC 970 and Apple promptly put it into their PowerMac line. IBM are not standing still; the POWER5 is out and rumours have long hinted at a successor to the 970 being in development. What should we expect?
I predicted  that PowerPC would get the lead over x86 CPUs in 2004. With the non appearance of the 3GHz 970 however this has not been the case, though that’s not to say it’s losing either . However, anyone can select benchmarks which show any processor running ahead of others, the real story is a lot more complex.
According to SPEC benchmarks the Itanium 2 is an absolute killer, outperforming the Opteron quite considerably especially on floating point . There are some interesting scientific benchmarks  from last year which show just how erratic benchmark results can be: in some cases the Itanium 2 is shows a clear lead, in others it’s the Opteron clearly in front.
Unfortunately, the 970 isn’t listed on SPEC but it generally seems to do comparatively badly in unofficial SPEC tests. The 970 is listed in the scientific benchmarks but again the story is less than clear. That
said, where gcc 3.3 is used the 970 does badly, while where the IBM compiler is used it does well. Compilers can have a big effect on performance so the figures for all the CPUs could have changed considerably by now.
If there is an overall performance leader in the desktop market right now I’d put my money on AMD’s Opteron, though the top 3 desktop processors (Pentium 4, AMD 64, G5) all have relative strengths and weaknesses in different areas, so none of them can be said to have a commanding lead.
The moral of the story is that no one test or set of tests tells the whole story; it’s best to run your own tests based on what you intend to run or at least look at tests specific to your application field. Independent tests are best of all, vendor-supplied benchmarks should be taken with a pinch of salt – it’s not called “benchmarketing” for nothing! That said, all the top CPUs are fast, but not in the same area.
But could this be about to change? The 970 was based on IBM’s high-end POWER4 CPU. POWER5 has since been released and rumours have long hinted at a 9×0 CPU based on it.
With new enhanced cores and improved memory and cache systems, IBM’s recently released POWER5 has shown itself to be an absolutely stellar performer in a number of different areas; in some cases nearly doubling its performance over the previous POWER4+ model .
The POWER5 is better than the POWER4 in a number of ways. Improved cores now offer simultaneous multi threading (SMT) and in order to keep the threads moving the pool of rename registers has been increased.
The POWER5 also has a slightly larger on-die cache which has a better design than before, the same goes for the 36MB external L3 cache. In addition to the improvements in the design of the caches, the L3 cache has been moved so it is now connected behind the L2 cache, reducing its latency considerably. The memory controller has also now been integrated on-die and its bandwidth increased to nearly 20GBytes per second.
A lot of CPUs spend an awful lot of time sitting around waiting for data. A lot of design effort in the POWER5 seems to have been concentrated on getting data to the CPU cores faster and this has paid off as shown by the improved benchmarks.
In addition to these improvements, internal changes mean the cores run at a higher frequency than the previous generation.
All in all, the POWER5 is a very impressive processor, especially given that it is implemented in a previous-generation (130nm) silicon process and IBM use a deliberately conservative design process.
If rumours are to believed, IBM are working on a desktop version of this CPU. The question is can IBM translate these sorts of performance gains to the desktop? We can’t expect performance like POWER5 but what can we expect?
There is pretty much zero public data available on the project other than it will almost certainly be dual core. However there are general industry trends and other announcements which may shed light on what this processor might look like.
The following is thus speculation…
The POWER5 and Opteron both have on-die memory controllers, a feature which is, in my opinion, a good bet to be included, especially given Intel and Freescale are already planning to add memory controllers to their CPUs. An on-die memory controller alone will have a considerable performance impact as it will reduce memory latency (the delay between when data is asked for and when it arrives).
HyperTransport is another possibility, IBM are a member of the HyperTransport consortium  and already use the technology on the existing 970’s northbridge. This would be an important cost saving measure as it would save IBM from needing to spend millions on developing another northbridge. Using HyperTransport as the link technology means standard PC parts for the Athlon 64 can be used in 9×0 systems.
This would also save other PowerPC manufacturers the pain of sourcing PowerPC specific northbridges which can be a royal pain as the only available parts are generally for the embedded sector and these are often not quite what a desktop manufacturer wants.
The CPU will most likely be made using a 90nm process in a less conservative manner than the POWER5 so it will run faster and cooler. Quite how fast it will be able to run is open to question and I think it’s fairly likely IBM will go the same route as AMD in their dual core plans and not clock the processor as high as possible to keep power consumption within reasonable limits. This will not be easy as POWER5 consumes 160 Watts at 1.8GHz (power consumption isn’t much of an issue at the high end).
One possibility is to use the same technique the POWER5 already uses which is to constantly adjust the clock frequency to keep heat output down. This technique is becoming popular with Transmeta and Intel doing or planning to do the same.
Another possibility would be to use a technique Intel plan to use for the next Itanium “Montecito,” which includes two peltiers in the heat sink. Peltiers actually consume quite a bit of power themselves but reducing the CPU temperature reduces transistor leakage, this lowers the power consumed by the CPU itself allowing boosts in clock frequency which might not otherwise be possible.
Montecito is expected to consume 100 Watts but its heat sink requires a further 75 watts. The end effect is overall power consumption does not change (it may even go up) as part if moved to the heat sink but the CPU itself does not get so hot when working. AMD have filed a patent on an on-chip peltier so they’re evidently considering similar technology.
I don’t know if the 9×0 will be so hot as to require such aggressive cooling but things are heading that way. “Power density” is becoming a problem and will seemingly only get worse in the future. Power density is the heat generated in a specific area; as CPUs get ever smaller the heat is generated in a smaller area and thus the unit becomes progressively more difficult to cool. The 970FX used in Apple’s PowerMacs actually uses less power than the previous 970 but liquid cooling was added because of the higher power density.
One long-rumoured feature of the 9×0 is the addition of new vector instructions (read Altivec 2). Altivec is the most powerful feature of the PowerPC line. The G4 is pretty modestly clocked, by x86 standards, to keep power consumption down, but they more than make up for it when Altivec is activated. The original architecture was designed by Keith Diefendorff at Apple and word has it he has returned to the company so it’s possible a new version is in the works. Whether it will make it into a G6 is open to question but early information on the POWER6 seems to indicate that processor will include vector processing capabilities.
What an enhanced Altivec would do is another question. The architecture could be extended to support 64 bit floating point operations. Another possibility would be to double the width doubling throughput; additional registers would also increase performance in some areas. However these are just guesses, the reality could be very different.
It’s almost certain that a POWER5-derived CPU is in development. It’s just a question of when it appears and what its features will be. The important thing is how its performance turns out.
The POWER5 increased in performance over the previous POWER4+ because of a series of enhancements. A dual core PowerPC derivative will have several of the same enhancements even if they are not to quite the same degree.
An on-die memory controller will reduce latency and simultaneously increase bandwidth. Memory bandwidth is linked to latency as memory has to be gathered in chunks and the less you have to wait per chunk the more you can ask for. That’s a simplification, but the result can be seen on the Opteron which gets closer to its theoretical bandwidth than the G5.
Given CPUs spend most of their time waiting for data any system which increases the availability of that data is going to increase performance. The on-die memory controller of the Opteron appears to be a major reason for its strong overall performance.
If an on-die memory controller is included on a 9×0 I think it’s safe to assume it too will get a significant performance boost. SMT, larger caches and other core enhancements will also be beneficial and of course a second core will double the potential computing power.
The POWER5 was designed so that any optimisations made for POWER4 will also apply to it. This is a good strategy as it seems to take years to optimise compilers to a specific CPU and means all that work is preserved. This will almost certainly also be the case for the 9×0 so compiler technology will continue to improve.
If the 9×0 is like my speculations, it looks on paper like it could give Intel and even AMD something to worry about. However, as always, we will not know what it is really like until it arrives and applications can be tested on the system.
When this happens is anyone’s guess but I think the next version of OS X could be accompanied by some interesting new hardware. I expect the chip to arrive by summer 2005.
Now if someone were to put a Cell (co)processor beside it we’d have a different ball game…
References and Further reading
My prediction was that PowerPC would take the lead on the desktop in 2004. They have not done this save for some specific areas. However, the fastest PowerPC is not the 970FX, it’s actually the POWER5 which can execute PowerPC binaries…
IEEE also ran a detailed article on the POWER5 (pdf).
POWER5 and next generation Itanium
After POWER5 Intel will come back with the Itanium Montecito this article at Real World Tech
compares the two.
© Nicholas Blachford, October 2004
About the author
Nicholas Blachford lives in Paris. He is currently helping out on the Yoper Linux disto, learning French, Python and dreaming up a GUI for advanced consumer entertainment systems, but not necessarily all at the same time.
If you would like to see your thoughts or experiences with technology published, please consider writing an article for OSNews.