Last year IBM introduced a cut-down POWER4 CPU called the PowerPC 970 and Apple promptly put it into their PowerMac line. IBM are not standing still; the POWER5 is out and rumours have long hinted at a successor to the 970 being in development. What should we expect?
I predicted [1] that PowerPC would get the lead over x86 CPUs in 2004. With the non appearance of the 3GHz 970 however this has not been the case, though that’s not to say it’s losing either [2]. However, anyone can select benchmarks which show any processor running ahead of others, the real story is a lot more complex.
According to SPEC benchmarks the Itanium 2 is an absolute killer, outperforming the Opteron quite considerably especially on floating point [3]. There are some interesting scientific benchmarks [4] from last year which show just how erratic benchmark results can be: in some cases the Itanium 2 is shows a clear lead, in others it’s the Opteron clearly in front.
Unfortunately, the 970 isn’t listed on SPEC but it generally seems to do comparatively badly in unofficial SPEC tests. The 970 is listed in the scientific benchmarks but again the story is less than clear. That
said, where gcc 3.3 is used the 970 does badly, while where the IBM compiler is used it does well. Compilers can have a big effect on performance so the figures for all the CPUs could have changed considerably by now.
If there is an overall performance leader in the desktop market right now I’d put my money on AMD’s Opteron, though the top 3 desktop processors (Pentium 4, AMD 64, G5) all have relative strengths and weaknesses in different areas, so none of them can be said to have a commanding lead.
The moral of the story is that no one test or set of tests tells the whole story; it’s best to run your own tests based on what you intend to run or at least look at tests specific to your application field. Independent tests are best of all, vendor-supplied benchmarks should be taken with a pinch of salt – it’s not called “benchmarketing” for nothing! That said, all the top CPUs are fast, but not in the same area.
But could this be about to change? The 970 was based on IBM’s high-end POWER4 CPU. POWER5 has since been released and rumours have long hinted at a 9×0 CPU based on it.
POWER5
With new enhanced cores and improved memory and cache systems, IBM’s recently released POWER5 has shown itself to be an absolutely stellar performer in a number of different areas; in some cases nearly doubling its performance over the previous POWER4+ model [5].
The POWER5 is better than the POWER4 in a number of ways. Improved cores now offer simultaneous multi threading (SMT) and in order to keep the threads moving the pool of rename registers has been increased.
The POWER5 also has a slightly larger on-die cache which has a better design than before, the same goes for the 36MB external L3 cache. In addition to the improvements in the design of the caches, the L3 cache has been moved so it is now connected behind the L2 cache, reducing its latency considerably. The memory controller has also now been integrated on-die and its bandwidth increased to nearly 20GBytes per second.
A lot of CPUs spend an awful lot of time sitting around waiting for data. A lot of design effort in the POWER5 seems to have been concentrated on getting data to the CPU cores faster and this has paid off as shown by the improved benchmarks.
In addition to these improvements, internal changes mean the cores run at a higher frequency than the previous generation.
All in all, the POWER5 is a very impressive processor, especially given that it is implemented in a previous-generation (130nm) silicon process and IBM use a deliberately conservative design process.
If rumours are to believed, IBM are working on a desktop version of this CPU. The question is can IBM translate these sorts of performance gains to the desktop? We can’t expect performance like POWER5 but what can we expect?
Speculative Exaltations
There is pretty much zero public data available on the project other than it will almost certainly be dual core. However there are general industry trends and other announcements which may shed light on what this processor might look like.
The following is thus speculation…
The POWER5 and Opteron both have on-die memory controllers, a feature which is, in my opinion, a good bet to be included, especially given Intel and Freescale are already planning to add memory controllers to their CPUs. An on-die memory controller alone will have a considerable performance impact as it will reduce memory latency (the delay between when data is asked for and when it arrives).
HyperTransport is another possibility, IBM are a member of the HyperTransport consortium [6] and already use the technology on the existing 970’s northbridge. This would be an important cost saving measure as it would save IBM from needing to spend millions on developing another northbridge. Using HyperTransport as the link technology means standard PC parts for the Athlon 64 can be used in 9×0 systems.
This would also save other PowerPC manufacturers the pain of sourcing PowerPC specific northbridges which can be a royal pain as the only available parts are generally for the embedded sector and these are often not quite what a desktop manufacturer wants.
The CPU will most likely be made using a 90nm process in a less conservative manner than the POWER5 so it will run faster and cooler. Quite how fast it will be able to run is open to question and I think it’s fairly likely IBM will go the same route as AMD in their dual core plans and not clock the processor as high as possible to keep power consumption within reasonable limits. This will not be easy as POWER5 consumes 160 Watts at 1.8GHz (power consumption isn’t much of an issue at the high end).
One possibility is to use the same technique the POWER5 already uses which is to constantly adjust the clock frequency to keep heat output down. This technique is becoming popular with Transmeta and Intel doing or planning to do the same.
Another possibility would be to use a technique Intel plan to use for the next Itanium “Montecito,” which includes two peltiers in the heat sink. Peltiers actually consume quite a bit of power themselves but reducing the CPU temperature reduces transistor leakage, this lowers the power consumed by the CPU itself allowing boosts in clock frequency which might not otherwise be possible.
Montecito is expected to consume 100 Watts but its heat sink requires a further 75 watts. The end effect is overall power consumption does not change (it may even go up) as part if moved to the heat sink but the CPU itself does not get so hot when working. AMD have filed a patent on an on-chip peltier so they’re evidently considering similar technology.
I don’t know if the 9×0 will be so hot as to require such aggressive cooling but things are heading that way. “Power density” is becoming a problem and will seemingly only get worse in the future. Power density is the heat generated in a specific area; as CPUs get ever smaller the heat is generated in a smaller area and thus the unit becomes progressively more difficult to cool. The 970FX used in Apple’s PowerMacs actually uses less power than the previous 970 but liquid cooling was added because of the higher power density.
Vector Move?
One long-rumoured feature of the 9×0 is the addition of new vector instructions (read Altivec 2). Altivec is the most powerful feature of the PowerPC line. The G4 is pretty modestly clocked, by x86 standards, to keep power consumption down, but they more than make up for it when Altivec is activated. The original architecture was designed by Keith Diefendorff at Apple and word has it he has returned to the company so it’s possible a new version is in the works. Whether it will make it into a G6 is open to question but early information on the POWER6 seems to indicate that processor will include vector processing capabilities.
What an enhanced Altivec would do is another question. The architecture could be extended to support 64 bit floating point operations. Another possibility would be to double the width doubling throughput; additional registers would also increase performance in some areas. However these are just guesses, the reality could be very different.
Performance
It’s almost certain that a POWER5-derived CPU is in development. It’s just a question of when it appears and what its features will be. The important thing is how its performance turns out.
The POWER5 increased in performance over the previous POWER4+ because of a series of enhancements. A dual core PowerPC derivative will have several of the same enhancements even if they are not to quite the same degree.
An on-die memory controller will reduce latency and simultaneously increase bandwidth. Memory bandwidth is linked to latency as memory has to be gathered in chunks and the less you have to wait per chunk the more you can ask for. That’s a simplification, but the result can be seen on the Opteron which gets closer to its theoretical bandwidth than the G5.
Given CPUs spend most of their time waiting for data any system which increases the availability of that data is going to increase performance. The on-die memory controller of the Opteron appears to be a major reason for its strong overall performance.
If an on-die memory controller is included on a 9×0 I think it’s safe to assume it too will get a significant performance boost. SMT, larger caches and other core enhancements will also be beneficial and of course a second core will double the potential computing power.
The POWER5 was designed so that any optimisations made for POWER4 will also apply to it. This is a good strategy as it seems to take years to optimise compilers to a specific CPU and means all that work is preserved. This will almost certainly also be the case for the 9×0 so compiler technology will continue to improve.
Conclusion
If the 9×0 is like my speculations, it looks on paper like it could give Intel and even AMD something to worry about. However, as always, we will not know what it is really like until it arrives and applications can be tested on the system.
When this happens is anyone’s guess but I think the next version of OS X could be accompanied by some interesting new hardware. I expect the chip to arrive by summer 2005.
Now if someone were to put a Cell (co)processor beside it we’d have a different ball game…
—
References and Further reading
1: [Prediction]
My prediction was that PowerPC would take the lead on the desktop in 2004. They have not done this save for some specific areas. However, the fastest PowerPC is not the 970FX, it’s actually the POWER5 which can execute PowerPC binaries…
[2: Barefeats benchmarks]
These tests put the G5 against Opteron and Xeon systems.
[3: SPEC benchmarks]
The Athlon and Itanium 2 systems mentioned below can both be seen here.
[4: Sci benchmarks]
Itanium, Opteron and G5 bechmarks (pdf).
[5: POWER5 benchmarkss]
Aceshardware lists a number of POWER5 benchmarks in different areas.
[POWER5 Interview]
Arstechnica recently interviewed one of the POWER5’s designers.
[HT]
HyperTransport consortium members.
IEEE
IEEE also ran a detailed article on the POWER5 (pdf).
POWER5 and next generation Itanium
After POWER5 Intel will come back with the Itanium Montecito this article at Real World Tech
compares the two.
© Nicholas Blachford, October 2004
About the author
Nicholas Blachford lives in Paris. He is currently helping out on the Yoper Linux disto, learning French, Python and dreaming up a GUI for advanced consumer entertainment systems, but not necessarily all at the same time.
If you would like to see your thoughts or experiences with technology published, please consider writing an article for OSNews.
Of course theres “more power” on the way, it is inevitable.
However, it seems we have reached a maximum point in our current processor technology. We can’t make them any faster without obscure cooling methods, or sucking more power. The G5 seems stuck at 2.5 ghz, and the X86 old and tired. Even video cards are experiancing this.
The concept of dual cores certianly is interesting. It seems to be a good temporary workaround for our current “block” in processor advancement.
Pretty soon, however, we will have Dual Dual PowerMacs with Dual Dual DVI. Whats this saying about computer technology? Will we go through a dead age of technology workarounds? I think people will soon begin to realize this…
If IBM still use “conservative” methods to create their processors, don’t that mean that they still have some way to go before they meet the wall?
As I understand it there is nothing really stopping IBM from shipping a 3 or 4 ghz G5 processor other than IBM cares more about heat, and power consumption in the Power lines, than Intel does in the x86 P4 line. Hence why most P4’s draw ~30% more power and produce more heat.
It’s all about what your priorities are.
> The concept of dual cores certianly is interesting. It seems > to be a good temporary workaround for our current “block” in > processor advancement.
The concept is hardly novel or new… The trick will be seeing if the PC software industry will actually accept having to design for parallelism. For example, with the enormous demands of the PC gaming industry nowadays, it’s all studios can do to turn out a complete working game in 6 months. Adding the challenge of parallel design certainly doesn’t make their lives any easier. Perhaps this will become less a burden as high-end arrangements like NUMA are brought to the desktop (IMHO, shared memory raises more design concerns than separate memory pools and message passing… Then again I try to avoid programming as much as possible)
-uberpenguin
> It’s all about what your priorities are.
Right, and the current POWER architecture does what it is designed to do very well. POWER is designed for bandwidth and parallelism, not necessarily the fastest integer speed in one of those awful `benchmarks.’
-uberpenguin
im sure i read an interview about a week ago saying that the desktop version wouldnt be dual core, and have no l3 cache…
theres probably no doubt there will be no l3 but as to only single core.. hmm.. depends how tightly coupled the cores in power5 are… but ibm designs in software so they should be able to drop a core in/out and let the software calculate it….
I think this is the article you’re talking about
http://www.thinksecret.com/news/0411ppc.html
Yes, no l3 cache for the 970gx
Perhaps we’ll move towards the ‘cell processor’ made of littler units each self controlled and perhaps with their own timers, so for instance complicate mathematics engines can run slower than ALUs, and there would be an on-chip bus between cells!
This si getting closer and closer but seems a few years of mainstream if at all.
Each cell can scale to save power when doing different tasks. No longer a monolithic single-clocked chip.
That might happen, and will open new markets, and new look at the things. Docking one machine to another to get better speed, etc. But it would require unlearning some practices, and learning new ones.
I myself with my C/C++ only (mostly) knowledge, feel that the language is too much tight with the classical single-cpu architecture. The programming language itself should contain (not through libraries) ways to communicate with other execution units. How? I dunno. I’ve read about OCaml, Lisp, Ada, and other prominent languages that delve into that, but didn’t get any definite answer (might be there is no such an answer, except… 42 )
As a game developer we are facing really hard situation with the new consoles being multi-cpu based, and some of them not symmetrical (rumours for now). Some are saying OpenMP.org would be good for tyding up your C/C++ to support that, but don’t know to what extent.
Schools should be starting teaching more of that, not the classical Turing machine where only one execution unit is present. Might be actually not a bad idea, if programmers are thought as managers, as their job would start looking like manager’s one – although easier (as cpu’s are not slackers usually ) – but still you have to find way to schedule them, to put correct dependancies, to estimate, etc.
just my ramblins about the matter. What do I know – I now preffer coding for a mobile platform under mophun or something like it It’s just easier to create fun (i mean games) this way. When creating games becomes pain in the but, then games might be pain in the ass to play (well that’s a fallacy I know).
That’s an interesting point you’ve brought up. A lot of research was done into concurrent computation in the 1980’s, when single-CPU machines started to hit a performance plateau, just like they are doing now.
See: APL http://www.vector.org.uk/?area=apl&fetch=v203/nap203.htm
Connection Machine Lisp http://fresh.homeunix.net/~luke/misc/ConnectionMachineLisp.pdf
Transputer Lisp (you’ll need am ACM account) http://portal.acm.org/citation.cfm?id=131225
MultiLisp http://portal.acm.org/citation.cfm?id=4478
Erlang http://ll2.ai.mit.edu/talks/armstrong.pdf
The key idea behind all these languages is that concurrency is exposed as a primitive concept (just like an integer or an addition operation), which makes it easier for the programmer to manage the inherent complexity of concurrent programs. The primitives vary by language, ranging from ultra light-weight threads in Erlang, to expressions that represent “values to be computed in the future” in Multilisp.
>Of course theres “more power” on the way, it is inevitable.
>
>However, it seems we have reached a maximum point in our >current processor technology. We can’t make them any faster >without obscure cooling methods, or sucking more power. The >G5 seems stuck at 2.5 ghz, and the X86 old and tired. Even >video cards are experiancing this.
>
>The concept of dual cores certianly is interesting. It >seems to be a good temporary workaround for our current >”block” in processor advancement.
>
>Pretty soon, however, we will have Dual Dual PowerMacs with >Dual Dual DVI. Whats this saying about computer technology? >Will we go through a dead age of technology workarounds? I >think people will soon begin to realize this…
I think maybe 2.5 GHz it’s enough for most 2004 desktop PC users. I have a 2.8 P4, and don’t need any more. I was fine with a 800Mhz celeron. I don’t think there is so much software today, that uses that much CPU power, and no one is in such a hurry to see the milisecond difference between one processor or the other, in desktop tasks.
Maybe the new “java looking glass” desktop or MS longhorn will demand this kind of computers.
“I have a 2.8 P4, and don’t need any more. I was fine with a 800Mhz celeron. I don’t think there is so much software today, that uses that much CPU power, and no one is in such a hurry to see the milisecond difference between one processor or the other, in desktop tasks.”
You’re right on that, I have a AMD 2800+ and it runs After Effects just fine, but the more powere is for those specific wares, like Apple Motion(real time HD)and even After Effects benifits from a beafed up G5. Oh don’t i wish for a g5………santa!!
They are rumours about a new version of the 970, not a POWER5 based CPU.
970GX sounds like an enhanced G5 (970FX) with a bit more cache.
The 970MP “Antares” appears to be a dual core G5, I am a bit puzzled about this as I would think they would go direct to a dual core POWER5 based CPU (i.e. a G6 which the article is about), I don’t see the point of a dual core G5 as a G6 will almost certianly be a lot more powerful. That said some rumours I’ve seen have said the Antares is POWER5 based …I guess we’ll see.
POWER5 doesn’t constantly adjust it’s clock frequency. It does have clock gating, which is a different thing.
The U3 northbridge is from Apple, not IBM.
The JS20 already uses an AMD southbridge thanks to HyperTransport.
The idea that there are PowerPC desktop vendors besides Apple is amusing. (Sorry, Genesi is not credible.)
If it was possible to clock the 970 up to 3 or 4 GHz, Apple would have done it already.
>> though the top 3 desktop processors (Pentium 4, AMD 64, G5) all have relative strengths and weaknesses in different areas, so none of them can be said to have a commanding lead.
last i checked, Intel has 80%+ marketshare, and 80% is a very conservative lowball estiamte.
seems “commanding” to me
>> Pretty soon, however, we will have Dual Dual PowerMacs with Dual Dual DVI. Whats this saying about computer technology? Will we go through a dead age of technology workarounds? I think people will soon begin to realize this…
No, there is a huge economy set up around our current method of building systems and people want to squeeze very penny from it before being forced to truly innovate. It will happen one day but not quite yet, the demand is not quite there across the board.
Look to the Cell processor or other videogame tech to point in new directions. You aren’t going to much more than more Ghz or more heat or more cores from any of the workhorse PC-style systems.
One would expect a 325 million transistor Itanium 2 and its relatively expensive price entry would beat Opteron in floating point. Opteron wouldnāt be reaching ~210 million transistor count sometime in 2005.
>If the 9×0 is like my speculations, it looks on paper >like it could give Intel and even AMD something to worry >about(SNIP)
For Intel, “Only the paranoid survives”. For AMD, refer quad-core AMD64s.
how many of those 325 million trannies are cache? itanium is all about the cache.
as IBM like to merge there processorlines for mini/medium Systems with their processors for the mainframes, the Power6 would become these product of merging it together.
the next point is, that the semiconductor processor branch of IBM must make profits.Today, its loosing money.
The deadline is end of 2006, when IBM got one Processorline for their midsize and mainframe boxes.
Thats the point of truth. These branch must going to make profit at this time. if not, well, do it like motorola.
You think IBM would never do that ? Well, they sold their Masstorage/Disk fabs to Hitachi. Wouldn’t you expect that ?
At 2006 IBM must reached a noticable gain of marketshare with teir power-boxes within the linux market to get the unit costs down.
As the direction from ibm is clear to see, power to the servers, their is no plan for desktop ppc from ibm.
but, a processor doesnt realize if it runs on a desktop. why not…
Cheers Frank
>itanium is all about the cache.
Cache *and* FPU!
I think that the FPU is the only part (beside vector unit) of a CPU where throwing transistors can really help now, if the code is FP-intensive of course.
G6??? lets just get the G5 running cool enough to be used in a notebook, if you can’t even get the G5 in a notebook IBM,APPLE are going to be SoL when they are stuck using G4s forever
Sorry, but I get real erked when people missuse plural/singular verbs on a regular basis in an article. IBM is a single entity therefore, IBM never ARE anything. Instead, IBM IS always something i.e. IBM’s name may seem plural (International Business Machines) but they are a single unique entity and as such the name is singular. So please no more of this “IBM *ARE* not standing still” crap. It’s “IBM *IS* not standing still”.
> Sorry, but I get real erked when people missuse […]
And I get irked when people misspell “irked” erked
-uberpenguin
“APPLE are going to be SoL when they are stuck using G4s forever”
Well IBM has had it’s share of problems……….wonder why the G5 Imac didn’t come out in time for back to school and edu sales? Well it was IBM, but that’s part of being in this industry. Things don’t always come out as easy as they did on paper or as they did in prototype.
The idea that a new dual core chip is coming from IBM is old news, since they are providing the processors for both the XBox 2 and the Playstation 3.
http://www.joystiq.com/entry/1884567466834323/
The processor is listed as a PowerPC 976.
http://news.teamxbox.com/xbox/5388/Xbox-2-Specs-Leaked-Update-
So IBM is playing all sides and they will certainly have a new series of chips. The 976 is supposed to be 65 nm process according to the rumors.
I don’t know if these are true, but someone thinks IBM wants to be in the consumer processor space, and I think it has become pretty widely accepted that the PowerPC 9XX series will power the new consoles. Frankly, MS and Sony will sell many more chips than Apple ever will, and it will make it hard to justify the new Mac G6 when you can get the Xbox 2 for $200.00 and it has the same chip inside.
This is a marketing problem for Apple. The new chips will be very cheap to buy if the consoles are behind them, so I expect that Apple will be forced to wait until IBM makes something really special for them, something different. Which, if IBM is busy making consoles for MS and Sony, could take quite some time.
This will be interesting to watch.
IBM is a single entity therefore, IBM never ARE anything
Under American-English, yes.
But under British English “IBM are” is perfectly acceptable as IBM can also be seen as a collection of individuals.
—
I think it has become pretty widely accepted that the PowerPC 9XX series will power the new consoles.
In rumours yes, in reality I can’t see them putting a high power CPU into any console. I’m expecting something more like the customised 440s in the BlueGene.
While not correct, using plurals if you refer to those who make up the company is not uncommon. As I’ve understood it it’s primarily an UK thing.
See here:
http://www.spec.org/cpu2000/results/res2004q4/
They’re under “BladeCenter JS20”.
970FX 2.2 GHz: int-986, FP-1178, intrate-20.2, FPrate-19.2
People have already corrected you on this, but I feel the desire to rub some more salt into the wounds. In British English, collectives are considered to be plural. Just as we’d say “the neighbors are”, they say the “government are” or “the company are”.
Good article about this: http://www.yaelf.com/aueFAQ/mifcompnyvscompnyr.shtml
That is really bad logic on the Brits part, I must say.
What if the company is made up of just one single individual?
The name of a company is no different than the name of a single person. It is one entity, and as such, should not have are following it.
the land of the free and the home of the brave (as the slogan goes) is correct UK english?
š sounds pretty bad.
>The idea that a new dual core chip is coming from IBM is
>old news, since they are providing the processors for
>both the XBox 2 and the Playstation 3.
Refer http://www.theinquirer.org/?article=19615
Note the recent news on XBOX PC ability run normal PC apps.
>how many of those 325 million trannies are cache? itanium
>is all about the cache.
Such issues should be irrelevant since the cache is an integral part of the solution.