Apple and PC users alike criticised Motorola when the G4 CPUs failed to keep pace with the aggressive clock speed ramping of processors from AMD and Intel. The newly announced 8641D with dual cores, dual DDR-II memory controllers and more is set to change all that.
The latest 7447A G4 runs at up to 1.5GHz and can be found in Apple’s 15″ and 17″ PowerBooks. Freescale (previously Motorola Semiconductor) have concentrated on the embedded market where low power consumption rules rather than absolute maximum performance, thus the clock speed has not soared like the Pentium 4 but then neither has pipeline length or the power consumption. The highest performing Pentium 4s already use in excess of 100 Watts, the next generation G4 7448 is expected to require less than 10 Watts at 1.4 GHz. Of course when performance is needed there’s always the Altivec unit which was never been lacking, to encourage it’s use an increasing number of routines are being made available by Freescale for use by their customers.
The G4 however has been at a disadvantage as regards it’s memory bandwidth. While it’s bus has increased to 166MHz (200MHz on the 7448), it does not use the DDR or QDR signalling technology used in PCs thus limiting it’s memory bandwidth. This impacts the performance of some applications but in general usage this doesn’t seem to be much of a problem as applications are more often latency bound and PCs generally have no advantage over the G4 in this respect (apart from AMD64 systems).
All the CPU manufacturers have hit a wall though, clock speeds have rising rapidly but power consumption has been rising faster, the most visible sign of this was the cancellation of Intel’s Tejas processor and their new found focus on dual core processors.
Intel however have also introduced the low power Pentium-M processors which give the majority (if not more) of the performance of a Pentium 4 at a fraction of the power consumption and a considerably lower clock rate. The Pentium-M CPU has been praised for this achievement, a rather odd situation considering the criticism the G4 got; with the Pentium-M Intel is following the exact same strategy the G4 designers have been implementing all along – with much the same results.
It’s been known for some time that Freescale were working on new PowerPC processors. Now they’ve been announced and they are substantially different from the previous G4 designs, they still follow the same philosophy of low power consumption but this time doing so at the same time as increasing overall system performance.
Freescale could have increased performance by upping the clock speed but in doing so it would would have sent the power consumption rocketing to PC CPU levels, something unacceptable in their primary market. Instead, they took another path and added a second core, the move to a 90nm SOI (Silicon on Insulator) process cuts power consumption considerably so the second core doubles performance without doubling power consumption. A larger cache (1MB cache per core) and some architectural changes will boost performance further.
They not only added a second CPU core and increased the CPU performance but integrated dual 667MHz DDR-II memory controllers sending aggregate memory bandwidth up by a factor of 8 to 10 GBytes / Second in a single leap, of course placing the memory controllers on die also reduces memory access latency.
As with the Pentium-M the G4 delivers much of the power of a Pentium 4 even given it’s low clock frequency, consequently I fully expect a pair of these cores will outgun any current single core PC CPU though obviously not on single threaded operations. That alone would have brought Freescale back into the reckoning but they didn’t stop there, they have added a slew of on board peripherals not only reducing costs but increasing overall system performance whilst simultaneously reducing power consumption.
AMD’s Opteron moved a Hypertransport connection and a memory controller on board the CPU die, Transmeta’s Efficeon went further adding an AGP X4 controller into the mix. The 8641 family goes much further than either of these by also adding dual PCI Express interfaces, a RapidIO bus (similar idea to HyperTransport) and 4 Gigabit Ethernet MACs with hardware TCP/IP acceleration and other interfaces all onto the same die as the CPU cores.
Despite all the interfaces, controllers and dual CPU cores power consumption is currently estimated at 15 – 25 Watts in typical usage.
The 8641D and 8641 (single core version) are not expected to sample until 2H 2005 so they won’t be turning up in machines in the immediate future but they should prove very interesting when they do. The CPU performance will increase significantly due to the inbuilt memory controllers and clock speed boost (no speeds have been quoted yet other than “at least 1.5GHz” but 1.7GHz seems to be expected) but more than this the overall system performance should increase considerably. This is in contrast to PC designs which go all out to boost CPU performance and leave the rest of the system lagging behind. The new PCI Express will give additional bandwidth but the current 11 year old PCI bus will just about handle a single Gigabit Ethernet port, never mind four!
For years embedded version of the PowerPC CPUs have been available which include all manner of connectors and functionality but the highest performance PowerPC CPUs are typically stand alone components requiring north and south bridges for interfaces. It is an interesting development that Freescale have now chosen to introduce similar integrated functionality on their highest performance CPU. However, this device is not aimed primarily at PCs so it cannot be taken as a sign that PC CPU vendors are planning on going to this level of integration in the immediate future. While users measure systems by pure CPU benchmarks they’ll get systems which are designed to perform well in pure CPU benchmarks, even if you actually use the complete system.
Servers may not seem an obvious target for a G4 based CPU but these are the perfect target for a CPU which optimises system performance over raw CPU performance, servers after all, are doing many things at once so a 8641D could perform surprisingly well. High network throughput requires a lot of computing power, the TCP/IP hardware assist will help with this leaving the CPU to work on the actual content to be served. Low latency connections mean that data can be moved around and accessed for processing quickly.
The low power consumption is also a factor to be considered if a large number of servers are in use, high power consumption CPUs produce heat which has to be removed, usually by cooling systems which cost considerably more than the computer hardware. CPUs with low power consumption can be cooled easier and cheaper and having the north-bridge integrated on the CPU saves even more power.
It’s a pity something like this wasn’t around when BeOS was still about, it was designed for multiprocessor systems from the get go so would have ran like a dream on such a CPU, hopefully Haiku and Zeta will be able to pick up the benefits though. Of course there are many multiprocessor capable operating Systems available nowadays including of course, Linux and the BSDs so there’ll be no shortage of options for desktop or server users. There will also be, as you would expect, several embedded OSs to choose from.
One feature the 8641D has is the ability to not only run multiprocessor operating systems but also to run an OS per core, bringing new meaning to “dual boot” systems in the process! Having two operating systems running independently has some very interesting possibilities for OS or application development though modified drivers will be required for any operating systems which utilise this feature. This also gives developers the ability to create an uber stable OS environment. If all the I/O goes via a primary OS on core 1 a secondary OS could be set up to run completely independently on core 2 with only minimal communication between them, even if the first OS completely fails and needs to reboot, the second OS is never affected. You could conceivably leave a task running on one core and reboot multiple times into different systems without ever stopping the task. Useful if you want to boost your RC5 score.
Who’ll use it?
The 8641 / 8641D is expected to sample in the second half of next year so I don’t expect to see products turning up for at least another year.
Apple’s iBook would be an obvious target for this CPU as I think they’ll look at the G5 for the PowerBook, however Apple could then find themselves in a situation where the iBook outperforms the PowerBook. They could however use the single core version of the chip in the iBook leaving the option of either the dual core 8641D or a G5 in the PowerBook. Of course as usual Apple will probably use higher clocked “P” rated parts which aren’t generally advertised on the web site.
The 8641D and G5 will be strong in different areas so will not compete directly for the most part, the G5 is strong on floating point and like PC CPUs goes for high single threaded performance. The 8641D will give better overall system and multi-threaded performance with lower power consumption. The G5 is also 64 bit whereas the 8641 is 32 bit, Freescale do have a have a higher clocked 64 bit CPU in development but no details have been announced as yet.
Macs aside this CPU is mainly targeted at embedded applications so it’s presence for the most part will not be visible. It will turn up in some desktop systems though, for general consumer computing the existing G4 provides more than enough computing power so a pair of them at a higher speed with much faster memory should provide quite a potent system for that market.
There has been no official announcement but Genesi have said their future products will track Freescale CPU development and my information is they do indeed plan to use this processor at some stage in the future.
The 8641D may not get the single threaded performance of high end PC chips but it’s not designed for that, having said that this CPU will probably deliver the majority of that level of performance even if it doesn’t quite reach 100%. The clock rate is highly deceptive as CPUs spend much of their time sitting around waiting for data so raising the clock rate doesn’t necessarily translate to an equivalent level of higher performance. The Itanium II CPU (also at 1.5GHz) with 6.4GBytes / second memory bandwidth and a huge 6MB cache has been studied and even when running a highly optimised benchmark is doing absolutely nothing for 50% of the time – and that’s one of the world’s fastest CPUs (i.e. it’ll happily eat any x86).
On the desktop bragging rights matter, they are completely irrelevant in the most of the embedded world if you can’t perform in a low power budget. An extra 10-20% performance isn’t so as important as to be worth the additional effort and cost. A dual core PC CPU may be faster than this on raw computing speed but don’t expect to see them in many embedded systems any time soon.
The already high power consumption of single core PC processors means manufacturers will have to lower the clock of the CPU cores otherwise a pair will use far too much power and end up approaching 200 Watts, AMD are doing exactly this with the dual core Opteron, they will run 3 or 5 speed grades below the top single core Opteron. Freescale have a lot more headroom so don’t need to do this, they can run both cores at full speed and still come in at less than half the power consumption of other PC CPUs. This problem will also affect the G5 unless IBM can lower it’s power consumption, currently a dual 1.8GHz G5 should be possible at under 100 Watts (Maximum).
It’s clear to me in some respects that the embedded and desktop markets are converging in this CPU, Freescale have been shipping highly integrated CPUs for years now but they’ve never appeared anywhere near desktop systems. PC CPUs wont be able to race away from embedded CPUs as they have over the last few years, if anything I think the gap will tend to close as everyone starts using multiple cores.
The 8641D and other family members are targeted at a different market from PC CPUs so direct comparisons are not terribly meaningful unless you’re interested in using one as a desktop. PCs are upgraded at a rapid pace, the current top end is often a few notches down within a few months if not sooner, within a couple of years they’re not even being made any more. In the embedded world you expect these parts to be on sale and in use for many years. I bet there’s not many people still using 10 year old CPUs in their main system, would many of those systems even work now these days? The 8641D is expected to be around for a long time – it’s got a reliability rating of 10 years at 105¡C.
Freescale didn’t need to drop the speed of their cores to put two on a single die but if they did they could build some very potent processors. Dropping the cores to 1.4GHz should allow 4 cores to be used and still remain under 50 Watts. They could even build an 8 core device if they wanted and still stay under 100 Watts, it would outgun every PC CPU on the market several times over and still not use as much power as a P4 Prescott!
Perhaps the criticism levelled at the G4 should have been sent in another direction…
© Nicholas Blachford, October 2004
About the Author
Nicholas Blachford lives in Paris. He is currently helping out on the Yoper Linux disto, learning French, Python and dreaming up a GUI for advanced consumer entertainment systems, but not necessarily all at the same time.
If you would like to see your thoughts or experiences with technology published, please consider writing an article for OSNews.