This article started life when I was asked to write a comparison of x86 and PowerPC CPUs for work. We produce PowerPC based systems and are often asked why we use PowerPC CPUs instead of x86 so a comparison is rather useful. While I have had an interest in CPUs for quite some time but I have never explored this issue in any detail so writing the document proved an interesting exercise. I thought my conclusions would be of interest to OSNews readers so I’ve done more research and written this new, rather more detailed article. This article is concerned with the technical differences between the families not the market differences.
History and Architectural Differences
The x86 family of CPUs began life in 1978 as the 8086, an extension to the 8 bit 8080 CPU.
It was a 16bit CISC (Complex instruction Set Computing) processor.
In the following year the 8088 was introduced which was used in the original IBM PC. It is this computer which lead to todays PCs which are still compatible with the 8086 instruction set from 1978.
The PowerPC family began life with the PowerPC 601 in 1993, the result of a collaboration started in 1991 between Apple, IBM and Motorola.
The family was designed to be a low cost RISC (Reduced Instruction Set Computing) CPU, it was based on the existing IBM POWER CPU used in the RS/6000 workstations so it would have an existing software base.
RISC Vs CISC
When Microprocessors such as x86 were first developed during the 1970s memories were very low capacity and highly expensive. Consequently keeping the size of software down was important and the instruction sets in CPUs at the time reflected this.
The x86 instruction set is highly complex with many instructions and addressing modes. Additionally it also shows it’s age by the small number and complex nature of registers (internal stores) available to the programmer. The x86 only has 8 registers and some of these are special purpose, PowerPC has 32 general purpose registers.
RISC was originally developed at IBM by John Cocke in 1974 . Commercial RISC microprocessors appeared in the mid 80s first in workstations later moving to the desktop in the Acorn Archimedes.
These use a simplified instruction set which allow the CPUs to be simpler and thus faster. They also included a number of architectural improvements such as pipelining, super scalar execution and out of order execution which enabled the CPUs to perform significantly better than any CISC CPUs.
CISC CPUs such as the 68040 and the Intel 80486 onwards picked up and used many of these architectural improvements.
In the mid 1990s a company called NextGen produced an x86 CPU which used a translator to convert x86 instructions to run within a RISC core. Pretty much all x86 CPUs have since used this Technique. Even some RISC CPUs such as the POWER4 / PowerPC 970 use this technique for some instructions.
The high level internal architecture of the vast majority of modern desktop CPUs is now glaringly similar be they RISC or CISC.
Current State Of x86 And PowerPC CPUs
The current desktop PowerPC and x86 CPUs are the following:
AMD Athlon XP
Intel Pentium 4
IBM 750xx (G3)
Motorola 74xx (G4)
IBM 970 (G5)
The current G4 CPUs run at significantly lower speeds compared with the x86 CPUs which are now above 2GHz (P4 > 3GHz). The recently announced PowerPC 970 currently runs up to 2GHz and delivers performance in line with the x86 CPUs.
CPUs break down all operations into stages and these are performed in a pipeline, these stages can be big or small and the number of stages depends on what’s done in each stage, the more an individual stage does the less stages you need to complete the operation. However if the stages are simple you will need more of them but each stage can complete quicker. The clock speed of the CPU is limited by the time an individual stage needs to complete. A CPU with simpler but greater number of stages will operate at a higher frequency.
Both the Athlon and Pentium 4 use longer pipelines (long and thin) with simple stages whereas the PowerPC G4s use shorter pipelines with more complex stages (short and fat). This is the essence of the so called “megahertz myth”. A CPU with a very high clock speed may not be any faster than a CPU with a lower clock speed. The Pentium 4 is now at 3.2 GHz yet a 1.25 GHz Alpha can easily outgun it on floating point operations.
The longer pipelines allow the x86 CPUs to attain these very high frequencies whereas the PowerPCs G4s are somewhat restricted because they use a smaller number of pipeline stages and this limits the clock frequency.
The amount of voltage the CPU can use restricts the power available and this effects the speed the clock can run at, x86 CPUs use relatively high voltages to allow higher clock rates, to boost clock speeds further, power hungry high speed transistors are used. A long thin pipeline is very fast but also very inefficient power wise. All these things add up so a 3GHz CPU may be fast but are also very power hungry with maximum power consumption rates now approaching or even exceeding 100 Watts. Intel in fact have taken to using a much lower frequency part for laptop computers than the top end Pentium 4. Yet, despite the fact it is only 1.6GHz, the Pentium M performs just as well as the 2.2GHz Pentium 4.
The Law Of Diminishing Returns (Aka Amdahl’s Law)
The Law of diminishing returns is not exactly a new phenomenon, it was originally noticed in parallel computers by IBM engineer Gene Amdahl, one of creators of the IBM System 360 Architecture. The original describes the problem in parallel computing terms however this simplified version pretty much describes the problem in terms of any modern computer system:
“Each component of a computer system contributes delay to the system
If you make a single component of the system infinitely fast…
…system throughput will still exhibit the combined delays of the other components.” 
As the clock speeds goes upwards the actual performance of the CPU does not scale exactly with the clock speed. A 2GHz CPU is unlikely to be twice the speed of a 1GHz CPU, indeed on everyday tasks people seem to have some difficulty telling the difference between these speeds.
The reason for the lack of scaling is the fact that memory performance has not scaled with the CPU so the CPU is sitting doing nothing for much of it’s time (HP estimate this at 70% for server CPUs). Additionally the latency of memory has barely improved at all so any program which requires the CPU to access memory a lot will be effected badly by memory latency and the CPU will not reach anything near it’s true potential. The CPU memory cache can alleviate this sort of problem to a degree but it’s effectiveness depends very much on the type of cache and software algorithm used.
Many of the techniques used within x86 CPUs may only boost performance by a small amount but they are used because of the need for AMD and Intel to outdo one another. As the clock speed increases ever higher the scaling problem increases further meaning that the additional effort has less and less effect on overall performance. Recent SPEC marks for two Dell workstations show that a greater than 50% increase in CPU speed and the addition of hyper-threading results in only a 26% increase in SPEC marks . Yet when the Itanium 2 CPU got an 11% clock speed boost and double the cache the SPEC mark increased by around 50%
Of course there are other factors which effect the performance of CPUs such as the cache size and design, the memory interface, compiler & settings, the language it’s programmed in and the programmer who wrote it. Changing the language can in fact be shown to have a much greater effect than changing the CPU . Changing the programmer can also have a very large effect .
Performance Differences Between The PowerPC And x86
Since AMD began competing effectively with Intel in the late 1990s both Intel and AMD have been aggressively developing new faster x86 CPUs. This has lead them to becoming competitive with and sometimes even exceeding the performance of RISC CPUs (If you believe the benchmarks, see below). However RISC vendors are now becoming aware of this threat and are responding by making faster CPUs. Ironically however if you were to make all CPUs at the same geometry the Alpha 21364 is the fastest CPU going – yet it uses a 7 year old core design.
PowerPCs although initially designed as desktop processors are primarily used in embedded applications where power usage concerns outweigh raw processing power. Additionally, current G4 CPUs use a relatively slow single data rate bus system which cannot match the faster double or quad data rate busses found on x86 CPUs.
The current (non G5) PowerPC CPUs do not match up to the level of the top x86 CPUs however due to the effects of the law of diminishing returns they are not massively behind in terms of CPU power. The x86 CPUs are faster but not by as much as you might expect . (Again, see below section on benchmarks).
Vector Processing Differences
Vector processing is also known as SIMD (Single Instruction Multiple Data) and it is used in some types of processing. When used it speeds up operations many times over the normal processing core.
Both x86 and PowerPC have added extensions to support Vector instructions. x86 started with MMX, MMX2 then SSE and SSE2. These have 8 128 bit registers but operations cannot generally be executed at the same time as floating point instructions. However the x86 floating point unit is notoriously weak and SSE is now used for floating point operations. Intel has also invested in compiler technology which automatically uses the SSE2 unit even if the programmer hasn’t specified it boosting performance.
The PowerPC gained vector processing in one go when Apple, IBM and Motorola revised the powerPC instruction set and added the Altivec unit which has 32 128 bit registers. This was added in the G4 CPUs but not to the G3s but these are now expected to get Altivec in a later revision. Altivec is also present in the 970.
Currently the bus interface of the G4 slows down Altivec as it is very demanding of memory. However the Altivec has more registers than SSE so it can operate without going to memory too much which boosts performance over SSE. The Altivec unit can also operate independently from and simultaneously to the floating point unit.
Power Consumption Differences
One very big difference between PowerPC and x86 is in the area of power consumption. Because PowerPCs are designed for and used in the embedded sector their power consumption is deliberately low. The x86 CPUs on the other hand have very high power consumption due to the old, inefficient architecture as well as all the techniques used to raise the performance and clock speed. The difference in power consumption is greater than 10X for a 1GHz G4 (7447) compared with the 3GHz Pentium 4. The maximum rating for a G4 is less than 10 Watts whereas Intel do not appear to give out figures for power consumption rather referring to a “thermal design rating” which is around 30 Watts lower than the maximum figure. The Figure given for the design rating of a P4 3GHz is 81.9 Watts so the maximum is closer to and may even exceed 100 Watts.
A single 3GHz Pentium 4 CPU alone consumes more than 4 times power than a Pegasos PowerPC motherboard including a 1GHz G4.
Low Power x86s
There are a number of low power x86 designs from Intel, AMD, VIA and Transmeta.
It seems however that cutting power consumption in the x86 also means cutting performance – sometimes drastically. Intel still sell low power Pentiium III CPUs right down at 650MHz. The Pentium 4 M can reduce it’s power consumption but only by scaling down it’s clock frequency.
Transmeta use a completely different architecture and “code morphing” software to translate the x86 instructions but their CPUs have never exactly broken speed records.
VIA have managed to get power usage down even at 1GHz levels but they too use a different architecture. The VIA C3 series is a very simple CPU based on an architecture which forgoes the advanced features like instruction re-ordering and multiple execution units. The nearest equivalent is the 486 launched way back in 1989. This simplified approach produces something of a compromise however, at 800MHz it still requires a fan and even at 1GHz the performance is abysmal – a 1.3GHz Celeron completely destroys it in multiple benchmarks .
Why The Difference?
PowerPCs seem to have no difficulty reaching 1GHz without compromising their performance or generating much heat – how?
CISC and RISC CPUs may use the same techniques and look the same at a high level but at a lower level things are very different. RISC CPUs are a great deal more efficient.
No need to convert CISC -> RISC ISA
x86 CPUs are still compatible with the large complex x86 Instruction set which started with the 8080 and has been growing ever since. In a modern x86 CPU this has to be decoded into simpler instructions which can be executed faster. The POWER4 and PPC 970 also do this with some instructions but this is a relatively simple process compared with the multi-length instructions or the complex addressing modes found in the x86 instruction set.
Decoding the x86 instruction set is not going to be a simple operation, especially if you want to do it fast.
How for instance does a CPU know where the next instruction is if the instructions are different lengths? It could be found by decoding the first instruction and getting it’s length but this takes time and imposes a performance bottleneck. It could of course be done in parallel, guess where the instructions might be and get all possibilities, once the first is decoded you pick the right one and drop the incorrect ones. This of course takes up silicon and consumes power. RISC CPUs on the other hand do not have multi-length instructions so instruction decoding is vastly simpler.
Related to the above is addressing modes, an x86 has to figure out what addressing mode is used so it can figure out what the instruction is. A similar parallel process like the above could be used. RISC CPUs on the other hand again have a much simpler job as they usually only have one or two addressing modes at most.
To RISC Or Not To RISC
Once you have the instructions in simpler “RISC like” format they should run just as fast – or should they?
Remember that the x86 only has 8 registers, this makes life complicated for the execution core in an x86 CPU. x86 execution cores use the same techniques as RISC CPUs but the limited number of registers will prove problematic. Consider an loop which uses 10 variables in an iteration. An x86 will need hardware assist just to perform a single iteration.
Now consider a RISC CPU which generally have in the order of 32 registers. It can work across multiple iterations simultaneously, the compiler can handle this without any hardware assist.
The Hardware assist in question is Out-Of-Order execution and the tools of this trade are called rename registers. Essentially the hardware fools the executing program into thinking there are more registers than there really are and in the example this will allow for instance an iteration to be completed without the CPU needing to go the cache for data, the data needed will be in a rename register.
OOO execution is mainly used to increase the performance of a CPU by executing multiple instructions simultaneously. If so the instructions per cycle increases and the CPU gets it’s work done faster.
However when the x86 includes this kind of hardware the 8 registers becomes a problem. In order to perform OOO execution, program flow has to be tracked ahead to find instructions which can be executed differently from their normal order without messing up the logic of the program. In x86 this means the 8 registers may need to be renamed many times and this requires complex tracking logic.
RISC wins out here again because of it’s larger number of registers. Less renaming will be necessary because of the larger number of registers so less hardware is required to do register usage tracking. The Pentium 4 has 128 rename registers, the 970 has less than half at 48 and the G4 has just 16.
Because of the sheer complexity of the x86 ISA and it’s limited number of architectural registers a RISC processor requires less hardware to do the same work.
Despite not using the highly aggressive methodologies used in the x86 CPUs, IBM have managed to match and even exceed the computing power of x86 CPUs with the PowerPC 970 – at lower power consumption. They were able to do this because of the efficiency of RISC and the inefficiency of x86 CPUs. IBM have already managed to get this processor to run at 2.5GHz and this should perform better than any x86 (with the possible exception of the Opteron).
The idea that x86 have RISC-like cores is a myth. They use the same techniques but the cores of x86 CPUs require a great deal more hardware to deal with the complexities of the original instruction set and architecture.
PowerPC And x86 Get More Bits
Both families are in the process of transitioning to 64 bit.
Athlon 64 (due September)
The AMD Opteron adds 64 bit addressing and 64 bit registers to the x86 line. There is already some support for this CPU in linux and the BSDs, a 64 bit version of Windows is also due.
The Opteron is designed as a server CPU and as such both the CPU and motherboards cost more than for normal desktop x86 CPUs. The Athlon 64 can be expected to arrive at rather lower prices.
Despite performing better than the best existing 32 bit Athlon, the Opteron has a slower clock speed (1.8GHz Vs 2.2GHz).
AMDs x86-64 instruction set extensions give the architecture additional registers and an additional addressing mode but at the same time remove some of the older modes and instructions. This should simplify things a bit and increase performance but the compatibility with the x86 instruction set will still hold back it’s potential performance.
The PowerPC 970 is as predicted on OSNews  is a 64 bit PowerPC CPU based on the IBM POWER 4 design but with a smaller cache and the addition of the Altivec unit as found in the G4. It supports 32 bit software with little or no changes although some changes to the original 64bit PowerPC architecture have been made in the form of a “64 bit bridge” to ease the porting of 32 bit Operating Systems . This bridge shall be removed in subsequent processors.
The hardware architecture of the 970 is similar to that of any advanced CPU however it does not have the aggressive hardware design of the x86 chips. IBM use automated design tools to do layout whereas Intel does it by hand to boost performance.
The 970 has a long pipeline however it is not run at a very high clock rate, unusually the CPU does more per clock than other long pipeline designs so the 970 is expected to perform very well.
In addition to the new architecture the 970 includes dual floating point units and a very high bandwidth bus which matches or exceeds anything in the x86 world, this will boost performance and especially boost the Altivec unit’s capabilities.
The IBM PPC 970 closes the performance difference between the PowerPC and x86 CPU without consuming x86 levels of power (estimated 20 Watts at 1.4GHz, 40W at 1.8GHz). It has been announced in Apple Power Macintosh computers for August 2003, with the pent up demand I think we can expect Mac sales to increase significantly.
There has been a great deal of controversy over the benchmarks that Apple has published when it announced the new PPC 970 based G5 .
The figures Apple gave for the Dell PC were a great deal lower than the figures presented on the SPEC website. Many have criticised Apple for this but all they did is use a different compiler (GCC) and this gave the lower x86 results. GCC may not be the best x86 compiler but it contains a scheduler for neither the P4 or PPC 970 however it is considerably more mature on x86 than PowerPC. In fact only very recently has the PowerPC code generation began to approach the quality of x86 code generation. GCC 3.2 for instance produced incorrect code for some PowerPC applications.
However, this does lead to the question of why the SPEC scores produced by GCC are so different from those produced by Intel’s ICC compiler which it uses when submitting SPEC results. Is ICC really that much better than GCC? In a recent test  of x86 compilers most results turned out glaringly similar but when SEE2 is activated ICC completely floors the competition. ICC is picking up the code and auto-vectorising it for the x86 SSE2 unit, the other compilers do not have this feature so don’t get it’s benefit. I think it’s fairly safe to assume this at least in part is the reason for the difference between the SPEC scores produced by Apple and Intel.
This was a set of artificial benchmarks but does this translate into real life speed improvements? According to this comment  by an ICC user the auto-vectorising for the most part doesn’t make any difference as most code cannot be auto-vectorised.
In the description of the SPEC CPU2000 benchmarks the following is stated:
“These benchmarks measure the performance of the processor, memory and compiler on the tested system.”
SPEC marks are generally used to compare the performance of CPUs however the above states explicitly this is not what they are designed for, SPEC marks also also test the compiler. There are no doubt real life areas where the auto-vectorisation works but if these are only a small minority of applications, benchmarks that are effected by it become rather meaningless since they do show reliably how most applications are likely to perform.
Auto-vetorisation also work the other way, The PowerPCs Altivec unit is very powerful and benchmarks which are vectorised for it can show a G4 outperforming a P4 by up to 3 1/2.
By using GCC Apple removed the compiler from the factors effecting system speed and gave a more direct CPU to CPU comparison. This is a better comparison if you just want to compare CPUs and prevents the CPU vendor from getting inflated results due to the compiler.
x86 CPUs may use all the tricks in the book to improve performance but for the reasons I explained above they remain inefficient and are not as fast as you may think or as benchmarks appear to indicate. I’m not the only one to hold such an opinion:
“Intel’s chips perform disproportionately well on SPEC’s tests because Intel has optimised its compiler for such tests”* – Peter Glaskowsky, editor-in-chief of Microprocessor Report.
I note that the term “chips” is used, I wonder does the same apply to the Itanium? This architecture is also highly sensitive to the compiler and this author has read (on more than one occasion) from Itanium users that it’s performance is not what the benchmarks suggest.
If SPEC marks are to a useful measure of CPU performance they should use the same compiler, an open source compiler is ideal for this as any optimisations added for one CPU will be in the source code and can thus be added to the other CPUs also keeping things rather more balanced.
People accuse Apple of fudging their benchmarks, but everybody in the industry does it – and SPEC marks are certainly not immune, it’s called marketing.
Personally I liked the following comment from Slashdot which pretty much sums the situation up:
“The only benchmarks that matter is my impression of the system while using the apps I use. Everything else is opinion.” – FooGoo
x86 has the advantage of a massive market place and the domination of Microsoft. There is plenty of low cost hardware and tons of software to run on it, the same cannot be said for any other CPU architecture.
RISC may be technically better but it is held in a niche by market forces which prefer the lower cost and plentiful software for x86. Market forces do not work on technical grounds and rarely chose the best solution.
Could that be about to change? There are changes afoot and these could have an unpredictable effect on the market:
1) Corporate adoption of Linux
Microsoft is now facing competition from Linux and unlike Windows it is not locked into x86. Linux runs across many different architectures if you need more power or low heat / noise you can run Linux on systems which have those features. If you are adopting Linux you are no longer locked into x86.
2) Market saturation
The computer age as we know it is at an end. The massive growth of the computer market is ending as the market is reaching saturation. Companies wishing to sell more computers will need to find reasons for people to upgrade, unfortunately these reasons are beginning to run out.
3) No more need for speed
Computers are now so fast it’s getting difficult to tell the difference between CPUs even if their clock speeds are a GHz apart. What’s the point of upgrading your computer if you’re not going to notice any difference?
How many people really need a computer that’s even over 1GHz? If your computer feels slow at that speed it’s because the OS has not been optimised for responsiveness, it’s not the fault of the CPU – just ask anyone using BeOS or MorphOS.
There have of course always been people who can use as much power as they can get their hands on but their numbers are small and getting smaller. Notably Apple’s software division has invested in exactly these sorts of applications.
4) Heat problems
What is going to be a hurdle for x86 systems is heat. x86 CPUs already get hot and require considerable cooling but this is getting worse and eventually it will hit a wall. A report by the publishers of Microprocessor Report indicated that Intel is expected to start hitting the heat wall in 2004.
x86 CPUs generate a great deal of heat because they are pushed to give maximum performance but because of their inefficient instruction set this takes a lot of energy.
In order to compete with one another AMD and Intel will need to keep upping their clock rates and running their chips at the limit, their chips are going to get hotter and hotter.
You may not think heat is important but once you put a number of computers together heat becomes a real problem as does the cost of electricity. The x86’s cost advantage becomes irrelevant when the cooling system costs many times the cost of the computers.
RISC CPUs like the 970 are at a distinct advantage here as they give competitive performance at significantly lower power consumption, they don’t need to be pushed to their limit to perform. Once they get a die shrink into the next process generation power consumption for the existing performance will go down. This strategy looks set to continue in the next generation POWER5.
The POWER5 (of which there will be a “consumer version”) will include Simultaneous Multi-Threading which effectively doubles the performance of the processor unlike Intel’s Hyper Threading which only boosted the performance by 20% (although this looks set to improve). IBM are also adding hardware acceleration of common functions such as communications and virtual memory acceleration onto the CPU. Despite these the number of transistors is not expected to grow by any significant measure so both manufacturing cost and heat dissipation will go down.
x86 is not what it’s sold as. x86 benchmarks very well but benchmarks can and are twisted to the advantage of the manufacturer. RISC still has an advantage as the RISC cores present in x86 CPUs are only a marketing myth. An instruction converter cannot remove the inherent complexity present in the x86 instruction set and consequently x86 is large and inefficient and is going to remain so. x86 is still outgunned at the high end and perhaps surprisingly also at the low end – you can’t make an x86 fast and run cool. There is a lot of marketing goes into x86 and the market -technical people included- just lap it up.
x86 has the desktop market and there are many large companies who depend on it. Indeed it has been speculated that inefficient or not, the market momentum of x86 is such that even Intel, it’s creator may not be able to drag us away from it . The volume of x86 production makes them very low cost and the amount of software available goes without saying. Microsoft and Intel’s domination of the PC world has meant no RISC CPU has ever had success in this market aside from the PowerPCs in Apple systems and their market share is hardly huge.
In the high end markets, RISC CPUs from HP, SGI, IBM and Sun still dominate. x86 has never been able to reach these performance levels even though they are sometimes a process generation or two ahead.
RISC vendors will always be able to make a faster, smaller CPUs. Intel however can make many more CPUs for less.
x86 CPUs have been getting faster and faster for the last few years, threatening even the server vendors. HP and SGI may have given up but IBM has POWER5 and POWER6 on the way and Sun is set to launch CPUs which handle up to 32 threads. Looks like the server vendors are fighting back.
Things are changing, Linux and other Operating Systems are becoming increasingly popular and these are not locked into x86 or any other platform. x86 is running into problems and PowerPC looks like it is going to increasingly become a real, valid alternative to x86 CPUs both matching and exceeding the performance without the increasingly important power consumption or heat issues.
Both Amdahl’s Law (of diminishing returns) and Moore’s Law date from around the same time but notably we hear a great deal more about Moore’s law. Moore’s Law describes how things are getting better, Amdahl’s Law says why it’s not. There is a difference however: Moore’s Law was an observation, Amdahl’s Law is a Law.
 John Cocke, inventor of RISC (obituary)
 SPEC benchmark results
 Amdahl’s Law Simplified – Richard Wiggins
 Speed differences in different languages
 Coding competition shows humans are better than compilers
 Combined CPU Benchmarks
 C3 V’s Celeron benchmarks
 Speculation on the PowerPC G5
 Details of the 64bit bridge can be found in the Software Reference Manual.
 Apples G5 benchmarks
 ICCs optimisations can greatly effect performance
 But  does not appear to continue into real life code
* Article on G5 benchmarks
*I do not know if this is an exact quote.
 Escape from planet x86 – Paul DeMone
Article covering the differences between RISC and CISC
Article on PowerPC 970
About the Author:
Nicholas Blachford has been interested in CPUs for many years and has written on the subject for OSNews before. He works for Genesi who produce the Pegasos G3 / G4 PowerPC based motherboard and the MorphOS Operating System.