posted by Nicholas Blachford on Wed 9th Jul 2003 16:43 UTC

"Low Power x86s, Why The Difference?, To RISC Or Not To RISC, PPC and x86 get more Bits"

Low Power x86s
There are a number of low power x86 designs from Intel, AMD, VIA and Transmeta. It seems however that cutting power consumption in the x86 also means cutting performance - sometimes drastically. Intel still sell low power Pentiium III CPUs right down at 650MHz. The Pentium 4 M can reduce it's power consumption but only by scaling down it's clock frequency. Transmeta use a completely different architecture and "code morphing" software to translate the x86 instructions but their CPUs have never exactly broken speed records.

VIA have managed to get power usage down even at 1GHz levels but they too use a different architecture. The VIA C3 series is a very simple CPU based on an architecture which forgoes the advanced features like instruction re-ordering and multiple execution units. The nearest equivalent is the 486 launched way back in 1989. This simplified approach produces something of a compromise however, at 800MHz it still requires a fan and even at 1GHz the performance is abysmal - a 1.3GHz Celeron completely destroys it in multiple benchmarks [7].

Why The Difference?
PowerPCs seem to have no difficulty reaching 1GHz without compromising their performance or generating much heat - how?

CISC and RISC CPUs may use the same techniques and look the same at a high level but at a lower level things are very different. RISC CPUs are a great deal more efficient.

No need to convert CISC -> RISC ISA x86 CPUs are still compatible with the large complex x86 Instruction set which started with the 8080 and has been growing ever since. In a modern x86 CPU this has to be decoded into simpler instructions which can be executed faster. The POWER4 and PPC 970 also do this with some instructions but this is a relatively simple process compared with the multi-length instructions or the complex addressing modes found in the x86 instruction set.

Decoding the x86 instruction set is not going to be a simple operation, especially if you want to do it fast. How for instance does a CPU know where the next instruction is if the instructions are different lengths? It could be found by decoding the first instruction and getting it's length but this takes time and imposes a performance bottleneck. It could of course be done in parallel, guess where the instructions might be and get all possibilities, once the first is decoded you pick the right one and drop the incorrect ones. This of course takes up silicon and consumes power. RISC CPUs on the other hand do not have multi-length instructions so instruction decoding is vastly simpler.

Related to the above is addressing modes, an x86 has to figure out what addressing mode is used so it can figure out what the instruction is. A similar parallel process like the above could be used. RISC CPUs on the other hand again have a much simpler job as they usually only have one or two addressing modes at most.

Once you have the instructions in simpler "RISC like" format they should run just as fast - or should they?

Remember that the x86 only has 8 registers, this makes life complicated for the execution core in an x86 CPU. x86 execution cores use the same techniques as RISC CPUs but the limited number of registers will prove problematic. Consider an loop which uses 10 variables in an iteration. An x86 will need hardware assist just to perform a single iteration.

Now consider a RISC CPU which generally have in the order of 32 registers. It can work across multiple iterations simultaneously, the compiler can handle this without any hardware assist.

The Hardware assist in question is Out-Of-Order execution and the tools of this trade are called rename registers. Essentially the hardware fools the executing program into thinking there are more registers than there really are and in the example this will allow for instance an iteration to be completed without the CPU needing to go the cache for data, the data needed will be in a rename register.

OOO execution is mainly used to increase the performance of a CPU by executing multiple instructions simultaneously. If so the instructions per cycle increases and the CPU gets it's work done faster.

However when the x86 includes this kind of hardware the 8 registers becomes a problem. In order to perform OOO execution, program flow has to be tracked ahead to find instructions which can be executed differently from their normal order without messing up the logic of the program. In x86 this means the 8 registers may need to be renamed many times and this requires complex tracking logic.

RISC wins out here again because of it's larger number of registers. Less renaming will be necessary because of the larger number of registers so less hardware is required to do register usage tracking. The Pentium 4 has 128 rename registers, the 970 has less than half at 48 and the G4 has just 16.

Because of the sheer complexity of the x86 ISA and it's limited number of architectural registers a RISC processor requires less hardware to do the same work.

Despite not using the highly aggressive methodologies used in the x86 CPUs, IBM have managed to match and even exceed the computing power of x86 CPUs with the PowerPC 970 - at lower power consumption. They were able to do this because of the efficiency of RISC and the inefficiency of x86 CPUs. IBM have already managed to get this processor to run at 2.5GHz and this should perform better than any x86 (with the possible exception of the Opteron).

The idea that x86 have RISC-like cores is a myth. They use the same techniques but the cores of x86 CPUs require a great deal more hardware to deal with the complexities of the original instruction set and architecture.

PowerPC And x86 Get More Bits
Both families are in the process of transitioning to 64 bit.

Athlon 64 (due September)

PowerPC 970

The AMD Opteron adds 64 bit addressing and 64 bit registers to the x86 line. There is already some support for this CPU in linux and the BSDs, a 64 bit version of Windows is also due. The Opteron is designed as a server CPU and as such both the CPU and motherboards cost more than for normal desktop x86 CPUs. The Athlon 64 can be expected to arrive at rather lower prices. Despite performing better than the best existing 32 bit Athlon, the Opteron has a slower clock speed (1.8GHz Vs 2.2GHz).

AMDs x86-64 instruction set extensions give the architecture additional registers and an additional addressing mode but at the same time remove some of the older modes and instructions. This should simplify things a bit and increase performance but the compatibility with the x86 instruction set will still hold back it's potential performance.

The PowerPC 970 is as predicted on OSNews [8] is a 64 bit PowerPC CPU based on the IBM POWER 4 design but with a smaller cache and the addition of the Altivec unit as found in the G4. It supports 32 bit software with little or no changes although some changes to the original 64bit PowerPC architecture have been made in the form of a "64 bit bridge" to ease the porting of 32 bit Operating Systems [9]. This bridge shall be removed in subsequent processors.

The hardware architecture of the 970 is similar to that of any advanced CPU however it does not have the aggressive hardware design of the x86 chips. IBM use automated design tools to do layout whereas Intel does it by hand to boost performance. The 970 has a long pipeline however it is not run at a very high clock rate, unusually the CPU does more per clock than other long pipeline designs so the 970 is expected to perform very well. In addition to the new architecture the 970 includes dual floating point units and a very high bandwidth bus which matches or exceeds anything in the x86 world, this will boost performance and especially boost the Altivec unit's capabilities.

The IBM PPC 970 closes the performance difference between the PowerPC and x86 CPU without consuming x86 levels of power (estimated 20 Watts at 1.4GHz, 40W at 1.8GHz). It has been announced in Apple Power Macintosh computers for August 2003, with the pent up demand I think we can expect Mac sales to increase significantly.

Table of contents
  1. "History, Architectural differences, RISC Vs CISC, Current state of these CPUs"
  2. "Law of Diminishing , Performance, Vector Processing and Power Consumption differences"
  3. "Low Power x86s, Why The Difference?, To RISC Or Not To RISC, PPC and x86 get more Bits"
  4. "Benchmarks, the Future"
  5. "Conclusion, References"
e p (0)    221 Comment(s)

Technology White Papers

See More