To view parent comment, click here.
To read all comments associated with this story, please click here.
There is nothing particularly inferior about the x86 architecture. It adds to the die space of the design (an Opteron dual-core is 20% larger in die area than a 970MP, even though the only difference in cache sizes is 32KB of L1 data cache). It adds some stages to the beginning of the pipeline, but not a tremendous number. The 970 spends quite a few stages on decode as well.
One of the advantages of x86 is that it is an extremely compact format. Not only are its instructions smaller, but they can do more. For example, they can have implicit memory loads/stores. The Opteron uses this not only to increase code density, but to increase dispatch rate (it can only dispatch 3 x86 ops per cycle, but each can have an associated load/store as well). A RISC processor not only has to encode these loads/stores seperately, but has to dispatch them seperately as well (the PowerPC 970 can dispatch 5 instructions per cycle, but two of them may be loads). In practice, PPC64 code is a *lot* bigger than amd64 code.
Instruction set is fairly low down in the list of performance parameters. Opteron is fast not because of (or in spite of) x86, but because its got a short pipeline, extremely low memory latency, a tight scheduling loop, and a good number of execution units. What's really amazing is the Pentium-M, which has an almost primitive execution core by today's standards, yet, a primitive ISA (8 architectural registers and all), yet gets incredible integer performance. Chalk it up to a combination of an extremely clever frontend, great branch prediction, and a supremely tuned compiler.






Member since:
2005-07-06
In integer performance (server space), the G5's are clock-for-clock maybe 80% as fast as an Opteron, and maybe 70% as fast as a Pentium-M. The SPECint of the 970MP at 2.5GHz is similar to the SPECint of the Opteron at 2.0GHz. And that's with IBM's compiler.
Given the alleged inferiority of the x86 architecture that's all rather disappointing. I wonder whether Intel or AMD could do a better PowerPC implementation than IBM themselves, or whether it isn't actually held back by its RISC architecture, in particular the cache-busting fixed 32-bit instruction encoding.