Post a Comment
I enjoyed that interview too. I'm the sort of geek who can sit and read about microprocessor design for hours, even when the information has no practical use. I've got a great insomnia book that compares the 6502, 68000 and Z80 processor architectures.
I am seeking medical help :-)
It is interesting to see the demarcation between PC's and Consoles right now - x86 for (most) PC's and PowerPC for the consoles. With the previous generation of consoles, we had x86 XBoxes and PowerPC Macs. With the current generation. there is no overlap (forgetting for now of course that Linux/BSD can run on PowerPC's).
Just think if the Mac had not abandoned the PowerPC chip. Since Linux and BSD run just fine on the PowerPC, that would leave poor little Windows stranded in the x86 world.
I guess 2 very big surprises - Mac switching to x86 and XBox switching to PowerPC - have created this polarized PC vs. Console world.
[EDIT: can't spell]
Edited 2007-03-27 16:41
But i think that intel really came out with some good stuff with core 2.
I agree! It used to be for me AMD was the obvious choice for price/performance. The Core 2 Duo realy reversed that for me. It wins in not just performance, but in power usage. It seems to me AMD only is a choice for the home user in the lower-end.
EDIT: Forgot to put on my flame-retardant suit. ;}
Edited 2007-03-27 16:46
And for the server users on the high end. The Opteron is still a good chip and does a fine job of scaling. Also, AMD has some advantages in its cHT design that allows for co-processors to be linked efficiently to its chips. Thus, the Opteron becomes a great controlling processor for a bunch of Cells, or for the ClearSpeed chip that is basically a chip with a grid of SPEs.
AMD has good chip designers. It looks to me like the main reason that Intel can beat them right now is that Intel has more resources and expertise for better manufacturing processes. Intel simply has smaller/faster transistors. I think that if AMD could get their chips produced on Intel's Fabs, then you'd have an extremely close race.
The Core 2 is a nice chip, but it should be borne in mind that the K8 architecture is pretty long in the tooth. K8 was over three years old when Core 2 was introduced, and the underlying design dates to the K7 about eight years ago. Core 2 is a pretty fresh start for Intel. Not a complete redo like K7 was relative to K6, but a much bigger overhaul than K7 to K8.
Overall, K8 is probably one of my most favorite designs ever. Core 2 is known for being a particularly aggressive hybrid of CISC and RISC, but K7/K8 was the one who started the whole "next-generation CISC" thing by not only accepting its x86 heritage, but actively embracing things like memory operands, variable-length instruction encodings, byte registers, etc, and using them to improve overall performance.
When it really comes down to it, an ISA is just an ISA. It's like the APIs of an OS. Many operating systems are (more or less) POSIX-compliant, but they differ wildly under the covers. CPUs are the same way. Whatever ISA gets used is broken down into micro-ops by sophisticated decoders so that the execution pipeline can work in a more convenient language. Assembly is the language for compilers, not for CPUs.
x86 became successful because the first generations of CPUs that used this ISA were the world's first general-purpose microprocessors at a mainstream price point. Over the years, micro-op architectures, decoder logic, micro-op reordering, and speculative execution have evolved to the point where today's x86 processors have vastly more in common with today's PPC processors than they do with x86 processors from just 3 years ago. The biggest difference between x86 and PPC is that software compiled for one won't run on the other. The ISA is the language of the compiler.
There are other important differences, though. Most notably, x86 only has 8 general purpose registers (16 on 64-bit processors), while PPC has 32. x86 tends to involve more decode logic and more sophisticated compilers, but then again optimizing compiler technology for x86 has advanced beyond that of any other ISA. PPC has more humorous instruction names such as eieio.
More mass market software is compiled for x86 than for any other ISA. That's the primary reason why x86 continues to be so dominant. When a console comes out, the vendor is generally responsible for ensuring software (game) availability/compatibility. Backwards compatibility with third-party software isn't generally a top concern. If the more compelling processor speaks PPC, tell the game developers to make their games speak PPC. In the console market, this is perfectly acceptable, whereas this doesn't fly in the PC market.
Edited 2007-03-27 18:39
Good points, but one quibble:
"x86 tends to involve... more sophisticated compilers"
I think the opposite is probably true. The nature of the x86 ISA, and the sheer amount of binary-only code already existing for it, has forced x86 implementations to be particularly accommodating of simplistic code generation. You can get good performance out of a K8 or Core 2 without fancy scheduling, aggressive reordering, instruction bundling, predication, sophisticated control-flow transformations, etc.
You have to be careful about instruction selection and encoding (particularly on newer CPUs in 64-bit mode), and you probably want a decent register allocator, but at the end of the day I'd much rather have to deal with a code-generator for K8 than one for PPC 970!
I agree on some of your points, but this statement
"...today's x86 processors have vastly more in common with today's PPC processors than they do with x86 processors from just 3 years ago."
is totally taken out of thin air.
Three years ago we had basically three (very different) x86-implementations alive: Athlon (64), Pentium 4, and Pentium M.
First, in order to compare these to "todays" x86-processors, you have to specify which one you refer to. They are much too different from each other to be meaningfully grouped into a "generation", or something like that.
Second, "todays" processor implementations are very much the same designs (minus Pentium 4), except for enhancments in physical manufacturing and various small tweaks in the logic. And the newer Core/Core 2, is really not much more similar to PPC than any of these, somewhat older, x86-processors are.
In fact, Core/Core 2 has a lot in common with Pentium M, technically, while it shares very little with PPC (of course with the exception of the basic SIMD idea, the von Neumann principles, and many other fundamental principles and technologies).
It's a lot more complicated than that. While uops in modern x86 processors are indeed fixed length, they are usually tracked in variable-sized groups. See for example the Pentium-M's micro-op fusion, and the K8's double-dispatch instructions.
RISC doesn't just mean fixed-length instructions, though. x86 has conventional CISC features like fancy addressing modes, a hardware-managed stack, memory operands, string instructions, etc. Though there was originally a move towards getting rid of all those abstractions at the uop level in previous processors, modern x86's handle those things even within the "RISC" core. Core 2 and K8 track LOAD+OP instructions as a bundle throughout the pipeline, and offer full performance for all of x86's addressing modes. Core 2 and K8L have dedicated hardware for managing the stack. Even on a recent x86 processor, the very CISC-y x86 string instructions are still often the fastest way to move memory around.
I agree (while encias point is also basically right).
Some thoughts:
(1) RISC (or the more sensible name, load-store architectures) are not inherently better (or worse) than non-RISC designs (as your "rep movsb" example illustrates). However, one or the other may be significantly faster, depending on purely physical factors such as the speed ratios between main memory, cache memory (if any), microcode-ROM, decode-PLAs, and random logic, as well as currently available component densities (1945 - 2007) and other low level factors.
Of course, this is nothing new, but nonetheless ignored in 99% of all the flamable "RISC/CISC" discussions I've seen on the Internet.
(2) What you are describing (LOAD+OP etc) reflects the fundamental principle that it is normally a bad idea to discard information, or context, that is already (freely) available.
(3) I find it a little amusing that older non-RISC designs (such as 8085) are often retroactively labeled complex ("CISC"), regardless of whether they fit the description or not. They may well have both less complex and fewer instuctions (addressing modes) than modern RISC processors.
Edited 2007-03-29 04:29




