Yesterday, we reported on an article about the demise of the Alpha. That article was the first part in a series about the future of processor design. Today, part II has been published: “In terms of the architecture itself, AMD’s Athlon 64 platform, at the stage it is at right now, does not offer that much of a performance advantage, and AMD should not be resting on its laurels. This is because on the desktop, interconnects as such play less of a role. It’s on servers and multi-processing systems that you can take advantage of scaling, and that’s where interconnects such as HyperTransport have a role. But when you talk about a single-chip desktop system, whether it’s one, two or four cores, the efficiency of the chipset still plays a very important role.”
When I was younger and a bit weirder I wrote Be Inc. asking if they would ever consider porting BeOS to Alpha and explained that it followed their philosophy of a technical clean break. I received an email from JLG saying that if they were to receive the necessary help that, yes, they probably would. I guess he was just humouring me, but man, that would be one technically superior system.
Who own’s Alpha now? Compaq? They’ve basically shelved the whole thing haven’t they?
What about the efficiency of the software?
64-bit is years behind/away.
Seems to be the message that these articles are sending across. While the x86 architecture may not be all its cracked up to be, as Intel has showed us it is not easy to break away from. AMD and Intel may have the technology and the skill the design an architecture superior to x86, but would it even have a chance in the current market? I think the only way that it would work is if the improvements are vastly superior than x86 and at a competitive price point
I think the only path forward is just like AMD figured out, slowly adding the next step. If you look at software from pre Windows 95 days, you’ll see the big step from 16 bit to 32 bit computing. Only recently has support for 16 bit computing started to be shelved in OSes. My guess is that once 64 bit software becomes dominant we’ll see 16bit dropped altogether from x86. Either that, or some amazing emulation technology will have to come out that makes running old arch apps on a radically different new processor efficient. Descussing popular use of 128bit computing is obviously a little far out there at the moment We’ll probably be using optical computers before 128bit digital ones become popular.
Regarding Athlon and it’s amazing design, AMD picked up several senior Athlon designers which is why there are striking similarites between it’s slot processor design and one of the alphas. Also, the ex-Alpha people were heavily involved in AMDs move to 64bit. So it’s not entirely accurate to talk about AMDs x64 and Athlon’s as entirely unrelated.
should read “picked up several senior ALPHA designers” not Athlon
I think the only way that it would work is if the improvements are vastly superior than x86 and at a competitive price point
You are forgetting a couple of other major roadblocks. First of all There has to be MoBo manufacturers out there that are willing to produce a board with a new chip. If only Intel produced the boards, or even AMD for that matter, it will never fly in the PC industry. The other major roadblock is Microsoft. They still rule the roost in the PC OS business and if they don’t want to support another architecture then the whole thing is dead in the water.
The worst thing about the x86 situation is that there is such a large volume of them being produced that cost has come down considerably over the years. If a new chip were released the costs would far exceed the price of current x86 chips. The only way to drive down costs would be to produce more chips and the only way to do that is to get the whole PC business onboard. That seems like a fat chance to me.
You have cca. 10 (guesstimate) MoBo manufacturers. One dominant OS(*). These things can be changed rather fast. The problem, as I see it, is all the Windows/x86 software out there. People often use it, even when it is not supported anymore, etc.
(*) – Linux and the *BSDs are already multi-platform and would be ported to the new architecture quite fast.
You have cca. 10 (guesstimate) MoBo manufacturers. One dominant OS(*). These things can be changed rather fast.
It’s not a matter of who makes the motherboards, it’s a matter of any of them actually buying into such a risk considering the current situation. Besides that pushing Microsoft out of their position of dominance in the area of operating systems is not a trivial task, mostly because reasons you state next:
The problem, as I see it, is all the Windows/x86 software out there. People often use it, even when it is not supported anymore, etc.
intel and amd continue to grapple with the ongoing deflation in the processor world. very few home or business applications are cpu bound at this point. people want cheaper. they want lower power consumption. the high-performance multicores continue to show tepid sales. i don’t think people really care at this point, their current cpus are near idle most of the time as i/o and netqork connectivity fail to fill the gap.
You read me right. Computers are still slow as molasses in the real world. What I *really* want is a dramatically faster hard drive.
The high end will sell, especially to me. I’ll pay double for double the performance, no sweat. If you make your living using, programming and serving with computers, then you will be happy with the continued progress we’ve seen… 2001-2004 was a wash for progress in performance except in the video market.
The article failed to add that Windows XP 64 is a total failure, though. It isn’t appreciably faster (Linux can pick up as much as 30% overall just from the added registers) and there are a dearth of drivers for it. There are so few 64 bit drivers available, and so little performance increase in W64 that one is left mostly with the additional memory as a reason to go to W64, and plenty of reasons not to.
Maybe Vista will be decent, once it’s had a couple of years to mature, but it doesn’t matter much to me, I’ll be using KDE4 by then (typing this from 3.5 after all).
Edited 2006-01-24 18:20
>I’ll pay double for double the performance, no sweat.
Unfortunately the market requires you to pay far more than double for double the performance of standard disks:
949.90 € IBM 146 Go 15000 RPM SCSI
197.61 € IBM 160 Go 7200 RPM Serial ATA
Uhh … Earth to author?
With AMD already having gone dual-core, and ready to strike out with quad-core sometime this/next year, interconnects are more important than ever!
The Pentium D’s interconnect design is absolutely terrible. All communication between the two cores on a Pentium D, including cache coherency updates, goes over the shared 800 MHz bus. On an Athlon 64 X2, the internal crossbar controller deals with such things, and the outside world is linked to the processor with a fast, low-latency HyperTransport tunnel. Right now, the Pentium D still manages to out-do its single-core cousin, but if Intel had gone to four cores with such a design, performance would screech to a halt. The Athlon 64 design would continue humming along.
Interconnects are *extremely* important.
While I couldn’t find any glaring mistakes in the article (though the “quadruple the integer capacity” did make my laugh…) his understanding in core design seem to be sourly lacking and he’s not really up to speed when it comes to benchmarks and gaming.
How can one talk about core design without comparing Intel and AMD’s different core design (No just hypertransport vs GTL) and their vastly different approach to dual core design?
More-ever, he failed to mention to pending death of the P4 netburst architecture, which to me, seem pretty monumental.
I the author’s defense, this article is much better then his previous one which among other things claimed:
“Nevertheless HyperTransport has now come of age, and it’s no longer simply an AMD bus…. a few other places as well, including Intel machines. Intel uses it in their chipsets.”
Intel using HT their chipsets? Interesting! Now I know why they canceled the CSI…:)
In short, this the shallowest review of the CPU world I’ve read in a long time. Here’s 5 minutes I won’t be getting back…
G.
Edited 2006-01-24 19:08
The really nifty bit about the early Athlons was that they used the same motherboard bus as the Alpha 21064 (I think it was) — the EV6 bus. There was even a project around that time to make the Alpha and Athlon slot compatible using Slot A, but it fizzled out along with all the rest of the Alpha development.
While the Alpha was a great architecture, I think the article overstates the benefits of RISC. This comment in particularly miffed me:
If we’ve seen the death of some of the best RISC architectures, and with it the triumph of mediocrity, in the form of the x86 architecture and the software that is usually run on it
x86 isn’t *that* bad. It’s a bit crfty, since its been amended and changed so many times, but in principle, its a reasonable ISA. Its very compact, saving on icache, and its handling of memory references and constants is very convenient. Has anybody ever compared the size of a PPC binary to an x86 one? GCC makes binaries on my G5 that are like 50% bigger than binaries on my Athlon64. That means despite both processors having the same sized iCache, the G5’s can hold only 2/3 as much real useful code as the Athlon64’s.
Things get worse for object oriented programs (RISC was designed more for FORTRAN than LISP). In such programs you deal with a lot of pointers and a lot of indirect memory references, and all while on x86 they can be folded into arithmetic or logical instructions, on RISC, they have to be implemented as seperate load/store instructions. This makes the binary still bigger, and eats into your dispatch bandwidth.
I disagree: x86 is bad, and little crufty is understating: compare SSE vs Altivec for example.
From ease-of-use point of view: its use of low-endian makes hard to read hex dump of memory, low number of register, not-orthogonal ISA…
x86 instruction density can be matched by RISCs as ARM’s Thumb2 have shown (Thumb had a performance impact, I don’t know if it’s still the case with Thumb2).
Now I must admit that I don’t remember if the paper I’ve seen which compared instruction density was using also OO code for the comparison.
I disagree: x86 is bad, and little crufty is understating: compare SSE vs Altivec for example.
Both use 128-bit vectors, both support byte, short, int and float vectors. SSE also supports double vectors. The instruction sets are quite similar.
SSE has only 8 registers in comparison to Altivec’s 32, but makes up for much of that that by being able to use powerful memory addressing in every instruction, not just load/store.
Where’s your point?
x86 instruction density can be matched by RISCs as ARM’s Thumb2 have shown
Yes, but should Thumb2 still be considered a RISC instruction set? It hasn’t got fixed-size instructions, and it only has 8 general-purpose registers.
I think they should have sacrificed three-operand instructions rather than halve the number of registers compared to the full ARM instruction set, but it’s still a good compromise between orthodox RISC and pragmatic CISC.
> should Thumb2 still be considered a RISC instruction set?
-Load/Store architecture? Check.
-Fixed instruction size? No: but only two mode 16/32 bit compared to 8/16/24/32.. in tradionnal CISCs.
-Simple encoding? More or less, debatable as of course it’s more complex that ISA with only one length.
-For the registers, even in thumb mode the 8 registers are fully orthogonal and there are 3 other register for instruction pointer,etc.
So it still looks mostly like a RISC, but RISC isn’t a religion, the real questions are: is-it easy to use for human, compilers and does-it use transistors efficiently?
For the two first point I think that this is the case, for the third point I think that it still take less transistor to handle Thumb2 that to decode x86 but it’s just a guess.
I disagree: x86 is bad, and little crufty is understating: compare SSE vs Altivec for example.
SSE wasn’t that great, but SSE2 and SSE3 are pretty good. AltiVec seems to be slightly faster, but SSE3 is more flexible in both operations and datatypes.
From ease-of-use point of view: its use of low-endian makes hard to read hex dump of memory, low number of register, not-orthogonal ISA…
Big endian makes it easier to stare at a random memory dump and figure out what’s going on. Little endian makes code simplier as it makes it simplier to do things like treating an int as a short or a char. It’s not something you’d directly notice unless you program in assembly, but it does tend to result in smaller code.
x86 instruction density can be matched by RISCs as ARM’s Thumb2 have shown (Thumb had a performance impact, I don’t know if it’s still the case with Thumb2).
I don’t know anything about Thumb2, but Thumb certainly sucked. The only time it made any sense was if your hardware couldn’t provide a 32 bit memory interface, and even then you were still better off using ARM for certain things.
Does SSE3 has a multiply-accumulate operation?
I remember that it was missing in previous version.
Little endian simplifies some assembly language trick, big endian simplifies other assembly language trick, I’m not sure that there is really a benefit of little endian for the assembly language programming here, but human read in big endian and this is a big difference.
What do you say that Thumb sucked? Because of the performance impact?
Anyway I’ve not said Thumb but Thumb2 where you can mix 32 and 16 bit instructions.
Little endian simplifies some assembly language trick, big endian simplifies other assembly language trick, I’m not sure that there is really a benefit of little endian for the assembly language programming here, but human read in big endian and this is a big difference.
OMG, I can’t believe people are still beating that particular dead horse.
The differences between the two approaches are so small that the choice is really only a matter of taste, and that’s of course why we ended up with the industry split down the middle on that one.
The supposed advantages of either approach are laughable compared to the huge disadvantage of having to deal with both of them in heterogenous settings.
While I agree that the difference of performance between both approaches are negligible (both have some algorithm where they are better but no particular advantage), the readability for human is a big difference though.
Agreed about your last point: I still don’t understand why Intel went that way, where most other company used big endian: Motorola, IBM, etc..
Does SSE3 has a multiply-accumulate operation?
No, because x86 does not allow for three-operand instructions. There is a multiply-add instruction for two 16-bit integer vectors, but that’s not the same.
Therefore multiply-accumulate requires separate multiply and add instructions. Whether that actually has a performance impact depends on what execution units there are and how the operations are scheduled. Anyone got experience with that?
While the Alpha was a great architecture, I think the article overstates the benefits of RISC.
Very much so. The Alpha in particular went too far in simplifying the hardware at the cost of software.
E.g. its load/store instructions can only access whole 32-bit (or 64-bit) words at a time. So what if you need to access bytes or 16-bit words? Yes, you need to load the whole word and then extract the part that you want. That wouldn’t be too awful if at least it had dedicated instructions to do the extraction.
But no, standard shift&rotate is all you get, so that accessing a random byte in memory takes something like 5 instructions, each of them 4 bytes big. x86 can do that in a single 2- or 3-byte instruction.
And it’s not like this is some obscure operation; 8-bit and 16-bit accesses are required all the time in string, image, and audio processing.
Well there was an extension to the Alpha to add 8,16 bit access.
So you’re complaint is a bit exagerated..
That review was terrible. Absolutely terrible. In short, this “consultant” knows nothing about system architecture. He’s too busy comparing numbers to realize the numbers are not comparable.
DDR2, his biggest gripe. Why is it permissible for AMD to be on DDR while Intel went to DDR2? AMD is NUMA, each processor has dedicated memory. The consultant waxes about how Intel’s bus is faster – but AMD has multiple, independent busses. Intel has bus contention; AMD runs serial links at full speed.
He talks about how a GPU could saturate a bus – but ignores how the GPU would starve out an Intel processor’s memory usage (which Intel processors are more succeptable to; NetBurst’s failings). I suppose Intel’s archictecture would favor CPU and GPU equally – but that’s not what you want, stalling the CPU creates latency while stalling the GPU might (gasp!) drop a frame.
Single-core processors look great on benchmarks. But on any realistic system – yes, even games – there is enough I/O, enough background activity, and bus traffic that you want to look at multithreaded performance. When everything fits in memory, Intel does better; once you have real-world workloads that grow beyond Intel’s huge caches, AMD shines. Try running a kernel compile for a good system benchmark – or try running one of those games with a background virus scanner, three web browsers, and a bittorrent download going in the background.
Agreed, and the part about AMD64 is terrible: applications don’t need to be 64-bit clean before being able to use the new registers..
They just need to be recompiled in the correct mode: you can stay in 32bit by default and get access to the new registers..
It may even be faster to use 32bit addresses instead of 64bit addresses if your application doesn’t need to access >4GB amount of memory.
Getting a ~20% improvement with just a recompile isn’t bad!
They just need to be recompiled in the correct mode: you can stay in 32bit by default and get access to the new registers..
No, you can’t. The extra registers are only available in 64-bit mode. Theoretically you could limit pointers to 32 bit when in 64-bit mode, but I don’t think any compilers or operating systems support that at the moment.
Oops, you’re right, you have to use 64 bit addresses to be able to use the additionnal registers.
I apologize for my (honest) mistake.
It’s strange that this isn’t available though: this would have allowed easy recompilation without having to do modification in your app..
Do people remember the customised chipset that set the amiga 500 as the premiere gaming and multimedia machine of it’s day (more like a few years). It did this with a rather lame CISC 68K cpu (68000) with 16 bit bus(32 bit internal though), & no FPU to speak of.
Chipsets used to be what made the difference, then people got lazy cause CPU’s got faster(making crappy chipsets less noticeable). Coding got sloppy. Hardware designers lost focus.
We seem to be going round in circles. Now we have CPUs that run slower but do more per cycle and focus is being placed back on how the rest of the system hold’s up it’s end of the bargain.
Can’t wait to one day turn on my new “Octo-core” from 1ntel and be presented with a Basic prompt!!! (rendered in HD 48bit colour with AA of course)
for the CPUs
If we had some low cost MIPS64 and SPARC desktop product, INTEL would not survive. And yes, we need alternative desktops to see the difference. MIPS64 eases the way compilers are written. I think this was the problem where Itanic stumbled and this is where PARISC also was succesful. We need alternative desktops. The whole discussion has not meaning, thanks to INTEL or AMD. Hopefully the new Sparc station is more affordable.
For what runs on the CPUs
Java and C# are good for rapid development but keep developers far from the internals of the OS and CPU. This is good for rapid prototyping but not for production purposes. This generation of CPUs was not made for this kind of careless coding and people should realize it. C,C++ and ObjC has some role to play and correct software development procedures must come back. And yes, I am a fan of assembly for parts of the final product that must be boosted.
“If we had some low cost MIPS64 and SPARC desktop product, INTEL would not survive.”
That’s some crazy talk, Intel makes more than just processors, and nobody who makes SPARC and MIPS processors have near the market share that x86 have. The only reason that AMD has been successful in the last few years is that they sell a x86 compatible processor, that allows people to keep thier current OS and software
… marketing rules engineering not. Intel made it because of their marketing decisions – cause from any other perpective their processors are crap compared to RISC processors. It’s true that some RISC processors may have some quirks, but (judging from the ingenuity that engineers show in solving impossible problems with x86 processors) my speculation is that RISC processors should have no quirk that is hard to fix over a one-year period. x86 processors are accepting so much funding and research to be improved and still their performance is tragic to say the least. Thank god there is AMD around to produce decent-crapy processors. And please don’t tell me about x86 compatibilty. These are lies we’re being told to help us sleep at night with the notion that we are getting products that actually make sense. I wouldn’t mind compatibity if the performance jump from each RISC processor to the next was x2 while their price dived /2 (RISC processors -despite what we are being told- are actually far cheaper to produce than x86 ones). But this is not a `best man wins’ world: x86 processors versus RISC, USB versus Firewire, MS versus all other OSes … anywhere you look `worst man wins’ … and we are being told that this makes sense. I say we are _imposed_ to believe it does. But the harsh and fair truth is that it doesn’t … not if you look it from the `best man should win’ perspective that is. Call it `conspiracy theory’ if you want but my take is that there is a reason for all these and it’s not called `our choices’ … but again that’s just my take.