Continuing Part I from last week, this is the Part II article regarding the future of 64-bit CPUs, called “Under the Microarchitectural Covers”: “Who will dominate 64-bit computing, AMD or Intel? AMD’s Hammer architecture is compelling and compatible, but IA-64 has great long-term potential. Will Intel also hedge its bet with a 64-bit x86 design?” “Instruction Dispatch and Execution”: See how Hammer’s nine execution units are nothing like Itanium’s. “Seamless and Powerful Multiprocessing”: Hammer’s onboard memory controller and HyperTransport links–big advantage. “Intel’s Ace in the Hole“: What’s this rumored ‘Yamhill’ 64-bit x86. On a related note, Intel’s McKinley 64-bit CPU will showcase at next week’s Intel Developer Forum in San Francisco.
A good article to read to know the main differencies between IA64 and Hammer…
“Instruction Dispatch and Execution”: See how Hammer’s nine execution units are nothing like Itanium’s. ”
found here?
http://www.extremetech.com/article/0,3396,apn=4&s=1005&a=22731&app=…
they revert to a tie after discovering ia64 has only risc-ops too
and what means the address units doesnt count?
does ia not access memory?
btw with predictable memory vliw is nice, like ti’s signal chips.
but with caches – how does the compiler know what will be in cache and what not?
hammer can reorder to do something when memory waits,
how does ia?
same number of units, lower frequency, no reorder
– sounds like hammer wins..
“For starters, Hammer’s got three address-generation units (Itanium has none), which don’t really contribute to forward progress. They’re more of a necessary evil. Itanium has no address-generation units because it supports only one simple addressing mode. Advantage: Intel.”
So it is an advantage to require 2-3 instructions to calculate an address? The compilers for the IA64 are good but cannot remove all adress calulations. The x86 address generation units can (and are when one have a good compiler) be used for additions and some constant multiplications besides their address generating duties.
“It usually takes a handful of ROPs to equal one “real” instruction”
Not if you use a real compiler. Most instructions are expanded to one or two ROPs. To get “a handful” ROPs per instruction you have to work real hard (or have very small hands).
“Hammer has to make do with the x86 instruction set, which has no concept of branch hinting or predication, and never will (…)”
First: A processor based on a heavily OOE architecture does not need predication as badly as an in-order VLIW processor.
Second: Intels Pentium4 already have branch hinting instructions.
<p> The article is very pro-VLIW but fails to mention the weak spot of VLIW: language support. Most language don’t allow the expression of variable dependance, so the compiler must assume the worst. And the cases were the independence is explicit, like using two different local variables, are the ones that a processor like Hammer is also effective at detecting, because the variables will be at different addresses or in different registers. </p>
<p> Take for example memmove() in C: its operands might overlap, so the compiler is forced to assume they do. Hammer can detect at run-time that they are independant and take advantage of that.</P>
<p>There is a new keyword in C99 (restrict) that allows expressing independence for pointers, but current code won’t use it, and it probably won’t be used much since the programmer must proves that pointers are independent in 100% of the case. If there is a possibilty of aliasing, even in just 1% of the case, it cannot be used.</p>
<p> And besides, compiler-time instruction scheduling is already done, just not as extensivelyas what Itanium requires. And to improves it, you must convince cmpiler writer that it is worth the trouble.</p>
<p>And finally, with platforms like Java and .NET where the optimizations of programs are done at run-time, and code dependencies are not expressed in the byte-codes, the possibility to produce good VLIW instructions is slim.</P>
Just curious if anyone has done a comparison between Intel IA64, AMD Hammer and Motorola PowerPC?
”
There is a new keyword in C99 (restrict) that allows expressing independence for pointers, but current code won’t use it, and it probably won’t be used much since the programmer must proves that pointers are independent in 100% of the case. If there is a possibilty of aliasing, even in just 1% of the case, it cannot be used.
”
transmeta has a nice feature for aliasing,
which allows to block addresses.
so to put stuff in register, one blocks its memory place.
an access at that address throws interupt.
the jitter has compiled a fallback too for the
aliasing-case, which is memory only.
most of the time code runs fast,
all the time correct.
One small note on IA64..
One thing I notice as a compiler writer in the IA64 is the limitation of no integer multiply. I know that people love to optimize their multiplies using shifts & adds, but when you start to get into multi dimensional arrays (e.g. fortran, pascal), invariably a multiply instruction becomes useful.
Also, eventually code density becomes an issue. If the code density problems increase too much, your L1/2/3 cache starts to not do its job and overall performance drops. Memory is cheap these days, but still inherently slow. And you most likely get problems with bus contention if you start talking multi processor designs.
Now to the main topic..
I think that the crucial decider between the IA64 and the x86-64 is going to be the running of legacy apps. Now I’m not talking about running a legacy OS on the IA64, but rather I am talking about an OS that can run both instruction sets simultaneously and integrate program execution between the two architectures, possibly even in the same process address space.
Now you all know that the win95/98/ME still has tons of 16 bit code in it. The GUI subsystem is still fundamentally 16 bits and it thunk back and forth between the 32 bit code and the 16 bit code to make the system work. I could envisage that a hybrid 32/64 bit OS + Windowing system might be more expedient to develop than porting the entire OS & GUI to 64 bits. For starters, you would have to rely on the myriads of device driver developers to build drivers for the 64 bot system or face doing them all oneself. So in my opinion, being able to run legacy 32 bit code at maximum performance is going to be the deciding factor for any widely used OS on these platforms, and this decision purely on getting a useful OS happening.
And then there are the millions of applications out there that are built for IA32, and aren’t likely to be rebuilt any time soo, either because compiler tools won’t be available, the porting costs are too high, or silly reasons like the source code is lost or inaccessible due to company reasons.
So in summary, while a new architecture is sorely needed, we must face the reality that we live in a software world where legacy abounds, probably more than ever in the history of computing. Any architecture that can’t integrate that legacy correctly is possibly doomed.
If IA64 weren’t built by Intel, it certainly couldn’t succeed no matter how good it may look on paper. Intel is going to have to use all its muscle to get people to use it, as they are just a new player amongst many in an already established market. They might have achieved it if they had gotten to market earlier, but timing is everything is this industry, I strongly believe they have left their run too late. The only way they can gain market sharing is by unfair methods like putting its competitors out of business using dirty tricks. Care to guess how long the Alpha will last?
AMD on the other hand have recognized that the legacy issue is far bigger than Intel are prepared to admit and are using that as leverage to break into the 64 bit market. In my opinion they will have far more success than Intel. They are smaller but less ambitious, but sometimes this business strategy succeeds in the long run.
P
i agree with “So in summary, while a new architecture is sorely needed, we must face the reality that we live in a software world where legacy abounds, probably more than ever in the history of computing. Any architecture that can’t integrate that legacy correctly is possibly doomed.”
post.
amd should allow compilers, developers, etc write directly to its native, internal RISC core. you get the best of both worlds: x86 legacy support that is fast and proven
and the world of RISC – with better future headroom