Itanium Gets Scaled Down and Pushed Back

Thom Holwerda 2005-10-25 Intel 21 Comments

Intel has delayed by months the release of the next three major versions of the Itanium processor, a new blow for the processor family. But the chipmaker also plans a change it said will boost the performance of its more widely used Xeon line.

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

21 Comments

2005-10-25 9:53 am
zima
Itanium – Intel biggest succes in beeing ahead of other companies…

2005-10-25 10:49 am
Anonymous
the problem is that those other companies you are referring to are their customers…

2005-10-25 10:01 am
Anonymous
revive the alpha ?
it’s 64 bit. it’s fscking fast. and it has proven its quality since the early 90’s
2005-10-25 10:39 am
Anonymous
Why don’t you buy AMD ? It is already 64, already fast, and already proven to be stable.

2005-10-25 11:00 am
zima
…and it has a lot of Alpha in it. Heck, K7 had even the same fsb bus.
2005-10-25 11:56 am
nick8325
Because it’s ugly, crufty, has no tagged TLB nor segmentation to emulate it, still has one of the smallest numbers of general-purpose registers (same as 68K 20 years ago), still has an inscrutable and likely-to-change-often form of OOO, …?
Mostly IMO, of course.

2005-10-25 12:34 pm
nimble
Because it’s ugly, crufty,
No argument there. But Moore’s Law means the overhead from that gets less and less significant.
still has one of the smallest numbers of general-purpose registers (same as 68K 20 years ago)
While eight register (actually six after you discount stack and frame pointer) really were too few, 16 are sufficient for most algorithms.
RISC architectures were designed with 32 registers for a number of reasons. It helps utilising longish in-order pipelines. Load/store architectures require more temporary storage. And their 32-bit instruction words had plenty of instruction bits to spend anyway.
But once the likes of PowerPC and Alpha gained out-of-order execution and register renaming those extra architectural registers became much less useful.
ARM e.g. looked at the statistics, decided to go with 16 registers and spent the saved bits on conditional execution instead.
still has an inscrutable and likely-to-change-often form of OOO, …?
That’s the whole point of OOO: complexity is moved into the hardware. Programmer and compiler don’t need to care as much about the details, and a program you compile today will still have good performance on new processors without having to recompile.
(Yes, you can still squeeze out some performance by tailoring for a particular processor, but the returns are much less than with in-order ones.)

2005-10-25 4:30 pm
nick8325
While eight register (actually six after you discount stack and frame pointer) really were too few, 16 are sufficient for most algorithms.
Yes, I suppose so. After all, the more registers there are the more state there is to save on context switch. By the way, the frame pointer is usable as a GP reg in protected mode x86, since you can index things from the stack pointer instead.
(Yes, you can still squeeze out some performance by tailoring for a particular processor, but the returns are much less than with in-order ones.)
Yeah, but the complicated addressing modes and CISCy instruction set of x86 seem to make this more dependent on the particular implementation – but perhaps it’s just NetBurst being irritating that’s given me this impression ([reg * n] encodings, for example, are slow in NetBurst because it’s terrible at shifts – they happen outside of the ALU).

2005-10-26 1:47 am
rayiner
Yeah, but the complicated addressing modes and CISCy instruction set of x86 seem to make this more dependent on the particular implementation
Actually, the K8 architecture uses this to very good effect. There is an AGU attached to each integer pipeline on the K8. Integer instructions are batched into ALU/AGU pairs, and each integer pipeline can execute one pair each cycle. The net result is that memory operands become quite cheap to use, and a good register-allocator can take advantage of that fact to optimize access to spilled variables. Overall, this saves many instructions that would otherwise be used for loads/stores on a RISC architecture, increasing both performance and code density.
The K8 really is a wonderful architecture for integer performance. Its available in high clock speeds (up to 2.8GHz), has a relatively short pipeline (shorter than the one on the PPC970, Cell, Pentium 4, Pentium M, and UltraSPARC III), high-bandwidth, low-latency memory, and decent cache sizes. It’s bar-none the fastest CPU at SpecInt, with the 2.8GHz Opteron 254 coming in at 26% faster than a 1.9GHz Power5+ (a CPU that has 128MB of L3 cache and tens of GB/sec of memory bandwidth!)

2005-10-25 12:48 pm
Anonymous
Because it’s ugly, crufty, has no tagged TLB nor segmentation to emulate it, still has one of the smallest numbers of general-purpose registers (same as 68K 20 years ago), still has an inscrutable and likely-to-change-often form of OOO, …?
It may not be pretty but is sells. Besides:
-AMD64 added additional registers
-Pacifica will add tagged TLB etc.
What Itanium offers over Opteron is support for high-end 64 or 128-way servers such as HP Integrity Superdome (they are replacing PA-RISC here). For all intents and purposes Itanium is only for the high-end (you can buy smaller Itanium server if you want). A more appropriate comparison would be with IBM POWER5, Fujitsu SPARC64 or Sun UltraSPARC IV+, each of these scales to at least 64-way (32xDual core in the case of POWER5)
(72xDual core for Sun Fire)
(128 Single core for SPARC64)

2005-10-25 4:21 pm
Anonymous
“It may not be pretty but is sells.”
That is the sad truth.
80×86 is like a lump of cheese. When it starts to look bad they add more cheese to cover up the mould. Consumers only care that it works and only see the nice fresh layer of cheese on the top, while developers have to put up with the increasing number of layers of decayed slime surrounding the original putrid lump.
The rest of 80×86 hardware isn’t much better. IMHO it’s one of the reasons Microsoft have managed to remain so dominant for so long – competitors spend too much effort buried in the architectural wasteland left over from 30 years of “backward compatability”.
Intel had the intelligence to see that a shift to 64 bit was a good opportunity to start again with a clean design. AMD saw potential profit, and now we’ll probably be suffering from the “curse of the putrid cheese” for another 30 years.

2005-10-25 8:14 pm
Anonymous
Isn’t innovation grand? ;-p

2005-10-25 4:45 pm
nick8325
Pacifica will add tagged TLB etc.
Oh, that is good news! I just looked at Pacifica and it seems very nifty. That was one of my main annoyances with x86-64.

2005-10-25 11:34 am
mario
at their own foot.
The Itanium has been an umitigated, complete, truly comprehensive embarrassement for Intel and HP. Great at executing benchmarks, not so good for changing workloads and datasets, and absolutely a joke for executing x86 code. Does not scale, no dualy, and even now, it’s one of the hottest CPUs in the world. I just don’t want to imagine how a dualy Itanium would dissipate power. Imagine a smoking and melting motherboard.
Which reminds me: they thought by pushing out a dualy Xeon they’d offer something to the starving Intel fans, but the news about the abysmal performance (and high power consumption) is just too bad for even them to swallow.
The only genuinely good CPU developed by Intel the last couple of years is the Dothan-based Pentium M, and even that was developed in Haifa, Israel.
2005-10-25 1:22 pm
Anonymous
The amazing news by http://www.pasemi.com and the speed up of the epia-dp (very low power) by via make itanium seem a long ago hot sunk boat.
2005-10-25 2:43 pm
Anonymous
… was basically not to bother. I had attended training from Intel on the Itanium processors meant for implementors of high-performance computing clusters and developers of scientific software.
While it was very interesting, benchmarking existing code on the Itanium2s showed meager performance. The solution: recompile (which may not always be possible). But even then, to eke out the good performance it wasn’t sufficient enough to recompile. The Itanium relied entirely on the compiler to do all the optimization, setup the piplining, order the instructions, etc. The compiler was good, but frequently needed added hints or tweaking on the code for the compiler to know that it could apply certain optimizations in places.
It might have been practical were I writing everything from scratch, but we use a lot of code from other places and I just haven’t the time or desire to hand-optimize them.
Java applications (there are lots in the sciences), forget about it.
I did leave impressed that there are certain specific types of problems for which Itanium could work very well on, but that the general case is that it would remain behind Intels or AMDs other offerings for general purpose computing and that it’s not at all worth it for code not optimized for the Itanium.
2005-10-25 4:43 pm
CaptainPinko
From the article on The Register it looks like the cause of the clock decrease was due to the removal of Foxton power management which just seems to indicate that once again Intel is getting f–ked by heat issues.
It seems that the whole company bet the farm that heat wouldn’t be an issue and are screwed for the past oh say 4 years?
Probably taking ppl off the Intanium project to try to catch up with the Opteron’s success.
I really hope that Itanium doesn’t die since I think with more years to mature it could really be a strong contender. It took 15 for the x86 to mature so why expect less for Itanium?
The most troubling news is the delay of same sock Itaniums and Xeons since the thing Intel should prioritize is getting those damn things more affordable.

2005-10-25 4:45 pm
CaptainPinko
Heh, should have read this article too before posting… heh, at least I did read an (relevant) article bbefore posting.

2005-10-25 8:11 pm
Anonymous
all four of them
2005-10-26 8:43 am
Anonymous
The truth is that the last fifteen years Intel and AMD have provided the best combination of:
a) Backwards compatibility
b) Performance
c) Price
If you leave out backwards compatibility, then there was always good alternatives, although very few have competed successfully on the ratio of performance/price.
Why? Because of the extreme mass production of x86 processors resulting from the demand made by people actually wanting a).
Intel and AMD have always delivered new generations of processors that were capable of running the previous generation of software without emulation. Itanium excepted.
Who cares about “awful architecture” as long as the performance is good? Yes, they are too power hungry, but significant advances have been done by both AMD and Intel over the last years on this issue.

2005-10-26 8:53 am
nimble
The truth is that the last fifteen years Intel and AMD have provided the best combination of:
a) Backwards compatibility
b) Performance
c) Price
True enough, but the ugly bastard still deserves a good bit of bashing every now and then.