While the first generation of Itanium left many people dissapointed about its performance (mostly because people were trying to run and benchmark 32-bit code, while the CPU only shines when running native 64-bit code), it seems that Intel is back with a vengeance. The new Itanium2, ready to ship in a few months, scores some very good results in the CPU/compiler intense, industry standard benchmark, SPEC. While its integer capabilities are average to good, its floating point ones is the best thing out there today, scoring even better than the IBM Power4 (which includes a whopping 128 MB of cache). What’s very interesting in the results is that both the Power4 and the Itanium2 score poorly on MesaGL, Perl and gzip tests but they are adequate on the gcc test. This is possibly because they’re optimized for heavily streaming operations, big matrices. For a short commentary on the results check here. AMD Opteron SPEC results are not available yet.
Were there SPEC benchmarks done on the early 800MHz hammer chips? Anyone have the numbers?
http://www.tecchannel.de/hardware/937
Here is a Hammer benchmark for Quake 3 frame rates..
The scary thing here, though, is how close to the Power4 (with its giant cache) the P4 2.5G comes. A full 66% of the performance. Given the fact that the P4 is ramping up clockspeed like anything (3.0 GHz by the end of the year) it will be pretty soon that it overtakes the Power4.
On top of that, a Pentium 4 Xeon server costs a fraction of what a Power4 server costs.
Add another 512K onchip cache (for 1MB total) to the P4 and a slew of optimizations, including possible addition of the extra execution units that the original Pentium 4 design included… and this chip will be a monster. Clock speed, bandwidth, and IPC are all improved over P4.
Of course, Itanium2 is still a relatively small chip (the caches are most of the chip), so there is plenty of performance to be gained here as well.
Hammer is under a lot of pressure. AMD has the ace card of desktop 64 bit computing, but it will take a while before there are any 64 bit apps out there. The rumors of the Hammer optimizations being back-ported to Barton are interesting…. indicating delays with Hammer most likely.
I don’t know how many of the original Itanium designers are still sane, but congrats to them. Intel has built a chip that is world class.
#m
amd hammer at pr3400 is rumored to be 1360, or twice spc int of tianium 2
System performance is the key to the POWER4. A Pentium 4 system is not even in the same league. Besides POWER4 is soon to ramp up in clockspeed too.
Any rumors about the fp performance?
Gotta agree with jarre here. Itanium will help intel go into a market of serious workstations, maybe renderfarms, but not into big servers like the Power4. Itanium2 will push a bit higher than the Pentium4, but just a bit further. It’s not a giant leap forward for intel, just one more bite into the overall CPU market.
If you look at the SPEC results in more detail, you’ll see that the biggest Itanium2 configuration is a quad-CPU. IBM ships 32-way Power4s. HP has 64-wap HP-PAs. SGI benchmarks a 256-way MIPS, and seems to support 512-way on that same machine.
Intel is a producer of volume processors, not of niche processors.
The 1-way/ 2-way / 4-way / 8-way market is the volume market. Itanium is aimed at this volume market.
Intel is going after the medium server market, a notch or two above what they currently own with all the Pentium/Xeon servers that they have already shipped.
The Itanium is a fun processor that’s at the very beginning of its lifecycle. The VLIW design works well enough for Intel to challenge other processors on a pure CPU/FPU performance level. Once the Alpha team gets going on the chip, who knows where it will end up? The EV7 and EV8 certainly look interesting and there is no reason why one couldn’t adapt many of Alpha’s performance features to VLIW instruction bundles. I believe over time that a VLIW engine will be an asset, not a liablity. Dealing with the complexity in the short run has been the challenge.
I’ll have to see how I can get one of those zx6000 workstations.
#m
Oh sure the power in Power4 is in the system design and the vast multiprocessor goodness. I’m just pointing out that as a CPU, the moldy old x86 Pentium 4 can old its own. Lucky for IBM Intel machines’ bus and I/O tech are nowhere near as good.
but it will take a while before there are any 64 bit apps out there.
>>>>>>
Not for those running Linux With 90% of its apps already ported to 64-bit architectures (and thus 64-bit clean) Debian should be just a recompile away from fully supporting Hammer.
It would be great to see new apps, or good 64 bit redesigns.
Like MySQL-64 or something like that. Or PHP-64 or Apache-64. Or a new object database or XML database. Or rendering software.
Not just stuff recompiled under a 64 bit compiler, but redesigned for a 64 bit environment.
That’s one of the reasons I’d love to have an Itanium workstation… as Debian runs on Itanium today ๐
Linux64 Debian 3.0 with 2.4.18 kernel for Itanium2/zx1
http://www.specbench.org/osg/cpu2000/results/res2002q3/cpu2000-2002…
#m
Itanium 2 looks good on SPEC marks but Intel has the source to the SPEC benchmarks and can modify their compilers acordingly (they all so it…) However given the VLIW architecture of the Itanium series they need a really good compiler for everything otherwise the performance dives.
I’d be interested to see some other benchmarks for the Itanium 2 to see how it compares against other CPUs when Intels engineers can’t tweak the compiler for the benchmark.
System performance is the key to the POWER4. A Pentium 4 system is not even in the same league. Besides POWER4 is soon to ramp up in clockspeed too.
Power4 isn’t in the same league as P4, but what about I2? Exactly, you better be right about the clockspeed ramp… (Besides, I wonder how good I2’s performance would be if they had the same amount of cache as the Power4?)
However given the VLIW architecture of the Itanium series they need a really good compiler for everything otherwise the performance dives.
I heard HP has a good internal compiler for VLIW which is faster than Intel’s offering…. Besides, I’m sure one day the compiler would be ramp up, plus I’m sure GCC guys wouldn’t miss the chance in getting a brand new market….
With ANY modern CPU essentially being a highly speculative out-of-order multiple execution unit complex engine, you absolutely need a fancy compiler to get good performance from your chip. Sure, Intel pushed more of the complexity of scheduling their fantastic VLIW beast down to the compiler, but the compiler teams have finally caught up with the nature of the beast.
By the way, HP’s compiler would have improved SpecInt rates by 20%-25% (according to an article on Itanium that I read).
And remember Itanium 2 still has a small die (not considering the cache). Itanium 3 with more improvements plus 6MB cache will offer even greater FP performance. And then Itanium 4 hits the street with Alpha team magic.
Kiss Your Shadow … Itanium is going to be good.
#m
I heard HP has a good internal compiler for VLIW which is faster than Intel’s offering….
But how good? – Writing a VLIW compiler is no easy task.
Thats why I’d like to see other benchmarks.
With ANY modern CPU essentially being a highly speculative out-of-order multiple execution unit complex engine, you absolutely need a fancy compiler to get good performance from your chip. Sure, Intel pushed more of the complexity of scheduling their fantastic VLIW beast down to the compiler
But thats the thing you see, Out of Order CPUs do the scheduling for you, VLIW on the other hand is totally dependant on the compiler, If some code is not optimal on an OOO machine the hardware still does it’s stuff however this is not the case with VLIW. You need a good compiler on every occasion with VLIW otherwise you won’t even touch the potential performance of the CPU.
but the compiler teams have finally caught up with the nature of the beast.
But have they? It usually takes years for compilers to catch up… and Itanium2 is brand new.
By the way, HP’s compiler would have improved SpecInt rates by 20%-25% (according to an article on Itanium that I read).
Why didn’t they use it then? Intels marketing dept not using a trick???
And remember Itanium 2 still has a small die (not considering the cache). Itanium 3 with more improvements plus 6MB cache will offer even greater FP performance.
Itaniums instructions are large, in the same way RISC instructions are bigger then CISC. For this reason the Itanium needs a huge cache just to keep up with chips with lesser caches – Expect to See the Alpha 364 deliver Itanium 2 a through beating with only half the cache and essentially the same core as the 264.
And then Itanium 4 hits the street with Alpha team magic.
Yes, that will be interesting ๐
With ANY modern CPU essentially being a highly speculative out-of-order multiple execution unit complex engine, you absolutely need a fancy compiler to get good performance from your chip. Sure, Intel pushed more of the complexity of scheduling their fantastic VLIW beast down to the compiler
But thats the thing you see, Out of Order CPUs do the scheduling for you, VLIW on the other hand is totally dependant on the compiler, If some code is not optimal on an OOO machine the hardware still does it’s stuff however this is not the case with VLIW. You need a good compiler on every occasion with VLIW otherwise you won’t even touch the potential performance of the CPU.
With modern chips, your compiler is still doing a lot of work. Certainly the CPU can do some things via speculative execution, but you don’t want too many stalls. And when your compiler starts unrolling loops and using very specific SSE2 instructions and flows and data alignments for cache line loading… etc etc… there’s a lot of dependencies.
And you’re right. VLIW is all of the above + some. However, in a macro sense, 3 instruction bundles aren’t too large to work with. I think the compiler folks are doing fine with it.
but the compiler teams have finally caught up with the nature of the beast.
But have they? It usually takes years for compilers to catch up… and Itanium2 is brand new.
I think so. Itanium has been in development a long time ๐ Itanium 2 is physically reworked, not logically reworked. Heck, I’ve even been to Intel Labs for the courses on how to write fast Itanium code.
By the way, HP’s compiler would have improved SpecInt rates by 20%-25% (according to an article on Itanium that I read).
Why didn’t they use it then? Intels marketing dept not using a trick???
Some sort of SPEC thing where you are not supposed to use different compilers for INT and FP? I’m not sure.
And remember Itanium 2 still has a small die (not considering the cache). Itanium 3 with more improvements plus 6MB cache will offer even greater FP performance.
Itaniums instructions are large, in the same way RISC instructions are bigger then CISC. For this reason the Itanium needs a huge cache just to keep up with chips with lesser caches – Expect to See the Alpha 364 deliver Itanium 2 a through beating with only half the cache and essentially the same core as the 264.
Alpha 364 has quite a few more differences than just cache size. It has a very fancy memory interface on chip (for low latencies and extreme bandwidth) and also fancy interchip communications for multiprocessing. And I believe it has quite a bit of high-speed cache.
With the instruction bundles on the Itanium2, I don’t think the penalty is as much as you think it is. Intel didn’t waste too much space on the packaging.
The Itanium has been weak on memory bandwidth and memory latency with the Intel chipsets. With the HP chipset, at least the memory latencies aren’t too bad.
With a small CPU die, Intel can add more execution units to the Itanium, including some very interesting vector units that would match up well with VLIW.
And then Itanium 4 hits the street with Alpha team magic.
Yes, that will be interesting ๐
I hope a lot of the good Alpha technology ends up in the Itanium line. It would make me feel a bit better about the wretched job DEC and Compaq did leveraging all the super technology that the Alpha team created.
#m
I do not know why there is so much hype to upcomming future processors, specially from a manufactor known to cut down specifications prior to marketing…
Anyway i just wander what is the market share that Intel is forseening for this processor, because current grade of prossessores are very nice indeed and more then enought for 99% of computer usage of the world, with the other 1% be ocupied by clusters and/or supercomputers that use special designed CPUs and/or normal processors.
All in all, i’m just puzzled why continue to make silicon processors for a potencially inexistent market… Or are they needing to uplift its shares in the market and thrus are posting pseudo-fud PR…
Cheers…
Check out these two DB performance test with the new Itanium 2 and a Alpha chip both @ 1 GHz.
Alpha 1 GHz, 8 Gb RAM = 50117
http://www.tpc.org/tpcc/results/tpcc_result_detail.asp?id=102012901
Itanium 2 1 GHz, 12 Gb RAM = 78455
http://www.tpc.org/tpcc/results/tpcc_result_detail.asp?id=102070801
Win 64 will compete with Unix ?
Thanks for posting those benchmarks. Itanium has 50%+ better performance for half the price. Amazing.
The new Windows 64 bit OS and apps kill Alpha. Considering Alpha’s 64 bit UNIX has been out for ages and optimized, that Oracle has been running on Tru64 a long time, it is a big accomplishment for the new Itanium platform.
It will be interesting to see how Itanium performs on Linux vs. Win64.
#m
GCC 3.3 release plans talk about a new DFA scheduler, being an important part when optimizing code for VLIW/EPIC architectures. While I believe that HP’s compiler will have a performance advantage it will be interesting to compare GCC with whatever is out there, if not for anything else, to gauge the quality of this perticular and very important open source project. It could mean a lot to open source overall if gcc perfomed well since OSS will so much faster take advante of any new CPU architecture.
/jarek