Linked by Thom Holwerda on Thu 18th Aug 2005 16:46 UTC, submitted by Nicholas Blachford
Intel "At next week's Intel developer forum, the firm is due to announce a next generation x86 processor core. The current speculation is this new core is going too be based on one of the existing Pentium M cores. I think it's going to be something completely different."
Thread beginning with comment 19726
To read all comments associated with this story, please click here.
Article is pure fantasy
by rayiner on Thu 18th Aug 2005 20:51 UTC
rayiner
Member since:
2005-07-06

The article is entirely speculative and completely without substance.

If it was just a Pentium M variant I donít think thereíd be such a fuss about it... No, this change is bigger.

The far more likely scenario is that Intel is hyping up the processor to cover for what is really just an incremental upgrade. That fits Intel's historical marketing profile.

Steve Jobs showed a graph with PowerPC projected at 15 computation units per watts and Intelís projected at 70 units per watt. Intel must have figured out a way to reduce power consumption 4 fold.

That does not logically follow. Even taking Steve Job's 4x number at face value, Intel doesn't have to reduce power consumption 4-fold. The 4x decrease is relative to the G5, not relative to Intel's current Pentium M. The G5 is a relatively power hungry chip with relatively poor integer performance. The current P-M probably has on the order of 2-3x better performance/watt than the G5. It would not take something radically different (a process shrink would suffice) to hit your 4x.

The forthcoming Cell processorís SPEs at 3.2 GHz use just two to three Watts and yet are said to be just as fast as any desktop processor.

Except they are not, not for the kind of code people run on PCs. The SPEs are SIMD FP monsters, but ever since PC graphics cards started handling transform and lighting on-chip, single-precision SIMD FP on the CPU has been relatively unimportant. That's why nobody really cares when a new version of SSE comes out, and why Athlon 64's school P4's in gaming despite the latter's very significant advantage in certain FP benchmarks.

but they could use some of the same techniques to bring the power consumption down.

There is nothing magic in the SPEs. The SPE's don't use a lot of power because they don't do much of anything besides SIMD FP. They have long pipelines, no cache, little parallelism, no out-of-order execution, no branch prediction, etc. Intel using these "techniques" in a next-gen CPU would be suicide. The thing would basically be a Pentium 4 taken to its logical conclusion --- massive theoretical FP performance, but quite useless for use as a central processing unit.

Out of order execution seems to be pretty critical to x86 performance

Out of order execution is critical to integer performance. The poor performance of the SPEs and the PPE on integer code is proof of that. Lack of architectural registers has jack to do with it. There is a reason RISC CPUs like the Alpha and POWER are massively out of order! Indeed, if you take a look at the two varients of SPARC: Sun's and Fujitsu's, you'll see that Sun's is in-order and has shitty integer performance, and Fujitsu's is out-of-order and has great integer performance.

The Itanium line, also VLIW, includes processors with a whopping 9MB of cache.

Becuase VLIW kills your code size and the Itanium is a $3000 chip with an enormous die area!

Intel has a lot of experience of VLIW processors from its Itanium project which has now been going on for more than a decade.

Most of the experience shows that the theoretical advantages of VLIW processors are mitigated by the fact that nobody can write a decent compiler for them! Intel is not stupid enough to bet its desktop processor business on Itanium technology. It'd be suicide.

indeed it has already been developing similar technology to run X86 binaries on Itanium for quite some time now.

One which works very poorly, for the simple reason that the Itanium cares a hell of a lot more about code scheduling than the Alpha did, and a binary translator doesn't have enough high-level information to do proper optimization on the translated code!

Switching to VLIW means they can immediately cut out the hefty X86 decoders.

Except the x86 decoders aren't that hefty, and the relative percentage of area spent on x86 decoders has been shrinking for years to the point where it's not a big deal anymore. Moreover, the decoder/cache tradeoff is a stupid one. Look at the Athlon64 die: http://www.chip-architect.com/news/Opteron_780x585.jpg

It would be liberal to say that the decoding portion takes up 5% of the overall die. A conservative estimate of the size of the L2 cache would be 50% of the die. Even making the cache 10% larger wipes out the benefit of eliminating the decoding section entirely. When you realize that we're talking about around a 15% cache size increase just to keep up with the increased size of (Itanium-like) VLIW code, we've just pessimized the design! Throw in an extra meg or two to hold translations, and your design just plain sucks...

The branch predictors may also go on a diet or even get removed completely as the Elbrus compiler can handle even complex branches.

With what, magic? The same magic compiler technology that was supposed to save Itanium? You know why the Opteron is kicking Intel's ass? It doesn't require magic compiler technology! CPUs that don't choke on branches will become even more valuable in the future, as software moves to dynamic languages (even C# and Java are dynamic relative to C code).

Intel could use a design with a single large register file covering integer, floating point and even SSE, 128 x 64 bit registers sounds reasonable

Look at how much die space the Itanium spends to make its massive register-file accessible in one clock-cycle. There is a reason chip designers segregate integer and FP registers. Having a unified register file means having lots of ports on it, which is a bitch to route and can bottleneck your clockspeed.

The heavily multi-threaded code advantage is bullshit. It took the industry a decade just to recompile their fricking apps to run natively on 32-bit machines. It'll be another decade minimum before heavily multi-threaded code is common, and that is wishful thinking on my part.

More on why this concept sucks in part 2 of my post (damn length limit...)

Reply Score: 5