Linked by Thom Holwerda on Tue 23rd Aug 2005 18:34 UTC, submitted by kellym
Intel During his keynote address at San Francisco's Moscone Center, Otellini unveiled the company's next-generation, power-optimized micro-architecture for future digital home, enterprise, mobile, and emerging market platforms aimed at a new category of converged consumer devices.
Thread beginning with comment 22232
To read all comments associated with this story, please click here.
My speculation
by Nicholas Blachford on Tue 23rd Aug 2005 23:25 UTC
Nicholas Blachford
Member since:
2005-07-06

...turned out to be incorrect.

Why?
Well it seem Intel have cut the "average" power not the TDP. The TDP seems to be remain in the range of the P-M range.

If they want to do an 8 core design they'll need to cut TDP in half, if they want to go to 16 cores they'll need to cut it highly aggressively.

Whenever Intel or AMD do this I fully expect my the methods I speculated on to be used.

Reply Score: 1

RE: My speculation
by on Wed 24th Aug 2005 00:49 in reply to "My speculation"
Member since:

Actually, I think even with a 16 core system, your speculation is still a SWAG (Silly Wild-Assed Guess) based on what they've already demonstrated with their ultra-low power version: after all, if they can make it run on .5 watts for (IIRC) is already a dual core processor, even at less than maximum speed of the other processor variants currently announced, why would they want/need that complication? 8*.5=4 watts, and, worst case, 16 cores *.5watts=8 watts for the chip. The biggest immediate problem is the size of the die to go 16 cores, along with the size of the cache: it takes an awful lot of data to keep 16 cores doing useful work, and memory speeds have not even come close to keeping up with CPU's ability to saturate bandwidth. Sure, you can increase the front side bus width to 256 bits, and with 16 cores, chances are you might have a bit of latency for each one going out of the cache, but if you actually have enough to keep 16 cores busy doing *useful* work (non-idle system thread), you'll *still* be bandwidth starved, because that's only 16 bits *per **memory bus** cycle* per core on average, with the multiplier likely to be an absolute minimum (with current RAM available) of at least 3-4 CPU cyles per single FSB bus cycle, not counting latency of the memory controller(s). It would be an interesting trick to schedule threads/tasks such that they all run within the L1/L2 caches of the chip such that those that are almost purely computation bound allow those that are almost purely memory bound to have a useful bit of bus bandwidth for their data and instructions, and this is *before* doing the insane thing of requiring all the translated VLIW code stored somewhere and pumped in and out. In short, what Transmeta did, while it has some pluses, simply can't practically be scaled without using several memory buses, including one of them purely for the use of translated code, because the processor would be sipping data through a narrow straw.

Then, when you consider that a 4K page of code is not likely to translate 1:1 with a 4K page of translated code in the VLIW format, that becomes much too hairy to do effectively in a combination of hardware and software, requiring all OS's to account for this weird architectural brainfart to achieve something that *might* be more power-efficient, perhaps more die-efficient, perhaps faster overall... but only with a huge infrastructure that defeats the whole purpose in the first place!

Reply Parent Score: 0