Linked by Thom Holwerda on Tue 23rd Aug 2005 18:34 UTC, submitted by kellym
Intel During his keynote address at San Francisco's Moscone Center, Otellini unveiled the company's next-generation, power-optimized micro-architecture for future digital home, enterprise, mobile, and emerging market platforms aimed at a new category of converged consumer devices.
Order by: Score:
v cool computing
by 2501 on Tue 23rd Aug 2005 18:55 UTC
Ahh...
by Buck on Tue 23rd Aug 2005 18:59 UTC
Buck
Member since:
2005-06-29

So it IS Pentium M after all. That's sad.

Reply Score: 1

Rayiner Hashiem please
by durbhas on Tue 23rd Aug 2005 19:18 UTC in reply to "Ahh..."
durbhas Member since:
2005-07-07

hey rayiner, can you provide your comments on this ?
i remember a long time back you led a good discussion on Pipeline design and out of order exexution..
will be good to see that back instead of some mindless ranting about osx and all that
cheers
ram

Reply Score: 1

RE: Ahh...
by Anonymous on Tue 23rd Aug 2005 19:52 UTC in reply to "Ahh..."
Anonymous Member since:
---

The new CPU's are made from scratch, using knowledge gained from the Pentium M architecture experiment.
It features bits and pieces from both P M and Netburst as well as new things. All in all it is not a Pentium M derivative even if it sounds like that.

Reply Score: 0

Note quite...
by Harbinjer on Tue 23rd Aug 2005 19:18 UTC
Harbinjer
Member since:
2005-07-06

Not quite, they're adding a 4th execution unit it seems. Overall, it's probably pretty similar though. They do add 64-bit, and VT(Vanderpool--is that Xen stuff?), and its properly dual core.

Reply Score: 2

Mark Williamson Member since:
2005-07-06

Vanderpool/VT refers the hardware extensions Intel are adding to aid virtualisation. Previously, x86 was horrible to fully virtualise, VT makes it easier and faster.

And yes, VT support was contributed to Xen by Intel. Today at IDF, Windows running under Xen was demonstrated (using VT for full virtualisation), along side Linux guests running natively on Xen (paravirtualisation).

Reply Score: 1

Hmm...
by 1c3d0g on Tue 23rd Aug 2005 19:19 UTC
1c3d0g
Member since:
2005-07-06

...I'm fairly happy with almost everything except the dreaded FSB design. Why Intel why? Why do you keep hanging onto old technology? Why can't you integrate the memory controller onto the CPU? What is the problem, Intel? ;)

You've seen what it can do to latency and overall performance in general...I really can't comprehend their decision on this. They must have had a very good reason to stick with a FSB implementation. EiIther way it makes me sad, to say the least... :-(

Reply Score: 2

RE: Hmm...
by binarycrusader on Tue 23rd Aug 2005 20:47 UTC in reply to "Hmm..."
binarycrusader Member since:
2005-07-06

Why can't you integrate the memory controller onto the CPU? What is the problem, Intel? ;)

Integrating the memory controller into the CPU has it's downsides. It is not a panacea. You can ask almost any game developer or CPU analyst that does low level performance analysis about the penalties.

An on-die memory controller will give you reduced latency at the cost of bandwidth. Having an on-die memory controller also raises your costs and restricts a manufacturer who sometimes has to maintain compatability between different processor socket designs.

Reply Score: 2

RE[2]: Hmm...
by 1c3d0g on Tue 23rd Aug 2005 21:02 UTC in reply to "RE: Hmm..."
1c3d0g Member since:
2005-07-06

Integrating the memory controller into the CPU has it's downsides. It is not a panacea. You can ask almost any game developer or CPU analyst that does low level performance analysis about the penalties.

An on-die memory controller will give you reduced latency at the cost of bandwidth. Having an on-die memory controller also raises your costs and restricts a manufacturer who sometimes has to maintain compatability between different processor socket designs.


O.K., that is true...but now let's look at it this way: even AMD, which is a company at least 10x smaller compared Intel, managed to pull it off (integrating the memory controller into their CPU's). So if AMD could do it, why can't Intel? I'm extremely confident, that if Intel put their minds to it, they could elegantly implement an integrated memory controller. I just don't understand the reasoning behind the decision (was it really that expensive to implement, is compatibility so high on Intel's priority list, etc.)

Reply Score: 1

RE[3]: Hmm...
by binarycrusader on Tue 23rd Aug 2005 21:14 UTC in reply to "RE[2]: Hmm..."
binarycrusader Member since:
2005-07-06

O.K., that is true...but now let's look at it this way: even AMD, which is a company at least 10x smaller compared Intel, managed to pull it off (integrating the memory controller into their CPU's). So if AMD could do it, why can't Intel? I'm extremely confident, that if Intel put their minds to it, they could elegantly implement an integrated memory controller. I just don't understand the reasoning behind the decision (was it really that expensive to implement, is compatibility so high on Intel's priority list, etc.)

AMD managed to pull it off, but not without the disadvantages I mentioned. Integrating the memory controller raised their costs, and has kept them behind in memory bandwidth compared to Intel. Which is part of the reason why Intel systems are still favored for bandwidth intensive activities such as video / audio encoding, etc.

Additionally, because of the integrated memory controller AMD did have some advantages with their dual-core performance, but they also have a disadvantage in that both processors have to share 6.4gb/s of potential memory bandwidth, whereas Intel since they do not have an on-die memory controller can have each CPU use the full potential bandwidth.

Yes, I believe it's possible that they will eventually overcome most of the disadvantages (Intel or AMD), but for now it isn't yet there. I would rather see cost effective processors that perform well, than ones that are idealistically better but still have all the disadvantages I listed.

(I say this as an Athlon64 2800+ owner)...

Reply Score: 1

v RE[4]: Hmm...
by Anonymous on Wed 24th Aug 2005 00:24 UTC in reply to "RE[3]: Hmm..."
RE[5]: Hmm...
by japail on Wed 24th Aug 2005 01:02 UTC in reply to "RE[4]: Hmm..."
japail Member since:
2005-06-30

The cores do share the same memory controller, but the K8 isn't particularly bandwidth constrained and it's a side-effect of sharing a memory controller rather than moving the memory controller on-die. The K8's multicore strategy is better than the P4's. Benchmarks run with variable increases in the memory bandwidth typically result in fairly modest performance gains, and the discussion as far as games is concerned is a tad fishy because PC game engines are not typically multithreaded with kernel threads, though there are a few that make use of coroutines and some that make limited use of multiple threads.

Reply Score: 1

RE[5]: Hmm...
by binarycrusader on Wed 24th Aug 2005 03:32 UTC in reply to "RE[4]: Hmm..."
binarycrusader Member since:
2005-07-06

I call Fudulent statement.

Now prove me wrong.


I wish people would stop calling things "FUD" which are not. I was not posting "fear, uncertainty, and doubt". Even if I was posting something incorrect, which I do not believe I was, it would be an "incorrect statement", not "FUD".

See this diagram as proof:

http://images.anandtech.com/reviews/cpu/amd/athlon64x2/preview/AMDa...

As you can see a dual-core setup for the Athlon64-x2 shares one memory controller.

Now, this is not the case for SMP systems, just dual core.

Yes, I realise that Intel's dual-core chips will be sharing a single memory controller on the motherboard. However, each processor has a full amount of bandwidth to communicate with the memory controller instead of sharing a single path. That makes some difference.

Reply Score: 1

RE[6]: Hmm...
by Anonymous on Wed 24th Aug 2005 05:21 UTC in reply to "RE[5]: Hmm..."
Anonymous Member since:
---

Yes, I realise that Intel's dual-core chips will be sharing a single memory controller on the motherboard. However, each processor has a full amount of bandwidth to communicate with the memory controller instead of sharing a single path. That makes some difference.

How so? This is not a sarcastic or rethorical question, mind you... an honest request for explanation from someone who is not into CPU design at all.
In a naive point of view, a bandwidth bottleneck is a bandwidth bottleneck, whether it happens on die or on motherboard. Two threads running on different cores will need in both case to reach through the memory controller all along the path the ram for their work, in both cases sharing the bandwidth. Again, from my naive point of view, the lower latency of an on-die memory controller actually reacts 'faster' to changes in bandwidth occupation thus optimizing the flow? I don't really know ;) Please explain.

Reply Score: 0

RE[7]: Hmm...
by binarycrusader on Wed 24th Aug 2005 05:59 UTC in reply to "RE[6]: Hmm..."
binarycrusader Member since:
2005-07-06

How so? This is not a sarcastic or rethorical question, mind you... an honest request for explanation from someone who is not into CPU design at all.

I really don't remember or know at the moment. What I do know is this: AMD has consistently scored lower in bandwidth heavy benchmarks, and articles / reviews that I've read have stated this is a trade-off between on-die memory controller's low latency and a motherboard based memory controller instead. It's not unique to dual core systems. There is some advantage that Intel seems to have.

Reply Score: 1

RE[2]: Hmm...
by butters on Wed 24th Aug 2005 07:13 UTC in reply to "RE: Hmm..."
butters Member since:
2005-07-08

"An on-die memory controller will give you reduced latency at the cost of bandwidth."

I admit I'm a slouch at systems architecture... for a computer engineering student. However, I don't understand how putting the memory controller on-die either limits or adds cost to memory bandwidth relative to bridged designs. Do chip interconnects cost more than pins? Does the FSB or bridge clock offer more bandwidth in some way?

Your comment leaves me puzzled and looking for an explanation. I'll mod you up for making my mind do that thing where it thinks.

Reply Score: 1

more details
by nimble on Tue 23rd Aug 2005 20:15 UTC
nimble
Member since:
2005-07-06

Techreport has posted some more details of the microarchitecture:

http://www.techreport.com/onearticle.x/8695

14-stage pipeline instead of 12 in PPro through P-M. Four instructions per cycle can be issued instead of three. Multi-cores share the L2 cache. Hyperthreading is missing but could come later.

Reply Score: 3

v 10 cores in 2010
by Anonymous on Tue 23rd Aug 2005 20:41 UTC
RE: 10 cores in 2010
by 1c3d0g on Tue 23rd Aug 2005 20:51 UTC in reply to "10 cores in 2010"
1c3d0g Member since:
2005-07-06

So they expect to hit 10 cores in 2010, right?
I guess the Cell chip will carry 40 cores by then.

Well done, Apple.


You've got it all wrong. It's not the quantity of the cores, but the quality that counts. If those 40 cores on a Cell CPU can only process 1/8th of what a 10-core Intel CPU can do, then they've still got a lot of catch-up to do. Besides, the 2 aren't really comparable anyway as (this is greatly simplified) the Cell is an in-order Power arch and Intel is an out-of-order x86 arch.

Reply Score: 1

RE[2]: 10 cores in 2010
by butters on Wed 24th Aug 2005 07:44 UTC in reply to "RE: 10 cores in 2010"
butters Member since:
2005-07-08

OK I'll bite (or should I say, "sting"):

Currently, it is all about the "quality" of the core(s) (in your parlance) rather than the quantity, because 99% of software is written like people think they think. That is, in a linear fashion, branch-heavy and monolithic. I call this the "queen bee model." I think our brains really work like a beehive, with one queen bee and thousands(?) of worker bees that do her bidding. We think we think like a queen bee because we can only express our "train of thought" in terms of the perspective of the queen bee. This is why today's software is incapable of providing the insight and "intelligence" of the human brain.

Hardware comes before software. It is not like the chicken and the egg. This is the way it works. Humanity is developing the hardware of the future. The first incarnation is the Cell (not hexagonal only because of fabrication technology), which conveniently fits into my analogy by representing the "beehive model." This model requires that we program the hardware like the hive works, not like the queen bee sees things happening. This can be implemented in code, in the compiler, in firmware, and/or even in the hardware itself.

Then it comes to pass that improvements to the quality of each worker bee pales in comparison to the quantity of worker bees in the hive in terms of improving the overall performance of the system. The advantage that we have over the bees--well, I think we have many, but here is just one--is that we can have multiple queen bees in our hive, each controlling an optimal quantity of worker bees, and they will cooperate with one another. This adaptation has not (...yet) taken hold in the social systems of modern bees. Thus we can manage scalability issues and avoid adding undo complexity to our queen bee processing elements (like OOO execution, for example).

If I'm right (and by extension hundreds of other brilliant people), the beehive model of computing will usher in an era where computers can truly think like humans can. And we will do it by putting quantity before quality.

I would like to apologize to any anaphylactics out there, for whom this post must have been terrifying...

Reply Score: 1

small step, a lot to see
by JrezIN on Tue 23rd Aug 2005 20:46 UTC
JrezIN
Member since:
2005-06-29

Lots of improvements... but it doesn't look like a "new architeture" too much... It's more like the evolution from Pentium III to Pentium M...

...But until we have the real thing for testing, I can change my mind (probably won't, but anyway...)...

The lach of HT is no surprise, as they're looking for multi hardcores now, but they have to run... AMD already has a really good dual core design and several plans for quad and octa cores with it's next socket...

I just hope they both have competitive products (currently, AMD has better overall cost vs quality) and both multicore processores became cheaper than the current prizes!

Reply Score: 2

RE: small step, a lot to see
by butters on Wed 24th Aug 2005 07:03 UTC in reply to "small step, a lot to see"
butters Member since:
2005-07-08

"currently, AMD has better overall cost vs quality"

I like AMD, and my next CPU purchase will be AMD, but I don't know about price/performance anymore. It seems like the magnetic field of the CPU price/performance planet is suddenly flipping.

Reply Score: 1

Netburst RIP
by nimble on Tue 23rd Aug 2005 20:55 UTC
nimble
Member since:
2005-07-06

So the engineers are back in charge at Intel. Great to see that silly GHz chase is finally coming to an end.

Only shame is it's still another year until all this stuff actually becomes available. Let's see how AMD can use this window of opportunity.

And as for Mr Nicholas Blachford of "the Inquirer" fame: eat your words. No sign whatsoever of VLIW or binary translation or anything else of your great predictions. Just plain old and proven out-of-order execution.

Reply Score: 1

next-gen batteries
by 2501 on Tue 23rd Aug 2005 21:12 UTC
2501
Member since:
2005-07-14

what about the next-gen batteries??? They are all talking about better comsuption and blah,blah,blah....but nothing about the development of the next generation of batteries which could help also. It hasn't really evolved and this is a great factor also to consider.
-2501

Reply Score: 1

transmeta
by 2501 on Tue 23rd Aug 2005 21:19 UTC
2501
Member since:
2005-07-14

Transmeta has been preaching what Intel mentioned today. So, Why is this revolutionary??? I don't see anything special. All I see is Intel taking the same path but now they are carrrying the flag. nothing special.

-2501

Reply Score: 2

pictures
by 2501 on Tue 23rd Aug 2005 21:24 UTC
2501
Member since:
2005-07-14
woo7
by Anonymous on Tue 23rd Aug 2005 21:56 UTC
Anonymous
Member since:
---

this is a great move.. processors are already powerful enough.. they need to address power consumption. jus 'cos some ppl's pc's are bogged down by spy/adware doesn't mean their hardware is to blame. apple's os x will fly on these chips ;)

Reply Score: 0

arstechnica
by nimble on Tue 23rd Aug 2005 21:58 UTC
nimble
Member since:
2005-07-06

Hannibal has posted his take on the new microarch:

http://arstechnica.com/news.ars/post/20050823-5232.html

He points out how the execution core is quite similar to the PPC970 (aka G5).

And the recent Inquirer article gets a fitting appraisal in the first sentence.

Reply Score: 3

Quad core processors
by Dark_Knight on Tue 23rd Aug 2005 22:52 UTC
Dark_Knight
Member since:
2005-07-10

I'd buy a mobile workstation with a quad core processor to run my highend 3D/2D apps on ;) As for Hyperthreading it would be interesting to see if Intel offers this or something similar.

Reply Score: 1

My speculation
by Nicholas Blachford on Tue 23rd Aug 2005 23:25 UTC
Nicholas Blachford
Member since:
2005-07-06

...turned out to be incorrect.

Why?
Well it seem Intel have cut the "average" power not the TDP. The TDP seems to be remain in the range of the P-M range.

If they want to do an 8 core design they'll need to cut TDP in half, if they want to go to 16 cores they'll need to cut it highly aggressively.

Whenever Intel or AMD do this I fully expect my the methods I speculated on to be used.

Reply Score: 1

RE: My speculation
by Anonymous on Wed 24th Aug 2005 00:49 UTC in reply to "My speculation"
Anonymous Member since:
---

Actually, I think even with a 16 core system, your speculation is still a SWAG (Silly Wild-Assed Guess) based on what they've already demonstrated with their ultra-low power version: after all, if they can make it run on .5 watts for (IIRC) is already a dual core processor, even at less than maximum speed of the other processor variants currently announced, why would they want/need that complication? 8*.5=4 watts, and, worst case, 16 cores *.5watts=8 watts for the chip. The biggest immediate problem is the size of the die to go 16 cores, along with the size of the cache: it takes an awful lot of data to keep 16 cores doing useful work, and memory speeds have not even come close to keeping up with CPU's ability to saturate bandwidth. Sure, you can increase the front side bus width to 256 bits, and with 16 cores, chances are you might have a bit of latency for each one going out of the cache, but if you actually have enough to keep 16 cores busy doing *useful* work (non-idle system thread), you'll *still* be bandwidth starved, because that's only 16 bits *per **memory bus** cycle* per core on average, with the multiplier likely to be an absolute minimum (with current RAM available) of at least 3-4 CPU cyles per single FSB bus cycle, not counting latency of the memory controller(s). It would be an interesting trick to schedule threads/tasks such that they all run within the L1/L2 caches of the chip such that those that are almost purely computation bound allow those that are almost purely memory bound to have a useful bit of bus bandwidth for their data and instructions, and this is *before* doing the insane thing of requiring all the translated VLIW code stored somewhere and pumped in and out. In short, what Transmeta did, while it has some pluses, simply can't practically be scaled without using several memory buses, including one of them purely for the use of translated code, because the processor would be sipping data through a narrow straw.

Then, when you consider that a 4K page of code is not likely to translate 1:1 with a 4K page of translated code in the VLIW format, that becomes much too hairy to do effectively in a combination of hardware and software, requiring all OS's to account for this weird architectural brainfart to achieve something that *might* be more power-efficient, perhaps more die-efficient, perhaps faster overall... but only with a huge infrastructure that defeats the whole purpose in the first place!

Reply Score: 0

RE: 10 cores in 2010
by Anonymous on Tue 23rd Aug 2005 23:59 UTC
Anonymous
Member since:
---

"So they expect to hit 10 cores in 2010, right?
I guess the Cell chip will carry 40 cores by then."

Cell have only ONE real core. The other 8 cores are only for multimedia.
It's like Apple had a G4 with 1 PowerPC core and 8 AltiVec unit.

Intel offers 10 REAL core.

Reply Score: 0

RE: RE: 10 cores in 2010
by Anonymous on Wed 24th Aug 2005 07:23 UTC in reply to " RE: 10 cores in 2010"
Anonymous Member since:
---

2010 is 35 dog years into the future. (IANADL)

Multicore or not, Cells can be combined too, today, and that would mean, for 10 Cells, 10 PPE:s and 70 or 80 SPE:s. That would absolutely blow your socks off, if your OS of choice can keep them busy. I'm not saying it would be low-watt, but neither would an Intel 10-core design be, today.

Reply Score: 0

RE[2]: RE: 10 cores in 2010
by nimble on Wed 24th Aug 2005 08:40 UTC in reply to "RE: RE: 10 cores in 2010"
nimble Member since:
2005-07-06

Multicore or not, Cells can be combined too, today, and that would mean, for 10 Cells, 10 PPE:s and 70 or 80 SPE:s. That would absolutely blow your socks off, if your OS of choice can keep them busy.

The OS can't help you much there. With their separate instruction sets and local memories the SPEs have to be treated as exclusive resources.

So an application can ask for a number of SPEs, the OS checks that there are enough left and says "ok, here you go", and then it's entirely up to the application to put them to good use, whereby you can forget about things like pthreads.

At least that's how Linux-on-Cell works at the moment. If you can come up with something better, I'd guess IBM and Sony would be very interested.

Reply Score: 1

64-bit
by Anonymous on Wed 24th Aug 2005 17:43 UTC
Anonymous
Member since:
---

At least now all the folks snarking about how the Intel-based Mac's will be 32-bit machines are going to shut up.

Reply Score: 0