Post a Comment
Would it have been any better had they named it Knight's Fury? No of course not but it really would have been a better name. It would have made more sense with Knight's Ferry being the developer version.
All I'm not impressed with intel so far their designs are little more than *take small cores and network together* which isn't the best approach for supercomputing IMO cloud computeing perhaps but not supercomputing.
Super computers don't actually require complicated CPU designs given that the instructions are fixed; all the CPU that is required to do is suck in the information, crunch it and spit out the other side. When you use a super computer for number crunching you're pushing in a sequence of equations and pumping out a result the other end so you can get away with stripping off branch prediction and so forth because all you're really interested in is raw power.
Have a simple CPU design, chain together those many cores, parallelise the code to buggery, a heck load of bandwidth and a clock speed going gang busters and you'll be all set to party.
Edited 2010-06-02 00:45 UTC
Instructions are always fixed, it's called an ISA. And what you are describing seems to be some kind of stream processor. Most super computers need to be good in several tasks, with algorithms which can be parallelized successfully to varying degrees. Of course even if you parallelise it to bits there is always Amdahl's law.
You can only get away with stripping off branch prediction (and I presume other niceties such as Out Of Order Execution) if you have a well behaved algorithm, which you almost never have in reality. Of course some (parts of) algorithms run well on GPUs, which is what you seem to be describing here.
Again, this only works for some algorithms. Communication between processors does not scale that well for most workloads. So you'd rather want fewer high performance cores, than more low performance cores. Scaling is not very important if your total performance still sucks.
If you don't believe me, check out the super computer top 500. Almost all systems use Xeons or Opterons.
What Intel is building here is interesting. Larrabee was supposed to be a many core x86 processor with massive vector units. The memory system was cache coherent using a massive ring bus. There were serious doubts as to if it would scale very well even for embarrassingly parallel workloads. This MIC might look more like the other project Intel had, in which there was no cache coherency but all chips were connected by a switched network and one had to use explicit message passing between threads in software, almost like a cluster on a chip.
I saw an Intel video about these experimental many-core chip designs and cloud computing was specifically intended to be the target. Their whole goal is to explore ways to further improve space- and power-efficiency in cloud computing datacenters. (eg. by having 50 cores that consume as much as a single high-end CPU and can be throttled back to a 10th of that at off-peak times)
Doubtful it will be useful in super computing. The current 6 core chips tended to not have nearly the performance improvement over 4 core chips that was expected. Mostly because the problem isn't CPUs count. Or CPU speed. The problem is the memory. Memory bandwidth is the killer, and no one seems to be offering solutions.
Well ... intel has eDRAM which is more compact which is also why intel chips have such huge caches these days...
I would be curious to see what would happen if cores were capped at 4 and whatever extra die space were thrown at cache and a real integrated GPU design where it would be more akin to how an FPU is treated instead of just a device hanging off of PCI-E
Aren't you confusing them with IBM? I'm pretty sure Intel is just really good at making small cheap SRAM.
On top of that, even eDRAM would leave them with the problem of having to have a royal caravan of RAM slots--it would only make the size of cache cheaper. eDRAM is still no performance match for SRAM.
However, even with SRAM caches, workloads that can crunch on moderate sizes of data that can be fit into a shared cache might be able to work very fast, without jacking up the RAM bandwidth. If Intel needed to, I'm sure they could do 32+MB SRAM caches on a die, and still make their high margins.
Edited 2010-06-02 04:09 UTC
This "multiple low-powered core" technology is not gonna last on the desktop, the day people realize that only few problems scale well accross multiple cores.
For virtualization-oriented servers, on the other hand, putting that together with NUMA could do wonders. But as other people around, I think that bus bandwidth issues will kill this product.
Edited 2010-06-02 07:42 UTC
You mean only few software programs scale well across multiple cores. There are many problems that can be decomposed into parallel tasks, you just need to build your software from the ground up to take advantage of large number of parallel execution units.
There are many things people do on desktop machines that benefit from multicore processors: audio/video encoding, digital photography, data rendering, be it a complex 3D scene or office/web document. And many new problems can be created to fill the demand for such hardware.
No one says that all existing software or all existing types of software should scale to multiple cores. It's more an issue of existing software taking advantage of parallel processing for different kinds of tasks.
Software like photoshop, 3ds max, even web browsers (scaling javascript and the rendering processes) can be modified to take advantage. Audio software can greatly be benefitted too (run multiple virtual effects/synthesizers each on a separate core), and of course videogames (physics simulation, renderingm etc).
So the target is to give more power to existing software, not asking it to be rewritten...
Why the odd number of 50 though?
Can't wait to get our hands on this chip to see how it performs! This is something we want to support in BareMetal OS (http://www.returninfinity.com) for HPC.
Edited 2010-06-02 13:11 UTC



