IBM has finally, finally, started shipping computers based on the Cell architecture it co-developed with Sony and Toshiba. In fact, the company claims, a number of high profile clients such as the University of Manchester and the Fraunhofer Institute are already running the much ballyhooed devices. Cnet has more.
now we can see how much of the hype surrounding the cell cpu was, well… hype!!
I hope it holds its own, the industry needs a viable competitor to x86/x86_64. It keeps Intel/AMD on their toes 😉
It’s not really a competition: you have to create custom software or at least adapt it to be able to use efficiently Cell CPU..
So sure it’s more efficient, but it’s competing with x86 on a very small part of the market.
I’d say it’s competing more against PPC and Itanium than x86. I guess x86 is sort of competing against it as well, but not much right now. Maybe more in the next few years.
Huh? How is CELL competing against PPC and Itanium? First off, CELL is a PPC… a bit of a contradiciton for a PPC to be competing against itself. Second, IA64 is geared towards the server/high end markets, these blades have limited local memory and are used mostly as DSP appliances.
Maybe I need to clear it up for some of you: IA64 is like an apple, CELL is more like an orange. Get it?
First off, CELL is a PPC
Correct. What I meant by PPC was the G5. Sorry for the confusion.
a bit of a contradiciton for a PPC to be competing against itself.
No contradiction – related products compete with each other all the time. It is why the Celeron is so completely (and artificially) crippled, because otherwise it would compete against Intel’s high end and take away more expensive sales. A new Ford truck is going to be competing against the old one if it isn’t discontinued, etc.
Second, IA64 is geared towards the server/high end markets, these blades have limited local memory and are used mostly as DSP appliances.
These blades, yes, but I see the chip in general competing in that high end market rather than the low end like these blades. I could be wrong about that though.
Huh? How is CELL competing against PPC and Itanium? First off, CELL is a PPC… a bit of a contradiciton for a PPC to be competing against itself.
This argument would suggest then, that Intel and AMD are not competitors, I mean, why would i386 compete with it self?
I may be wrong, but I believe the original poster was possibly suggesting that the PPC/IA64 segment of the market is an area, where the purchasers are in the habbit of either A) Not running “popular” applications, and very specific ones, B) have significant inhouse development facilities.
These sorts of people should not be compared to average Jo, who can’t get his favourite application working.
Though I may be wrong, and the poster meant some thing entirely different.
Second, IA64 is geared towards the server/high end markets, these blades have limited local memory and are used mostly as DSP appliances.
I have to admit, at this point things are going way over my head, and I’m not in a position to really talk too much.
However, just because some thing is currently being developed towards the blade area, does that mean that it is exclusively capapble of dealing with that form of application.
Considering the CELL system I think is a fairly new technology, one could think that may be they are just starting? As I said, I don’t really know, just wondering.
Maybe I need to clear it up for some of you: IA64 is like an apple, CELL is more like an orange. Get it?
Marvelously whitty, did you think it up all on your own? ;P
Hi,
now IBM has proved that Cell can be used as a single processor and does not need a host processor like the blades from Mercury. So the logical consequence would be to offer workstations equiped with this chip as a replacement for the PPC970 aka G5 processor currently used in the cheapest IBM workstation offerings. But the price has to come down, more than 18.000$ for a single blade? Come on.
Anton
I’ve read some articles stating that the CELL wouldnt be very suitable on normal PCs.
I believe I read that the CELL has limited capacity when it comes to double precision floating-point operations?
Double precision floating point is much slower than single precision on the Cell, but I believe it is still quite a bit faster than it would be on an x86. You just have to be aware there is a speed tradeoff that you wouldn’t really have on most other architectures.
Yes, Cell is not very suitable for normal PCs. It has nothing to do with the floating-point support, though you are right in that Cell’s DP floating-point is much slower than its SP floating-point. The primary problem with Cell for desktops/workstations is that its integer performance sucks. There are two types of processors in the Cell, an SPE and a PPE. The SPEs are completely unsuitable for general-purpose code, since they can only directly address 256KB of memory. Any code utilizing the SPEs must be specially written to fit their memory model.
With the SPEs out of the picture, much of Cell’s potential performance disappears. What’s left is a very simplistic 2-issue in-order PPE. To make things worse, the PPE also has extremely high cache latencies, and a very long pipeline. All these sacrifices were made so the PPE could be clocked as highly as the even simpler SPEs without using large amounts of power. Added together, these inefficiencies compound, resulting in a processor whose general-purpose integer performance is probably at the level of a sub-1GHz PIII (and that’s being optimistic).
Edited 2006-09-14 00:12
The Cell SPEs can stash and fetch chunks of their local memory to interleaved memories like RAMBUS memory very quickly. Just think of the Local Store memory like a software-controlled cache much like a disk cache. It traverses links in a linked list by double or triple-buffering the nodes and making the linked lists much faster.
BTW, the thing that slows down the SPEs are pipeline stalls due to excessively pipelining them for vector usage. Replacing search trees with search tries will speed up the matter making up for the difference.
If you are old enough to remember programming a RAM expanded Commodore 64 or a 286 with XMS memory then you’ll have the hang of programming the SPEs as well in no time. It’s just the same with some pipelining thrown in for speed.
Memory outside the local store cannot be directly accessed via the SPE. Ie: you can’t read data in main memory using a C pointer. You have to DMA the data into the LS to use it, which makes it unsuitable for general-purpose code.
In a Commodore 64 or 286 with expanded memory you have to DMA to/from conventional memory as well. How is this any different from what you’ve said? Apparently you don’t consider a 286 to be a real PC or something becuase this kind of workaround is commonplace outside of the 32-bit realms. It is unusual for a 128-bit processor like the SPE to use such a workaround but I’m not concerned becuase the DMA process is fast for large chunks of memory.
IMHO, the days of readable source code without workarounds/threading is drawing to a close anyway. The days of optimized code are coming back.
Apparently you don’t consider a 286 to be a real PC or something becuase this kind of workaround is commonplace outside of the 32-bit realms.
I certainly don’t consider it to be a “normal PC” (from the original post), not in 2006.
IMHO, the days of readable source code without workarounds/threading is drawing to a close anyway. The days of optimized code are coming back.
The days of optimized code aren’t coming back, not for the general purpose market, anyway. Skilled labor only continues to become more expensive, while manufactured products continue to drop in price. Until we reach a social cataclysm, after which the labor of machines starts to cost more than the labor of men, then maybe highly-optimized code will make a comeback. As it is, it’ll remain relegated to niches, and will become more niche as time goes by.
Now, threading is definitely going to be in the future, for general purpose machines even, but threaded code can still be highly readable. Code that has to hack around a 256K directly addressable memory on the other hand, well that’ll never be highly readable outside of very specialized or simple algorithms.
Supposedly the IBM XLC Compiler can do a code partitioning, and software cache support (e.g. access to variables, array, etc. would ensure that the memory is present when it’s needed), where the code is dynamically swapped in-out the SPU when needed. Here there are more documents about it:
http://www-128.ibm.com/developerworks/edu/pa-dw-pa-cbecompile5-i.ht…
http://cag.csail.mit.edu/crg/papers/eichenberger05cell.pdf#search=~…
There is a very long distance between initial results in a research paper, and a usable, mature implementation. The delay is only a matter of years, if you’re lucky. There is nothing in the article that indicates anything like this is ready to ship for the forseeable future.