Cell Architecture Explained – Version 2

Submitted by Nicholas Blachford 2005-07-22 Hardware 45 Comments

6 months ago N. Blachford wrote an article based on the 2002 Cell processor patent application. Since that original document, the Cell evolved considerably in both hardware and software. This new version has been rewritten to cover those changes. New sections have also been added to cover Cell software development and the design of the architecture.

About The Author

Adam Scheinberg

Technology Executive • Web Developer • Father • Foodie • Music Snob • OS enthusiast

Follow me on Mastodon @[email protected]

45 Comments

2005-07-22 9:46 pm

Anonymous
I told EVERYBODY that all the Cell hype was just that, hype!
2005-07-22 9:50 pm

Anonymous
I was thinking of this idea if i wanted to start a gaming console company.

I would stay away from all fancy new ‘expensive’ technologies that don’t provide much horsepower.

I would go with TWO graphics cards in the box and a regular processor for AI and physics.

I want tons of triangles and polygons !

Any thoughts on this ? XBOX and Cell seem way overrated.

2005-07-22 10:14 pm

mezz
I would go with TWO graphics cards in the box and a regular processor for AI and physics.

Why not go with dual GPU or dual cores of GPU (if dual core is possible) instead two graphic cards? Don’t you think two graphic cards (SLI) is a flaw logic?

2005-07-22 10:02 pm

.Joe
I’m sure it seemed nice, but how would you use two graphics cards to process a single scene? That flaw kind of destroys your entire plan.

2005-07-22 10:29 pm

Mediocre Sarcasm Man
Alienware sells a computer with two graphics cards. IIRC, one renders the top half of the screen and the other renders the bottom.
2005-07-22 10:34 pm

japail
The same way that 3dfx did, and now nVidia and ATi do.
2005-07-23 2:32 am

Varg Vikernes
I haven’t yet read the article, but if it’s anything like the first version it will be an interesting read.

Now to the graphics. As others already mentioned you can use 2 NVIDIA cards (SLI) but the effect is quite minimal, because 1 card renders the upper half of the screen while the second card the lower half. So if you play a FPS gfx #1 has to render the grass, trees, water,… while card #2 only renders the sky – yet the card #2 must wait until the card #1 is finished.

A better solution (especially for consoles) is to use 2 cards, each rendering an odd or even line (consoles render (and TVs) in interlaced mode so this isn’t that much of a hassle. This would then realy bring nearly double the speed boost of a single card. Unfortanently parent’s idea is still flawed, because games still rely heavily on the CPU (there’s a lot of gfx beeing done on the CPU also) – couple that with sound, physics, AI,… and you’re back to the expesive console we have today

2005-07-23 4:33 pm

japail
SFR does not work as you have outlined; namely clip region distribution is load balanced from frame to frame. The alternative you suggest (3dfx’s Scan Line Interleave) suffers from problems: it causes artifacts, it makes implementing anti-aliasing probelmatic, it makes dealing with shaders problematic, and in with modern functionality probably wouldn’t be worth implementing. It should also be noted that televisions haven’t been interlacing-exclusively in years. That little p after the vertical resolution means progressive.

Using two actual “cards” is a waste of money for a console. It wouldn’t be cheaper than the RSX (the PS3 will cost about as much as a single high-end desktop nVidia-based card for consumers, and the RSX is more capable). Even with volume discounts or other agreements, you’re probably better off just purchasing chips from nVidia and integrating them into your console yourself; it’ll save a lot on space, RAM, third-party overhead, etc. Still, this is a costly design.

2005-07-22 10:12 pm

somebody
Yeah, plain wrong assumption about two graphics cards. Game is much more than tons of triangles and polygons.

PS3 initial cost will be 399$ (that is complete PS3 not Cell only, which includes blue ray, 1GBit eth, controller etc.), now,… which two graphic cards can you actualy get for that money. And then how will you solve IO troughput problems. GPU is faster than CPU in some (SOME is the thing that is important here) cases, but here is the catch (or two)

1. Being able to run some game involves high IO load, cheap hardware, low performance

2. Cell is designed to take the same inroads as GPU in the cases where GPU excels.

As for me. Can’t wait for march
2005-07-22 10:13 pm

Anonymous
This time it really sounds convincing. I really hope he right and the world will embrace a good new technology. I want to belief it but i lack faith.
2005-07-22 10:20 pm

Anonymous
honestly the cell seems very well and good but ah do we really need all that power? Honestly graphics do not make great games alone. All graphic power is, is a marketing outlet no more no less. Sure it is nice but would you pay more for the same game but prittier? I don’t think so. The new consoles ARE BEING PUSHED TOO EARLY. I honestly couldn’t care wether the xbox has better graphics than the ps2. Nor will I care whether the ps3 or the x360 will outperform the other It’s the games that matter. Now i know the cell is not just for the console, but graphics seems to be one of its strenghs (if i’m not mistaken… if i am please correct me) The cell is not designed specifically for general computing that is for sure. But i would suppose that it would be sufficient in all other tasks too.

The cell is a step forward in chip design that is unquestionable but it is the first generation of these types of chips (multicore processing more specifically more than 2 or 1) Don’t be knocking it as JUST hype. It is what it is. The first generation of next generation processors.

2005-07-22 10:34 pm

Mediocre Sarcasm Man
http://en.wikipedia.org/wiki/Ps3#Graphics_processing_unit

The PS3 relies on a custom designed graphics card, not the cell, for graphics.
2005-07-23 10:41 pm

rayiner
The new consoles ARE BEING PUSHED TOO EARLY.

How? The PS2 was released about 6 years after the PS1 (2000), and the PS3 will be released 6 years after the PS2 (2006). Six years is quite a long lifetime for a console. The SNES, which by all accounts was a long-lived console ha a lifetime of about that long, and even then only because the N64 was delayed a year. Even the NES only lasted 7 years (5 in the US).

2005-07-22 10:37 pm

Anonymous
Dual cards is less expensive than all the PS3 custom stuff. I suppose 2 gpus could work too.

http://reviews.cnet.com/4520-8902_7-5582790.html

http://www.slizone.com/page/home.html
2005-07-22 11:08 pm

John Nilsson
Am I the only one that reacted to the totally braindead abuse of xhtml+css in this articel?

2005-07-23 12:54 am

Nicholas Blachford
Am I the only one that reacted to the totally braindead abuse of xhtml+css in this articel?

It’s auto-generated, unfortunately the auto-generator (Apple’s Pages)* leaves something to be desired

It’s far too much hassle to something of this length by hand in HTML.

*Pages is otherwise a very nice app.

2005-07-22 11:08 pm

Anonymous
Just a thought, but with two graphics cards, wouldn’t it make it easier to have head to head games with two tvs? Two video-out ports in one box? Overkill?
2005-07-22 11:09 pm

Anonymous
From the article: All systems are limited by their slowest components [Amdahl’s law] , Cell was designed not to have any slow components!

what?

WHAT?!?!

I’m sorry people that use arguments like that get sorted into the loony bin. goodbye.

2005-07-23 1:26 pm

rayiner
I think what he means was that Cell was designed not to have any obvious bottlenecks. Cell is a pretty specialized architecture, so IBM designed each piece to be as fast as it needed, but no faster. Hence the 25 gigabytes per second of low-latency RAM, and the hundreds of gigabytes per second of internal bandwidth.

2005-07-22 11:33 pm

Anonymous
For those have special interest in other such comical logic, I can suggest reading the article. It’s quite full of ‘it’
2005-07-23 12:08 am

Anonymous
I wonder if these SPE’s could be used to do real time ray tracing to provide 3d acceleration.

I guess the biggest problem with trying to do this is that the PPE might not have enough horsepower to coordinate such a task.

I guess we’ll have to see.
2005-07-23 4:35 am

Anonymous
Writing this article was definitely a LOT of work and provides many insights.

Thanks for sharing your knowledge!
2005-07-23 9:14 am

_LH_
That article has one major flaw. The article discusses performance at 4 GHz but PS3 will be 2.4 GHz.
2005-07-23 9:18 am

Anonymous
i thought ps3 will be at 3.2ghz?

2005-07-23 9:44 am

_LH_
My mistake. I checked it and it runs at 3.2 GHz indeed.

2005-07-23 9:56 am

Anonymous
>Both IBMand Intel have discovered this rather >publicly, try buying 3GHz G5 or a 4GHz P4.

Recall, P4’s ALUs(dual issue) are already double pumped over marketed Mhz(e.g. Pentium IV “@2.0Ghz” with effective 4Ghz ALUs).
2005-07-23 10:06 am

Anonymous
>The compiler can in effect do the job of OOO >hardware, Intel’s Itanium has clearly shown just how >effective a compiler can be at doing this.

What you didn’t remember is that Itanium can issue 6 instructions per cycle (from dual “bundles” issuing). The complier maximise this potential. Dual instruction issue per cycle from PPE doesn’t have the same potential as Itanium.
2005-07-23 10:14 am

Anonymous
>This has allowed them to produce a processor with 9 >cores while the rest of the industry is shipping >just 2

Flawed comparisons since shouldn’t compare Q2 2005 shipping to ~2006 shipping. Cell is not available for the current(23 July 2005) end user i.e. same is true for quad-core AMD64. Recall Intel’s 16 core IXP2800(Xscale based).
2005-07-23 11:07 am

Anonymous
Once again the same story. A clever compiler is expected to replace a healthy part of silicon – In fact, this is an attempt to circumvent Moore’s law. They hope to squeeze more computing power from the the present technology available. This may work in embedded single purpose systems, but for general purpose computing (desktop) the adoption time would be too long. In the meantime cells will become obsolete by the natural development of conventional hardware. Compare Itanium and AMD64.

2005-07-23 12:32 pm

Arcanum-XIII
Well… it force programmer to optimize their code, and this seem to work at some level.

Branch prediction has a cost too, don’t forget that missed branch slow down things too.

Remember when the PIV was introduce, no compiler can optimize code for it and it seem slow.

People seem to forget some olds facts

2005-07-24 3:12 am

rayiner
it force programmer to optimize their code

It forces programmers to micro-optimize their code, which is an enormous waste of time. What strikes you as a better use of time — spending weeks or months doing instruction-scheduling level optimizations that the CPU could do for you anyway, or using that same amount of time to implement better algorithms? Of course, the non-programmers will say that every program should use the best algorithms and be written in hand-tuned ASM, but that’s rarely achievable. The real world never works like that. Of all the mutually conflicting concerns that affect a program, namely performance, stability, maintainability, and time constrains, it’s the latter one that is most strict.

Now, this sort of thing is okay for Cell, because its an embedded processor. Programmers in such environments really have no choice about the matter, because the console will last for five or six years unchanged. In more general programming fields, programmer’s probably won’t stand for a processor that they have to babysit so much. Why do you think programmers love the Opteron so much? It’s because it’s forgiving! It’ll perform well on most reasonably-written code. You can’t say the same for the Itanium or the Pentium 4.

2005-07-23 12:20 pm

Anonymous
Nicholas Blachford obviously don’t know anything about processor architecture. The article is so full of factual errors and flawed logic it’s not even funny.

Running general purpose code on the SPE:s? Is that a joke? It doesn’t have any branch prediction at all for gods sake.

Branch prediction not important? WTF? In general purpose code, branches are about 20%. This means that on a moderately pipelined processor there will be a pipeline bubble of around 5 instructions every fifth instruction. Meaning that it will be stalled on branches 50% of the time. The SPE’s is probably deeper than that, i’d guess it will be stalled 2/3 of the tim e on branches only.

Then there is the fact that it lacks result forwarding which means even more bubbles.

It will be much faster to run general purpose code on the PPE itself than trying to distribute it on 8 dog-slow vector processors.

Nicholas, please stop writing articles about subjects you don’t know anything about. You are just making a fool out of yourself.
2005-07-23 1:07 pm

Anonymous
For those of you obsessing over the “lack” of branch prediction, there are two considerations :

1) the differences in architecture are significant enough to make prediction realestate less necessary.

2) IBM have implemented branch hinting, to cater for those situations where a lot of branching is necessary.

Branch hinting will be handled by the compiler, or the programmer – assuming the programmer is willing to code at such low levels.

😉

2005-07-24 1:51 am

Anonymous
>Branch hinting will be handled by the compiler, or >the programmer

Modern X86 processors also includes branch hinting instructions btw…

2005-07-23 1:36 pm

Anonymous
For Jesus sake, the SPEs have the shortest pipeline available on big cpus today…

The thingie probably has the brach cost of and old ARM. An SPE simply can not perform the usual disasters of P4 pipeline misspredicting a branch!

2005-07-23 10:37 pm

rayiner
The SPE pipeline is actually quite long. The penalty for a branch misprediction is 18 clock cycles.

2005-07-23 1:46 pm

Nicholas Blachford
Nicholas Blachford obviously don’t know anything about processor architecture. The article is so full of factual errors and flawed logic it’s not even funny.

If that was the case the reviewers would have told me, they didn’t.
2005-07-23 1:50 pm

Phil
Firstly, thanks for the article, Nicholas.

Secondly, I really like the design of this thing. It seems like a lot of advantage has been taken of the lack of legacy. In particular, cell should be a lot more predictable than x86 machines, in that so much guesswork has been removed. x86 processors make guesses about memory caching, branch execution etc, all of which have the capacity to really screw things up.

Cell sounds like a much more rational design, by way of removing the need for the complications that other processors have gathered over the years.

2005-07-24 1:55 am

Anonymous
Just google ‘branch hint prefix instruction X86 3Dnow’….

2005-07-23 4:42 pm

Anonymous
I think the best thing in the Cell is that people actualy looked at nature and found something quite simple and pratical. (check topic The Future: Multi-Cell’d Animals @ Page3)

Maybe if people looked at nature more times they should design more beautiful stuff like Cell. Open your eyes.

Nature is perfect.
2005-07-23 7:12 pm

Anonymous
This guy claims to have read “the original Cell processor patent application” which he now cites, a claim that indicates that he missed four-fifths of the information provided by Sony. There were FIVE patent applications, not one, all filed on Mar. 22, 2001, all but one published on Sep. 26, 2002, and all five relating to a technology for “processing streaming data” of “uniform software cells.”

The patents block out a type of hardware for this processing, but essentially it is little different from other stream processors, in academia (Stanford’s Merrimac/Imagine, MIT Cheops, Berkeley VIRAM) and commercial (IBM TRIPS, Philips, TI), and even companies (Streaming Processors, Inc). What Sony really covers in their patents is the idea that any processor on the “wire” (i.e., a worldwide Broadband network) has capability of performing the computation of any packet sent by any similar machine out onto the wire.

Thus a standalone, single “Cell” system (say a PS3) will not “rock, dudz!!!” or whatever the OSNews articles claim. If the machine is not connected to the network, then it not “playing”. A better word for the hardware system is “hive”, where the processors, all connected via Broadband, are part of a huge (literally worldwide) homogeneous hive and any with idle cpu cycles can honor a computation request by any software cell that streams past.

It’s a big gamble for Sony (world domination usually is) and it won’t happen until a critical mass of hardware cells are all connected to the network and always on, that is, always available for all the other cells.

Read the patents (that word is plural): software cells. Software, not hardware. Look to the Aya OS, this is after all OSNews.

2005-07-23 9:59 pm

_LH_
Clustering game consoles through Internet doesn’t work. Sony knows it. Full stop.

2005-07-24 12:32 pm

Kokopelli
It forces programmers to micro-optimize their code, which is an enormous waste of time.

That is why I am not enthusiastic about the cell in less specialized environments such as the desktop. I am sure it will do some things very well. Based upon some of the commentary from games developers working on games for cell based systems though there is some give and take versus conventional CPUs.

Maybe eventually the cell will become the be all end all solution, but not any time soon.
2005-07-25 11:43 am

Anonymous
http://www.osnews.com/story.php?news_id=8881

same quality. Nothing happened….

totaly wrong predictions. same as now.

Cell is for consoles, hdtv a.s.o.

nothing else.

the rest is using power.

see developerworks at ibm and several publishings there according to the Cell.

Links ? well, you should know how to find at ibm developerworks… 😉

“IBM have announced a blade system made up of a series of dual Cell “workstations”. The system is rated at up to 16 TeraFlops, which will require 64 Cells.”…

well where ?

They did not comment on any Cell based System other then PS/3 or some announced systems of the Cell-Partners, like Sony and Toshiba.

There is no Cell Workstation. There a 2 Blade prototypes. Nothing more ? Nothing more…

Another try ?

http://www.osnews.com/story.php?news_id=8881&page=3#prediction

Well, the G6 aka PPC980s are dead with the switch from Apple to Intel. Well, there is would be a Power 6, there is definitly the Cell (we will see how fast in real). But nothing more anymore.

You see, its a defending game of a beloved processor.

2005-07-25 3:51 pm

Nicholas Blachford
same quality. Nothing happened….

totaly wrong predictions. same as now.

Have you even read the article?

“IBM have announced a blade system made up of a series of dual Cell “workstations”. The system is rated at up to 16 TeraFlops, which will require 64 Cells.”…

well where ?

Try Google, It was all over the web months ago.

There is no Cell Workstation. There a 2 Blade prototypes. Nothing more ? Nothing more…

The “workstation” and the “blade” are the same machine.

Well, the G6 aka PPC980s are dead with the switch from Apple to Intel. Well, there is would be a Power 6, there is definitly the Cell (we will see how fast in real). But nothing more anymore.

There’s another 2 1/2 years to go, we don’t know what’s still in development.

You see, its a defending game of a beloved processor.

I have an interest in a certain family of processors and write about them. There’s a zillion x86 sites out there so there’s really not much I could add to their coverage. Hardly “beloved”.