Linked by Thom Holwerda on Thu 26th Aug 2010 23:24 UTC
At the Hot Chips 2010 conference, IBM announced their upcoming z196 CPU, which is really, really fast. How fast? Fastest chip in the world fast. Intended for Z-series mainframe computers, the Z196 has a clock speed of 5.2GHz. Measuring just 512 square millimeters, the Z196 is fabricated on 45nm PD SOI technology, and on its surface contains almost one and a half billion transistors. My... Processor is bigger than yours.
Thread beginning with comment 438451
To view parent comment, click here.
To read all comments associated with this story, please click here.

Member since:
2010-03-08

A first issue related to multicore is that if the input of task N in an algorithm depends on the output of task N-1, you're screwed. This prevents many nice optimizations from being applied.

A purely mathematical example : prime factorization of integers from 1 to 10000.
First algorithm that comes to mind is...

For N from 1 to 10000
..For I from 1 to N
....If I divides N then
......Store I in divisors of N

This algorithm can be scaled across multiple cores quite easily (just split the first for loop). But in order to waste a lot less processing power when N grows large, we may be tempted to use this variation of the algorithm...

For N from 1 to 10000
..For I from N to 1
....If I divides N then break
..Add I to divisors of N
..Add divisors of I to divisors of N

...which can't be scaled across multiple cores because it relies on the order in which the Ns are enumerated !

Reply Parent Score: 2

Member since:
2007-04-20

A first issue related to multicore is that if the input of task N in an algorithm depends on the output of task N-1, you're screwed. This prevents many nice optimizations from being applied.

You just narrowed down to a basic leaf algorithm. I was talking about larger problems, i.e. where each task can be broken down into smaller sub-tasks, and then perhaps those sub-tasks can be broken down into smaller units.

Sure there are some basic algorithms that are difficult to parallelise, however the world is full problems that can be broken down into smaller units.

Reply Parent Score: 1

Member since:
2010-03-08

You're right, that should work for problems where several power-hungry and independent tasks occur simultaneously, like gaming (where you simultaneously handle graphics, sound, AI, physics, and others), as the Cell's success illustrates it.

Reply Parent Score: 2