At the Hot Chips 2010 conference, IBM announced their upcoming z196 CPU, which is really, really fast. How fast? Fastest chip in the world fast. Intended for Z-series mainframe computers, the Z196 has a clock speed of 5.2GHz. Measuring just 512 square millimeters, the Z196 is fabricated on 45nm PD SOI technology, and on its surface contains almost one and a half billion transistors. My... Processor is bigger than yours.

2010-03-08

A first issue related to multicore is that if the input of task N in an algorithm depends on the output of task N-1, you're screwed. This prevents many nice optimizations from being applied.

A purely mathematical example : prime factorization of integers from 1 to 10000.

First algorithm that comes to mind is...

For N from 1 to 10000

..For I from 1 to N

....If I divides N then

......Store I in divisors of N

This algorithm can be scaled across multiple cores quite easily (just split the first for loop). But in order to waste a lot less processing power when N grows large, we may be tempted to use this variation of the algorithm...

For N from 1 to 10000

..For I from N to 1

....If I divides N then break

..Add I to divisors of N

..Add divisors of I to divisors of N

...which can't be scaled across multiple cores because it relies on the order in which the Ns are enumerated !