Linked by Thom Holwerda on Thu 26th Aug 2010 23:24 UTC
At the Hot Chips 2010 conference, IBM announced their upcoming z196 CPU, which is really, really fast. How fast? Fastest chip in the world fast. Intended for Z-series mainframe computers, the Z196 has a clock speed of 5.2GHz. Measuring just 512 square millimeters, the Z196 is fabricated on 45nm PD SOI technology, and on its surface contains almost one and a half billion transistors. My... Processor is bigger than yours.

Member since:
2007-04-20

What doesn't scale well across multiple cores? Give me a few examples. The signals fired by the human brain are pretty slow compared to what computers can do, however the brain does massive amounts of processing, because everything is wired in parallel with billions of connections.

If you can solve the parallel problem first, getting the individual processing units running at a faster rate will be the trivial task.

Member since:
2010-03-08

A first issue related to multicore is that if the input of task N in an algorithm depends on the output of task N-1, you're screwed. This prevents many nice optimizations from being applied.

A purely mathematical example : prime factorization of integers from 1 to 10000.
First algorithm that comes to mind is...

For N from 1 to 10000
..For I from 1 to N
....If I divides N then
......Store I in divisors of N

This algorithm can be scaled across multiple cores quite easily (just split the first for loop). But in order to waste a lot less processing power when N grows large, we may be tempted to use this variation of the algorithm...

For N from 1 to 10000
..For I from N to 1
....If I divides N then break
..Add I to divisors of N
..Add divisors of I to divisors of N

...which can't be scaled across multiple cores because it relies on the order in which the Ns are enumerated !

Member since:
2007-04-20

A first issue related to multicore is that if the input of task N in an algorithm depends on the output of task N-1, you're screwed. This prevents many nice optimizations from being applied.

You just narrowed down to a basic leaf algorithm. I was talking about larger problems, i.e. where each task can be broken down into smaller sub-tasks, and then perhaps those sub-tasks can be broken down into smaller units.

Sure there are some basic algorithms that are difficult to parallelise, however the world is full problems that can be broken down into smaller units.

Member since:
2005-08-09

brain is not very good at precise calculation of physic processes (some of such calculations is not friendly to multicore)