Linked by Thom Holwerda on Thu 26th Aug 2010 23:24 UTC

At the Hot Chips 2010 conference, IBM announced their upcoming z196 CPU, which is really, really fast. How fast? Fastest chip in the world fast. Intended for Z-series mainframe computers, the Z196 has a clock speed of 5.2GHz. Measuring just 512 square millimeters, the Z196 is fabricated on 45nm PD SOI technology, and on its surface contains almost one and a half billion transistors. My... Processor is bigger than yours.

Thread beginning with comment 438451

To view parent comment, click here.

To read all comments associated with this story, please click here.

To view parent comment, click here.

To read all comments associated with this story, please click here.

RE[4]: Are they still stuck in GHz race?

by rom508 on Fri 27th Aug 2010 20:43
in reply to "RE[3]: Are they still stuck in GHz race?"

A first issue related to multicore is that if the input of task N in an algorithm depends on the output of task N-1, you're screwed. This prevents many nice optimizations from being applied.

You just narrowed down to a basic leaf algorithm. I was talking about larger problems, i.e. where each task can be broken down into smaller sub-tasks, and then perhaps those sub-tasks can be broken down into smaller units.

Sure there are some basic algorithms that are difficult to parallelise, however the world is full problems that can be broken down into smaller units.

RE[5]: Are they still stuck in GHz race?

by Neolander on Fri 27th Aug 2010 21:48
in reply to "RE[4]: Are they still stuck in GHz race?"

Member since:

2010-03-08

A first issue related to multicore is that if the input of task N in an algorithm depends on the output of task N-1, you're screwed. This prevents many nice optimizations from being applied.

A purely mathematical example : prime factorization of integers from 1 to 10000.

First algorithm that comes to mind is...

For N from 1 to 10000

..For I from 1 to N

....If I divides N then

......Store I in divisors of N

This algorithm can be scaled across multiple cores quite easily (just split the first for loop). But in order to waste a lot less processing power when N grows large, we may be tempted to use this variation of the algorithm...

For N from 1 to 10000

..For I from N to 1

....If I divides N then break

..Add I to divisors of N

..Add divisors of I to divisors of N

...which can't be scaled across multiple cores because it relies on the order in which the Ns are enumerated !