Linked by Thom Holwerda on Thu 26th Aug 2010 23:24 UTC
At the Hot Chips 2010 conference, IBM announced their upcoming z196 CPU, which is really, really fast. How fast? Fastest chip in the world fast. Intended for Z-series mainframe computers, the Z196 has a clock speed of 5.2GHz. Measuring just 512 square millimeters, the Z196 is fabricated on 45nm PD SOI technology, and on its surface contains almost one and a half billion transistors. My... Processor is bigger than yours.

Member since:
2010-03-08

This has been discussed a billion times already Some things just don't scale well accross multiple cores, if they scale at all. As an example, for physics simulations, interactions can become a major nuisance when you have parallel processing in mind (they are a nuisance for all kinds of calculations, anyway).

So along with the current trends towards hundreds of low-performance processor cores, making individual cores faster is still a good thing for some problems, as long as the bus bottleneck and some relativistic issues concerning the size of electronic circuitry can be worked around ^^

Member since:
2007-04-20

What doesn't scale well across multiple cores? Give me a few examples. The signals fired by the human brain are pretty slow compared to what computers can do, however the brain does massive amounts of processing, because everything is wired in parallel with billions of connections.

If you can solve the parallel problem first, getting the individual processing units running at a faster rate will be the trivial task.

Member since:
2010-03-08

A first issue related to multicore is that if the input of task N in an algorithm depends on the output of task N-1, you're screwed. This prevents many nice optimizations from being applied.

A purely mathematical example : prime factorization of integers from 1 to 10000.
First algorithm that comes to mind is...

For N from 1 to 10000
..For I from 1 to N
....If I divides N then
......Store I in divisors of N

This algorithm can be scaled across multiple cores quite easily (just split the first for loop). But in order to waste a lot less processing power when N grows large, we may be tempted to use this variation of the algorithm...

For N from 1 to 10000
..For I from N to 1
....If I divides N then break
..Add I to divisors of N
..Add divisors of I to divisors of N

...which can't be scaled across multiple cores because it relies on the order in which the Ns are enumerated !

Member since:
2005-08-09

brain is not very good at precise calculation of physic processes (some of such calculations is not friendly to multicore)