Linked by Thom Holwerda on Thu 26th Aug 2010 23:24 UTC
IBM At the Hot Chips 2010 conference, IBM announced their upcoming z196 CPU, which is really, really fast. How fast? Fastest chip in the world fast. Intended for Z-series mainframe computers, the Z196 has a clock speed of 5.2GHz. Measuring just 512 square millimeters, the Z196 is fabricated on 45nm PD SOI technology, and on its surface contains almost one and a half billion transistors. My... Processor is bigger than yours.
Thread beginning with comment 438429
To read all comments associated with this story, please click here.
Are they still stuck in GHz race?
by rom508 on Fri 27th Aug 2010 12:01 UTC
Member since:

No matter how fast your processor is, you can only go as fast as the slowest path in your system. In most cases the slowest path is RAM. The case when a faster processor makes a big difference is when your data set can fit entirely in CPU's cache memory. However, as these processors are targeted for large database systems, there will be a lot of access to RAM, hence the super fast 5.2GHz CPU will spend a lot of time just waiting.

I don't know what tricks IBM engineers did to avoid RAM latency, I'm just speculating on the most obvious bottleneck such a system will have. The more you raise CPU speed, the closer you start approaching the point of diminishing returns. The future is parallel processing, and not just instruction level parallelism, but all the way up the stack to operating systems and user applications.

Reply Score: 2

helf Member since:

Its IBM. I'm sure they planned for memory latency.

Reply Parent Score: 2

tylerdurden Member since:

Not really, the whole dataset does not have to fit in cache.

Technically the slowest element in most computing systems are either the disk or the network interfaces.

Very efficient branch prediction subsystems, and nicely tuned caches can boost the utilization of the processor significantly, without having to "fit the whole dataset in cache." Things like simultaneous multithreading also help improve resource utilization.

Believe it or not, IBM has excellent architectural teams... so the system will be fairly tuned to keep the processor "utilized."

Reply Parent Score: 3

rom508 Member since:

Well that depends on the locality of data and your cache size. If you have large cache and good data locality, then this will hide memory latency. This is what IBM did:

"A 4-node system is equipped with 19.5MB of SRAM for L1 private cache, 144 MB for L2 private cache, 576MB of eDRAM for L3 cache, and massive 768MB of eDRAM for a level-4 cache."

That is a lot of cache over many levels. You can probably fit a large portion of your working data set into all that cache memory.

Reply Parent Score: 2

Neolander Member since:

This has been discussed a billion times already ;) Some things just don't scale well accross multiple cores, if they scale at all. As an example, for physics simulations, interactions can become a major nuisance when you have parallel processing in mind (they are a nuisance for all kinds of calculations, anyway).

So along with the current trends towards hundreds of low-performance processor cores, making individual cores faster is still a good thing for some problems, as long as the bus bottleneck and some relativistic issues concerning the size of electronic circuitry can be worked around ^^

Reply Parent Score: 2

rom508 Member since:

What doesn't scale well across multiple cores? Give me a few examples. The signals fired by the human brain are pretty slow compared to what computers can do, however the brain does massive amounts of processing, because everything is wired in parallel with billions of connections.

If you can solve the parallel problem first, getting the individual processing units running at a faster rate will be the trivial task.

Reply Parent Score: 2

bert64 Member since:

Mainframes traditionally had relatively slow processors, coupled to memory that was more than able to keep up with it...
It's low end machines which have faster processors that are severely bottlenecked by memory speeds, because buyers look at the processor speed and don't consider other aspects of the system...
I would be extremely surprised if IBM hasn't designed a suitably fast memory system to go with this processor.

Reply Parent Score: 2

leech Member since:

Wow, while everyone else went on scientific reasoning of why RAM is fast or not fast..

The slowest path of computing at this point in time isn't RAM, which is probably the second slowest, the slowest is through the hard drive!

RAM keeps getting faster and faster. Hard drive technology really hasn't changed all that much since the beginning.

It's sad that it has been the bottleneck in real speed for so long. If only SSD technology was FAR cheaper than it is. Then maybe we can finally start pushing the limits of the PCI(e/x) and Memory bus.

Reply Parent Score: 2

cb88 Member since:

Mainframes tend to have Terabytes of ram just for that reason even my schools compute servers that students use have 64Gb ram and 8x cores. Also wasn't it just the other week that that group demonstrated sorting and archiving 1Tb of data in a minute?

I believe there was a graph I once saw that showed a strong correlation between the speed of AI computations and the size of CPU cache. Such computations don't benefit much from faster access to data on an HD but do benefit greatly from data that can be accessed quickly from cache. What I'm trying to say is 1ms vs 100ms is still slow compared to 5-10ns or less.

Reply Parent Score: 1

tylerdurden Member since:

No, you are all missing the point.

It is not that a computer is as fast as it slowest component. The whole point of computer architecture is to make the common case fast. Veeeeery different.

So yeah, the booting speeds may have not progressed that much since they are constrained to the speed of the disk subsystem, which indeed is quite slow. But booting up is not the common case, is it? Running code is. And for the most part, most modern computers tend to utilize their processors rather well, try running a modern game on an old P3. This like databases etc are obviously more sensitive to I/O, but the machines used to run them are not necessarily comparable to a modern single user PC.

Reply Parent Score: 2