Linked by Thom Holwerda on Thu 26th Aug 2010 23:24 UTC
IBM At the Hot Chips 2010 conference, IBM announced their upcoming z196 CPU, which is really, really fast. How fast? Fastest chip in the world fast. Intended for Z-series mainframe computers, the Z196 has a clock speed of 5.2GHz. Measuring just 512 square millimeters, the Z196 is fabricated on 45nm PD SOI technology, and on its surface contains almost one and a half billion transistors. My... Processor is bigger than yours.
Thread beginning with comment 438442
To view parent comment, click here.
To read all comments associated with this story, please click here.
tylerdurden
Member since:
2009-03-17

Not really, the whole dataset does not have to fit in cache.

Technically the slowest element in most computing systems are either the disk or the network interfaces.

Very efficient branch prediction subsystems, and nicely tuned caches can boost the utilization of the processor significantly, without having to "fit the whole dataset in cache." Things like simultaneous multithreading also help improve resource utilization.

Believe it or not, IBM has excellent architectural teams... so the system will be fairly tuned to keep the processor "utilized."

Reply Parent Score: 3

rom508 Member since:
2007-04-20

Well that depends on the locality of data and your cache size. If you have large cache and good data locality, then this will hide memory latency. This is what IBM did:

"A 4-node system is equipped with 19.5MB of SRAM for L1 private cache, 144 MB for L2 private cache, 576MB of eDRAM for L3 cache, and massive 768MB of eDRAM for a level-4 cache."

That is a lot of cache over many levels. You can probably fit a large portion of your working data set into all that cache memory.

Reply Parent Score: 2

tylerdurden Member since:
2009-03-17

Where did you get your numbers? 12MB of L1? Whaaaa?

From IBM's specs:

L1 Instruction Cache = 64KB
L1 Data Cache = 128KB
L2 Shared I+D Level = 1.5 MB

The numbers you quoted for the L1 for example, amply exceed the total transistor budget for the whole chip quoted by IBM.

As I said, things like finely tuned branch predictors make as much of an impact as huge caches. In fact, there start to be diminishing returns for most cache sizes after a few megabytes.

Reply Parent Score: 2