Linked by Thom Holwerda on Mon 17th Sep 2007 20:51 UTC
Oracle and SUN "Sun announced Niagara 2 the other day, an evolution of the older Niagara 1, now called the UltraSPARC T1. From the 10000-foot view, it all looks quite familiar, but once you delve into the details, it quickly becomes apparent that almost everything has changed."
Thread beginning with comment 272161
To read all comments associated with this story, please click here.
Very interesting...
by JonathanBThompson on Tue 18th Sep 2007 00:35 UTC
JonathanBThompson
Member since:
2006-05-26

It sounds so good, it leaves readers wondering what, besides a clock speed increase (which is likely stuck being tied to the FSB/RAM speeds in practice) and adding more cores (again, may not matter much, if the system is already I/O pegged) they'll be able to do to evolve this puppy.

Perhaps in theory, with enough transistor budget, they might be able to add some out of order execution in there, but it appears using in-order with that many cores is far more practical and efficient for server purposes, so I'm wagering they'll not bother, because there would go the power efficiency, among other things.

Reply Score: 3

RE: Very interesting...
by spotter on Tue 18th Sep 2007 01:23 in reply to "Very interesting..."
spotter Member since:
2005-07-06

Well, the main differences between the T1 and T2 are more threads per core (not more cores), an FPU per core, 10G ethernet on die, crypto on die.

Coming up next is multi-socket support (codenamed Victoria Falls), which will end up with 128 or 256 threads/system (2 or 4 sockets, 64 threads/socket).

OOX could potentially be added, as could a deeper pipeline, I suppose. They could bring more on die as well, larger cache.

Reply Parent Score: 2

RE: Very interesting...
by crystall on Tue 18th Sep 2007 08:08 in reply to "Very interesting..."
crystall Member since:
2007-02-06

besides a clock speed increase (which is likely stuck being tied to the FSB/RAM speeds in practice)


Since the UltraSPARC Tx architecture trades latency for bandwidth like GPUs increasing the clock speed doesn't make sense if you are already saturating the memory subsystem. As long as enough bandwidth is available you enjoy almost linear scaling from clock speed but the scaling completely flattens once you reach the saturation point.

Perhaps in theory, with enough transistor budget, they might be able to add some out of order execution in there


It doesn't make sense to add OoO execution to such an architecture simply because it doesn't need it. The UltraSPARC Tx is inherently optimized for throughput, all the latencies (memory stalls, branches, non-single-cycle instructions, etc...) are covered by switching threads. OoO execution would make the core significantly more complex with little or no benefit for such an architecture. Look for example how 2-way dispatch has been implemented in the T2. A core cannot execute two instructions from one thread but two instructions from two threads each one picked from one of the two thread groups. This eliminates any needs for an intra-thread dependency checking in the pick stage and while it doesn't improve single-thread performance it increases throughput significantly.

Reply Parent Score: 2

RE: Very interesting...
by zdzichu on Tue 18th Sep 2007 12:25 in reply to "Very interesting..."
zdzichu Member since:
2006-11-07

As for OOX, Sun is adding so called scout thread. It analyzes program ahead of execution and prefetches data.

Reply Parent Score: 1

RE[2]: Very interesting...
by John Bayko on Tue 18th Sep 2007 16:04 in reply to "RE: Very interesting..."
John Bayko Member since:
2006-10-20

"As for OOX, Sun is adding so called scout thread. It analyzes program ahead of execution and prefetches data."

No, that's for the "Rock" processor, Sun's compute-oriented CPU line (the successor to the SPARC64 CPU being co-developed with Fujitsu), though I'm sure some successful features from the two lines will cross over eventually.

Reply Parent Score: 1