Linked by David Adams on Thu 11th Dec 2008 00:18 UTC, submitted by caffeine deprived
Benchmarks Japan's Tsubame supercomputer was ranked 29th-fastest in the world in the latest Top 500 ranking with a speed of 77.48T Flops (floating point operations per second) on the industry-standard Linpack benchmark. Why is it so special? It uses NVIDIA GPUs. Tsubame includes hundreds of graphics processors of the same type used in consumer PCs, working alongside CPUs in a mixed environment that some say is a model for future supercomputers serving disciplines like material chemistry.
Order by: Score:
tyrione
Member since:
2005-11-21

The cooling of that setup must be a delight.

Reply Score: 1

javiercero1 Member since:
2005-11-10

Well, the GPUs only take like 4 racks. I assume they are using another 4 racks for the host computers. 8/10 racks is not that bad for a system that puts out 77 tflops.

Reply Score: 1

GatoLoko Member since:
2005-11-13

4 racks? Are you kidding?

"Tsubame itself - once you move past the air-conditioners - is split across several rooms in two floors of the building and is largely made up of rack-mounted Sun x4600 systems. There are 655 of these in all, each of which has 16 AMD Opteron CPU cores inside it, and Clearspeed CSX600 accelerator boards.

The graphics chips are contained in 170 Nvidia Tesla S1070 rack-mount units that have been slotted in between the Sun systems. Each of the 1U Nvidia systems has four GPUs inside, each of which has 240 processing cores for a total of 960 cores per system."

More info on http://www.pcworld.com/article/155242/.html?tk=rss_news

Reply Score: 2

javiercero1 Member since:
2005-11-10

They have 170 tesla cabinets, each is 1U rack unit. Each standard rack tends to be 42 units.

Simple math dictates that 4 racks could hold those 170Us.

I am not sure they packed them that much. But I was referring to the GPUs alone. Which is indeed a remarkable density of computing.

Reply Score: 3

Not the future today
by bnolsen on Thu 11th Dec 2008 04:50 UTC
bnolsen
Member since:
2006-01-06

NVidia's gpus really fall flat on double precision. The nvidia tesla system advertises 1TFlop SP and 80GFlop DP. That means 64bit doubles run at only 8% the speed of 32bit floats. And 32bit floats are just about totally worthless for general scientific processing. An 8 core powermac 2 years ago could churn ~40GFlop DP. That was 2 intel generations ago....

The new cell successor from IBM looks far more interesting but it seems that IBM isn't interested in selling it (same mistake DEC made).

Reply Score: 4

RE: Not the future today
by javiercero1 on Thu 11th Dec 2008 07:42 UTC in reply to "Not the future today"
javiercero1 Member since:
2005-11-10

Please, refrain from making blanket statements like "simple precision FP is worthless" it turns out that for some applications, it is enough. Which I am sure that is what these machines are being targeted at.

Reply Score: 3

RE[2]: Not the future today
by nsrbrake on Thu 11th Dec 2008 10:31 UTC in reply to "RE: Not the future today"
nsrbrake Member since:
2006-08-17

It was my understanding that nvidia was bringing native doubles with their latest hardware. The boards used were announced in 2007, while the vendor ClearSpeed is saying that they support single and double, this is with their api.

Reply Score: 1

RE: Not the future today
by drahca on Thu 11th Dec 2008 15:42 UTC in reply to "Not the future today"
drahca Member since:
2006-02-23

Not all scientific applications require 64-bit floating point support. Some algorithms are resilient enough for the 32-bit FP support to be sufficient, even with the restricted rounding modes. When you absolutely need 64-bit support, you can mix it with 32-bit floats where 64-bit is not necessary. And when you absolutely need 64-bits, remember that all that memory bandwidth (141.7 GB/sec) is available. Your 8% figure is only about raw execution resources but says nothing about actual application performance, which is of course totally dependent on the application. Considering super computers are not exactly cheap, they must have taken this into account somewhere.

The new cell successor from IBM looks far more interesting but it seems that IBM isn't interested in selling it (same mistake DEC made).


True. Cell is still very expensive. IBM cannot compete with COTS nVidia solutions, since the Cell in the PS3 does not support DP, so the volumes must be quite small of the DP enabled Cell.

Reply Score: 1

RE[2]: Not the future today
by evangs on Thu 11th Dec 2008 19:35 UTC in reply to "RE: Not the future today"
evangs Member since:
2005-07-07

Not all scientific applications require 64-bit floating point support.


True, but the majority (if you ask me for a number I'd say >95%) of scientific applications that require a super computer will need 64 bit floating point support.

Reply Score: 2

RE[3]: Not the future today
by javiercero1 on Thu 11th Dec 2008 21:28 UTC in reply to "RE[2]: Not the future today"
javiercero1 Member since:
2005-11-10

Be careful, pulling numbers like that out of one's derriere is a bit counter productive :-)

Reply Score: 2

RE[3]: Not the future today
by sbergman27 on Thu 11th Dec 2008 21:52 UTC in reply to "RE[2]: Not the future today"
sbergman27 Member since:
2005-07-24

True, but the majority (if you ask me for a number I'd say >95%) of scientific applications that require a super computer will need 64 bit floating point support.

Remember that this is a hybrid system. It also has 10,480 Opterons. Parts of the job which require higher precision can use the Opterons, while parts which can get by with lower precision, can use the GPUs. Also, as I understand it, GPUs are really great for vector processing, but really suck at most everything else. So they can be viewed as one more resource that the supercomputer application programmer has at hand which can be applied at his or her discretion, just like any other resource.

Reply Score: 2

RE[2]: Not the future today
by bnolsen on Fri 12th Dec 2008 04:58 UTC in reply to "RE: Not the future today"
bnolsen Member since:
2006-01-06

Actually cell SPEs can do DP, butt also with a big hit in performance. The newer ibm PowerXCell boosts the double precision. Apparently a single cell with 8 SPEs could only do 14GFlops DP. PowerXCell can do 102GFlops with 8 SPEs...

In fact a super computer made up of these PowerXCell was (maybe still is?) the top super computer in the world.

http://www.ppcnux.com/?q=node/7144

What sucks is that cell should absolutely not be so expensive. Proof: sony who's claiming no hardware cost losses ~$399 per ps3 sale. That's with a helluva lot more hardware than just cell.

IBM wants their 10,000% markup, that's all.

Reply Score: 2

RE: Not the future today
by Googol on Fri 12th Dec 2008 10:07 UTC in reply to "Not the future today"
Googol Member since:
2006-11-24

Interesting to see people agree with this statement.

OK, the thing is this: There are computations that require double precision. And there then there are some which don't. Now guess what this is made for. Also, the machine doesn't consist of GPUs only. Btw, if you see the performance of NV GPUs within the folding at home project you will find that they easily outdo even the Cell. Again, it depends what you wanna do with it. I hope you don't believe the issue escaped them before they had built the thing... ;)

Reply Score: 2

Suse Linux Enterprise Server 9
by sbergman27 on Thu 11th Dec 2008 19:56 UTC
sbergman27
Member since:
2005-07-24

Just in case anyone was wondering:

http://tinyurl.com/6998vg

Reply Score: 2