Linked by Amjith Ramanujam on Wed 19th Nov 2008 22:07 UTC, submitted by caffeine deprived
Hardware, Embedded Systems Nvidia and partners are offering new "personal supercomputers" for under $10,000. Nvidia, working with several partners, has developed the Tesla Personal Supercomputer, powered by a graphics processing unit based on Nvidia's Cuda parallel computing architecture. Computers using the Tesla C1060 GPU processor will have 250 times the processing power of a typical PC workstation, enabling researchers to run complicated simulations, experiments and number crunching without sharing a supercomputing cluster.
Thread beginning with comment 337875
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[3]: Specifics?
by CodeMonkey on Thu 20th Nov 2008 14:25 UTC in reply to "RE[2]: Specifics?"
CodeMonkey
Member since:
2005-09-22

I know, but I'd be interested in seeing bench marks of high-level operations (I.e. how fast can it reduce a matrix of n*n compared to a CPU?)

Interestingly, the CUDA SDK comes with a BLAS library implemented on the GPU. They also have an FFT library as well.

Ah, so you can't take your existing Fortran and recompile chunks of it for the C1060?

It's not as simple as a re-compile, no. And really, you wouldn't want it to be. When, using a GPU to accellerate processing, it's not just another processor. It has a very different memory model and a very differnt processing model. In order to really take advantage of and best leverage the GPU architecture, the code needs to be structured with that in mind.

Say, for instance, you have 500 matricies of size 500x500 and you needed to use these matricies to solve some A*x=b equations. On the CPU, you would loop though all 500, solving one at a time.
While this will work on the GPU, it's not an efficient way to use it. On the GPU, you would copy all 500 to the GPU memory, run a single solver on all 500 simultaneously, and then copy the results back.

Specialized hardware generally requires specialized programming to fully exploit it.

Reply Parent Bookmark Score: 2

RE[4]: Specifics?
by Vanders on Thu 20th Nov 2008 16:57 in reply to "RE[3]: Specifics?"
Vanders Member since:
2005-07-06

"Ah, so you can't take your existing Fortran and recompile chunks of it for the C1060?

It's not as simple as a re-compile, no.
...
Specialized hardware generally requires specialized programming to fully exploit it.
"

Which, in a round-about way, brings me to the point: while these cards look very nice and clearly have a roll to play in specialised applications such as real-time medical imaging, they are not a "drop in" replacement for a proper cluster. If you write your code to use one of these cards you will find yourself tied to nVidia in the future, with perhaps no opportunity to run your code on a faster machine in the future should the need arise.

If you write your code using say, MPI on Fortran, you can pretty much expect your code to run five or ten years from now, even if it's running on a totally different cluster.

Reply Parent Bookmark Score: 2

RE[5]: Specifics?
by javiercero1 on Sat 22nd Nov 2008 20:45 in reply to "RE[4]: Specifics?"
javiercero1 Member since:
2005-11-10

CUDA is a programming model, mostly based on super-threading, data streaming and data parallelism.

It is being ported to the CPU, and later (via Apple's OpenCL, which is mostly CUDA-based) to ATI's GPUs (although the ATI parts have poorer programmability).

Basically, once you map your algorithm to CUDA, you should be able to run it on either the CPU or GPU in the near future.

Alas, if you already have developed your code on OpenMP and it works for you.. as they say, if it ain't broken...

However, where the CUDA boards shine is on their price per flop and power per flop. So they are very, very, very attractive.

Reply Parent Bookmark Score: 2