Linked by Amjith Ramanujam on Wed 19th Nov 2008 22:07 UTC, submitted by caffeine deprived
Thread beginning with comment 337788
To view parent comment, click here.
To read all comments associated with this story, please click here.
To view parent comment, click here.
To read all comments associated with this story, please click here.
"For what operations?
GPU units really shine in huge SIMD problems where you have a very large dataset and need to perform the same operation on each element. Examples would be simulations, visualization, medical imaging, etc. " [/q]
I know, but I'd be interested in seeing bench marks of high-level operations (I.e. how fast can it reduce a matrix of n*n compared to a CPU?)
"it won't help much if you're processing a large data set as the PCIe bus will become the (very small) bottle-neck.
While the PCIe bus is usually the limiting factor, it can be dealt with. Usually by transferring very large chunks of data over at once (hundreds of megabytes to several gigabytes), performing the computation on the GPU, and tranfering the results back " [/q]
Yes, that's why I was interested in how much on-board memory it has.
"What programming models does the C1060 support?
Since at it's heart it's just a GPU, the programming model is shader based. GLSL or HSL could both be used (the OpenGL and DirectX shading languages). However, NVidia's CUDA toolkit is also available " [/q]
Ah, so you can't take your existing Fortran and recompile chunks of it for the C1060?
I know, but I'd be interested in seeing bench marks of high-level operations (I.e. how fast can it reduce a matrix of n*n compared to a CPU?)
Interestingly, the CUDA SDK comes with a BLAS library implemented on the GPU. They also have an FFT library as well.
Ah, so you can't take your existing Fortran and recompile chunks of it for the C1060?
It's not as simple as a re-compile, no. And really, you wouldn't want it to be. When, using a GPU to accellerate processing, it's not just another processor. It has a very different memory model and a very differnt processing model. In order to really take advantage of and best leverage the GPU architecture, the code needs to be structured with that in mind.
Say, for instance, you have 500 matricies of size 500x500 and you needed to use these matricies to solve some A*x=b equations. On the CPU, you would loop though all 500, solving one at a time.
While this will work on the GPU, it's not an efficient way to use it. On the GPU, you would copy all 500 to the GPU memory, run a single solver on all 500 simultaneously, and then copy the results back.
Specialized hardware generally requires specialized programming to fully exploit it.






Member since:
2005-09-22
GPU units really shine in huge SIMD problems where you have a very large dataset and need to perform the same operation on each element. Examples would be simulations, visualization, medical imaging, etc.
While the PCIe bus is usually the limiting factor, it can be dealt with. Usually by transferring very large chunks of data over at once (hundreds of megabytes to several gigabytes), performing the computation on the GPU, and tranfering the results back, rinse, repeat. Even with the bandwidth limitations, the computational gains are so great, the end result is usually orders of magnitude faster.
4GB, 512-bit GDDR3, 800MHz, 102 GB/sec.
Since at it's heart it's just a GPU, the programming model is shader based. GLSL or HSL could both be used (the OpenGL and DirectX shading languages). However, NVidia's CUDA toolkit is also available (and the preferred method) which is essentially an extension to C designed with a kernel type processing model in mind (GPU kernel, not OS kernel).
Edited 2008-11-19 23:25 UTC