Linked by Amjith Ramanujam on Wed 19th Nov 2008 22:07 UTC, submitted by caffeine deprived
Permalink for comment 337875
To read all comments associated with this story, please click here.
To read all comments associated with this story, please click here.





Member since:
2005-09-22
Interestingly, the CUDA SDK comes with a BLAS library implemented on the GPU. They also have an FFT library as well.
It's not as simple as a re-compile, no. And really, you wouldn't want it to be. When, using a GPU to accellerate processing, it's not just another processor. It has a very different memory model and a very differnt processing model. In order to really take advantage of and best leverage the GPU architecture, the code needs to be structured with that in mind.
Say, for instance, you have 500 matricies of size 500x500 and you needed to use these matricies to solve some A*x=b equations. On the CPU, you would loop though all 500, solving one at a time.
While this will work on the GPU, it's not an efficient way to use it. On the GPU, you would copy all 500 to the GPU memory, run a single solver on all 500 simultaneously, and then copy the results back.
Specialized hardware generally requires specialized programming to fully exploit it.