Post a Comment
.... to be fair NVIDIA has had the opencl sdk available for developers for a while, whereas ATI just released theirs (which brings down our linux devel system for stream really hard, so it seems they are not ready for production yet... so this x86 opencl release seems to be a stopgap to not keep developers waiting)
...the purpose of the GPU was to offload some work from the CPU? I understand that this would be good for mini/micro systems - where GPU is limited. Would this help them work together in some way - act like a multi-processor GPU? That would be nice - maybe preprocess the commands before popping the results off to the GPU for final calculation and rendering.
OpenCL is a lot more than just a GPU accelerator for code; it's designed to be a system where the code you write can run on CPUs, GPUs and other accelerator cards without you having to do any other work. It also maximizes performance and threading usage for compute tasks running on the CPU.
I've "ported" the nvidia nbody sample from their OpenCL package, to the AMD cpu based one.
I can't comment on NVIDIA OpenCL benchmarks as I'm under NDA, but compared to the same CUDA sample it was 1.3GFLOPS for AMD/OpenCL (CPU) vs. 28GFLOPS CUDA (can't comment on NVIDIA OpenCL). Though I need to test again
MacBook Pro (MacBookPro3,1)
Intel Core 2 Duo 2.6 Ghz, 1 Processor, 2 cores, 4MB L2 Cache, 4GB Memory, Bus Speed 800Mhz
GeForce 8600M GT (PCI-Express x16 width) 256MB
Bootcampe'd Windows XP 32 bit Service Pack 3 using 190.38 nvidia drivers (bit modified to install on the 8600M - thanks to laptop2go)
That to be said, CPU based OpenCL is exactly what we need. There are lots of servers, where graphics cards are not present, also if you have Remote-Desktop'd into such machine, the graphics driver is replaced and you can't use CUDA (won't comment on OpenCL). VNC is too slow for solution (maybe only HP RGS). And OpenCL would be there for the PS3 SPE's...
The beauty of it, is that it offers a more restricted "C"-based language, where you can still program normally (not an assembler), but it would still run efficiently. From that perspective, you can forget all your worries about using C++ as dominantly performance language (through boost, and other template libraries), and use your favourite high-level language (javascript, C#, java, lisp, rub, python, perl, etc.) as long as it has some form of freezing foreign array data (e.g. the garbage collector should not move it) and accessing it later.
Put the management decisions of what work need to be done in the high-level language (it would be easier to organize such tasks there), and then write directly the low-level workers in OpenCL - most likely all OpenCL implementations would always have the compiler loaded so you can even change on the fly, instead of recompiling - much like GLSL in OpenGL.
It would take quite time to catch on, but I think it might hit the sweet spot.
I can't comment on NVIDIA OpenCL benchmarks as I'm under NDA, but compared to the same CUDA sample it was 1.3GFLOPS for AMD/OpenCL (CPU) vs. 28GFLOPS CUDA (can't comment on NVIDIA OpenCL). Though I need to test again
MacBook Pro (MacBookPro3,1)
Intel Core 2 Duo 2.6 Ghz, 1 Processor, 2 cores, 4MB L2 Cache, 4GB Memory, Bus Speed 800Mhz
GeForce 8600M GT (PCI-Express x16 width) 256MB
Bootcampe'd Windows XP 32 bit Service Pack 3 using 190.38 nvidia drivers (bit modified to install on the 8600M - thanks to laptop2go)
That to be said, CPU based OpenCL is exactly what we need. There are lots of servers, where graphics cards are not present, also if you have Remote-Desktop'd into such machine, the graphics driver is replaced and you can't use CUDA (won't comment on OpenCL). VNC is too slow for solution (maybe only HP RGS). And OpenCL would be there for the PS3 SPE's...
The beauty of it, is that it offers a more restricted "C"-based language, where you can still program normally (not an assembler), but it would still run efficiently. From that perspective, you can forget all your worries about using C++ as dominantly performance language (through boost, and other template libraries), and use your favourite high-level language (javascript, C#, java, lisp, rub, python, perl, etc.) as long as it has some form of freezing foreign array data (e.g. the garbage collector should not move it) and accessing it later.
Put the management decisions of what work need to be done in the high-level language (it would be easier to organize such tasks there), and then write directly the low-level workers in OpenCL - most likely all OpenCL implementations would always have the compiler loaded so you can even change on the fly, instead of recompiling - much like GLSL in OpenGL.
It would take quite time to catch on, but I think it might hit the sweet spot.
You might want to retest that code because I'd imagine the GFLOPS between your Macbook Pro and the 8600GT will differ wildly as well.
More to the point, you didn't actually mention the ATi card you tested against.



