Linked by Thom Holwerda on Thu 6th Aug 2009 22:38 UTC
AMD "AMD has announced the release of the first OpenCL SDK for x86 CPUs, and it will enable developers to target x86 processors with the kind of OpenCL code that's normally written for GPUs. In a way, this is a reverse of the normal 'GPGPU' trend, in which programs that run on a CPU are modified to run in whole or in part on a GPU."
Thread beginning with comment 377807
To read all comments associated with this story, please click here.
Comment by malkia
by malkia on Sun 9th Aug 2009 20:56 UTC
malkia
Member since:
2005-07-17

I've "ported" the nvidia nbody sample from their OpenCL package, to the AMD cpu based one.

I can't comment on NVIDIA OpenCL benchmarks as I'm under NDA, but compared to the same CUDA sample it was 1.3GFLOPS for AMD/OpenCL (CPU) vs. 28GFLOPS CUDA (can't comment on NVIDIA OpenCL). Though I need to test again

MacBook Pro (MacBookPro3,1)
Intel Core 2 Duo 2.6 Ghz, 1 Processor, 2 cores, 4MB L2 Cache, 4GB Memory, Bus Speed 800Mhz
GeForce 8600M GT (PCI-Express x16 width) 256MB

Bootcampe'd Windows XP 32 bit Service Pack 3 using 190.38 nvidia drivers (bit modified to install on the 8600M - thanks to laptop2go)

That to be said, CPU based OpenCL is exactly what we need. There are lots of servers, where graphics cards are not present, also if you have Remote-Desktop'd into such machine, the graphics driver is replaced and you can't use CUDA (won't comment on OpenCL). VNC is too slow for solution (maybe only HP RGS). And OpenCL would be there for the PS3 SPE's...

The beauty of it, is that it offers a more restricted "C"-based language, where you can still program normally (not an assembler), but it would still run efficiently. From that perspective, you can forget all your worries about using C++ as dominantly performance language (through boost, and other template libraries), and use your favourite high-level language (javascript, C#, java, lisp, rub, python, perl, etc.) as long as it has some form of freezing foreign array data (e.g. the garbage collector should not move it) and accessing it later.

Put the management decisions of what work need to be done in the high-level language (it would be easier to organize such tasks there), and then write directly the low-level workers in OpenCL - most likely all OpenCL implementations would always have the compiler loaded so you can even change on the fly, instead of recompiling - much like GLSL in OpenGL.

It would take quite time to catch on, but I think it might hit the sweet spot.

Reply Score: 1

RE: Comment by malkia
by tyrione on Mon 10th Aug 2009 07:02 in reply to "Comment by malkia"
tyrione Member since:
2005-11-21

I've "ported" the nvidia nbody sample from their OpenCL package, to the AMD cpu based one.

I can't comment on NVIDIA OpenCL benchmarks as I'm under NDA, but compared to the same CUDA sample it was 1.3GFLOPS for AMD/OpenCL (CPU) vs. 28GFLOPS CUDA (can't comment on NVIDIA OpenCL). Though I need to test again

MacBook Pro (MacBookPro3,1)
Intel Core 2 Duo 2.6 Ghz, 1 Processor, 2 cores, 4MB L2 Cache, 4GB Memory, Bus Speed 800Mhz
GeForce 8600M GT (PCI-Express x16 width) 256MB

Bootcampe'd Windows XP 32 bit Service Pack 3 using 190.38 nvidia drivers (bit modified to install on the 8600M - thanks to laptop2go)

That to be said, CPU based OpenCL is exactly what we need. There are lots of servers, where graphics cards are not present, also if you have Remote-Desktop'd into such machine, the graphics driver is replaced and you can't use CUDA (won't comment on OpenCL). VNC is too slow for solution (maybe only HP RGS). And OpenCL would be there for the PS3 SPE's...

The beauty of it, is that it offers a more restricted "C"-based language, where you can still program normally (not an assembler), but it would still run efficiently. From that perspective, you can forget all your worries about using C++ as dominantly performance language (through boost, and other template libraries), and use your favourite high-level language (javascript, C#, java, lisp, rub, python, perl, etc.) as long as it has some form of freezing foreign array data (e.g. the garbage collector should not move it) and accessing it later.

Put the management decisions of what work need to be done in the high-level language (it would be easier to organize such tasks there), and then write directly the low-level workers in OpenCL - most likely all OpenCL implementations would always have the compiler loaded so you can even change on the fly, instead of recompiling - much like GLSL in OpenGL.

It would take quite time to catch on, but I think it might hit the sweet spot.


You might want to retest that code because I'd imagine the GFLOPS between your Macbook Pro and the 8600GT will differ wildly as well.

More to the point, you didn't actually mention the ATi card you tested against.

Reply Parent Score: 2

RE[2]: Comment by malkia
by MamiyaOtaru on Mon 10th Aug 2009 07:40 in reply to "RE: Comment by malkia"
MamiyaOtaru Member since:
2005-11-11

More to the point, you didn't actually mention the ATi card you tested against.

He didn't test against an ATI card. "to the AMD cpu based one." and "for AMD/OpenCL (CPU)" should be hints.

Reply Parent Score: 2

RE[2]: Comment by malkia
by malkia on Tue 11th Aug 2009 10:22 in reply to "RE: Comment by malkia"
malkia Member since:
2005-07-17

I said 8600M GT, not 8600GT

Reply Parent Score: 1