“Researchers from North Carolina State University have developed a new technique that allows graphics processing units and central processing units on a single chip to collaborate – boosting processor performance by an average of more than 20 percent.”
For a first step, 20% is a pretty significant jump. AMD has been planning on doing this for some time – I am glad to see that this was AMD’s research investment paying off…
However, with the poor performance of Bulldozer, it will take more than a 20% boost to match Intel’s current performance… MAYBE some semblance of this tech will be in Piledriver… but I seriously doubt it ๐
That would be a pleasant surprise if it happened, but I agree – I wouldn’t bet on Piledriver having a shared L3 (or any L3 for that matter).
This makes more sense for Steamroller once they hit 28nm. Hopefully by 2013 they can go back and tweak the layout to make some room for an L3.
Sounds like AMD’s planned HSA (heterogenous systems architecture) for future APUs, wherein the CPU and GPU both have access to the same caches, RAM, buses, etc.
Anandtech.com has a bunch of slides up from the recent AMD financial analyst day that cover this.
A summary article that lists all the other articles:
http://www.anandtech.com/show/5503/understanding-amds-roadmap-new-d…
I can’t help but notice it could just as well mean homogeneous systems architecture…
If the performance boost is really only 20% then in most cases it is simply not worth the effort put in the optimization. Partitioning computations into two separate programs is not trivial, putting the same effort in other optimizations techniques may produce better results.
That’s always been the problem with GPGPU programming. Is the extra trouble really worth it? Especially with the obsolecense window still being something like 6-12 months or so, and the huge difference in scaling between cpu and gpu technology makes heterogenious design a headache.
Edited 2012-02-09 17:29 UTC
I was taking the article to be hinting at a FREE (from the programmer’s perspective) 20% additional performance. I would expect this to be achieved by the CPU intelligently off-loading FPU tasks to the GPU.
If this is just more of the same, then I don’t see what the big deal would be… Intel already does it, and I do it already here with AMD APP with my video encoding tasks…
This research is maybe hinting at a full open source GPU that does opencl also.
I wish ARM/VIA did the same as AMD.
In any case AMD is brave enough and it will be the target of my new desktops. I do not care that much about performance only. If the model is open and easy to program for Illumos/*BSD/Haiku without blobs then I am ready to buy.
Edited 2012-02-10 15:26 UTC
I wish people (the sentiment is promulgated all over the place) would get some perspective over this one… the performance of Bulldozer, or of AMD CPU offerings in general, isn’t exactly poor – it’s just worse.
Truth is, lately, CPUs are quite universally way more than powerful enough for (most likely) strong majority of people – and AMD products can be even easily seen as preferable in some segments, for example the one covered by Fusion series (addressing, with its more decent GPU, some of the nowadays few areas where the processing power is still not necessarily enough; and even, via “fuller” GPGPU support, making up for some of the CPU power differences)
The areas “never enough power” got quite rare… (and at least one, not really tested by benchmark sites and such, might be curious here – AMD supposedly geared Bulldozer for HPC uses, it almost looked like at the cost of general desktop performance; who knows what will yet come out of it)
and the actual research presented in their paper, seem to have played a real bad game of “telephone.”
Edited 2012-02-09 01:42 UTC
Wow! North Carolina State University is releasing a new phone! Awesome! I hope it will finally introduce a fully open source stack, maybe with a little Haiku inside In any case, this is sure to restore the wolfpack to their glory days. Duke & Carolina’s days are over!
Go Pack!
Doesn’t this suggest that a more generic solution would be to simply have a small ARM core on the GPU with some cache memory? That can do the “complex” functions to feed and assist the GPU… leaving the CPU free for more important tasks… Maybe the transistor budget really won’t allow that, I don’t know
-edit- I just realised that is exactly what Broadcom have done with the chip that the Raspberry Pi team are using*! Except they are using the ARM core to run an OS
*If we are to believe the Raspbery Pi team (which I do), then the chip they are using is a gfx chip with an ARM core for support, rather than a CPU with an integrated GPU…
Edited 2012-02-09 10:26 UTC
I have had the same thought for years. Just put a cpu on the gpu card.