Linked by MOS6510 on Fri 17th May 2013 22:22 UTC
Hardware, Embedded Systems "It is good for programmers to understand what goes on inside a processor. The CPU is at the heart of our career. What goes on inside the CPU? How long does it take for one instruction to run? What does it mean when a new CPU has a 12-stage pipeline, or 18-stage pipeline, or even a 'deep' 31-stage pipeline? Programs generally treat the CPU as a black box. Instructions go into the box in order, instructions come out of the box in order, and some processing magic happens inside. As a programmer, it is useful to learn what happens inside the box. This is especially true if you will be working on tasks like program optimization. If you don't know what is going on inside the CPU, how can you optimize for it? This article is about what goes on inside the x86 processor's deep pipeline."
Thread beginning with comment 561967
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[3]: Comment by Drumhellar
by Alfman on Sat 18th May 2013 05:22 UTC in reply to "RE[2]: Comment by Drumhellar"
Alfman
Member since:
2011-01-28

theosib,

You sound very knowledgeable, certainly more than me. What do you think about cpu cores eventually being replaced / enhanced with massive FPGAs?

The issue I have with current CPU architectures is how there's so much hardware and R&D being thrown at running sequential instruction sets in parallel rather than actually using native parallel instruction sets in the first place. We have undeniably seen dramatic gains for sequential code, and yet, all this focus on sequential code optimization seems to be a major detractor away from what could have been a much better overall strategy for maximizing parallel computation.

For illustrative purposes, take the case of bitcoin mining as good example of a parallel problem where performance is king and software compatibility isn't a factor. The next link contains a sizable dataset very roughly showing how different computational technologies compare:

https://en.bitcoin.it/wiki/Mining_Hardware_Comparison

Intel's latest processors top out at ~65Mhash/s for 6*2 hyperthreaded cores at 200W. As we'll see, the sequential programs running on these super-scalar CPUs cannot compete with real parallel algorithms.

The ARM processors listed top out at ~1Mhash/s running on .5W. If we ran 400 of these to match intel's power consumption, we'd get very roughly 400Mhash/s.

Now when we look at FPGAs, they have 400+ Mhash/s running on less than 20W. If we ran 10 of these to match 200W, we'd get 4000Mhash/s, or 62X the processing power of the x86 cpu.


There are ASICs that have 5000Mhash/s running on 30w (I mention it for comparison only, obviously it's not a reprogrammable part so it wouldn't have a place in a software programmable PC).

While I know CUDA is doing it's part to introduce parallel software to the PC via GPUs, it still fairs poorly compared to the FPGAs. In fact GPU bitcoin miners are throwing in the towel (like CPU miners before them) because electricity costs more than the value of the bitcoins earned.


So in your expert opinion, do you think we're bumping against the wall of diminishing returns with today's superscalar CPUs? Do you see FPGAs as a good contender for even higher performance PCs in the future (assuming we ever get away from sequentially based software programming practices)?


Edit: I realize the numbers are very imprecise and might not even be apples to apples. Never the less bitcoin was the best example I could come up with to compare parallel computation technologies.

Edited 2013-05-18 05:41 UTC

Reply Parent Score: 2

RE[4]: Comment by Drumhellar
by siride on Sat 18th May 2013 15:21 in reply to "RE[3]: Comment by Drumhellar"
siride Member since:
2006-01-02

> all this focus on sequential code optimization seems to be a major detractor away from what could have been a much better overall strategy for maximizing parallel computation.

Did you miss the Itanium?

Reply Parent Score: 4

RE[5]: Comment by Drumhellar
by tylerdurden on Sat 18th May 2013 15:59 in reply to "RE[4]: Comment by Drumhellar"
tylerdurden Member since:
2009-03-17

It depends what the previous poster meant by parallelism.

Reply Parent Score: 2

RE[4]: Comment by Drumhellar
by theosib on Sun 19th May 2013 18:19 in reply to "RE[3]: Comment by Drumhellar"
theosib Member since:
2006-03-02

Sorry for the long time to reply. Also sorry for not giving you a more thorough reply.

The bitcoin problem is interesting, and some friends and I are working on trying to get better computer/area out of an FPGA, just for the fun of it. As a chip designer, I see FPGAs as an obvious choice for accelerating this kind of thing far beyond what a general-purpose CPU can do.

One of the problems with using FPGAs is a general fear of hardware development. (Well, that and the cost of FPGA hardware, but that doesn't apply to supercomputing.) Another problem is reasoned avoidance. For me, having worked as a chip designer, I like to just put together solutions straight in Verilog. But we can't retrain all HPC programmers in chip design, and it's sometimes not a good cost/benefit tradeoff. The holy grail is being able to convert software source code into logic gates. There's plenty of work on that, but the results aren't necessarily all that great. There's a huge difference in performance between a custom-designed FPGA circuit (i.e. knowing what you're doing) versus something that came out of an automatic translator.

Reply Parent Score: 3

RE[5]: Comment by Drumhellar
by Alfman on Mon 20th May 2013 04:03 in reply to "RE[4]: Comment by Drumhellar"
Alfman Member since:
2011-01-28

theosib,


I'd like to say software engineers could figure it out given widely accessible hardware, but I might be overestimating our abilities ;) Most CS grads these days just end up becoming ridiculously overqualified web devs since that's where most jobs are.


"The holy grail is being able to convert software source code into logic gates. There's plenty of work on that, but the results aren't necessarily all that great. There's a huge difference in performance between a custom-designed FPGA circuit (i.e. knowing what you're doing) versus something that came out of an automatic translator."


This surprises me a bit. Even though the human mind is an incredible analytical machine, it has it's limits whereas computers just keep getting better. In the Kasparov vs Deep Blue chess championship, it was inevitable that the brute force capabilities of the computer would ultimately overtake the best humans, the only question was when.

At university I made a realtime 3d java program to place components on a virtual circuit board using genetic algorithms and a fitness function. It was just a fun project I presented for an undergrad GA course I was taking, to be honest I don't know if it's solutions were any good since it was never compared against expert solutions. But in any case my gut instinct tells me that given enough computing power, even a naive algorithm should be able to brute force the finite solution space and consistently beat the best humans. I do believe you when you say automatic solutions aren't as good as experts, however do you think that could change if there were more computing power thrown at the FPGA problem?

I'm interested in what you have to say about it because I don't have expertise with FPGAs and I don't personally know anyone else who does either.

Reply Parent Score: 2