Linked by Hadrien Grasland on Thu 19th May 2011 21:31 UTC
Thread beginning with comment 474299
To view parent comment, click here.
To read all comments associated with this story, please click here.
To view parent comment, click here.
To read all comments associated with this story, please click here.
Features
Linked by Thom Holwerda on 05/21/13 21:38 UTC
Linked by Thom Holwerda on 05/20/13 11:29 UTC
Linked by Thom Holwerda on 05/18/13 21:33 UTC
Linked by David Adams on 05/16/13 4:23 UTC
Linked by Thom Holwerda on 05/11/13 21:41 UTC
Linked by Thom Holwerda on 05/08/13 14:22 UTC
Linked by Thom Holwerda on 05/02/13 15:28 UTC
Linked by Thom Holwerda on 04/29/13 21:06 UTC
Linked by Thom Holwerda on 04/24/13 22:24 UTC
Linked by Thom Holwerda on 04/18/13 11:21 UTC
More Features »
Sponsored Links



Member since:
2010-03-08
The answer is that the CPU is way under utilized, the algorithm is actually IO bound and that the CPU spends most of it's time waiting for data.
Now which is better/more scalable?
Taking up one CPU for .37s, or two CPUs for 0.31s?
If this is the only process, then admittedly yes .31s is better. However, if there are any other CPU bound processes waiting, then the .06s tradeoff is starting to look pretty expensive since I could have been running a CPU bound process simultaneously for .37s instead of tacking it on top of .31s.
I was going to argue that CPUs don't have to do nothing while a process is blocking for I/O (one just has to switch to the next process in the scheduler's queue), but then I had a look at your code and noticed that it was actually essentially about accessing lots of RAM, in which case it's the CPU core that has no choice but to block, since MOVs are blocking by design.
Interesting problem, actually. I think once one thread starts to hog the memory bus like this, we're doomed anyway, since all other threads on all other cores are going to fight for access to the memory bus and get suboptimal performance. This problem won't fully be solved until the memory bus goes faster than all CPU cores together, which itself is probably not going to happen until we reach the limits of CPU speed. But this is open to discussion.
As said in the blog post mentioned earlier, if you need so much synchronization that your algorithm performs no better in parallel than in a serial version, then it's indeed better to do it the async way and not bother with the synchronization's coding complications. What I consider is tasks where most of the work can be done in parallel, and synchronization/communication is only needed at some specific points of the execution.
And there comes a trade-off between fine-grained modularity and macro-level parallelism
Again, if it's pure I/O and (almost) no computation, I admit that it's totally possible, and even frequent