Parallel Computing with .NET

Submitted by Amit Bahree 2007-12-02 .NET 37 Comments

“With all the modern systems using multi-core and multi-processor systems, tapping this new power is an interesting challenge for developers. It also fundamentally starts the shift on how your ‘average Joe’ interacts with a computer and things that he/she expects to be able to. First, check out the ‘Manycore Shift‘ paper from Microsoft. Second checkout the Parallel Extensions to .NET 3.5 which is a programing model for data and task parallelism. It also helps with coordination on parallel hardware (such as multi-core CPU’s) via a common work schedules. There is also a new Parallel Computing Dev Center on MSDN. Before you download the December 2007 CTP, make sure you have the RTM bits of the .NET 3.5 runtime. There are also a number of bugs fixed in this new CTP. If you want a quick introduction then check out a few videos available.”

About The Author

Thom Holwerda

Follow me on Mastodon @[email protected]

37 Comments

2007-12-02 11:29 pm

poundsmack
the .NET framework continues to impress me. Microsoft is really playing their cards right woth this one. i am rather interested to see how the .NET vs JAVA game plays out…

2007-12-03 2:10 pm

kaiwai
the .NET framework continues to impress me. Microsoft is really playing their cards right woth this one. i am rather interested to see how the .NET vs JAVA game plays out…

What will decide how well Java goes with the amount of cash which Microsoft is throwing at the problem is whether the Java ‘community’ actually start addressing long standing performance issues with Java.

Its open sourced, are we going to see vendors finally put their money where their mouth is and assign resources to develop it?

PS. According to CNBC right now, rumour is that SAP might be a take over target, possibly by Microsoft. If it means a Microsoft take over, it’ll be the defection of a very large player in the Java market to the .NET world, assuming Microsoft decides to go that route.

2007-12-03 2:53 pm

mnem0
Do you have a CNBC link to back that up? Sounds pretty far out?

2007-12-03 3:02 pm

kaiwai
I’m watching CNBC right now, it was on at the time I was posting the link; it was bought up in regards to consolidation in the software market.

2007-12-03 4:37 pm

ahmetaa
There are several goodies with Java and concurrent programming. Actually first,

Java 5 already came with a new set of concurrent API, like ThreadPools, Queues, Concurrent collections, Executors, Locks, Semaphores etc.

In java 7, there will be the new Fork library. some info can be found here:

http://www.ibm.com/developerworks/java/library/j-jtp11137.html

Also, sun is working on transactional memory on their Rock chip. once it is out, i believe it will support Java virtual machine.

they are thinking of adding some small language features for making the locks easier too.

But ultimately, the programming part is still not as easy as it should be.

There is a new language called Fortress, currently it’s alpha version is running on jvm in interpreted mode, it has impressive concurrency features.

2007-12-03 6:11 pm

alucinor
I’m using the new Concurrency library in Java 5 extensively. It’s making coding this multi-threaded application a breeze.

Whenever MS does something, though, they pay their heralds to laud it with golden trumpets like it’s the coming of the Messiah. *yawn* Time to get back to work.

2007-12-03 10:44 pm

jayson.knight
“Whenever MS does something, though, they pay their heralds to laud it with golden trumpets like it’s the coming of the Messiah”

Not to get nitpicky, but the features new to Java 5 that the parent poster mentioned have been in .Net for 5+ years now in the System.Threading namespace.

2007-12-04 12:12 am

ahmetaa
partially, yes. also “for 5+ years” is not really correct.

for a comparison:

http://java.sun.com/javase/6/docs/api/java/util/concurrent/package-…

Edited 2007-12-04 00:16

2007-12-03 2:25 am

jayson.knight
Anyone who has done any sort of multi-threading development in any language (not just .Net) knows firsthand just how much of a PITA it is to write a properly threaded application. I’ve long been wondering when the language writers were going to start baking stuff like this into the respective toolkits.

Threading should be something that you don’t have to worry about except in edge cases, i.e. an app I write should be able to automatically scale itself based on the number of processors available, thread scheduling/signaling should take place automatically, and manual synchronization needs to be a thing of the past. I was actually trying to explain these concepts to a friend of mine recently (making a case as to why quad cores just aren’t worth the extra money quite yet since most apps won’t utilize all the cores), and how each new thread in an app increases complexity exponentially which is why the majority of consumer apps simply aren’t truly multithreaded.

I look forward to seeing what MS comes up with.

Edited 2007-12-03 02:26

2007-12-03 5:50 am

andyleung
I wonder if you will ever see this coming. Think about it, when you need to develop multithreaded program, what things that you worry most? race condition. language or platform will never able to forcast race condition based on what you code because data model can be extremely complicated when involving business rules.

2007-12-03 6:39 am

renox
I disagree: while it’s true that the problem won’t ever be fully solved by language/platform features, some software construct like STM or Erlang-style|CSP (never understood the difference) looks easier to use for parallel programs that using thread and locks manually.

2007-12-03 8:27 am

MrSmith
This project looks a lot like the Jibu project:

http://www.axon7.com

Jibu is a commercial library, but it’s free for non-commercial use. It’s currently available for both Java and .NET, with C++ coming soon.

Jibu also has erlang-style mailboxes and CSP channels.

Edited 2007-12-03 08:32
2007-12-03 9:50 am

tuttle
I think that microsoft has frameworkitis. There are some huge performance problems in the CLR, and a lot of opportunity for further optimization.

But instead of fixing the core, they just add layer upon layer of framework code on top of it to make trivial things even more trivial.

The .NET framework is on version 3.5, while the CLR is still stuck on 2.0.

Anyway: Here is how to write multithreaded code in a nutshell:

-Do not use locks.

-Use immutable objects and referentially transparent methods whenever possible.

-Use message passing.

-Follow the rule: everything that is mutable stays on one thread, while everything that shares threads is immutable.

2007-12-03 10:20 am

stestagg
Adding extra layers of engineering is something of a favourite of MS.
2007-12-03 1:05 pm

JonPryor
i.e. Reinvent Lisp (or some other functional language). 🙂

Note also that the immutable object approach is suggested by one of Microsoft’s main developers on PLINQ, one of the Parallel Extensions for .NET:

http://www.bluebytesoftware.com/blog/PermaLink,guid,58392086-9f4c-4…

http://www.bluebytesoftware.com/blog/PermaLink,guid,bae6ac13-2a95-4…
2007-12-03 1:12 pm

jayson.knight
“The .NET framework is on version 3.5, while the CLR is still stuck on 2.0.”

The CLR version is the same as the .Net version: http://msdn2.microsoft.com/en-us/library/ms230176(VS.90).aspx This is because .Net is the umbrella term for CLR + BCL + CLI (actually, the CLR is MS’s implementation of the CLI). Regardless, we are now on v3.5 of the CLR.

“-Do not use locks.”

For non-functional languages, locks are an (if not THE) imperative part of multi-threaded application design, unless you want race conditions and deadlocks all over your application.

I’d also love to know what these performance problems are that you speak of. It has been shown on numerous occasions that a properly designed .Net app has performance approaching that of unmanaged code.

Edited 2007-12-03 13:13

2007-12-03 1:14 pm

jayson.knight
OSNews: Your URL parsing is munging the above URL b/c of the end paranthesis around VS.90.
2007-12-03 7:16 pm

Vanders
For non-functional languages, locks are an (if not THE) imperative part of multi-threaded application design, unless you want race conditions and deadlocks all over your application.

Not if you’re using message passing. I’m not going to claim that locks are not required at all, but good design and message passing can help eliminate them.

2007-12-03 7:34 pm

jayson.knight
“Not if you’re using message passing.”

By message passing, does that mean serialization in the Java/.Net world? If so, that introduces a whole new set of problems and ramifications…

2007-12-03 7:55 pm

tuttle
By message passing, does that mean serialization in the Java/.Net world? If so, that introduces a whole new set of problems and ramifications…

Message passing between threads does not involve any serialization. You just create a (preferably immutable) message object on the source thread and enqueue it into the message queue of the target thread.

If you want message passing between different processes or even different machines, then of course serialization gets involved.
2007-12-03 10:49 pm

jayson.knight
“You just create a (preferably immutable) message object on the source thread and enqueue it into the message queue of the target thread.”

I follow you 100%, however in the .Net/Java world this would entail creating a deep copy, which almost always involves serializing a copy of the object into memory. References are always mutable, and thus horrible candidates for what you’re talking about.

I just want to make sure I’m on the same page as you; I hear the term message and of course my mind immediately jumps to smalltalk, which I haven’t done any of since my college days.

That kind of construct (again, in the .Net/Java world) is very expensive, and probably outweighs any benefits of avoiding a locking strategy. Again, correct me if I’m wrong.
2007-12-04 9:19 am

tuttle
I follow you 100%, however in the .Net/Java world this would entail creating a deep copy, which almost always involves serializing a copy of the object into memory.

If your object is immutable, you do not have to create a deep copy via Clone() or serialization.

For example, the most often used immutable object in the CLR is probably System.String. It is completely safe to create a string on one thread and to consume it on another thread.

It does not matter if somebody can change e.g. a string reference to point to another string, as long as nobody is able to change the contents of the original string itself.

That kind of construct (again, in the .Net/Java world) is very expensive, and probably outweighs any benefits of avoiding a locking strategy. Again, correct me if I’m wrong.

I would not say very expensive, but it is slightly more expensive than well-done locking when working with large data objects such as bitmaps.

But using locking everywhere just because there might be a performance problem in some cases strikes me as a case of premature optimization. I would use message queues whenever possible, and then optimize to lower level synchronization primitives only where a profiler indicates that there is a bottleneck.
2007-12-04 5:37 pm

StaubSaugerNZ
Immutable objects are good for multi-threading. However, in applications that are highly iterative (image and signal processing, plotting, video etc.) they can cause performance problems due to as an excessive number of objects being created and destroyed merely to set their value.

Of course, no one puts objects in inner loops, but when arguments are passed in as immutable objects it can be a pain as they cannot be modified and returned (this procedural view is often needed in numerical work).

In fact the Java SDK had numerous performance problems due to immutable objects for number types (particularly BigInteger IIRC) where creating a new instance just to set the value was slowing things down considerably. Their solution was to create a mutable type an use this internally, and present an immutable type to users (who might rely on backwards-compatible immutability).

What’s my point? That using lots of classes immutable can be a PITA for users of those classes and brings significant performance problems of its own. In general it is better to use Java’s synchronisation (locks) only on the pieces of classes that designed to communicate between threads than impose immutability on classes that might also run solely within a single thread.
2007-12-04 6:47 pm

tuttle
However, in applications that are highly iterative (image and signal processing, plotting, video etc.) they can cause performance problems due to as an excessive number of objects being created and destroyed merely to set their value.

That is certainly a valid concern in some cases. However the cost of allocating short-lived temporary objects when using a modern runtime such as the CLR or the JVM is surprisingly low.

On my current machine and using .NET 2.0 I can allocate and collect 100000000 short lived small objects per second. The java allocator is of similar quality.

Escape analysis and stack allocation can eliminate most overhead for temporary object allocation completely (Escape analysis is in java 1.6. Unfortunately stack allocation did not make it). So in the majority of cases using immutable objects is not a significant performance problem.

Of course a naive implementation of immutable objects will cause problems in some cases. For example it would certainly not be a good idea to have an immutable bitmap class with a setpixel method that creates an entirely new bitmap.

However, you can often have destructive updates internally while still presenting an immutable public interface by using reference counting to determine when the old version of your object is no longer needed and can be destructively updated.

Their solution was to create a mutable type an use this internally, and present an immutable type to users (who might rely on backwards-compatible immutability).

That sounds like a reasonable approach to me.

What’s my point? That using lots of classes immutable can be a PITA for users of those classes and brings significant performance problems of its own.

I think that immutablility is a very valuable property even in single-threaded programs. Either you have unnecessary defensive copying all over the place, or you constantly break the encapsulation of your object.

Example:

class Test

{

_private ArrayList<Integer> mData;

_public ArrayList<Integer> getData() {

__return mData; //breaks encapsulation

__return mData.Clone(); //probably unnecessary

_}

}

another problem with mutable objects is that you can’t safely use them as e.g. keys in hashtables.

Edited 2007-12-04 18:52
2007-12-04 7:40 am

dalle
But adding an element (message object) to a collection (message queue) in another thread would require some kind of lock.

EDIT: But you noted this in http://www.osnews.com/permalink.php?news_id=18993&comment_id=288078

Edited 2007-12-04 07:47

2007-12-03 10:41 am

jverage
“It also fundamentally starts the shift on how your ‘average Joe‘ interacts with a computer and things that he/she expects to be able to…”

Oh God not this politically correct HE/SHE crap again. Afraid to get attacked by a gang of hairy, man-hating, lesbian feminists if you only say “he”? First of all, the sentence refers to an average JOE, and “Joe” is a male name, so an average JOE can NOT be a “she”. Let us remove “he” from the sentence above and see how it sounds:

“…how your ‘average Joe‘ interacts with a computer and things that she expects to be able to…”

Joe average interacts with (HER?) computer because (SHE?) expects…

Yeah, last time I checked Joe is not female.

How’s this:

“A computer user should know how to operate a computer, because HE will find it useful.”

Yes, I only said HE f–kers. Deal with it.

2007-12-03 10:52 am

evangs
“It also fundamentally starts the shift on how your ‘average Jo(e)‘ interacts with a computer and things that he/she expects to be able to…”

There. Fixed. It now covers both genders!

2007-12-03 11:50 am

jverage
Thanks. That’s great man. I was receiving some very threatening letters from the League of Lesbian Feminists For Gender Equality(LLFFGE) but now I feel much better (and safer). I love political correctness. Ok, back to the discussion of .NET…

2007-12-03 1:48 pm

tuttle
Regardless, we are now on v3.5 of the CLR.

No. If you run a low level program such as http://www.codeproject.com/dotnet/DetectDotNet.asp to enumerate the available CLRs on a machine where .NET framework 3.5 is installed, you get the following:

Is .NET present : Yes

Root Path : C:WindowsMicrosoft.NETFramework

Number of CLRs detected : 2

CLR versions available :-

1.1.4322

2.0.50727

Press any key…

For non-functional languages, locks are an (if not THE) imperative part of multi-threaded application design, unless you want race conditions and deadlocks all over your application.

If you use shared mutable state for communication between threads, you have not much choice but to use locks.

But there are safer alternatives such as message passing. Of course somewhere deep in the message queue code there will still be a lock or some other low level synchronization primitive. But an application programmer will never have to mess with that. See the BeOS API for pervasive use of message queues.

I’d also love to know what these performance problems are that you speak of. It has been shown on numerous occasions that a properly designed .Net app has performance approaching that of unmanaged code.

The biggest performance problem is this: http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.asp…. This can lead to a factor 5 to 10 performance degradation compared to C++ when doing numerically intensive code. Still much better than java though, since java does not have structs at all.

Fortunately, this will be addressed in the next (3.0) release of the runtime.

But there are various optimizations that java VMs have been doing for ages that the CLR does not do, such as virtual method inlining. And the CLR also does not even attempt to vectorize code on CPUs with SSE3.

Edited 2007-12-03 14:03

2007-12-03 2:43 pm

jayson.knight
Interesting, and thanks for correcting me. I’ve always been under the impression that CLR = MS’s implementation of the CLI, which in turn = a good part of the .Net Framework and therefore the same version. MS’s versioning scheme is cryptic to say the least, ala according to your info the following is true:

– We’re on v3.5 of the .Net Framework

– We’re on v2.0 of the CLR

– C# is in v3.0

Are you sure about Java inlining virtual methods? I just did a quick Google and wasn’t able to confirm that. I thought that by definition, a virtual method cannot be inlined which is why a lot of folks were concerned about all methods being virtual in Java by default. I’m by no means versed in compiler theory, but I don’t see how a virtual method can be inlined.

2007-12-03 6:36 pm

evangs
http://portal.acm.org/citation.cfm?id=353191

Method devirtualization has been around in the JVM for quite a while. That paper (sadly requires a subscription) does a moderately good job of covering the various techniques used.

2007-12-03 3:24 pm

tuttle

– We’re on v3.5 of the .Net Framework

– We’re on v2.0 of the CLR

– C# is in v3.0

That is correct as far as I understand it. But I would not bet my life on it, since as you mentioned the MS versioning scheme is a tad confusing.

(Java is not much better in that area though. The current version is called Java SE 6, but internally it is called 1.6)

Are you sure about Java inlining virtual methods? I just did a quick Google and wasn’t able to confirm that. I thought that by definition, a virtual method cannot be inlined which is why a lot of folks were concerned about all methods being virtual in Java by default. I’m by no means versed in compiler theory, but I don’t see how a virtual method can be inlined.

See for example

http://java.sun.com/products/hotspot/docs/whitepaper/Java_HotSpot_W…

It is a neat trick. Often a virtual method gets called only on one or a small number of types. So you specialize the method for each possible type and then can inline it just like a non-virtual method.

This requires some knowledge which is often not available at compile time. Therefore the java hotspot compiler analyzes the code while it is running and then optimizes only the parts that can profit most from optimizations (so-called hotspots, hence the name).

Of course since in C# methods are not virtual by default, the pressure to do something about the virtual method invocation overhead is not as high as in the case of java. Nevertheless it would be nice for microsoft to tackle this problem.
2007-12-03 8:52 pm

StaubSaugerNZ
As others have mentioned, the recent Java concurrency libraries (building on the work of Doug Lea) are quite usable, but like everything, you still have to know at least something about what you are doing.

There is another way of getting great parallel performance, without warping languages further. I guess the earliest posters were alluding to this. This is putting as much of the concurrency as possible into the library implementations of the runtime.

As an example of how this is done, consider the OpenGL model for GPU interaction with vertex and fragment shaders. You write the small piece of code that is to be run in parallel (in the OpenGL Shading Language) and the OpenGL driver is completely responsible for distributing it between the streams (vertex/fragment/or unified shader units) of the GPU. The programmer never has to worry about load-balancing between the shader units in GPUs with differing architectures. The driver does all that and the developer only needs to supply the custom code to be executed. That is quite a nice way of working for a limited class of problems. However, plenty of work can be done in libraries to utilise parallelism in a manner transparent to the user.

As another example, in Java 1.6u5 (in the testing stage) *all* of the drawing is done using DirectX on Windows. Many operations (such as convolutions) are done in parallel on the video hardware (using pixel shaders) and are around 30 times faster than the software equivalent (on common hardware). All of this is transparent to the end user (zero code changes required to benefit from this), can’t fault Java for that.

Quite often putting the parallesim in the libraries can make enough of a difference to users, in certain situations, so that the developers don’t jump through hoops to achieve better performance through multithreading.

2007-12-04 1:24 pm

joshv
Yeah, sure, I can see a good case for this when you are doing CPU intensive graphics operations that are easily to parallelize, but I find it hard to imagine another simple use case.

Parallelize the Collections framework? Meh. Maybe you could find some sort of parallelized sorting algorithm that would improve sorting of large datasets – at the cost of making smaller sorts slower. I just don’t see any quick wins here.

2007-12-04 5:45 pm

StaubSaugerNZ
My point was that the libraries should be parallelised, where possible and sensible. This can make enough of a difference that the developer might not then have to parallelise their own code to improve performance.

Some things, such as the Collections framework are trickier to parallelise, as you point out – but if this package is too difficult parallelise effectively then how many users would do the multithreading themselves (and if users started to, then you will have the inefficiency of each application doing the multithreading rather than a single expert implementation by the library maintainer).

I would be surprised (after running profiling) that a single collections sort would be the largest source of time spent in most applications. Multithreading is usually used to get around blocking I/O or the delays caused by human interaction.

2007-12-05 4:52 am

joshv
In the comments here somebody linked to an article for similar parallel extensions to Java targeted for Java 7. This article presented an example of a parallelized recursive algorithm that finds the maximum value of all of the element in a list.

Now, obviously there is a minimum list size threshold below which the concurrency overhead swaps any possible performance increase, and you are better off single threading. The example tested with a variety of thresholds on a variety of platforms, including a P4 with CMT, all the way up to a 64 thread Niagara. What struck me was the fact that performance was all over the place depending on the platform. The “sweetspot” varied from architecture to architecture, quite significantly.

This is a real problem for anybody who wants to parallelize their framework. How do you come up with a general purpose solution that performs well everywhere?

2007-12-03 10:24 pm

renhoek
grr. i could not see the pdf or the wmv files (i’m running a mac) but as far as i could see they addressed the wrong problem. writing multithreaded code is not hard, but writing multithreaded code with no deadlocks, race conditions or proper locking is hard, did they made that any easier?

i saw some video a long time ago on channel9 about transactional memory, that looked like a real solution.