Linked by Hadrien Grasland on Fri 27th May 2011 11:34 UTC
General Development After having an interesting discussion with Brendan on the topic of deadlocks in threaded and asynchronous event handling systems (see the comments on this blog post), I just had something to ask to the developers on OSnews: could you live without blocking API calls? Could you work with APIs where lengthy tasks like writing to a file, sending a signal, doing network I/O, etc is done in a nonblocking fashion, with only callbacks as a mechanism to return results and notify your software when an operation is done?
Thread beginning with comment 474893
To read all comments associated with this story, please click here.
Mutex-less computing
by xiaokj on Sat 28th May 2011 14:39 UTC
xiaokj
Member since:
2005-06-30

@ Original Article,

There are actually alternatives to just plain old threaded / async / semi-async (you defined threaded + threshold == semi-async). Non-blocking algorithms exist of a more radical type. I have read that there are people who have successfully implemented system-level stuff using unconventional non-blocking mechanisms, most notably compare-and-swap http://en.wikipedia.org/wiki/Compare-and-swap

This is actually very useful, especially in conjunction with client-server model. It would live somewhat more like async-threaded rather than threaded-async that you are looking at.

Let's use the client server model analogous to the X-org model. The owner of the resource, the graphics card, is the server. Within this server, there is only 1 point of I/O. This I/O point is the server's queue number. All clients wanting to perform any operation will first formulate an atomic transaction. Then, it looks at the queue number and attaches it to the transaction. It would then try to compare and swap the queue number with the server's: if the number had changed in the meantime, then the server had just accepted a transaction from another client, and it needs to try the queue number thing again. Only if it manages to update the queue number will it send the transaction details to the server for the server to process at its leisure.

The original paper (it's been years since I last saw this, and I cannot find it anymore, I guess) says that, even without the compare and swap instruction, this method of setting things up tend to be very simple yet gives a lot better performance than it looks. Collisions are rare even in busy environments. Also, because the resource has its own monolithic server, it can look ahead in the queue and apply sensible algorithms, especially against deadlocking-style problems. Due to its atomicity, it is actually very useful for databases too.

One thing that this really requires is that the malloc and memory-cleaning implementation be amazing. This is because the compare and swap action is most naturally implemented as a linked list variant updating mechanism.

There is no reason whatsoever that, once the client obtains the queue number properly and securely, that the server cannot establish the callback to notify the client of progress. It should be done too. The server should also implement its own internal consistent history of transactions -- should it decide to reject transactions, something must have the unarguably correct state of the system.

The above manages to implement an async model (sequential handling, even) that deals well with threaded requests. In the appearance of the clients, it is as if it is threaded, but actually implements centralised database-style handling. For really busy systems, this server can then be threaded later. It is rather difficult to avoid a choke-point for this server, but given multi-tasking, the choke-point is actually a plus because we can apply load-balancing algorithms there.

Most cpus have the ability to do either of CAS or LL/SC, with LL/SC much better than CAS, although when push comes to shove, CAS with a fixed algorithm to do it can be good enough (just without the security). Damn legacy x86 with its idiosyncrasies.

Reply Score: 3

RE: Mutex-less computing
by Alfman on Sun 29th May 2011 17:54 in reply to "Mutex-less computing"
Alfman Member since:
2011-01-28

xiaokj,

"There are actually alternatives to just plain old threaded / async / semi-async (you defined threaded + threshold == semi-async). Non-blocking algorithms exist of a more radical type. I have read that there are people who have successfully implemented system-level stuff using unconventional non-blocking mechanisms, most notably compare-and-swap"

I don't know why you call these unconventional, atomics are pretty common for implementing thread safe algorithms which don't require OS level synchronization. Many compilers (GCC) have support for them.

It's an extremely fine grained serialization (at the system bus level). I use them often in MT code, however they're still expensive since they cannot run at CPU speeds.

I'm not disagreeing with your post, however I'd like to add that atomic cpu operations are only necessary in MT code, an async or other single threaded model doesn't need to sync because the possibility for overlap with other threads doesn't exist in the first place.

Reply Parent Score: 2

RE[2]: Mutex-less computing
by xiaokj on Sun 29th May 2011 21:46 in reply to "RE: Mutex-less computing"
xiaokj Member since:
2005-06-30

Alfman,

I would first like to declare that I am speaking out of my area of expertise. It should be obvious that I am out of touch, so if it is not unconventional, then it should be my error.

However, I find it funny how you say that async model does not need to care. If you have multiple cores, then even if you dedicate one core to the server, then if, in the rare case, two other cores decide to request something, then there will be a need to care, no?

On the other hand, these kinds of atomics, the compare-and-swap and the LL/SC, are hardware accelerated to be (a) non-blocking and (b) interrupt-enabled and (c) runs in one cycle. Why do you claim that they are slower than CPU speed?

Nonetheless, if you combine atomicity and MT, I cannot foresee why a good implementation will not outperform simple threaded and/or async as described. It would be capable of doing all of those, and be well-balanced at the same time.

Reply Parent Score: 2