Linked by Hadrien Grasland on Thu 19th May 2011 21:31 UTC
Hardware, Embedded Systems Having read the feedback resulting from my previous post on interrupts (itself resulting from an earlier OSnews Asks item on the subject), I've had a look at the way interrupts work on PowerPC v2.02, SPARC v9, Alpha and IA-64 (Itanium), and contribute this back to anyone who's interested (or willing to report any blatant flaw found in my posts). I've also tried to rework a bit my interrupt handling model to make it significantly clearer and have it look more like a design doc and less like a code draft.
Thread beginning with comment 474208
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE: Locking benchmarks
by jal_ on Sun 22nd May 2011 13:08 UTC in reply to "Locking benchmarks"
jal_
Member since:
2006-11-02

I still argue that a lock free asynchronous model without any threads will easily outperform the "popup thread" model, particularly for simple tasks which consume 1-2K cycles.


This is a very interesting discussion thread, but I'm wondering what kind of model you envision when you talk of a lock-free, asynchronous, thread-less model. Can you point me to a description or more elaborate discussion of such a model?

Reply Parent Score: 2

RE[2]: Locking benchmarks
by Alfman on Sun 22nd May 2011 21:11 in reply to "RE: Locking benchmarks"
Alfman Member since:
2011-01-28

jai_,

"This is a very interesting discussion thread, but I'm wondering what kind of model you envision when you talk of a lock-free, asynchronous, thread-less model. Can you point me to a description or more elaborate discussion of such a model? "

More than a vision... I've actually implemented it in userspace on linux, and working on a windows port now, although I haven't made the project public.

Technically, my vision is to convert all calls which either block or could take a long time into asynchronous callback mechanisms.


So basically these function calls below are from my API.
When any of the tasks complete, the registered callback functions are invoked - all within a single even oriented thread.

async_file_create(&file, &file_cb);
async_file_open(&file, "async.test", ASYNC_FILE_READ | ASYNC_FILE_WRITE | ASYNC_FILE_CREATE | ASYNC_FILE_TRUNCATE);

async_process_create(&process, &process_cb);
async_process_setup_stdout(&process, &process_out_cb);
async_process_setup_stdin(&process, &process_in_cb);
async_process_execute(&process, "ls -l /");

async_timer_create(&timer1, &timer_callback);
async_timer_schedule(&timer1, 500);

async_resolve_create(&resolve, &resolve_cb);
async_resolve_lookup_host(&resolve, "av.com", ASYNC_ADDRESS_IP4);

async_address_create(&remote_address);
async_address_set(&remote_address, "127.0.0.1", 5555, ASYNC_ADDRESS_AUTO);
async_tcplisten_create(&lsock, &lsock_cb);
async_tcplisten_listen(&lsock, &remote_address);
async_address_destroy(&remote_address);


So, above, all those actions are performed simultaneously, and they callback when complete.


One callback could look something like this (sorry about OS News formatting bugs):

void resolve_cb(ASYNC_RESOLVE*resolve) {
char buffer[128];
ASYNC_ADDRESS address;
async_address_create(&address);
while(async_resolve_get_address(resolve,&address)) {
printf("Address %s\n", async_address_get_ip(&address, buffer, sizeof(buffer)));
}
async_address_destroy(&address);
async_resolve_destroy(resolve);
}



The thing which I find so cool about this model is that two completely independent processes (or applications for that matter) can be run together inside this model without interfering with each other. Nothing (apart from the main handler) ever blocks.

Threads are supported for long running tasks, and when the threads complete their work, they invoke a callback into the async handler much like any other blocking call.


Unfortunately, under linux, there are many blocking calls lacking an async interface and are implemented using threads in the kernel. It forces async userspace apps to wrap threads around certain kernel calls for no good reason other than the kernel requiring it. I consider it shortsighted, but many people would bash my head in if I say that too loudly.

Instead of burning through threads quickly, my implementation caches minimal proxy threads dedicated for this purpose.

Still, in this model proxy threads are unnecessary overhead which serve absolutely no purpose other than getting around blocking calls.

I think this model would work very well in the kernel itself. Instead of blocking a thread, a request could just register a callback and be done until the callback occurs.


The async model has one extremely important property which is inherently difficult for threads: well defined cancellation behavior. I can easily cancel an async IO request, canceling a blocked thread turns out to be extremely problematic.

In fact, pthreads has special locks programmers should use to ensure the cancellation doesn't cause corruption. Unfortunately for my model, this means thread cancellation itself can block and there's nothing I can safely do about it.

Even to this day, this is the cause of many aggravating linux kernel CIFS lockups where one cannot kill a hung connection, but has to wait for kernel threads to timeout.

Ah well, I can't have everything.

Reply Parent Score: 1

RE[3]: Locking benchmarks
by Alfman on Sun 22nd May 2011 22:03 in reply to "RE[2]: Locking benchmarks"
Alfman Member since:
2011-01-28

One clairification...

My post may have given the impression that a thread is used for all syscalls, which is not the case, only for those which may block.

For example, linux sockets support non-blocking IO, which I make use of.

Linux file IO blocks, forcing me to use threads for those syscalls if I want to continue handling other requests asynchronously.


Here's an interesting link describing an attack against apache caused by the fact that it uses a thread per connection. IIS is not subject to the same attack because it's asynchronous.

It may not be a completely fair comparison, but then it does suggest something about the scalability of each approach.

http://lwn.net/Articles/338407/

Reply Parent Score: 1

RE[3]: Locking benchmarks
by jal_ on Mon 23rd May 2011 08:54 in reply to "RE[2]: Locking benchmarks"
jal_ Member since:
2006-11-02

Alfman, thanks for the elaborate answer. What I'm wondering about though, is how do you sync all these asynchronous calls that have a linear dependancy. E.g. when opening a file, then reading, then closing it asynchronusly, since the asynchronous nature does not say anything about order of execution, you could have an attempt to read a unopened file, or a file that has been closed before it has been read. So you'd need some synchronisation primitives, which typically means blocking (or spinning, which may be as bad).

Also, in your resolve_cb, there's a while-loop that tests for the result of an async call. Since it's async, it cannot actually wait for the result of the call, so all it can do is check the result of the invocation of the call, right? Or do you have some mechanism that blocks the while-loopthread until an answer arrives? I guess not, as that more or less defeates the purpose?

Reply Parent Score: 2