Linked by Hadrien Grasland on Thu 19th May 2011 21:31 UTC
Hardware, Embedded Systems Having read the feedback resulting from my previous post on interrupts (itself resulting from an earlier OSnews Asks item on the subject), I've had a look at the way interrupts work on PowerPC v2.02, SPARC v9, Alpha and IA-64 (Itanium), and contribute this back to anyone who's interested (or willing to report any blatant flaw found in my posts). I've also tried to rework a bit my interrupt handling model to make it significantly clearer and have it look more like a design doc and less like a code draft.
Thread beginning with comment 474220
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[2]: Locking benchmarks
by Alfman on Sun 22nd May 2011 21:11 UTC in reply to "RE: Locking benchmarks"
Alfman
Member since:
2011-01-28

jai_,

"This is a very interesting discussion thread, but I'm wondering what kind of model you envision when you talk of a lock-free, asynchronous, thread-less model. Can you point me to a description or more elaborate discussion of such a model? "

More than a vision... I've actually implemented it in userspace on linux, and working on a windows port now, although I haven't made the project public.

Technically, my vision is to convert all calls which either block or could take a long time into asynchronous callback mechanisms.


So basically these function calls below are from my API.
When any of the tasks complete, the registered callback functions are invoked - all within a single even oriented thread.

async_file_create(&file, &file_cb);
async_file_open(&file, "async.test", ASYNC_FILE_READ | ASYNC_FILE_WRITE | ASYNC_FILE_CREATE | ASYNC_FILE_TRUNCATE);

async_process_create(&process, &process_cb);
async_process_setup_stdout(&process, &process_out_cb);
async_process_setup_stdin(&process, &process_in_cb);
async_process_execute(&process, "ls -l /");

async_timer_create(&timer1, &timer_callback);
async_timer_schedule(&timer1, 500);

async_resolve_create(&resolve, &resolve_cb);
async_resolve_lookup_host(&resolve, "av.com", ASYNC_ADDRESS_IP4);

async_address_create(&remote_address);
async_address_set(&remote_address, "127.0.0.1", 5555, ASYNC_ADDRESS_AUTO);
async_tcplisten_create(&lsock, &lsock_cb);
async_tcplisten_listen(&lsock, &remote_address);
async_address_destroy(&remote_address);


So, above, all those actions are performed simultaneously, and they callback when complete.


One callback could look something like this (sorry about OS News formatting bugs):

void resolve_cb(ASYNC_RESOLVE*resolve) {
char buffer[128];
ASYNC_ADDRESS address;
async_address_create(&address);
while(async_resolve_get_address(resolve,&address)) {
printf("Address %s\n", async_address_get_ip(&address, buffer, sizeof(buffer)));
}
async_address_destroy(&address);
async_resolve_destroy(resolve);
}



The thing which I find so cool about this model is that two completely independent processes (or applications for that matter) can be run together inside this model without interfering with each other. Nothing (apart from the main handler) ever blocks.

Threads are supported for long running tasks, and when the threads complete their work, they invoke a callback into the async handler much like any other blocking call.


Unfortunately, under linux, there are many blocking calls lacking an async interface and are implemented using threads in the kernel. It forces async userspace apps to wrap threads around certain kernel calls for no good reason other than the kernel requiring it. I consider it shortsighted, but many people would bash my head in if I say that too loudly.

Instead of burning through threads quickly, my implementation caches minimal proxy threads dedicated for this purpose.

Still, in this model proxy threads are unnecessary overhead which serve absolutely no purpose other than getting around blocking calls.

I think this model would work very well in the kernel itself. Instead of blocking a thread, a request could just register a callback and be done until the callback occurs.


The async model has one extremely important property which is inherently difficult for threads: well defined cancellation behavior. I can easily cancel an async IO request, canceling a blocked thread turns out to be extremely problematic.

In fact, pthreads has special locks programmers should use to ensure the cancellation doesn't cause corruption. Unfortunately for my model, this means thread cancellation itself can block and there's nothing I can safely do about it.

Even to this day, this is the cause of many aggravating linux kernel CIFS lockups where one cannot kill a hung connection, but has to wait for kernel threads to timeout.

Ah well, I can't have everything.

Reply Parent Score: 1

RE[3]: Locking benchmarks
by Alfman on Sun 22nd May 2011 22:03 in reply to "RE[2]: Locking benchmarks"
Alfman Member since:
2011-01-28

One clairification...

My post may have given the impression that a thread is used for all syscalls, which is not the case, only for those which may block.

For example, linux sockets support non-blocking IO, which I make use of.

Linux file IO blocks, forcing me to use threads for those syscalls if I want to continue handling other requests asynchronously.


Here's an interesting link describing an attack against apache caused by the fact that it uses a thread per connection. IIS is not subject to the same attack because it's asynchronous.

It may not be a completely fair comparison, but then it does suggest something about the scalability of each approach.

http://lwn.net/Articles/338407/

Reply Parent Score: 1

RE[4]: Locking benchmarks
by Neolander on Tue 24th May 2011 11:30 in reply to "RE[3]: Locking benchmarks"
Neolander Member since:
2010-03-08

Here's an interesting link describing an attack against apache caused by the fact that it uses a thread per connection. IIS is not subject to the same attack because it's asynchronous.

It may not be a completely fair comparison, but then it does suggest something about the scalability of each approach.

http://lwn.net/Articles/338407/

Actually, I still don't quite understand how this works, but it seems to me that the problem is that there a limitation to the number of connections which Apache can simultaneously keep open (or to the number of running threads, whichever is smaller). So for the comparison to be fair, IIS should have a similar limitation to the number of pending tasks.

And even then, the Apache guys have obviously not a good metric to detect if a connection is active or not here. If someone just keeps a connection open and does nothing with it, I'll simply ban that someone as soon as new incoming requests are coming. To follow the shop analogy : a cashier will be simply irritated by someone coming and paying with pennies if no one else is there, but if there are lots of people waiting in the queue, he'll use that excuse to either have the person pay in a faster way or kick it without its articles.

Edited 2011-05-24 11:31 UTC

Reply Parent Score: 1

RE[3]: Locking benchmarks
by jal_ on Mon 23rd May 2011 08:54 in reply to "RE[2]: Locking benchmarks"
jal_ Member since:
2006-11-02

Alfman, thanks for the elaborate answer. What I'm wondering about though, is how do you sync all these asynchronous calls that have a linear dependancy. E.g. when opening a file, then reading, then closing it asynchronusly, since the asynchronous nature does not say anything about order of execution, you could have an attempt to read a unopened file, or a file that has been closed before it has been read. So you'd need some synchronisation primitives, which typically means blocking (or spinning, which may be as bad).

Also, in your resolve_cb, there's a while-loop that tests for the result of an async call. Since it's async, it cannot actually wait for the result of the call, so all it can do is check the result of the invocation of the call, right? Or do you have some mechanism that blocks the while-loopthread until an answer arrives? I guess not, as that more or less defeates the purpose?

Reply Parent Score: 2

RE[4]: Locking benchmarks
by Alfman on Mon 23rd May 2011 16:57 in reply to "RE[3]: Locking benchmarks"
Alfman Member since:
2011-01-28

jal_,

"What I'm wondering about though, is how do you sync all these asynchronous calls that have a linear dependancy. E.g. when opening a file, then reading, then closing it asynchronusly, since the asynchronous nature does not say anything about order of execution"

That's a good question, although somewhat complex to describe abstractly.

Remember that async functions can be called synchronously at any time, since they never block by design.

If there is a dependency between the start of an async request and the result of another, then the second async request should be executed within the callback of the first instead of simultaneously.

All async objects have state within them indicating whether they are busy or ready if that is needed. However since the callbacks only trigger on transition from busy to ready, it's not typically necessary for an explicit check.


"you could have an attempt to read a unopened file"

In this case, the file open callback tells you that the file is open (or that there was an error). It's not until this point that it is valid to read the file.

"or a file that has been closed before it has been read."

While I actually do have an event for file closed by the OS, this is actually not supported by *nix or windows. What happens is a read/write failure instead. The file is always closed by the user code, therefor it should never happen at an unanticipated time.


"So you'd need some synchronization primitives, which typically means blocking (or spinning, which may be as bad)."

When an IO event occurs, we handle it immediately in our callback. If that means initiating another async IO request, then it is done without blocking. If it means initiating a CPU bound task, then we fire off a thread. Then our async callback function returns without ever having blocked. So we're always ready to handle more IO. Since I use eventfd and pollfd to retrieve IO status from the kernel, I can even retrieve several IO events with a single kernel syscall.

I don't know if I addressed your concern or not?

There are no thread synchronization primitives necessary since every async callback can safely assume that no other async callbacks are running in parallel. The state of the objects it sees will never change behind it's back, so there's no need for a mutex to protect anything.


It's often referred to as a giant state machine, which sounds complex, and it could be if it were designed like a kitchen sink win32 event loop. My implementations encapsulates all that complexity such that tasks don't need to know or care about other tasks, only handling the IO they are concerned with. So the code is the same if it's handling one client or 10K.



"Also, in your resolve_cb, there's a while-loop that tests for the result of an async call. Since it's async, it cannot actually wait for the result of the call, so all it can do is check the result of the invocation of the call, right?"


Right and wrong.

The async_resolve calls are implemented on top of the blocking "gethostbyname2_r" implemented in libc. So when the resolve call is initiated, the async library initializes a proxy thread and executes gethostbyname2_r in the background (this is completely under the hood and does not concern the async users. In theory a native asynchronous gethostbyname function could be written for libc...but realistically it may never happen).

In the meantime, the async program can continue handling events, once gethostbyname2_r is done, the user's callback is executed, in this case resolve_cb.

async_resolve_get_address is a helper function which retrieves all the IPs which were fetched by the libc call, like other async fucntions, it doesn't block since the IPs are already fetched.


"Or do you have some mechanism that blocks the while-loopthread until an answer arrives? I guess not, as that more or less defeates the purpose?"

No, it doesn't block, resolve_cb (callback) isn't executed in the first place until data is available.

Reply Parent Score: 1