Linked by Hadrien Grasland on Sun 29th May 2011 09:42 UTC
Thread beginning with comment 475077
To view parent comment, click here.
To read all comments associated with this story, please click here.
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[5]: RPC considered harmful
by Brendan on Tue 31st May 2011 02:44
in reply to "RE[4]: RPC considered harmful"
What I want to do is...
1/Process A gives work to do to process B through a "fast" system call, that in turn calls a function of B in a new thread using a stack of parameters given by A.
2/Process A forgets about it and goes doing something else.
3/When process B is done, it sends a callback to process A through the same mechanism using which A has given B work to do (running a function of A). Callbacks may have parameters, the "results" of the operation.
Does it remind you of something ?
1/Process A gives work to do to process B through a "fast" system call, that in turn calls a function of B in a new thread using a stack of parameters given by A.
2/Process A forgets about it and goes doing something else.
3/When process B is done, it sends a callback to process A through the same mechanism using which A has given B work to do (running a function of A). Callbacks may have parameters, the "results" of the operation.
Does it remind you of something ?
While I can see some similarities between this and asynchronous messaging, there's also significant differences; including the overhead of creating (and eventually destroying) threads, which (in my experience) is the third most expensive operation microkernels do (after creating and destroying processes).
On top of that, (because you can't rely on the queues to serialise access to data structures) programmers would have to rely on something else for reentrancy control; like traditional locking, which is error-prone (lots of programmers find it "hard" and/or screw it up) and adds extra overhead (e.g. mutexes with implied task switches when under lock contention).
I also wouldn't underestimate the effect that IPC overhead will have on the system as a whole (especially for "micro-kernel-like" kernels). For example, if IRQs are delivered to device drivers via. IPC, then on a server under load (with high speed ethernet, for e.g.) you can expect thousands of IRQs per second (and expect to be creating and destroying thousands of threads per second). Once you add normal processes communicating with each other, this could easily go up to "millions per second" under load. If IPC costs twice as much as it does on other OSs, then the resulting system as a whole can be 50% slower than comparable systems (e.g. other micro-kernels) because of the IPC alone.
If you have something like a pipe or message queue, you can implement higher-level IPC protocols on top of it, and use user-space libraries to implement a new IPC mechanism that uses these protocols. That's what I was talking about. But except when trying to make the kernel unusually tiny, I'm not sure it's a good idea either.
In general, any form of IPC can be implemented on top of any other form of IPC. In practice it's not quite that simple because you can't easily emulate the intended interaction with scheduling (blocking/unblocking, etc) in all cases; and even in cases where you can there's typically some extra overhead involved.
The alternative would be if the kernel has inbuilt support for multiple different forms of IPC; which can lead to a "Tower of Babel" situation where it's awkward for different processes (using different types of IPC) to communicate with each other.
Basically, you want the kernel's inbuilt/native IPC to be adequate for most purposes, with little or no scaffolding in user-space.
- Brendan
RE[6]: RPC considered harmful
by Neolander on Tue 31st May 2011 07:26
in reply to "RE[5]: RPC considered harmful"
While I can see some similarities between this and asynchronous messaging, there's also significant differences; including the overhead of creating (and eventually destroying) threads, which (in my experience) is the third most expensive operation microkernels do (after creating and destroying processes).
Ah, Brendan, Brendan, how do you always manage to be so kind and helpful with people who play with OSdeving ? Do you teach it in real life or something ?
Anyway, have you pushed your investigation so far that you know which step of the thread creation process is expensive ? Maybe it's something whose impact can be reduced...
On top of that, (because you can't rely on the queues to serialise access to data structures) programmers would have to rely on something else for reentrancy control; like traditional locking, which is error-prone (lots of programmers find it "hard" and/or screw it up) and adds extra overhead (e.g. mutexes with implied task switches when under lock contention).
This has been pointed out by Alfman, solved by introducing an asynchronous operating mode where pending threads are queued and run one after the other. Sorry for not mentioning it in the post where I try to describe my model, when I noticed the omission it was already too late to edit.
I also wouldn't underestimate the effect that IPC overhead will have on the system as a whole (especially for "micro-kernel-like" kernels).
I know, I know, but then we reach one of those chicken and egg problems which are always torturing me : how do I know that my IPC design is "light enough" without doing measurements on a working system for real-world use cases ? And how do I perform these measurements on something which I'm currently designing and is not implemented yet ?
For example, if IRQs are delivered to device drivers via. IPC, then on a server under load (with high speed ethernet, for e.g.) you can expect thousands of IRQs per second (and expect to be creating and destroying thousands of threads per second). Once you add normal processes communicating with each other, this could easily go up to "millions per second" under load. If IPC costs twice as much as it does on other OSs, then the resulting system as a whole can be 50% slower than comparable systems (e.g. other micro-kernels) because of the IPC alone.
First objection which spontaneously comes to my mind is that this OS is not designed to run on server, but rather on desktop and smaller single-user computers.
Maybe desktop use cases also include the need to endure thousands of IRQ per second though, but I was under the impression that desktop computers are ridiculously powerful compared to what one asks from their OSs and that their reactivity issues rather come from things like poor task scheduling ("running the divx encoding process more often than the window manager") or excessive dependency on disk I/O.
In general, any form of IPC can be implemented on top of any other form of IPC. In practice it's not quite that simple because you can't easily emulate the intended interaction with scheduling (blocking/unblocking, etc) in all cases; and even in cases where you can there's typically some extra overhead involved.
Understood.
The alternative would be if the kernel has inbuilt support for multiple different forms of IPC; which can lead to a "Tower of Babel" situation where it's awkward for different processes (using different types of IPC) to communicate with each other.
Actually, I tend to lean towards this solution, even though I know of the Babel risk and have regularly thought about it, because each IPC mechanism has its strength and weaknesses. As an example, piping and messaging systems are better when processing a stream of data, while remote calls are better suited when giving a process some tasks to do.
You're right that I need to keep the number of available IPC primitives very small regardless of the benefits of each, though, so there's a compromise there and I have to investigate the usefulness of each IPC primitive.
Edited 2011-05-31 07:28 UTC





Member since:
2010-03-08
I got the impression that your "non-blocking call" is a pair of normal/blocking calls, where (for e.g.) the address of the second call is passed as an argument to the first call (a callback). I also got the impression you're intending to optimise the implementation, so that blocking calls that return no data don't actually block (but that's an implementation detail rather than something that effects the conceptual model).
What I want to do is...
1/Process A gives work to do to process B through a "fast" system call, that in turn calls a function of B in a new thread using a stack of parameters given by A.
2/Process A forgets about it and goes doing something else.
3/When process B is done, it sends a callback to process A through the same mechanism using which A has given B work to do (running a function of A). Callbacks may have parameters, the "results" of the operation.
Does it remind you of something ?
For me send_message() and get_message() was like pipe operation (you send messages to or receive messages from the pipe). Sorry if I didn't get it.
Then what I do is definitely not RPC in the usual sense, as it is an asynchronous mechanism too. If the above description reminds you of some better name, please let me now.
If you have something like a pipe or message queue, you can implement higher-level IPC protocols on top of it, and use user-space libraries to implement a new IPC mechanism that uses these protocols. That's what I was talking about. But except when trying to make the kernel unusually tiny, I'm not sure it's a good idea either.
Totally agree.
Edited 2011-05-30 11:57 UTC