Linked by nfeske on Thu 26th May 2011 11:41 UTC
OSNews, Generic OSes The concern for efficient and easy-to-use inter-process communication is prevalent among microkernel-based operating systems. Genode has always taken an unorthodox stance on this subject by disregarding the time-tested standard solution of using an IDL compiler in favour of sticking to raw C++ mechanisms. The new version 11.05 of the OS framework takes another leap by introducing a brand new API for implementing procedure calls across process boundaries, facilitating type safety and ease of use, yet still not relying on external tools. Furthermore, the platform support for the Fiasco.OC kernel has been extended to the complete feature set of the framework. The most significant new features are L4Linux (on Fiasco.OC), an experimental integration of GDB, ARM RealView PBX device drivers, and device I/O support for the MicroBlaze platform.
Thread beginning with comment 474799
To read all comments associated with this story, please click here.
Best approach is shared memory buffers.
by axilmar on Fri 27th May 2011 10:19 UTC
axilmar
Member since:
2006-03-20

I don't see why messages have go through the kernel. For me, the best approach for interprocess communication on the same machine is to have two processes share memory, and then when a process A wants to send a message to another process B, then process A simply allocates a buffer from the shared memory, and then informs process B about the message via a semaphore. Then process B reads the message, copies it into a private memory, and then checks it.

In this way, there is no need for context swapping; the kernel need not be invoked at all.

Reply Score: 2

Morin Member since:
2005-12-31

I don't see why messages have go through the kernel. For me, the best approach for interprocess communication on the same machine is to have two processes share memory, and then when a process A wants to send a message to another process B, then process A simply allocates a buffer from the shared memory, and then informs process B about the message via a semaphore. Then process B reads the message, copies it into a private memory, and then checks it.

In this way, there is no need for context swapping; the kernel need not be invoked at all.


(1) A single memory shared by everything is a bottleneck in multiprocessor systems. Caches don't solve this problem, they only hide it behind the cache coherency protocol.

(2) "Going through the kernel" is only slow if you make it slow.

Reply Parent Score: 2

Megol Member since:
2011-04-11

"I don't see why messages have go through the kernel. For me, the best approach for interprocess communication on the same machine is to have two processes share memory, and then when a process A wants to send a message to another process B, then process A simply allocates a buffer from the shared memory, and then informs process B about the message via a semaphore. Then process B reads the message, copies it into a private memory, and then checks it.

In this way, there is no need for context swapping; the kernel need not be invoked at all.


(1) A single memory shared by everything is a bottleneck in multiprocessor systems. Caches don't solve this problem, they only hide it behind the cache coherency protocol.
"

Sharing always have bottlenecks (fundamentally from the speed of light). Sharing memory with caching-aware semantics is the fastest communication a standard processor can have, even pure message passing like the basic QNX primitives still use the same shared memory mechanism.


(2) "Going through the kernel" is only slow if you make it slow.


Like on x86? Depending on the processor and the kernel/user design a pure enter/exit of kernel mode can take some 1000s of clocks (including stalls due to cache/TLB evictions). Add the overhead of the operation. (I am aware that pure null-operations are considerably faster however real code have real overheads)

This means that user-level communications with shared memory can in many cases do spin-locks with lower overheads than using any kernel primitives. Spin+fallback to kernel synchronization is very effective.

Reply Parent Score: 1

axilmar Member since:
2006-03-20

A single memory shared by everything is a bottleneck in multiprocessor systems. Caches don't solve this problem, they only hide it behind the cache coherency protocol.


You can always have shared memory per process couple (sender-receiver).

"Going through the kernel" is only slow if you make it slow.


In modern 80x86 CPUs, it's very slow. It's also always slower than if you don't go through the kernel.

Reply Parent Score: 2

krishna Member since:
2008-08-11

... then when a process A wants to send a message to another process B, then process A simply allocates a buffer from the shared memory, and then informs process B about the message via a semaphore. Then process B reads the message, copies it into a private memory, and then checks it. In this way, there is no need for context swapping; the kernel need not be invoked at all.


Isn't the semaphore also a kernel-provided mechanism, which forces both processes to synchronize via kernel entries? Also, the usage of a shared-memory communication comes not for free as one has to establish the shared memory with each communication partner, allocate buffers in the shared memory, maybe allocate control packets, acknowledge completed messages for buffer reuse and maybe unblock the sender that wants to marshal the next message. From our experience with Genode, this pays off for bulk-data transfer but not for most RPCs with just a few register words of payload.

Modern microkernels like Fiasco.OC support fast inter-process communication with a user-level accessible part of the thread control block - UTCB - with a size of about 64 register words or something more. Processes marshal their message payload into the UTCB, the kernel copies from sender to receiver UTCB, and, finally, the receiver demarshals the data out of the UTCB as needed. Performance-wise this should fit the approach you described, the time for the copy operation is bounded by UTCB size, and there's no shared-memory establishment overhead as only the kernel accesses both UTCBs.

Reply Parent Score: 2

axilmar Member since:
2006-03-20

Isn't the semaphore also a kernel-provided mechanism, which forces both processes to synchronize via kernel entries?


It doesn't have to be a kernel object.

Also, the usage of a shared-memory communication comes not for free as one has to establish the shared memory with each communication partner


Not a problem. The virtual memory subsystem can take care of that.

allocate buffers in the shared memory


Can be done via mutexes (not kernel objects) in shared memory.

acknowledge completed messages for buffer reuse


The buffers are simply freed from the shared memory.

maybe unblock the sender that wants to marshal the next message


Atomically increment the semaphore. If the other CPU spins on the semaphore (i.e. it's a spin lock), then the other process will be unblocked.

Modern microkernels like Fiasco.OC support fast inter-process communication with a user-level accessible part of the thread control block - UTCB - with a size of about 64 register words or something more. Processes marshal their message payload into the UTCB, the kernel copies from sender to receiver UTCB, and, finally, the receiver demarshals the data out of the UTCB as needed. Performance-wise this should fit the approach you described, the time for the copy operation is bounded by UTCB size, and there's no shared-memory establishment overhead as only the kernel accesses both UTCBs.


The 80x86 CPU doesn't have 64 registers available for users. Furthermore, you still do two kernel switches. I think the shared memory approach is faster, at least on 80x86.

Reply Parent Score: 2

pepper Member since:
2007-09-18

I don't see why messages have go through the kernel. For me, the best approach for interprocess communication on the same machine is to have two processes share memory


This is done of course whenever two tasks have to communicate frequently.

But first you need to negotiate this shared memory and enforce access control between tasks. So IPC via kernel is always the first step. And if you don't expect to communicate often, establishing shared mem is only overhead.

IPC performance is indeed a major performance criterion for microkernel systems, but modern kernels can do this quite fast.

Reply Parent Score: 1