Linked by Norman Feske on Thu 15th Aug 2013 22:47 UTC
OSNews, Generic OSes The just released version 13.08 of Genode marks the 5th anniversary of the framework for building component-based OSes with an emphasis on microkernel-based systems. The new version makes Qt version 5.1 available on the entirety of supported kernels, adds tracing capabilities, and vastly improves multi-processor support.
Thread beginning with comment 569758
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE: Comment by jayrulez
by nfeske on Fri 16th Aug 2013 08:03 UTC in reply to "Comment by jayrulez"
nfeske
Member since:
2009-05-27

Thank you for the feedback!

Your question can be answered in three ways: The feature set, the design, and the implementation.

When comparing both kernels feature-wise (I am solely referring to x86), NOVA is more advanced because it has IOMMU support, more complete support for virtualization, and a deterministic way of passing processing time between threads that communicate synchronously.

The design of NOVA is more modern because it was designed from scratch in 2006 (I think) whereas Fiasco.OC is the successor of the L4/Fiasco kernel, which predates NOVA by almost a decade. Of course Fiasco.OC's kernel API has been largely modernized (i.e., use of capabilities) but it still sticks to certain aspects of L4/Fiasco's original design that were discarded by NOVA. For example, Fiasco.OC uses one kernel thread per user thread wheras NOVA uses a single-stack model for the kernel. This relieves the kernel from holding large state (a stack per user thread) and makes the kernel more easy to reason about. Another example is that Fiasco.OC still uses of identity mappings of physical memory for roottask whereas NOVA starts roottask with an empty address space that can be populated in arbitrary ways. A third example is scheduling. Whereas Fiasco.OC simply schedules threads, NOVA de-couples threads from CPU scheduling contexts, which allows for a deterministic flows of execution and the inheritance of priorities across IPC boundaries.

Comparing the implementations is a bit subjective though. Personally, having worked with both kernels, I highly appreciate NOVAs concise code. I see it as a real master piece of software engineering. But you have to judge this aspect by yourself.

Reply Parent Score: 9

RE[2]: Comment by jayrulez
by jayrulez on Fri 16th Aug 2013 15:17 in reply to "RE: Comment by jayrulez"
jayrulez Member since:
2011-10-17

I also appreciate the concise code of NOVA. Comparing the two codebases, Fiasco.OC+L4Re seems a bit convoluted (which I believe is as a result of many people working on the code lots of exprimentation and design changes over the years).

Do you think the NOVA API would fit well on the ARM architecture?

I see you aleady have the base-hw platform for ARM. I don't think doing it at this point would be feasible, but if the situation was different now, do you think it would make sense to port NOVA to ARM for your ARM support rather than creating base-hw?

Reply Parent Score: 2

RE[3]: Comment by jayrulez
by nfeske on Sat 17th Aug 2013 10:29 in reply to "RE[2]: Comment by jayrulez"
nfeske Member since:
2009-05-27

Conceptually, I see nothing that would speak against using NOVA's design on ARM. That said, there is not much incentive by the NOVA developers to bring the kernel to ARM. NOVA is developed at Intel after all.

As of now, our base-hw platform serves us primarily as experimentation ground. For example, we use it to explore ARM TrustZone, or for enabling new ARM platform quickly. At the current stage, it not as complete as Fiasco.OC or NOVA as it does not support MP, nor does it provide protection for capability-based security (yet).

Reply Parent Score: 3

RE[2]: Comment by jayrulez
by Alfman on Fri 16th Aug 2013 16:15 in reply to "RE: Comment by jayrulez"
Alfman Member since:
2011-01-28

nfeske,

"For example, Fiasco.OC uses one kernel thread per user thread wheras NOVA uses a single-stack model for the kernel. This relieves the kernel from holding large state (a stack per user thread) and makes the kernel more easy to reason about."

I'm a big fan of this asynchronous state machine kind of approach versus having a thread-per-request! Does this asynchronous interface transcend into the userspace API?


This brings up memories for me because I wrote an asynchronous library for linux, it's a subject I'm fairly passionate about! I'm disappointed with the state of async in linux. It doesn't support async file IO at all. It ignores O_NONBLOCK and causes processes to block never-the-less. This is bad because it makes most network daemons become serialized around the file system and block continually waiting for disk platter rotations and network file systems for data that isn't cached.

True async libraries such as mine and posix are forced to spawn userspace threads serving no purpose other than to avoid kernel blocking. My library uses async linux socket IO since it abstracts network connections differently than files, but under the posix async library all descriptors are treated generically, and thus it uses blocking threads for all IO including network sockets, thus negating the purpose of async code in the first place to eliminate the need for blocked threads. Another technical problem with threads is how notoriously difficult they are to cancel. But I'm quite off topic by now, getting back to genode...

I really appreciate the analysis of SMP support in your release notes. Its a very interesting technical problem in it's own right. I've found a lot of programmers are not really aware of how expensive SMP synchronization primitives can be. IO-bound code rarely benefits from SMP. Highly multithreaded designs often have hidden bottlenecks due to the implicit serialization required by CPUs maintaining cache coherency, which is the basis for synchronization primitives.

SMP is great for running unrelated workloads on different cores because they really do run in parallel. The question is how should a scheduler distribute the threads? Those with high IPC should be on the same core. If a CPU is at 100%, one should probably try to identify threads with the least IPC and move them to other cores. But then you have to have accounting for such things, which as you've mentioned already is much more complicated than a static approach. A dynamic processor affinity solution without IPC accounting (whether it's implemented in the kernel or in userspace) basically has to resort to guessing which threads should run on which core, it can never be optimal without accounting. But then how much complexity and overhead can be justified to optimize processor affinity?

Anyways, fascinating stuff! I really enjoy technical discussions.

Reply Parent Score: 2

RE[3]: Comment by jayrulez
by nfeske on Sat 17th Aug 2013 11:57 in reply to "RE[2]: Comment by jayrulez"
nfeske Member since:
2009-05-27

For the reasons you stated, Genode's interfaces for I/O (the session interfaces for networking, block access, and file-system) are designed to work asynchronously. This way, it is possible to issue multiple file operations at once and receive a signal on the completion of requests.

Also in general, Genode tries to completely move away from blocking RPC calls. For example, the original timer session interface had a blocking 'sleep' call. On the server-side (in the timer driver), this required either a complicated out-of-order RPC request dispatching or the use of one thread per client. By turning it into an asynchronous interface some months ago, we could greatly simplify the timer and reduce resource usage (by avoiding threads). Another example is the NIC bridge component, which we reworked for the current release. Here the change to modelling the component as a mere state machine improved the performance measurably.

There still exist a few blocking interfaces from the early days, but those will eventually be changed to operate asynchronously too.

However, even though the interfaces are well prepared to a fully asynchronous working software stack, not all server implementations operate this way, yet. For example, the part_blk partition manager dispatches one measly block request at a time. This needs to be fixed.

Your comment about SMP and I/O-bounded work loads is spot-on!

Managing the affinities dynamically at runtime is certainly an interesting project in its own right. In the current release, we have laid the ground work to pursue such ideas. What's missing are good measurement instruments for the thread's behaviour. It would be useful to gather statistics per thread about how much CPU time was actually consumed, how many attempts had been made to perform (costly) IPC to a remote CPU, or how often lock-contention took place, maybe even somehow capturing the access profile of local vs. remote memory. This information could then be fed into an optimization algorithm that tries to minimize a cost model. These are tempting topics for further research.

Reply Parent Score: 3