Kernel Designs Explained

Thom Holwerda 2007-03-21 OS News 53 Comments

I wrote the following article for university. It tries to explain the difference between three kernel types in such a way that less-computer savvy people should understand it. I had a 1500 word limitation, so detailed elaborations were out of the question. “In this article, I will try to make the ‘microkernel vs. monolithic kernel’ debate more accessible and understandable for laymen. I will explain the purpose of a kernel, after which I will detail the differences between the two competing designs. Finally, I will introduce the hybrid design, which aims to combine the two. In the conclusion, I will argue that this hybrid design is the most common kernel type in the world of personal computers today.” Because of the limitations, this article contains little news for most of you. Still, I thought I’d share.

About The Author

Thom Holwerda

Follow me on Mastodon @[email protected]

53 Comments

2007-03-21 10:09 pm

CrazyDude0
In your article you points that in Microkernel various pieces live in user mode, that is not necessarily true.

Microkernel architecture just means that various system components like networking stack, file system stack live in separate address space so that corruption in one doesn’t take down the whole system.

Yes putting them in user space is one way but it is not the only way.

Secondly the performance overhead is because let us say you want to send a file over the network, to do this, you will have to load the file from the disk in one address space, copy it to the other address space (or create some cross-address space mapping) and then let the networking stack access it and send it to the network.

In a monolithic system, all kernel mode components have access to the same address space so as soon as the file system loads the file in memory, the network stack can send it on the wire.

2007-03-21 10:14 pm

czubin
You talking about a hybrid kernel?

2007-03-21 10:25 pm

CrazyDude0
No. I am not talking about hybrid. I am talking about microkernel.

Anyways i think the idea of hybrid kernel seems silly to me. Any well developed OS should have well-separated modules. A hybrid kernel is basically a monolithic kernel with well desigend software components:)

NT is a monolithic kernel because all the drivers in NT shares the same address space. So if network stack corrupts something, you immediately see the blue screen.

Yes NT has some user mode drivers but they are a hack similar to FUSE i.e. a kernel mode driver acting as a proxy to a user mode component.

Edited 2007-03-21 22:31

2007-03-22 12:59 am

SEJeff
The slowness is generally caused by context switches (From kernel to usermode). The only microkernel I know of which doesn’t suffer from these performance problems is QNX.

I believe it is designed like you say (both in kernel space yet a different address space). I could be wrong though.
2007-03-22 1:37 am

butters
In your article you points that in Microkernel various pieces live in user mode, that is not necessarily true.

Well, let’s be clear: code (and data) lives in an address space and runs in a CPU mode. Userspace refers to memory segments whose code and data are accessible when the CPU is running in usermode. Kernelspace refers to memory segments which are only accessible in kernelmode. The firmware reserves some memory for storing the hardware page tables, which the CPU (or more specifically the MMU) uses to enforce the kernel protection domain. The kernel is responsible for enforcing protection domains for each process.

On almost any system, kernelspace segments are permanently mapped into the address space of every process. There is no reason to make a distinction of a “user process.” All processes run in both usermode and kernelmode and contain both userspace and kernelspace in their address space. Interrupts only run in kernelmode, but they don’t have an associated process. Whenever the CPU is executing in kernelmode, it is either running the kernel on behalf of a process or running an interrupt handler. When running in kernelmode, a process has access to its entire address space, including kernelspace, but may only execute instructions in kernelspace (can’t branch to userspace other than through the specified entry points). The kernel usually restricts itself from accessing userspace unless certain precautions are taken, but fundamentally these are voluntary restrictions to ensure the consistency of the system.

So we have two main concepts here: the protection domain that keeps processes from accessing kernelspace while running in usermode and the kernel which isolates the userspace of each process (excluding shared segments). The important thing to note is that the only way to isolate components in separate address spaces is to have them run in usermode. When running in kernelmode, the idea of address space as a form of protection is invalid. There is no memory you can’t access in kernelmode (other than the space reserved by the firmware and assuming we are not in a virtualized environment).

In a microkernel design, all major system services except for IPC, VMM, and interrupt handlers run in usermode, each in their own address space, just like any other untrusted process of the system. It should be noted that VMM and interrupt handlers are two of the most error-prone parts of any operating system, and they run in kernelmode on even the purest microkernel OS. Most drivers install (or modify) an interrupt handler, so third party drivers are still a threat to the consistency of the system. Since interrupt handlers are often non-preemptible and run with interrupts disabled, all sorts of nastiness can happen, from hangs to data loss to irreversible hardware damage. VMM is unquestionably the most challenging kernel subsystem, where simple changes can break everything and latent bugs can sit for years before they ever manifest themselves in a real or synthetic environment.

So when people gush about how microkernel systems run so much less code in kernelmode and have great isolation features, remember that the code that remains in the microkernel is usually the most problematic code in any well-designed operating system.

As far as I’m concerned, there’s only one fundamental purpose for implementing system services in userspace: ease of development. But it only works if you keep it simple. Userspace servers should only communicate directly with the kernel. Once you start having multiple layers of servers or inter-server communication, you’ve destroyed the ease of development and made everything harder than writing good code in a monolithic kernel environment. So FUSE is good. Userspace display drivers are good (i.e. nobody does kernelmode display drivers anymore, not even Microsoft). XNU is… OK. But MINIX, L4, QNX, and other microkernel operating systems will never make in in general-purpose computing.

The bottom line is that if the powers that be wanted us to have operating systems that have strong isolation between system components, then one of the major hardware platform vendors would have implemented segment-granular hardware MMUs many years ago. But the switch to 64-bit addressing came too late, and such technologies are only now showing up on the horizons of high-end UNIX servers.

Edited 2007-03-22 01:47

2007-03-22 2:12 am

czubin
Oh tell me more Oh great one!

ok J/k but what do you mean with “nobody does kernelmode display drivers anymore”? I mean look at nvidia, ATI ,… heck even X.org is talking about kernel drivers for modesetting. Or am I mixing stuff up?

2007-03-22 2:56 am

Xaero_Vincent
Isn’t what you compile in with the proprietary ATI or Nvidia driver the open-source DRM kernel module?

I think the DRM module is the driver’s gateway into the kernel and allows the driver to access low-level system calls and what not.

If I’m wrong, explain how a binary blob is compiled into the kernel as a kernel-space component when no driver source code is available?

Edited 2007-03-22 02:57

2007-03-22 3:46 am

czubin
I though they used an opensource wrapper for it’s binary only driver.
2007-03-22 4:10 am

butters
You’re pretty much right. There’s two components in the kernel for these proprietary drivers: the DRM module and a stub driver. The purpose of the stub driver is to be a sort of translation layer between the “unified” OpenGL driver and the Linux kernel. It’s also there to satisfy the licensing requirements. From a legal standpoint, the binary OpenGL driver is just an application that uses the GPL system calls in the stub driver. There’s no linking, just function calls, and that’s perfectly legal. Sure, the only purpose of the stub interface at the moment is to service the binary blob, but theoretically a GPL graphics driver could use it, too (like Nouveau).

The DRM (that’s direct rendering manager, not to be confused with defective recorded media) is a buffering stage to increase the performance of the userspace drivers. This is used for most graphics drivers in Xorg, although not for very old drivers and “lowest common denominator” drivers like vesa. When the kernel schedules the driver, it processes as many requests as it can during its timeslice (or until there’s no more requests) and buffers the resulting hardware instructions in the DRM. Then the kernel can dish these requests out to the hardware whenever the device issues an interrupt (hey, my instruction cache is getting empty, fetch me more work!).

If there were no DRM, the hardware would issue an interrupt and the kernel would say “ok, I hear ya, and I’ll get on that as soon as possible.” Then it would schedule the driver, copy one chunk’s worth of instructions from userspace, and send it to the hardware. By this time, the instruction cache might already be empty if there’s heavy graphics load, and frames would be dropped. Therefore, this is a kind of DRM that definitely makes your multimedia experience more enjoyable, unlike that other kind…

2007-03-22 3:56 am

edwdig
The bottom line is that if the powers that be wanted us to have operating systems that have strong isolation between system components, then one of the major hardware platform vendors would have implemented segment-granular hardware MMUs many years ago.

That’s what the 386 was designed for. The segmentation support offered much better memory protection than paging. No one chose to use the features, mainly because people tired of dealing with the weird segment:offset addressing scheme on the 8086 were too afraid to give the 386’s improved version of it a chance. As far as I know, Watcom is the only compiler that can do the necessary 48 bit pointers.

Also, the 386 design was limited to 8192 segments (or that many global segments, plus an equal number of process local ones, I forget). If you were able to use a few more bits of the 16 bit selectors, I think it would’ve been great, but with the 8k limit, it felt like you’d have to be rather aware of when you were allocating new segments in a large program.

2007-03-22 5:50 am

butters
I don’t think the global segment limitation would have been as much of a problem as the address space limitation. A standard segmentation layout for a 32-bit address space has 16 segments of 256MB each. This is virtual memory, of course, not physical memory. Typically the kernel would only get one segment mapped into each process’ address space when running in usermode and a few more (4 is common) when running in kernelmode. If the idea was to put different drivers or subsystems in different segments, then this wasn’t nearly enough to be useful.

With 64-bit addressing, we now have room for nearly 70 billion segments (up from 16, remember). Obviously this has a tremendous impact on the feasibility of dividing kernel components by giving them their own segments. As long as we look out for malicious or buggy code, we can hand out segments to whatever kernel modules get loaded without thinking twice about running out.

2007-03-22 12:19 pm

siride
Segments can be of any size.

Furthermore, with the LDT, each process could have around 8192 segments just for itself, that nobody else saw.
2007-03-22 1:35 pm

edwdig
With x86 segmentation, remember, it’s 16:32 for the pointers. There are a few unusable bits in the segment part, but the entire offset part is usable. You can make every segment 4 GB if you want, as long as you set up matching page tables to make it work. Segments can be whatever size you want, down to the byte up to 4 MB. Past that point, you get control in 4KB steps.

Every process would need at least 2 segments – one for code and one for data. You can’t mix both in one segment descriptor. You’d probably want one more segment for read only data.

When I tried thinking about how to use segments, I envisioned it as loading each shared library into its own set of segments, managed to be identically numbered across apps. That way, you’d have a lot less relocations as you’d only need to do the segment part – the offset would be known at link time. No additional relocations would be necessary when a second app decided to use the library. Of course, this scheme would fall apart if you tried running a gnome app and a kde app at the same time due to their insane amount of dependencies.

2007-03-22 5:26 am

CrazyDude0
In a microkernel design, all major system services except for IPC, VMM, and interrupt handlers run in usermode, each in their own address space, just like any other untrusted process of the system

This is not true. It is not technically possible. Any component that wants to program hardware be it a network card driver or a disk driver has to run some code in kernel mode to program the hardware for things like DMA etc.

2007-03-22 5:53 am

butters
That’s what I said…
2007-03-22 9:49 pm

Morin
> Any component that wants to program hardware be it a network card

> driver or a disk driver has to run some code in kernel mode to program

> the hardware for things like DMA etc.

Yes, *some* code, but that code need not be device-specific. In the extreme case you’d have a syscall that reads/writes a word or byte from/to an IO address. That would be slow, but works and it would move the driver to userspace (and maybe not even *that* slow, since the actual transition between modes isn’t performance hell; all the other stuff done routinely in kernel is and can be ommitted for an “outport” syscall, like saving registers, changing the cache’s working set, TLB flushing; you could even run several IO commands as a batch to save the mode transition time).

2007-03-22 9:40 pm

Morin
I fully agree with you, except for this:

> It should be noted that VMM and interrupt handlers are two of the most

> error-prone parts of any operating system, and they run in kernelmode

> on even the purest microkernel OS. Most drivers install (or modify) an

> interrupt handler, so third party drivers are still a threat to the

> consistency of the system. Since interrupt handlers are often non-

> preemptible and run with interrupts disabled, all sorts of nastiness can

> happen, from hangs to data loss to irreversible hardware damage.

In a Microkernel OS, it is possible to separate the “low-level” and “high-level” parts of an interrupt handler and run only the low-level, driver-independent part in kernel mode. This works best if interrupts can be disabled independently (as is the case with the interrupt controller in standard PCs), otherwise you’d still have to rely on the driver to enable interrupts again (through a syscall that is only available to driver processes), even if the driver code runs in user mode.

For the implementation you’d have a syscall (again only available to drivers) that blocks until an interrupt occurs – for the sake of simplicity, this can even be mapped to an existing syscall like down() for semaphores.

2007-03-21 10:32 pm

nicholas
Defination put forward in article falls over on it’s arse when throwing Exec into the mix. 😉

2007-03-21 11:01 pm

jack_perry
It might help if you specified that you are talking about AmigaOS Exec (which I think is what you’re talking about?)

2007-03-21 10:47 pm

ebasconp
Monolithic and microkernel approaches follow very different principles on handling resources, communication and code separation.

Saying that some implementations like FUSE, the X drivers or the new userland audio drivers found in Vista bring the microkernel approach to monolithic operating systems is not correct; if you run L4Linux as a process on top of L4/Fiasco, you are not turning the L4 microkernel approach into a hybrid approach.

2007-03-21 10:56 pm

Thom Holwerda
Saying that some implementations like FUSE, the X drivers or the new userland audio drivers found in Vista bring the microkernel approach to monolithic operating systems is not correct.

I never said they bring the microkernel approach to monolithic kernels. They are bringing microkernel-like functionality.

2007-03-22 3:13 am

nick
I never said they bring the microkernel approach to monolithic kernels. They are bringing microkernel-like functionality.

And what functionality is that?

2007-03-21 10:57 pm

Cloudy
you have good analogies for protected mode operation and separate address spaces, there is no way to describe the difference between mono and micro that will make sense to a lay person.
2007-03-21 10:58 pm

predictor
Nice and all, but the male ass anal*ogy put me off.

2007-03-21 11:00 pm

Thom Holwerda
Hey I didn’t call that an animal that. Go blame your forefathers :p.

2007-03-22 12:21 am

MamiyaOtaru
It is rather ironic that people think of a rear end when they hear ‘ass,’ that ‘ass’ is considered a ‘naughty’ word. It’s especially funny when ‘arse’ is used as a euphamism. Why? The original word was in fact ‘arse!’ The name of an animal that sounded similar (ass) was used as a euphamism, and over time took on the ‘naughty’ characteristics of the original word, which in turn became it’s euphamism.

Poor, poor donkeys

[/OT]

2007-03-22 10:12 pm

helf
Ha! I did not know that

Learn something new everyday… Thanks

2007-03-21 11:06 pm

bsdnewbieee
Microkernel is the way to go in the long run. However, due to the performance impact on messange passing and context switch, microkernel based OS is likely to have a inferior performance compare to monokernel. The way to solve this problem HARDWARE SUPPORT + GOOD DESIGN. e.g. Tagged TLB to eliminate the necessary to flush the whole TLB table.
2007-03-21 11:15 pm

unavowed
This is a very interesting article, which is actually a critique of Hurd but it shows how it works and it’s really esoteric when compared to how monolithic kernels work: http://walfield.org/papers/20070111-walfield-critique-of-the-GNU-Hu…
2007-03-21 11:39 pm

RandomGuy
I still think that the whole hybrid kernel thing is just a buzzword.

It’s like claiming whale were fish because they swim and have fins.

It has nothing to do with hybrids. It’s called “convergent evolution”.

Trying to bring the advantages of a microkernel (mainly stability) to a monolithic kernel is bound to produce a result that’s somehow similar to a microkernel.

On the other hand, making a microkernel as fast as possible will likely result in a lot of shared data structures and less message passing.

To the point where the only message passed is

“I passed you a message. Read it at memory location ‘foo’!”

with ‘foo’ being some shared data structure.

The same thing happend with the RISC vs CISC debate:

Both architectures approached a happy and somewhat sane medium.

RISC gained some complex instructions which were frequently used and what we know as CISC is actually translated into a lot of micro-operations.

Yet we do not call them hybrid architectures – I wish the same could be said of software…

2007-03-22 5:49 am

dnstest
This whole debate seems like (U.S) politics…you have purists/hardliners on two extremes, but in real life the majority sits in the center drawing the best (or worst) from each side. The majority of modern kernels sit in the middle, drawing influence from both extremes.

I honestly don’t know if you could call it evolution instead of hybridizing, maybe it is a bit of both. Then again, isn’t evolution a process of hybridizing by combining the best traits? Maybe I am way off there, just a thought.

With CISC vs. RISC, I see your parallel. I do not see these terms used as often these days, and this is because they no longer have the clear-cut ability to describe an architecture.

2007-03-22 1:44 pm

RandomGuy
“This whole debate seems like (U.S) politics…you have purists/hardliners on two extremes, but in real life the majority sits in the center drawing the best (or worst) from each side.”

That’s a really good comparison! Remember it when the next monolithic vs microkernel debate starts – which is bound to happen 😉

Regarding evolution:

Well, whales were not created by interbreeding mammals with fish.

That would make them hybrids.

Instead, they were simply mammals that lived in an environment better suited for fish and thus adapted a somewhat similar body shape.

By this logic XNU would actually qualify as a hybrid kernel because they used chunks of code from a monolithic BSD kernel. So SEJeff is right.

Windows’ kernel on the other hand seems to be a speed optimized microkernel or a modular monolithic one. Unless, of course, the rumors are true and they did take code from BSD.

2007-03-22 6:10 am

SEJeff
What about XNU, the basis of Darwin that OS X uses? It took Mach, a microkernel, and shoved in features from FreeBSD and a bit of oldschool 4.3BSD and some C++ driver crap.

XNU is quite literally a working example of a hybrid micro/monolithic kernel design as it incorporates aspects of both.

2007-03-22 2:55 pm

crystall
Well, actually XNU removed the memory protection from Mach in exchange for speed and the ability to implement BSD services inside which in my eyes make it mostly a monolithic kernel even though it is based on Mach 3.0 which was a microkernel.

2007-03-22 12:12 am

samad
Thom, make sure you cite the analogy of comparing the computer to a kitchen in the last paragraph of the “What is the kernel?” section, as it is used in Tanenbaum’s book.

2007-03-22 12:24 am

Thom Holwerda
?

I thought of that one myself, actually. I haven’t read Tanenbaum’s book.

2007-03-22 12:19 am

segedunum
There is no such thing as a hybrid kernel. You have characteristics that make your kernel definitely a micro one, or characteristics that make it a monolithic one. You can’t claim that you have a hybrid just because your monolithic kernel has a few modules and DLLs you can bung into it, or you’ve managed to shove some stuff into userspace.

NT and Linux are still monolithic. End.

2007-03-22 5:53 am

dnstest
I agree, but try telling that to a purist. Maybe we need to invent a new term, so we don’t get into these debates.

2007-03-22 12:28 am

Almafeta
… too bad. This would have been within the scope of the article, I think.
2007-03-22 1:12 am

galvanash
Thom, I see you’re still sticking to your guns on the whole “hybrid” label thing. Well I still don’t agree with you, but the article was nicely done for a layman audience. I know it is rather hard to write anything technically oriented for the non-geek without sounding like an idiot to the rest of us. You managed to pull it off for the most part. Good job.

I just wish you would give up on the whole hybrid thing… You are right in saying that hybrids (using your definition) are the most common type of kernel – this is because just about any and all monolithic kernels fall into your definition… Hell, linux can run ITSELF in usermode…

Is it a hybrid? What isn’t?
2007-03-22 2:42 am

ciphernaut
Mules don’t breed well (you don’t end up with more mules).

If the unnamed OS is a mule, and since it breed with malware well, you don’t end up with more operating system…. what do you get?
2007-03-22 2:58 am

bullethead
Taking so long. Partly due to lack of military style leadership, but that’s a given in terms of how complex the GNU/HURD with it’s kernel (Forgot which Mach they are using nowadays) really is, plus who would want such a project to take off into the mainstream? We are talking about one of the most complex things in computer science (microkernel). It’s hard to wrap your head around this stuff, even for a laymen reading the article, actually getting into the technology, that’s something else.

I really respect programmers and information technology workers. Even at work (I’m just the typical “knowledge worker”) I let the whole IT staff know how important they are, since no one seems to care about them until the systems go down

2007-03-22 3:10 am

Xaero_Vincent
Hurd isn’t going anywhere.

Even if the all the fundemental parts of HURD were turning complete there would still be one major problem:

No drivers.

Driver development is probably totally different due to the radically different kernel ABIs. So any effort to port Linux drivers to Hurd would likely require serious hacking, if not a total, full-blown re-write.

Edited 2007-03-22 03:13

2007-03-22 3:25 am

bullethead
Who’s saying use Linux drivers? A total rewrite is the ONLY way to go. Man, back in my day we had to use assembly We had better graphics performance on extended mode DOS than we now have with Vista Aero/DirectX 10. That’s saying something.

2007-03-23 10:21 am

evangs
No, we didn’t.

2007-03-22 5:56 am

dnstest
When it comes down to it, this article was written to be understood by the “layman” (or non-geek). I think it did a pretty good job of accomplishing this. Visitors of OSNews were not the target audience here…

Good job, hope you made the grade!
2007-03-22 8:36 am

cannonball
The 1500 words written were not very interesting to read , stating the same Monolithic vs Microkernel debate that you can read anywhere. Why not pick a unique aspect of the topic like ‘Microkernels offer a superior design over Monolithic Kernels but are they too difficult for most developers to implement efficiently with current technology?’.

2007-03-22 9:25 am

dnstest
Because your suggested topic is geared towards techies. With or without the basics and what was said in this article, many non-technically-inclined persons would not grasp or care about such a topic.

Like I posted above, this was not written for the OSNews audience. It is easy to pick it apart, but remember most people have no idea what a kernel is, much less why a particular type of kernel is easier to implement than another.

2007-03-22 1:24 pm

cannonball
Indeed my suggested topic is a technical one, but then who non-technical is going to be that interested in two major designs for an Operating System Kernel!

It is easy to pick apart something like this. I’ve tried to avoid that by avoiding any mention of the content of the article itself. The content actually seemed fine, but it just wasn’t very interesting. Of course the guy only had 1500 words to play with, but what a marvelous opportunity to write an article that wets the appetite of the reader by posing an interesting question about the debate.

2007-03-22 11:54 am

renox
-Using the 6Million line of the Linux kernel as an estimation of the number of bugs: most of these lines belongs to drivers, a bug in a driver not compiled in your kernel or not loaded will of course not be a problem.

-Saying that the loadable kernel modules of Linux makes it a ‘hybrid kernel’ as it is a micro-kernel approach, sorry but that’s just *wrong*.

Those modules run in kernel space and can do anything they want.

FUSE is a micro-kernel approach though, this I agree.

2007-03-22 11:59 am

Thom Holwerda
Saying that the loadable kernel modules of Linux makes it a ‘hybrid kernel’ as it is a micro-kernel approach, sorry but that’s just *wrong*.

Indeed, that’s wrong. That’s why I said: “gives it microkernel-like functionality“. With the functionality being the ability to insert new functionality into a running monolithic kernel just like a muK can.

When you read an article, it is generally wise to actually read it well, before commenting.

Edited 2007-03-22 12:05 UTC

2007-03-22 1:39 pm

-ujb-
Modern cpus with many registers should be able to overcome most of the speed penalty by actually using many registers.

Microkernels on classic IA-32 is much slower as on – say – ppc.
2007-03-22 9:54 pm

sardaukar
Very basic comment for this crowd…