Linked by Thom Holwerda on Sun 16th Apr 2006 15:36 UTC
OSNews, Generic OSes Right in between a car crash and Easter, I knew I had to write a Sunday Eve Column. So here I am, digesting vast quantities of chocolate eggs (and I don't even like chocolate), craving for coffee (for me about as special as breathing), with the goal of explaining to you my, well, obsession with microkernels. Why do I like them? Why do I think the microkernel paradigm is superior to the monolithic one? Read on.
Thread beginning with comment 115503
To view parent comment, click here.
To read all comments associated with this story, please click here.
Brendan
Member since:
2005-11-16

Consider a dodgy driver or service that occasionally writes to random addresses.

In a traditional monolithic system, the driver/service would be implemented as part of the kernel and can trash anything that's running on the computer - nothing will stop if from continuing to trash things, and nothing will help to detect which driver or service is faulty.

On a basic micro-kernel the driver/service can't effect anything else in the system, and sooner or later it'd generate a page fault and be terminated. This makes it much easier to find which driver or piece of software was faulty, and means that damage is limited.

In this case, you're still partially screwed because everything that was relying on that driver or service will have problems when that driver/service is terminated. This isn't always a problem though (it depends on what died) - for example, if the driver for the sound card dies then no-one will care much. If the video driver dies then the local user might get annoyed, but you could still login via. network and things like databases and web servers won't be effected.

The more advanced a micro-kernel is the more systems it will have in place to handle failures.

For example, if the video driver dies the OS might tell the GUI about it, try to download/install an updated driver, then restart the video driver and eventually tell the GUI that the video is back up and running. The user might lose video for 3 seconds or something but they can still keep working afterwards (and there'd hopefully be an explanation in the system logs for the system administrators to worry about).

Another way would be to use "redundancy". For example, have one swap partition on "/dev/hda3" and another on "/dev/hdc3" with 2 seperate disk drivers. Writes go to both disk drivers, but reads come from the least loaded disk driver. In this case the system would be able to handle the failure of one swap partition or disk driver (but not both). With fast enough networking, maybe keeping a redundant copy of swap space on another computer is an option..

The point is that for monolithic kernels you don't have these options - if anything in kernel space dies you have to assume that everything in kernel space has become unreliable, and rebooting is the only reliable option (if the code to do a kernel panic and reboot hasn't been trashed too).

Most developers of monolithc systems will say that it's easier to make their drivers and services bug free than it is to implement systems to recover from failures. They may be right, but it might be "wishful thinking" too...

Reply Parent Score: 2

nick Member since:
2006-04-17

What if the soundcard driver gets corrupted and starts
DMA to a random page of memory that was actually some
filesystem's pagecache[*]?

What if a driver goes haywire and starts sending the
wrong IPC messages down the pipe?

Another way would be to use "redundancy". For example, have one swap partition on "/dev/hda3" and another on "/dev/hdc3" with 2 seperate disk drivers. Writes go to both disk drivers, but reads come from the least loaded disk driver. In this case the system would be able to handle the failure of one swap partition or disk driver (but not both). With fast enough networking, maybe keeping a redundant copy of swap space on another computer is an option..

I don't think so. You have to have at least 3 devices
and 3 different drivers and perform checksumming across
all data that comes out of them if you really want to
be able to discard invalid results from a single
driver. Or you could possibly store checksums on disk,
but if you don't trust a single driver...

I think in general it would be far better to go with
RAID, or a redundant cluster wouldn't it?

The point is that for monolithic kernels you don't have these options - if anything in kernel space dies you have to assume that everything in kernel space has become unreliable, and rebooting is the only reliable option (if the code to do a kernel panic and reboot hasn't been trashed too).

A microkernel can fail too, end of story. If you need
really high availability, you need failover clusters.

And within a single machine, I happen to think
hypervisor/exokernel + many monolithic kernels is a
much nicer solution than a microkernel.

[*] Perhaps you might have DMA services in the kernel
and verify all DMA requests are going to/from
driver-local pages, yet more overhead... does any
microkernel do this?

Reply Parent Score: 2

Brendan Member since:
2005-11-16

Hi,

What if the soundcard driver gets corrupted and starts
DMA to a random page of memory that was actually some
filesystem's pagecache[*]?


Then you're screwed regardless of what you do. PCI bus mastering is the only thing a micro-kernel can't protect against (I've never found anything that can protected against it at least, but AMD's virtualization hardware might help - I haven't looked into it so I'm not sure). For the ISA DMA controllers it's easy to force drivers to use a kernel API where decent checking can be done (if you haven't guessed, I'm more for slightly largish micro-kernels than for minimalistic nano-kernels).

What if a driver goes haywire and starts sending the wrong IPC messages down the pipe?

It's standard programming practive (or at least it should be) to always check input parameters before doing anything, especially if these input parameters come from elsewhere (e.g. function parameters, command line arguments, message contents, environment variables, etc). All "message receivers" should also be able to check who sent the message. If the message still passes all of this, then the receiver might do something that isn't desired, but it's very unlikely this would lead to major problems.

I don't think so. You have to have at least 3 devices
and 3 different drivers and perform checksumming across
all data that comes out of them if you really want to
be able to discard invalid results from a single
driver. Or you could possibly store checksums on disk,
but if you don't trust a single driver...


You are right - 2 redundant drivers/services can recover from detectable failures, but 3 are required to detect some types of failure. For a failure like completely crashing (page fault, general protection fault, etc) 2 drivers/services are enough, but for checksumming you need at least 3.

I think in general it would be far better to go with
RAID, or a redundant cluster wouldn't it?


Regardless of how good it is, hardware RAID has at least 2 single points of failure (the device driver and the controller). Having entire redundant computers (or a redundant cluster) is an option for all types of kernels (but it's not so cheap).

A microkernel can fail too, end of story. If you need
really high availability, you need failover clusters.


Of course - but it's easier to find/fix bugs in something small, that isn't cluttered full of every device driver imaginable.

And within a single machine, I happen to think
hypervisor/exokernel + many monolithic kernels is a
much nicer solution than a microkernel.


You mean like running 8 versions of Vista on the same machine so that you can edit text files without worrying about messing up your web server? Hardware manufacturers would love the idea (just think of the extra sales)!

Reply Parent Score: 1

Cloudy Member since:
2006-02-15

The point is that for monolithic kernels you don't have these options - if anything in kernel space dies you have to assume that everything in kernel space has become unreliable, and rebooting is the only reliable option (if the code to do a kernel panic and reboot hasn't been trashed too).

This is true in most implementations, but it is a feature of the implementation rather than a necessity of the system. It is, given reasonable VM design, possible to make the user/supervisor transition distinct from the addressability distinction.

You can have a 'monolithic' kernel in the user/supervisor sense -- meaning that the whole thing is compiled as a unit and all runs in supervisor mode -- without having to have one in the memory addressability sense -- meaning that various subsystems can only access what they're allowed access to.

Reply Parent Score: 2

Brendan Member since:
2005-11-16

This is true in most implementations, but it is a feature of the implementation rather than a necessity of the system. It is, given reasonable VM design, possible to make the user/supervisor transition distinct from the addressability distinction.

Unfortunately, all VM implementations are restricted by what the hardware provides. For (32 bit) 80x86 this means paging and segmentation. Therefore, to seperate access from addressability you'd either need to modify the permission bits in the paging structures during each transition (very time consuming) or use segmentation.

While segmentation could help, it isn't a complete solution - it can provide a distinction between 4 privilege levels, but code at higher privilege levels can trash anything at lower privilege levels (e.g. drivers could trash user space and each other, but not the kernel itself). Of course for portability (even to 64 bit 80x86) you can't rely on segmentation anyway.

I guess it would be possible to design hardware to overcome this problem, but IMHO it'd make more sense to make context switching faster. For e.g. have "CR3 tagged" TLB entries so that address space switches aren't so expensive, which would benefit all kernels to varying degrees and could be added to 80x86 without requiring changes to any existing software.

Reply Parent Score: 1