Why I Like Microkernels

Thom Holwerda 2006-04-16 OS News 158 Comments

Right in between a car crash and Easter, I knew I had to write a Sunday Eve Column. So here I am, digesting vast quantities of chocolate eggs (and I don’t even like chocolate), craving for coffee (for me about as special as breathing), with the goal of explaining to you my, well, obsession with microkernels. Why do I like them? Why do I think the microkernel paradigm is superior to the monolithic one? Read on.

First a little history lesson. The debate microkernel or monolithic kernel is anything but original. In 1992, Linus Torvalds (you know him?) got entangled in a famous ‘flamewar’ (I wish flamewars were still like that today) with Andy Tanenbaum, one of the men behind MINIX and a book on MINIX which played a major role in Torvalds education concerning writing kernels. Where Tanenbaum had already proven himself with MINIX, a microkernel-based operating system which served as student material for aspiring kernel engineers, Linus was a relative nobody who just started working on Linux, a monolithic kernel. The discussion went back and forth for quite a few posts in comp.os.minix, and it’s a very interesting read.

Anyway, now that I’ve made it clear I’m not treading into uncharted territory, let’s move on to the meat of the matter.

A microkernel (or ‘muK’, where ‘mu’ is the Greek letter which indicates ‘micro’) differs vastly from a monolithic kernel in that in a muK, all things that might potentially bring down the system run outside of kernelspace, in individual processes often known as servers. In MINIX3, for instance, every driver except the clock (I never help but wonder: why the damn clock? I’m going to email Tanenbaum about this one of these days) lives in userspace. If one of those drivers crashes because of a bug, though luck, but it won’t bring the system down. You can just restart the failing driver, and continue as you were. No system-wide crash, no downtime. And there was much rejoicing.

This is of course especially handy during system updates. Say you finally fixed the bug that crashed the above-mentioned driver. You can just stop the old version of the driver, and start the improved version without ever shutting the system down. In theory, this gives you unlimited uptime.

In theory, of course, because there still is a part living in kernelspace that can contain bugs. To stick with MINIX3, its kernel has roughly 3800 lines of code, and thus plenty of room for mistakes. However, Andy Tanenbaum and his team believe that these 3800 lines of code can be made close to bug free, which would bring eternal uptime a step closer. Compare that to the 2.6.x series of the monolithic Linux kernel– which has roughly 6 million lines of code to be made bug free (there goes spending time with the family at Easter).

Another advantage of a microkernel is that of simplicity. In a muK, each driver, filesystem, function, etc., is a separate process living in userspace. This means that on a very local level, muKs are relatively simple and clean, which supposedly makes them easier to maintain. And here we encounter the double-edged sword that is a microkernel; the easier a muK is to maintain on a local level, the harder it is to maintain on a global level.

The logic behind this is relatively easy to understand. I’d like to make my own analogy, were it not for the fact that CTO already made the best analogy possible to explain this local complexity vs. global complexity:

“Take a big heavy beef, chop it into small morsels, wrap those morsels within hygienic plastic bags, and link those bags with strings; whereas each morsel is much smaller than the original beef, the end-result will be heavier than the beef by the weight of the plastic and string, in a ratio inversely proportional to the small size of chops (i.e. the more someone boasts about the local simplicity achieved by his microkernel, the more global complexity he has actually added with regard to similar design without microkernel).” [source]

That explains it well, doesn’t it? Now, this global complexity brings me to the second major drawback of a muK: overhead. Because all of those processes are separated, and need to communicate with one another over greater distances, a muK will inevitably be slower, performance-wise, than a comparably featured monolithic kernel. This is why Linus chose the monolithic model for Linux: in the early ’90s, computers were not really all that powerful, and every possible way to limit overhead was welcomed with open arms.

However, I think the circumstances have changed a lot since those days. Monolithic made sense 16 years ago, but with today’s powerful computers with processors acting as $300 dust collectors most of the time, the advantages of a muK simply outweigh its inevitable, minute, and in user experience probably immeasurable, performance hit.

—

That’s the technical, more objective side of the discussion. However, there’s also a more subjective side to it all. I prefer it when an application or device does one thing, and does it well. It’s why I prefer a component HiFi set over all-in-one ones, it’s why I prefer the GameCube over the Xbox/PS2, it’s why I prefer an ordinary TV plus a seperate DVD recorder over a media centre computer, it’s why I fail to see the point in ‘crossover’ music (why combine hiphop with rock when you suck at both?), it’s why I prefer a manual gearbox over an automatic one (in the Netherlands, we say, ‘automatic is for women’; no offence to the USA where automatic gearboxes are much more popular, since you guys across the pond have longer distances to cover), it’s why I prefer a simple dinner over an expensive 4-stage one in a classy restaurant (the end result is the same: replenishing vital nutrients), it’s why I prefer a straight Martini Bianco over weird cocktails (again, the end result is the same, but I’ll leave that up to your imagination), that’s why I prefer black coffee over cream and sugar with coffee, etc., etc., etc.

—

That leaves me with one thing. Remember how I mentioned that car crash? If you read the accompanied blog post, you might remember how it was caused by a miscalculation on the other driver’s end.

Now, would that have happened if the human brain was like a muK?

–Thom Holwerda

About The Author

Thom Holwerda

Follow me on Mastodon @[email protected]

158 Comments

2006-04-16 3:55 pm

mOOzilla
Hot configurable, plugable and easier to develop for without giving away source (Gfx drivers perhaps)

2006-04-18 5:07 am

proforma
Well Microsoft Windows Vista may not be a microkernel, but it is doing a lot of simular things and I think that is great.

Most of all the drivers are now in the UserMode and 64-bit Vista has a different core even. I think Microsoft is moving in that direction starting with Vista and even in 64-bit mode they are moving even faster in that direction.

o Drivers can crash and not crash the OS (just restart and go). Vista may be able to restart automatically.

o Drivers can be upgraded without rebooting.

o Drivers can be easier to create(program) in usermode.

o Drivers can be easier to test in usermode.

2006-04-16 4:11 pm

sigzero
Well I like Vi and spaces over emacs and tabs! So there!

2006-04-17 6:11 pm

DoctorPepper
But…but what about those of us that like Vi and TABS????

2006-04-17 8:42 pm

egarland
Well.. you guys are just freaks… nobody likes you.

2006-04-16 4:16 pm

mOOzilla
because my MOM said so, so there.

2006-04-17 6:38 pm

espinafre
I’m not sure why the parent was downmodded so swiftly. Please notice the sarcasm in his comment; I see it as an attack to the OS holy wars. Let’s try not to be so obtuse.

2006-04-16 4:18 pm

kadymae
However, I think the circumstances have changed a lot since those days. Monolithic made sense 16 years ago, but with today’s powerful computers with processors acting as $300 dust collectors most of the time, the advantages of a muK simply outweigh its inevitable, minute, and in user experience probably immeasurable, performance hit.

IIRC, the reason OS X is measurably slower at some tasks is because of the microkernel factor. (I seem to remember some sort of kerfluffle about X serves and SQL queries.)

But, on the other hand, there is the legendary stability of QNX.

(in the Netherlands, we say, ‘automatic is for women’; no offence to the USA where automatic gearboxes are much more popular, since you guys across the pond have longer distances to cover)

Y’know, that statement really implies that women are inferior. Next time you see a guy push a small person out of a tight orifice, let me know.

And the reason we have so many automatics over here, is that the US generally has non-existant public transportation, which makes for long slow commutes, and 90 minutes of stop and go with a manual gearbox will have your left knee screaming in pain.

(BTW, I drive standard transmission.)

it’s why I prefer a straight Martini Bianco over weird cocktails (again, the end result is the same, but I’ll leave that up to your imagination)

“Where are my clothes?! Wait. Whose bathtub is this?”

2006-04-16 4:34 pm

galvanash
IIRC, the reason OS X is measurably slower at some tasks is because of the microkernel factor. (I seem to remember some sort of kerfluffle about X serves and SQL queries.)

OSX is not a true muK by academic standards. It is a “modified microkernel” (in much the same way as the Windows NT kernel is, which is also sometimes mislabeled as a muK). It does not at all work the way Thom’s article describes, as the BSD server makes direct function calls into the kernel and does no message passing at all. It is for all intents and purposes a monolithic kernel.

If your interested in how OSX is designed under the hood, read the paper at the following url:

http://www.usenix.org/events/bsdcon02/full_papers/gerbarg/gerbarg_h…

2006-04-16 5:27 pm

kadymae
Galvanash, thanks for the link. Fascinating read. I wasn’t aware of the extent to which the OS X kernel had evolved away from being a microkernel.

2006-04-16 4:46 pm

Slapo
‘the legendary stability of QNX’

Not only is it damn stable, but I have found it to be very fast. Runs off a CD almost as fast as from the HDD as well. Talking about QNX Neutrino 6.2.1.
2006-04-16 5:22 pm

Thom Holwerda
“Where are my clothes?! Wait. Whose bathtub is this?”

Nah, Martini’s different. More like: “Thom, did we really get kicked off of a graveyard last night?” “Yes…” while thinking, thank god girl you didn’t mention that other thing in front of this 30-man crowd that together composes my social life…

:/.
2006-04-17 1:00 am

rapont
it’s why I prefer a straight Martini Bianco over weird cocktails (again, the end result is the same, but I’ll leave that up to your imagination)

“Where are my clothes?! Wait. Whose bathtub is this?”

That almost smacks of British humour – well done hehe!

On a more serious note, I gave up anything reading any of Thom’s editorial content within the first few days of him taking over from Eugenia – it is always:

*Obnoxious

*Arrogant

*Pro-Microsoft

*Anti-Linux

*Pro-European Union

*Anti-American

And his style is just too Dictatorial – always “this is what I say, so it *must* be true – there is no other opinion except for Thom’s”

Apart from that OSNews is still a very good site for a central source of news pertaining to OS’s – otherwise I wouldn’t put up with it.

2006-04-16 4:19 pm

jjezabek
Someone/something has to do the hard stuff. Imagine a userspace driver, which doesn’t crash, but for some reason writes garbage to your screen. It can’t get automatically restarted, because from a microkernel POV everything is in best order. So you either have to restart the server using a remote machine (and hoping that the NIC driver is fine) or simply reboot. I’d say most users will go for the second option.

So – a buggy driver in userspace isn’t much better than a buggy driver in kernelspace.

But yes, I agree, writing drivers in userspace is probably way less difficult than in kernelspace; that’s probably the single most important difference.
2006-04-16 4:24 pm

_tef
The timer has to be in the kernel. How else would the scheduler function?

😀

2006-04-16 6:43 pm

CaptainPinko
There has been talk in nanokernels of making the scheduler user-space to so that different apps could use different schedulers. For example there could be problems with not being MP-safe so you’d want to have finer control, or some legacy code that makes certain exceptions, or a myriad of other reasons.

The timer has to be in the kernel. How else would the scheduler function?

Cooperatively? Don’t laugh. That’s an option and if you can plug-in various schedulers it could be useful, especially for something like Wine emulating Windows 3.11, or if you have some real time (RT) tasks and non-RT. Where you could put the on-RT scheduler as a process with the RT-scheduler.

I don’t recall where I read this, but I believe it was in some paper on a branch of the L4 project.

2006-04-16 4:28 pm

felipe_alfaro
The problems with microkernels is than, in theory, they really look good. However, in practice, running a file system in user-space adds no benefit: if it crashes, how can you restart it? If the memory manager bails out, how can you create a new process and allocate memory for it? The networking subsystem is a good candidate for a user-space subsystem, but what about a diskless-workstation that boots from an NFS export? What would happen if the networking subsystem crashed?

More and more pieces of Linux are being pushed out to userspace. One nice example is used, of FUSE. udev is a central place in Linux device management, but even if it crashes or malfunctions, you can revert to the well-known mknod tool.

2006-04-16 6:48 pm

CaptainPinko
file system in user-space adds no benefit: if it crashes, how can you restart it?

I don’t see the problem. Just restart and block and process that makes an I/O until it’s back a running.

If the memory manager bails out, how can you create a new process and allocate memory for it?

Kill any process allocated by that manager, but any app controlled by another can continue living. You could make an ultra-stable manager for essential functions, and a more powerful-but-complex-so-harder-to-debug one for user apps.

what about a diskless-workstation that boots from an NFS export? What would happen if the networking subsystem crashed

Well it could run off the data it has in memory and/or just block until the connection is re-established. You could have a process that stores the details of your account monitor the connection process, and when the connection dies have it refeed the data that is needed to regain passwords.

2006-04-16 7:08 pm

felipe_alfaro
I don’t see the problem. Just restart and block and process that makes an I/O until it’s back a running.

Usually, restarting a process involves reloading the binary from disk, so if the file system is just to get restarted, how can you load it back from disk to memory? You could keep its text in memory, but what about corrupted data structures?

I still don’t see any advantages of a microkernel system. Just look at Windows NT 4.0 and how they integrated back the graphic drivers back into kernel space due to performance problems. The fact is that, in Windows NT 3.5, a crash in the graphics subsystem left the system unusable. Well, it could keep serving files, but it was impossible for anyone to log in to the system.

2006-04-16 9:32 pm

Cloudy
There’s a misunderstanding here. The fault handling that supposedly comes from microkernels isn’t keeping the failure of a component from causing the system to fail. Rather, it’s keeping the failure of a component confined within that component.

All microkernels do, from an architectural point of view is enforce fault confinment through the use of seperate address spaces.

The problem is that the cost they introduce to provide that separation is higher (usually *much* higher) than the benefit gained.

2006-04-16 4:31 pm

Brendan
Some notes:

The “clock” is probably a source of periodic interrupt used by the scheduler for pre-emptive task switching and timing (things like a “sleep()” function). It probably has nothing to do with “time and date”. Because the scheduler relies on this clock (and everything else relies on the scheduler instead of this clock), building support for it into the microkernel is a very sane thing to do.

A micro-kernel is not necessarily “fail-safe”. For example, if the hard disk driver crashes then you’ll loose all of the data stored in swap space, and having a micro-kernel won’t help to retrieve it. Another example would be the virtual file system – if it crashes then any software that was relying on it will loose their open file handles, etc. The same applies to everything that retains “client state”. With a lot of work a micro-kernel can be more “fail-safe”, but you need systems in place to take care of each type of failure. Without these extra systems it’s not much better than a monolithic kernel (of course “protection” is a different subject).
2006-04-16 4:37 pm

mOOzilla
Come on get some humor implants! It was a joke!
2006-04-16 4:38 pm

mOOzilla
Recently MS blocked some VLKs from updating via automatic updates (mine included). I just bypassed it by some supprisingly easy method
2006-04-16 4:40 pm

stephanem
Is to have a monolithic kernel because as soon as you move to a muK, drivers, modules, filesystems all can become closed source – you are forced to export a “stable” kernel API that Linus and co JUST WON’T DO

2006-04-16 4:52 pm

Brendan
Not having a stable kernel API means constant source code maintenance – you can’t write a piece of software this week and know that it will still work next week because someone might change the kernel API on the weekend.
2006-04-16 5:13 pm

JMcCarthy
Good ganja?

You should consider sharing your views with GNU. Who would’ve thought they’d be wrong?

Drivers, modules, etc. can very will become proprietary with a monolithic GPL kernel. Infact, there only about a dozen examples for Linux.
2006-04-17 4:21 am

jessta
There is currently nothing preventing the making of drivers and modules closed source. As long as they aren’t distributed with the kernel then the GPL doesn’t apply.

– Jesse McNelis

2006-04-16 4:42 pm

mOOzilla
Please, that is what is holding back support for Linux you chump. Linux is dead in the water for consumer boxes if you want EVERYTHING to be open source.

2006-04-16 11:54 pm

smitty_one_each
Linux is dead in the water for consumer boxes if you want EVERYTHING to be open source.

What you say holds true if everyone plays along.

Additionally, in the same spirit of your sentiment, only men could vote in the US until the 19th Ammendment.

2006-04-16 4:46 pm

SEJeff
I have to agree with felipe_alfaro. On technical merits alone, microkernels stomp monolithic kernels into the ground. In practice (and code) it is the other way around.

The #1 problem with the microkernel design… can we say latency? With any microkernel, the latency issue is horrendous. XNU, the kernel in Mac OS X, doesn’t have as much of a latency issue because it is more hybrid than minix 3 or L4. I guess this makes XNU more of a hybrid microkernel, but not even minix 3 is 100% microkernel based. Scheduling and race conditions can also be more of an issue in microkernels as everything runs in userspace.

http://www.minix3.org/vmware.html minix 3 vmware player image

2006-04-16 5:56 pm

Pixie
I have to agree with felipe_alfaro. On technical merits alone, microkernels stomp monolithic kernels into the ground. In practice (and code) it is the other way around.

When you say in practice are you refering to those of AmigaOS, MorphOS, AROS, QNX and many others? I always liked best the idea and practice of micro kernels

2006-04-16 7:27 pm

SEJeff
I am politely saying that Linux which is a monolithic kernel, beats every supposed microkernel I can personally find. Andrew Tannenbaum (whom I deeply respect) says microkernel design is better and I don’t disagree with him. However, his code (minix 3) is still inferior from many angles versus the Linux kernel. Is there a microkernel that even matches the Linux kernel performance wise?

I somehow doubt I will ever see a microkernel that will come close to the speed of RTLinux (realtime linux).

I am really trying to not start a flamewar here.

2006-04-16 7:29 pm

Thom Holwerda
I somehow doubt I will ever see a microkernel that will come close to the speed of RTLinux (realtime linux).

QNX’s Neutrino kernel. QNX is probably the most advanced muK system out there.

2006-04-17 3:21 am

hobgoblin
hmm, a real-time os. while im not fully up to speed, whats the performance of a linux kernel rigged for real time running?
2006-04-17 6:24 am

smitty
I don’t know, but it’s important to remember: RTOS != High Performance OS. The important thing about a RTOS is predictable (and low as possible) latencies. It really doesn’t matter what the overall performance is, though, which means a microkernel might be much more appropriate for a RTOS than one meant for servers.

2006-04-16 4:46 pm

mOOzilla
2006-04-16 4:57 pm

mOOzilla
I cannot EDIT I cannot REPLY and have my reply linked to the parent. I am using FireFox is that why or is OSNews broken in general?

2006-04-17 5:56 am

Sauron
Works fine in NetPositive. Your Firefox is broken

2006-04-17 8:26 am

mOOzilla
After further investigation I see OSNews does not work if It does not get the Referrer header, Which I do not send.

2006-04-16 5:04 pm

ivans
Oh joy. Here it goes again – the ever lasting story of how never-succesful academic bullshit called “microkernel” never made it into the mainstream OSes. And how world would have been a safer place, with no drivers fooling around in ring0, if Cutler/Torvalds/whomever had listened to mr. Tanenabaum and alike.

The microkernel propaganda is mostly composed of three common FUDs:

1) microkernels are more “stable”, because having ALL kernel code besides scheduler/dispatcher, IPC and some basic interrupt routing in ring3 (aka user-mode)

WRONG. If any crucial OS component (such as Memory Manager or FileSystem driver) fails – the whole OS fails as well. Being in ring0 (like the monolithic kernels do it) makes it just crash faster.

2) microkernels are more modular/maintanable/simple/blahblah

Writing modularized code is the basic tenet of software engineering, and it’s called separation of policy from mechanism, and has NOTHING to do with the kernel being monolothic or not.

For example – Linux kernel is highly modular, either from the standpoint of compile time flags, or dynamically expandable functionality via LKMs. Just look at the design goals behind the VFS(used by any filesystem driver) and LSM (Linux Security Modules – utilized by SELinux, DTE..) to know what I mean..

NT kernel (W2K, WS2K3, XP, Vista..) is also very modular – the basic kernel schedulable primitives are encapsulated into so-called “executive components” (MM, I/O, essentialy anything besides scheduler) but are still all being compiled into one big fat binary called NTOSKRNL.EXE. The point is that it’s the code that separates different abstraction policies, not the postcompiled modules organization.

3) It’s easier to write drivers for microkernel.

Because they should, by definition, be living in ring3, and could be debugged in gdb or VS, and not in null-modem connected kernel debugger 🙂 And that’s the reason why printer drivers are in userland ever since W2K, why Vista will have so-called UMDF (User-Mode Driver Foundation) that will enable most, for OS nonessential, cheap hardware to have it’s drivers running in ring3 (if you have XP SP2, check out the WDFMGR.EXE’s process description

And the cons against microkernel? Spending most of your CPU cycles doing context switches and IPC for simple stuff such as page fault handling and ISR dispatching.. That’s the reason why there are NO general-purpose commercially-successful microkernel OSes – that’s right, all Win NT-based, Linux, *BSD, Solaris, HP-UX, AIX, MacOS (aka Darwin – it contains Mach but it is not used as a microkernel, there are the FBSD, IOKIT and drivers stuff in ring0 too!) are monolithic. And those who aren’t, (QNX, EKA2 – SymbianOS nanokernel) are so not because of the “increased stability and security”, but because it enables them to have predictable interrupt latency.

Sorry for bad english.

2006-04-16 5:16 pm

Ronald Vos
Please take note that ring0 and ring3 are Linux-only terms, and meaningly outside discussions of the Linux kernel only. Besides, the NT Kernel and Mac OS X kernel aren’t monolithic, but hybrid.

2006-04-16 5:30 pm

ivans
Please take note that ring0 and ring3 are Linux-only terms, and meaningly outside discussions of the Linux kernel only

LOL, read the Intel manuals dude – it’s OS-agnostic term meaning “privilege level”, and has roots in MULTICS (IIRC it had 8 od them

Besides, the NT Kernel and Mac OS X kernel aren’t monolithic, but hybrid.

Wrong again – NT is even more monolithic than Linux, since it has GUI (win32k.sys) in kernel-mode. Read the relevant literature (Solomon/Russinovich for examble).

As for the MacOS – it does contain the Mach microkernel – but, as I said, it’s not used as a microkernel. See what the author of upcoming “MacOS X Internals” book says on this topic:

http://www.kernelthread.com/mac/osx/arch_xnu.html

XNU’s Mach component is based on Mach 3.0, although it’s not used as a microkernel. The BSD subsystem is part of the kernel and so are various other subsystems that are typically implemented as user-space servers in microkernel systems.

The term “hybrid microkernel” is IMHO just a marketing propaganda dating from mid 1990s when NT 4.0 was released. NT is even more monolithic than traditional UNIXen.

2006-04-16 5:37 pm

Thom Holwerda
Wrong again – NT is even more monolithic than Linux, since it has GUI (win32k.sys) in kernel-mode. Read the relevant literature (Solomon/Russinovich for examble).

Wowwow there, NT is a hybrid kernel. It started out as a true muK (Tanenbaum even compared MINIX to NT), but slowly but surely more things were added into kernelspace. The fact that the GUI is placed in kernelspace does not change the fact that NT is a hybrid kernel. Wikipedia has this to say on it:

“The architecture of the Windows NT operating system line is highly modular, and consists of two main layers: a user mode and a kernel mode. Programs and subsystems in user mode are limited in terms of what system resources they have access to, while the kernel mode has unrestricted access to the system memory and external devices. The kernels of the operating systems in this line are all known as hybrid kernels as their microkernel is essentially the kernel, while higher-level services are implemented by the executive, which exists in kernel mode.”

http://en.wikipedia.org/wiki/Architecture_of_the_Windows_NT_operati…

And with Vista, a lot of parts are moved out of kernelspace again, back into userspace where it belongs. So when Vista comes, the NT kernel has grown more towards its muK origins.

2006-04-16 5:54 pm

ivans
Wowwow there, NT is a hybrid kernel. It started out as a true muK (Tanenbaum even compared MINIX to NT), but slowly but surely more things were added into kernelspace.

No, NT is NOT a microkernel, and was never designed to be one. It has

1) ring0 drivers

2) all non-essential kernel services, which are traditionally implemented as user-mode (ring3) servers in pure microkernels such as Mach, are in ring0

3) it has even super-nonessential stuff in ring0 such as GUI, traditionally in ring3 on *NIX

There is no way to communicate between, lets say, Memory Manager and the Scheduler via formal IPC (msg_send() and msg_receive() in Mach) – they are ALL inside one statically compiled nonseparable module.

Wikipedia article is somewhat clueless, and paradoxical IMHO:

The kernels of the operating systems in this line are all known as hybrid kernels as their microkernel is essentially the kernel, while higher-level services are implemented by the executive, which exists in kernel mode.

So it basically this sentence says: kernel = microkernel, but upper-level services are still ring0, like the microkernel itself 🙂

But if both higher-level services and the “microkernel” are in the same privilege level, they must be in the same address space, meaning no protection, no security, no mutual IPC – ie. no microkernel design at all

As i said – calling NT a hybrid microkernel is just a marketing propaganda from the 90s, when the term microkernel was hot stuff.
2006-04-16 6:10 pm

Thom Holwerda
No, NT is NOT a microkernel, and was never designed to be one.

I’m sorry, but if I have to choose between believing you in saying NT did not start out as a muK, and andy Tanenbaum saying it did, I prefer the latter. No offense.

it has even super-nonessential stuff in ring0 such as GUI, traditionally in ring3 on *NIX

That’s a subjective matter. A kernel is hybrid when it combines both worlds; some parts live in kernelspace, while others do not. Exactly what goes into kernelspace has absolutely nothing to do with it. You might argue that the desktop is not supposed to be in kernelspace– but for a desktop operating system… I’d say the desktop is important and therefore giving it a speed bump by placing it in kernelspace makes sense. The reason the GUI does not live in kernelspace in UNIX is because, well, UNIX is not a desktop operating system. Using UNIX as a benchmark for defining how a desktop operating system should handle its kernel is like asking a butcher to explain to the bakery how to make bread.

But if both higher-level services and the “microkernel” are in the same privilege level, they must be in the same address space, meaning no protection, no security, no mutual IPC – ie. no microkernel design at all

Err… Exactly, because the NT kernel is indeed not a muK, it’s a hybrid. And by hybrid one means that certain parts live in kernelspace, while others don’t. And since NT is very flexiable and modular, they can move stuff in and out of kernelspace relatively easily (hence the GUI in Vista will live in userspace again).
2006-04-16 7:05 pm

ivans
I’m sorry, but if I have to choose between believing you in saying NT did not start out as a muK, and andy Tanenbaum saying it did, I prefer the latter. No offense.

Look Thom, I’ve read most of the available windows internals literature, even that leaked W2K source code, spent countless hours disassembling NTOSKRNL.EXE and even reported some kernel bugs to MS lately, and found absolutely no proof that NT was ever to be or has been a microkernel. Function calls between various executive components are so interleaved that there is absolutely no chance of decoupling them. Ever.

Noone questions Mr. Tanenbaum’s authority on general OS design theory, distributed systems, compiler and automata theory etc. – but I think that in this case, he fell on MS’s marketing crap. Heck, even Cutler said they’re building something different from RSX-11M (which was a microkernel).

Read the relevant literature (Russinovich/Solomon is great for starters), or check out the discussions on this “microkernel issue” on comp.os.ms-windows.programmer.nt.kernel-mode.

That’s a subjective matter. A kernel is hybrid when it combines both worlds; some parts live in kernelspace, while others do not. Exactly what goes into kernelspace has absolutely nothing to do with it.

Oh, it certainly has. “Hybrid microkernel” is in this context so vaguely defined term it can be applied on any OS you like. Let me put it like this: NT has inside kernel-mode (ring0) almost anything that traditional monolithic *nix kernels (Linux, Solaris, *BSD etc.) have + GUI. So can anybody tell me what the hell is so microkernelish about it, that other monolithic OSes don’t have, so that it can be entitled as “hybrid microkernel”???

And since NT is very flexiable and modular, they can move stuff in and out of kernelspace relatively easily

No they can’t I would really like to see a hacker capable of putting, lets say, Object Manager, inside user-mode server process and routing all handle-checking logic via IPC on it 🙂

(hence the GUI in Vista will live in userspace again).

I thought we’ve demistifyed that one..

http://osnews.com/permalink.php?news_id=13007&comment_id=74945
2006-04-16 9:28 pm

Cloudy
Tannenbaum’s wrong on this one. Cutler et al started out designing NT along similar lines to VMS, which he had worked on at DEC. Early Microsoft documentation on the NT structure makes this non-microkernel architecture very apparent. About the time of the Seattle OSDI, Mach was gaining mind share, and Cutler came to OSDI with a presentation declaring NT to be “micro kernel based”. It was, by far, the funniest presentation I’ve ever seen.

Early on, a “microkernel” system was thought to be one in which message-passing was used to separate functional subsystems. This comes from Mach’s predecessors, especially Accent.

Mach was originally intended to be a “pure” message-passing mostly user space OS ala Accent, but DARPA demanded that Mach be Unix compatible if Rashid wanted DARPA funding. From this was born the BSD/Mach hybrid crap that ended up dying such a horrible death as OSF/1.

“microkernel” is a silly idea. It specifies an implementation technology (and one based on the hardware privilige model of a particular processor architecture at that) as a solution to an architectural problem.

That’s the real reason why “microkernel” never made it out of academia into any widespread commercial use.
2006-04-16 9:36 pm

Thom Holwerda
That’s the real reason why “microkernel” never made it out of academia into any widespread commercial use.

And yet there is a muK operating system we all use, we all encounter basically daily: QNX. It powers a lot of medical equipment, cars, home theater equipment, satelites, the robotic arm on the space shuttle, etc.

In fact, QNX is one of the leading RTOS’s as well as one of the leading embedded operating systems, so saying that the muK never made it out of academia is wrong.
2006-04-17 6:35 am

Soulbender
“And yet there is a muK operating system we all use, we all encounter basically daily: QNX”

And it’s also the *ONLY* one that has had any success.
2006-04-17 9:25 pm

galvanash
In fact, QNX is one of the leading RTOS’s as well as one of the leading embedded operating systems, so saying that the muK never made it out of academia is wrong.

Fair enough, whoever said that no muK made it out of academia should take it back. But lets be fair here, the REASON for using a microkernel in a RTOS is because microkernels are by their very nature uniform and predictable, NOT because they perform well. It is MUCH more important for a RTOS to be consistent than fast. What makes them attractive for RTOS usage does not translate to general purpose Operating Systems…
2006-04-16 9:55 pm

Cloudy
I’m sorry, I thought we were discussing general purpose OSes.

The embedded universe (real embedded, not what’s now coming to be called embedded) is a different beast with a different set of requirements.

QNX didn’t start out as a “microkernel” — it even predates Mach. It started out as a small well organized architecture for embedded applications. And it was well done. Also very much bare-boned.

It is the well organized architecture, and not the ‘microkernel’ that made QNX the market leader.

Do you, by the way, recall when QNX got around to adding ‘microkernel’ as a marketing bullet? 2001? when they were already, what, 20 years old.

QNX, by the way, is more like Brevix, in having a small set of base facilities, than it is like a traditional microkernel, because it does not rely on message passing.
2006-04-17 9:01 pm

galvanash
I’m sorry, but if I have to choose between believing you in saying NT did not start out as a muK, and andy Tanenbaum saying it did, I prefer the latter. No offense.

Mr. Tanenbaum never said that NT started as an muK. He would have had no way of knowing that. He was merely parroting what Microsoft was saying as they were promoting their new OS and he foolishly took them at their word. This was WAY before NT shipped btw… You might be interested to see what he has said since then:

http://www.cs.vu.nl/~ast/brown/followup/

Quote straight from the horse’s mouth (about 3 quarters of the way down):

“Microsoft claimed that Windows NT 3.51 was a microkernel. It wasn’t. It wasn’t even close. Even they dropped the claim with NT 4.0.”

You can call it a “Hybrid” if you like, but that is nothing more than marketing bullshit. Hybrids are simply what companies call monolithic kernels to sound cool…

Mr. Tanenbaum defines a muK the same way as most academics, a small kernel that does AT THE MOST memory management, scheduling, and IPC. EVERYTHING else, including all drivers, file systems, etc. should communicate with the kernel from userland. Period. By that definition neither NT nor OSX are even close to being microkernels.
2006-04-16 6:15 pm

siride
You clearly don’t know what you’re talking about. NT has sophisticated IPC mechanisms (such as APC and LPC) and most non-essential services live outside the kernel. The sole exception is win32k.sys and that’s inside the kernel only in the sense that it lives in ring0. It’s still it’s own separate module that used to be in csrss.exe (which still exists as a microkernel style server, mainly for console apps).

The desktop is NOT integrated into the kernel. It’s called explorer.exe and you can easily terminate it from task manager and use something else. A great deal of services run in usermode (ever seen the crapload of svchost.exe instances running?). Yes, the drivers run in kernel space, but this, like win32k.sys, is done for performance reasons and now that computers are fast enough, drivers and graphics will move back out into userspace in Vista.
2006-04-16 7:09 pm

ivans
NT has sophisticated IPC mechanisms (such as APC and LPC) and most non-essential services live outside the kernel.

But you’re missing the point (i didn’t say that NT doesn’t have IPC, it’s just NOT USED for communication between different kernel components, because they live in the same address space – the upper 2 gigs) – in microkernel “nonessential services” are anything but the scheduler/IPC.

YOU CANNOT SEPARATE ANY EXECUTIVE NT KERNEL COMPONENT INSIDE A USER-MODE PROCESS.

The desktop is NOT integrated into the kernel…

I didn’t claim it was. You seem to be constantly proving something I didn’t say, in order to apper correct in other things.

http://en.wikipedia.org/wiki/Ignoratio_elenchi

Anyway, I would like to hear in just which ways is NT more microkernelish than, say Linux? 🙂

2006-04-16 7:49 pm

hustomte
“…and has roots in MULTICS (IIRC it had 8 od them ”

Actually 16, but then again, they didn’t actually use all of them

2006-04-16 8:19 pm

ivans
Actually, it did had 8 hardware rings

http://www.multicians.org/mgr.html#ring

It’s really funny that most OSes today are still being built on 1960s technology
2006-04-17 3:18 am

hobgoblin
if its not broken, dont fix it…
2006-04-18 5:55 am

proforma
>if its not broken, dont fix it…

If you have crap drivers pulling down your OS, then it is broken already.
2006-04-18 2:25 pm

hobgoblin
and may i ask what drivers that is? or for that matter what os one is talking about?

2006-04-16 6:03 pm

bleurgh
ring0 – ring3 are intel terminology. x86 have 4 privilege rings, whereas most processors only have 2 (kernelmode and user mode). This is where the terminology comes from, not Linux.
2006-04-17 6:29 am

Soulbender
“Please take note that ring0 and ring3 are Linux-only terms”

Uh, no. They are operating system design terms where the actual number of rings depends on the processor.

“Besides, the NT Kernel and Mac OS X kernel aren’t monolithic, but hybrid.”

Agreed, and so is the Linux kernel.

2006-04-17 7:22 am

Cloudy
You say that like it’s a good thing, but hybrid is the worst of both worlds. All the overhead of message passing with all of the memory collision issues of single address space.

2006-04-16 5:58 pm

Pixie
And the cons against microkernel? Spending most of your CPU cycles doing context switches and IPC for simple stuff such as page fault handling and ISR dispatching.. That’s the reason why there are NO general-purpose commercially-successful microkernel OSes – that’s right, all Win NT-based, Linux, *BSD, Solaris, HP-UX, AIX, MacOS (aka Darwin – it contains Mach but it is not used as a microkernel, there are the FBSD, IOKIT and drivers stuff in ring0 too!) are monolithic. And those who aren’t, (QNX, EKA2 – SymbianOS nanokernel) are so not because of the “increased stability and security”, but because it enables them to have predictable interrupt latency.

I know, I know, that’s one of the reason all *miga OSs out there feels so slow… 🙄

2006-04-16 5:12 pm

Ronald Vos
For example Exec is mentioned on wikipedia as an ‘atypical microkernel’, which is part microkernel, part hybrid.

http://en.wikipedia.org/wiki/Kernel_%28computer_science%29#…

Also, from what I’ve read some nanokernels with limited amounts of features do fine.

Projects like Gnu HURD are, in my humble opinion, destined to fail if they persist in aiming for a ‘theoretically pure’ implementation anyway; I believe hybrids to feature the best of both worlds. If you use services and modules, you can get performance, customisability, hotreloadability and overseeable complexity at multiple levels all at the same time. No surprise the kernels most used in practice feature either modules or services.

If you want performance, you’d be better off using exokernels anyway. I’m surprised this hasn’t escaped from academia yet.

2006-04-16 6:02 pm

Pixie
If you want performance, you’d be better off using exokernels anyway. I’m surprised this hasn’t escaped from academia yet.

Don’t let EyeAm hear you!

2006-04-16 5:14 pm

gshrblfts Platinum Secundus
I definitely like how the article does not only present positive traits of the microkernel, but also describes the corresponding negative traits. In addition, I think that’s the first time I read that very interesting Linus vs. Tanenbaum thread. Great read overall.
2006-04-16 5:15 pm

JMcCarthy
It’s easy to fall in love with the premise of microkernels when you’ve never actually designed&implemented one.
2006-04-16 5:18 pm

mOOzilla
Just because it is hard does not mean it is bad. Wasnt going to the moon hard? They did that not because it was easy but because it was hard, I think somebody made a speech about that somewhere sometime I cant seem to remember who or where or when.
2006-04-16 5:29 pm

evert
I guess a microkernel makes better use of multiple CPU’s. So it should scale better.

2006-04-16 6:22 pm

renox
Maybe, but I’m not convinced: spreading the work too thin can be a pitfall of parallelisation: CPU contains cache and if a system request goes to several CPUs in a micro-kernel, there could easily be data shared between CPU increasing cache-contention, which reduce efficiency.

Schedulers tries to put related thread on the same CPU, while at the same time balancing the load, this is a tricky balance and I’m not sure micro-kernel helps here.

2006-04-16 6:56 pm

somebody
Yep, doing too much of something can be even worster than doing nothing.

Part for all naysayers about monolithic tree

On the other hand (at least in my simple mind)

Monolithic kernel could simply put out generic `muK` protocol handlers (drivers) that would act on and control outside drivers, like for example `muK` network adapter handle and provide complete protocol needed. This driver would need to act as protocol server and handle the pluged outsiders.

Hierarchy of Linux kernel is just a file system, nothing else. So why couldn’t some drivers provide external `muK` capabilities?

Something being monolithic doesn’t mean it couldn’t be extended with some sub branches that would provide `muK` protocols and handling. Now, one network driver provides and can control few network cards that are in his domain. This would just control more generic domain.

But in my daily reality.

I don’t miss microkernel a bit. So far NVidia driver was the only one I ever installed so far. I just buy most compatible machine that suits my needs and that’s it.

2006-04-17 6:15 am

nick
A microkernel does not inherently make better use of

multiple CPUs than a monolithic kernel. Often quite the

reverse, in fact.

A monolithic kernel isn’t a single “process” or

“thread” (although there can be threads that run solely

in kernel mode). The kernel is called by, and performs

work on behalf of, user processes.

Now say that we have a 16 CPU system with 1 user

process on each CPU and each one would like to allocate

a page of memory. The kernel is notified about this

allocation request somehow (via a page fault, usually),

then asks the memory manager to find a page, return it

to us.

In the Linux kernel, in the common case, all processes

can concurrently enter the memory manager, and none

will have to take a lock which blocks another, because

each can just take a page from per-CPU freelists.

In a microkernel, there would be 16 requests queued up

for the memory manager process, which runs on one CPU

and processes them one at a time. If you would like to

have 16 memory manager threads, then they can now run

concurrently, but they will still block one another

when taking IPC requests off the queue.

Actually, the memory manager *is* something that a high

performance microkernel is likely to multi-thread

and/or put a light-weight per-CPU allocator in front of

the IPC call (oops, no longer a microkernel!)

2006-04-17 7:20 am

Cloudy
When you start talking about MPs, you have to be careful to distinguish between the two strategies of MP resource scheduling design. While it is common, especially the first time an OS is ported to an SMP to have a ‘big kernel lock’ and funnel all requests to a certain processor, that’s an implementation detail and not an artifact of micro versus macro kernel.

2006-04-17 9:58 am

nick
No that’s got nothing to do with what I was talking

about.

I just provided an example to illustrate why a micro

kernel is not inherently more SMP scalable than a

monolithic kernel.

In other words, a monolithic kernel can be just as

threaded / scalable as any microkernel.

2006-04-16 5:30 pm

walnut tree
I like the idea of an application or device that does one thing and does it well. But in the technology field, “convergence” is the name of the game. And this is partly being driven by consumers who want functionality in neat all-in-one packages rather than as separate components.

It’s not always bad though. I prefer a simple mobile phone, but I can see the appeal of a phone with MP3 player and (basic) digital camera. Would you rather have one integrated device or carry three separate devices?

On paper, the microkernal design seems simple and elegant. The modular structure gives you the impression of a plug-and-play system where new functionality can be dropped in without having to re-engineer the whole OS – whether it’s actually like that in real life though is another thing.

2006-04-16 9:36 pm

Cloudy
The problem is that on paper everyone’s block diagram of their system looks modular and elegant. In practice, microkernels don’t give you any modularity. They simply throw more of the crap outside the ‘kernel’ than was in it before. It’s the same crap in a different bag. But now it has all the extra overhead of talking to the microkernel to get things done.

2006-04-16 5:36 pm

JrezIN
Great article.

In my experience, microkernel systems have better stability and are usually faster to general-propose operations (no really talking about automation systems, like car-building Robots’ operation systems, and similar situations…), like a desktop operational system. Maybe it’s the kind of standard organization that the system works around; but still in the end, better systems (talking about the practical results).

Another problem these days that microkernels help with, are licensing problems. Since binary compatibility is much easier to maintain as the kernel itself doesn’t change too much. So, as the system “talks” a single language, code don’t need to “touch” other (license’s) code and the user don’t need to worry about this kind of problems, it’s easier to have everything “just working”…
2006-04-16 5:55 pm

ratatask
I suggest the OP (and others too) read

http://fred.cambridge.ma.us/c.o.r.flame/msg00025.html
2006-04-16 6:10 pm

bleurgh
While the clock driver in minix is in the kernel, it doesn’t not run in kernel mode. Minix has some kernel processes, called the kernel tasks which run in ring1 (kernel is ring0) and the clock is one of them. The reason, i’d assume for the clock to be in the kernel, is so that it can call enqueue and dequeue directly, which is part of the scheduling system though this could be be done equally as easily from a separate process I suppose. Actually, from just looking at the sources, the clock task interrupt is different to the interrupt handlers for the other hardware drivers. Most handlers send a HARD_INT message on every interrupt. You dont want to do this with the clock because it would cause 4 context switches each time, so the solution is to put a special clock interrupt handler in he kernel, so that HARD_INT messages are only called when the need to be, i.e. a process has used up all its time and needed to be switched out.

2006-04-16 7:12 pm

smitty
Yes, for Minix 2 anyway, the reason the clock task interrupt is handled specially is for performance reasons. By putting it in the kernel (and moving away from a pure microkernel design) you can eliminate significant delays.

2006-04-16 7:41 pm

mOOzilla
Somebody should tell NASA/ESA/BAe/Airbus et all that they got it wrong they shouldn’t be using VxWorks and QNX for mission critical systems and jump to Linux. Heads should ROLL in upper management!
2006-04-16 7:43 pm

mOOzilla
Examples of microkernels and operating systems based on microkernels:

* AmigaOS

* Amoeba

* Brainix

* Chorus microkernel

* Coyotos

* EROS

* K42

* LSE/OS (a nanokernel)

* KeyKOS (a nanokernel)

* The L4 microkernel family, used in TUD:OS, GNU Hurd.

* Mach, used in original GNU Hurd, NEXTSTEP, OPENSTEP, and XNU (used in Mac OS X)

* MERT

* Minix

* MorphOS

* Phoenix-RTOS

* QNX

* RadiOS

* Spring operating system

* Symbian OS

* VSTa
2006-04-16 7:48 pm

mOOzilla
Hybrid kernel examples

* BeOS kernel

* DragonFly BSD; being the first non-Mach BSD OS to adopt a hybrid kernel architecture.

* Haiku kernel

* NetWare kernel

* ReactOS kernel

* Windows NT, Windows 2000, Windows XP and Windows Vista kernels

* XNU kernel (core of Darwin, used in Mac OS X)

Monolithic kernel examples

* Traditional UNIX kernels, such as the kernels of the BSDs

* Linux kernel

* The kernel of Solaris

* Some educational kernels, such as Agnix

* MS-DOS, Microsoft Windows 9x series (Windows 95, Windows 98 and Windows 98SE) and Windows Me.

Exokernels

http://66.249.93.104/search?q=cache:kuOFowAHxHIJ:pdos.csail.mit.edu…

2006-04-16 7:55 pm

Thom Holwerda
mOOzilla, it’s fine if you want to post stuff, but please refrain from cutting up your posts when it’s not nescesary. What’s even worse, your lists of kernels are copy/pasted directly from Wikipedia, while you do not give credit where credit’s due.

Got it? Thanks.

2006-04-16 8:07 pm

dylansmrjones
There is no legal need for credit for a simple list like this. But it would have been ethical correct, but it does require that the edit function actually functions

2006-04-16 8:19 pm

Thom Holwerda
There is no legal need for credit for a simple list like this.

I know (as plagiarism is nor a criminal, nor a civil offense), but it’s just polite to give credit where credit’s due. As now it seems as if the guy himself made this list.

But anyway, back on topic.

2006-04-16 7:51 pm

g2devi
Personally, I didn’t think it was that good an article. For people interested in microkernels, this article is a far better read:

http://en.wikipedia.org/wiki/L4_microkernel_family

Basically, from what I’ve read, most of Linus’ original criticisms of microkernels were correct and it wasn’t until the mid 1990s that academics realized this and started from scratch with the bare bones L3 microkernel and then enhanced L4. From what I’ve read, L4 takes care of most of the criticism, although there is still a bit of a performance hit (though nowhere near as bad as Mach). The key thing of L4 is that it tries to focus on things that require ring0 privilege and leaves everything else to the OS. In essense, it’s not all that different from Xen (although they may differ on some details). Given that VMWare and Xen are trying to come up with an API for paravirtualization ( http://osnews.com/comment.php?news_id=14334 ), I wouldn’t be at all surprised if the changes are compatible with the L4 design.

2006-04-16 9:44 pm

Cloudy
Welcome to the late 1960s. L4 is striving mightly to be VM/370 all over again.

2006-04-16 7:57 pm

mOOzilla
Sure I can append a 200 line copyright for every word but I never claimed it was mine its obvious it was pasted and It would have been edited but OSNews is BROKE.

Got it? Thanks.
2006-04-16 9:54 pm

John Nilsson
http://en.wikipedia.org/wiki/Exokernel links to this paper: http://pdos.csail.mit.edu/papers/hotos-jeremiad.ps (Dawson R. Engler & M. Frans Kaashoek, Exterminate All Operating System Abstractions) which I found really interesting.

Anyone care to argue against it? Why is muKernels better than exokernels f.ex.

2006-04-16 10:10 pm

Cloudy
I won’t argume for muK because it’s an implementation technique not an architectural solution, but it’s pretty easy to argue against their basic thesis by observing that his paper is entirely devoid of a discussion of resource sharing.

Engler and Kasshoek completely missed, from the early history of OS design, *why* we put abstractions into operating systems: to share resources.

Throwing away OS abstractions is throwing out the baby with the bathwater. Using microkernels is demanding that all babies be small enough to bathe in a saucepan.

The trick (and it is a *hard* trick) to OS design is putting the right abstractions in place and then implementing them. Abstraction comes first, implementation comes second, iterate until satisfied.

Dennis Ritchie was very good at that. Unix has survived for as long as it has as a usable OS not because of its implementation technology but because of its original set of abstractions.

He seems to have been the last one.

2006-04-16 10:06 pm

nicolai
The concept “exokernel” can be summed up as: The kernel only performs resource allocations, the abstraction from the hardware is pushed down into a userspace library as far as possible.

This concept is completely orthogonal to the muK vs. monolithic question. You could have exokernel micro kernels and exokernel monolithic kernel.

In fact, exokernel design is used in Linux – it’s how the open source 3D drivers work. The DRI people don’t call it an exokernel architecture, but that doesn’t change the fact that it largely is one.

Of course, it’s not “pure” in the sense that the userspace driver cannot write directly into the graphics card’s command buffer, but that can’t be avoided for security reasons. The exokernel approach is rather limited in that way as long as you can’t trust the application.

2006-04-17 12:32 am

Ronald Vos
The concept “exokernel” can be summed up as: The kernel only performs resource allocations, the abstraction from the hardware is pushed down into a userspace library as far as possible.

This concept is completely orthogonal to the muK vs. monolithic question. You could have exokernel micro kernels and exokernel monolithic kernel.

In fact, exokernel design is used in Linux – it’s how the open source 3D drivers work. The DRI people don’t call it an exokernel architecture, but that doesn’t change the fact that it largely is one.

Linux is all about hardware abstraction layers. In fact, it’s layered-ness is what gives it it’s portability. The open source 3D-drivers are an abstraction layer to other software.

Exokernels are all about removing the abstraction layers, by letting software directly interface with the hardware. Kinda like in the times of bad old DOS, only supported by libraries so that programmers don’t have to manually program the interfaces to the hardware.

http://pdos.csail.mit.edu/exo/

Of course, it’s not “pure” in the sense that the userspace driver cannot write directly into the graphics card’s command buffer, but that can’t be avoided for security reasons. The exokernel approach is rather limited in that way as long as you can’t trust the application.

I don’t believe it will work with the traditional UNIX security model. A capability-based security model like implemented in EROS however could work if done properly.

2006-04-17 1:25 am

Cloudy
I dunno what Linux is “all about”, but it ain’t hardware abstraction. And the portability comes from a lot of people trying to force the x86 hardware model onto a lot of other architectures, not from any decent abstraction layer.

whether you can make ‘user space’ drivers work or not has nothing to do with the security model of the OS and everything to do with the virtual memory and i/o model of the underlaying hardware. More than anything else it has to do with how those things interact in a context switch.

There *are* hardware models where you can’t do user space drivers for the simple reason that hardware i/o is priviliged so all i/o processing has to be done in ‘ring 0’.

Then there are hardware models like the x86 that screw up the relationship between memory addressability and memory accessability making it difficult to switch efficiently.

Occassionally an architecture comes along like the original machines that Multics was designed for or like the HP Prism (PA-RISC) architecture that get everything right wrt to i/o processing, privilige, and addressability. On those you can have a lot of fun with user space “kernel” features.

(Which reminds me, the other Linux (in the sense that Linus uses the name) is not, is an operating system. It’s a massive kernel and a set of facilities.

I guess there’s some abstraction buried in the ad-hoc crap somewhere, but if it didn’t come from Unix, I sure haven’t been able to find it.

2006-04-17 5:53 am

nick
I dunno what Linux is “all about”, but it ain’t hardware abstraction. And the portability comes from a lot of people trying to force the x86 hardware model onto a lot of other architectures, not from any decent abstraction layer.

Excuse me, Cloudy, I’d like some clarification on this

comment please, because it bugs me when I see it

without anything to back it up with. I just registered

now so I could ask you.

I don’t know when you last looked at the Linux kernel

source code, but Linux 2.6 doesn’t appear to do

anything like “force the x86 hardware model” onto other

architectures.

As you might know, Linux 2.6 is the most ported OS in

the world (in terms of CPU ISAs, not “platforms” that

NetBSD counts). It runs on architectures without MMUs,

without pagetables, systems with 1024 CPUs and 16TB+ of

RAM. Its i386 (nor the x64) architecture port can’t do

any of these things.

How successful are these non-i386 ports?

The SPARC Niagara is a very different architecture

from i386, and yet Linux on the Niagara is nearly 50%

faster than Solaris 10, and 10% faster than Solaris

Express (OSes which are explicitly designed to run on

SPARCs).

http://www.stdlib.net/~colmmacc/category/niagara/

On SPECjbb, Linux is within 90% of AIX on 32 core

POWER5 systems (although not exactly the same user

software and that was a couple of years ago).

Linux is much, much faster than OSX on PowerPC hardware

(G5 systems). No link, but you shouldn’t have trouble

finding results on google.

On Itanium systems it holds its own against MS Windows

and HP-UX, although it is hard to find benchmarks on

identical systems (tpcc and tpch give some ideas).

2006-04-17 7:17 am

Cloudy
Last time I looked at Linux kernel code: about half an hour ago. 2.6.17-git. ARM OMAP730, to be precise, as I’ve been working on it again.

For years, at various times, I was responsible for Linux kernels on various embedded platforms.

It’s not the most ported OS in the world in terms of ISAs. Unix (meaning the Bell Labs RE code base) has been ported, in one form or another to not only numerically more ISAs, but dramatically different ISAs than Linux has been ported to.

None of your comments address the point I raised, by the way. Linux has been ported to a wide range of architetures. Not as wide a range as we had ported Unix to back in the 80s, but then, the range of architectures around now isn’t as wide as it was then, by a large amount.

But the way it was ported was not through implementing a Linux device abstraction layer on top of the hardware of various systems as has been claimed, for the simple reason that there is *no* Linux device abstraction layer.

Linux, sort of, has a Unix device abstraction model, with various warts on the side, but that’s only in the interface that’s visible across the user/kernel boundary.

Internally, it doesn’t have device abstractions. (Do I have to dig up the URL of Linus’ famous quote about interfaces?)

Not only does it not have device abstractions, but the device layer changes. Frequently. Drivers for 2.6.8 won’t work on 2.6.10. Similarly 2.6.10 drivers will break on 2.6.12. Interfaces change all the time. Device models vary at the whim of developers.

Take, for instance, the famous, on-going battle over the SCSI abstration layer(s) and how they should be implemented.

Or, to pick another random example, the five different device naming schemes floating around the kernel, in various states of repair, with various levels of support.

It’s a fun system. I enjoy recapturing the 80s. But it ain’t one based on any abstraction layers.
2006-04-17 9:01 am

nick
Last time I looked at Linux kernel code: about half an hour ago. 2.6.17-git. ARM OMAP730, to be precise, as I’ve been working on it again.

OK that’s great, but it doesn’t explain why you didn’t

answer my fundamental question: how does Linux force

the “x86 model” on other architectures?!?

It’s not the most ported OS in the world in terms of ISAs. Unix (meaning the Bell Labs RE code base) has been ported, in one form or another to not only numerically more ISAs, but dramatically different ISAs than Linux has been ported to.

Portability — I’m talking about being compiled from

the same source base. Obviously pretty much any bit of

software can be “ported” to another system, but it

takes more finesse to support both systems at the same

time.

Maybe I shouldn’t say most ported, but it is the most

portable operating system. Whatever. Semantics. I’m

sure you understand what I mean.

But the way it was ported was not through implementing a Linux device abstraction layer on top of the hardware of various systems as has been claimed, for the simple reason that there is *no* Linux device abstraction layer.

Sorry, there are plenty. There is a pci layer, which is

basically evolving into a generic bus layer. There is a

network device layer, a block device layer, there is a

MMU / virtual memory abstraction, a CPU context

abstraction, etc etc.

Not only does it not have device abstractions, but the device layer changes. Frequently. Drivers for 2.6.8 won’t work on 2.6.10. Similarly 2.6.10 drivers will break on 2.6.12. Interfaces change all the time. Device models vary at the whim of developers.

Of course, this is one of the reasons why it is a good

system IMO.

Take, for instance, the famous, on-going battle over the SCSI abstration layer(s) and how they should be implemented.

Err, how should I “take” that? The fact that there is a

battle over the abstraction layer should suggest to you

that there is actually an abstraction layer in place.

Or, to pick another random example, the five different device naming schemes floating around the kernel, in various states of repair, with various levels of support.

Huh? There is and always has been the (major,minor)

system and the network interface system for char,

block, and network drivers. And that’s fully supported.

There is the now obsolete devfs which was mostly

replaced by sysfs, to represent device topology. Sysfs

doesn’t impose a device naming scheme though.

So at most there are 2 device naming schemes in the

kernel, major,minor and devfs. 3 if you count sysfs,

2 of which are fully supported.

It’s a fun system. I enjoy recapturing the 80s. But

it ain’t one based on any abstraction layers.

Well it has many abstraction layers so you’re just

plain wrong. I don’t know what you think an abstraction

layer is though… maybe you should put down the crack

pipe before posting again.
2006-04-17 8:16 pm

Cloudy
With respect to the question about the x86 model, you answer your own question. There is, as you say, a “PCI layer”. Not, you will note, an abstraction of control busses, but a layer specific to a particular kind of bus.

This, I think, is the point you’re missing. Having layers is not the same thing as having abstraction layers. The PCI layer is very much a PCI layer, and not a bus layer. There is no bus abstraction layer that the PCI layer is derived from. So for SCSI, there is the SCSI layer, which is not at all similar to the PCI layer, even when SCSI is similar to PCI.

Second, The RE code base was a single code base, and ported to a far wider range of ISAs than Linux has ever been ported to. How many bit-addressable machines does Linux run on? How many 48 bit-machines? How many machines on which char is not 8 bits?

Computer architectures are much more similar now than they were in the 80s, and even by the 80s they were starting to homogonize.

Discovering the remaining device name spaces is left as an exercise to the reader: (hint, you missed proc) Oh, by the way, devfs may be obsolete, but it’s still in the tree, and still widely used.

Speaking of which, you are incorrect in your claim that Linux is ported from a single code base. The last time I counted, there were something like a dozen interrelated code bases that I was interested in. You can’t, for example, boot the TS7200 I’m working with right now, from any of the kernel.org trees, not even the tip of git.

Linux, which is just a kernel and not an OS, has many layers. They are not, however, *abstraction* layers.
2006-04-18 3:11 am

nick
With respect to the question about the x86 model, you answer your own question. There is, as you say, a “PCI layer”. Not, you will note, an abstraction of control busses, but a layer specific to a particular kind of bus.

Well, when I look around I see PCI busses in your

beloved PARISC to IA64 Altixes, Alphas, POWER servers,

even mainframes. So not only have I not answered my

own question, you still haven’t answered it either.

I’m waiting, please.

This, I think, is the point you’re missing. Having layers is not the same thing as having abstraction layers. The PCI layer is very much a PCI layer, and not a bus layer. There is no bus abstraction layer that the PCI layer is derived from. So for SCSI, there is the SCSI layer, which is not at all similar to the PCI layer, even when SCSI is similar to PCI.

Umm, no. The pci layer does a lot of work at

abstracting PCI bus implementation details. It

definitely abstracts PCI programming. The pci API

is actually mostly good enough to abstract other

busses as well.

As for the SCSI comment… no comment.

Second, The RE code base was a single code base, and ported to a far wider range of ISAs than Linux has ever been ported to. How many bit-addressable machines does Linux run on? How many 48 bit-machines? How many machines on which char is not 8 bits?

I really don’t care. I didn’t say anything about that.

I didn’t say a wider range of ISAs (whatever that might

mean).

Linux runs on the most architectures.

Discovering the remaining device name spaces is left as an exercise to the reader: (hint, you missed proc) Oh, by the way, devfs may be obsolete, but it’s still in the tree, and still widely used.

I didn’t miss proc (hint, it isn’t a device naming

scheme).

Speaking of which, you are incorrect in your claim that Linux is ported from a single code base. The last time I counted, there were something like a dozen interrelated code bases that I was interested in. You can’t, for example, boot the TS7200 I’m working with right now, from any of the kernel.org trees, not even the tip of git.

I’m not talking about any of those branches. I’m

talking about Linus’ tree. If you count branches and

forks, then you end up having even more.

Linux, which is just a kernel and not an OS, has many layers. They are not, however, *abstraction* layers.

Well you are wrong.
2006-04-18 4:01 am

Cloudy
A lot of machines have PCI busses. None of the embedded systems I’m developing for do. Where is the layer that lives *above* PCI bus and provides the abstractions that it would use in common with other busses? And no, the PCI bus isn’t a good model for those machines.

By the way, trying to model non-PCI-like busses as if they were PCI-like is a very good example of what I meant by saying that people port Linux by trying to force other architectures to look like X86. Thanks for coming up with it.

And yes, you were the one who brought up ISAs. Your exact quote was As you might know, Linux 2.6 is the most ported OS in

the world (in terms of CPU ISAs, not “platforms” that

NetBSD counts)

And no, Linux doesn’t run on “the most architectures”. which is, no doubt why you didn’t bother to quote my questions about various architectures that Unix ran on that Linux doesn’t.

You did miss proc, and it does contain a device naming scheme. Used, among other things, for identifying USB devices.

If you’re talking about Linus’ tree, than the number of platforms and architectures actually supported is far smaller. Many platforms aren’t supported out of the git tree.

Linux has been ported to a surprisingly small number of distintict architectures; especially given how similar “modern” architectures are to each other.
2006-04-18 4:26 am

nick
A lot of machines have PCI busses. None of the embedded systems I’m developing for do. Where is the layer that lives *above* PCI bus and provides the abstractions that it would use in common with other busses? And no, the PCI bus isn’t a good model for those machines.

Yet another strawman. I never said the PCI bus is a

good model. Linux’s PCI API is though (or is close).

By the way, trying to model non-PCI-like busses as if they were PCI-like is a very good example of what I meant by saying that people port Linux by trying to force other architectures to look like X86. Thanks for coming up with it.

What part of “PCI has nothing to do with x86” is too

much for you to comprehend?

I’m still waiting for my answer, by the way.

And no, Linux doesn’t run on “the most architectures”. which is, no doubt why you didn’t bother to quote my questions about various architectures that Unix ran on that Linux doesn’t.

They’re many different source bases and variants of

UNIX. I’m talking about a single source tree, as I

have pretty explicitly said, multiple times.

You did miss proc, and it does contain a device naming scheme. Used, among other things, for identifying USB devices.

Well that isn’t proc, it is actually another filesystem

completely (usbfs).

If you’re talking about Linus’ tree, than the number of platforms and architectures actually supported is far smaller. Many platforms aren’t supported out of the git tree.

I am. About 15-25 CPU architectures (depending on how

you count them).
2006-04-18 5:21 am

Cloudy
No, the Linux PCI model is not a good model for non-PCI-like busses. That’s why we don’t use it elsewhere in Linux for non-PCI-like busses.

It’s also not an abstraction. It’s a mechanism for dealing with exactly one kind of bus.

You have had your answer. You’ve given it yourself.

“you missed proc” was a pointer to the place where you needed to look for additional namespaces you hadn’t enumerated, including the usbfs namespace.

Yes, there are many source bases and variants of Unix. But I mentioned one specific source base: The Bell Labs RE source base, which has been ported to far more than 15-25 CPU architectures.

if you’re kind, count 32 bit version of an architecture seperately from 64 bit versons of the same architecture, and include variants of the same architecture ported at different times, perhaps Linux has been ported to 25 architectures.

If, on the other hand, you look in arch and lump the CPU architectures together reasonably, it’s closer to 10, and most of those are just like each other in being bus based, harvard architecture, load/store, memory-mapped i/o, machines.

RE Unix was ported to more different classes of architectures than that, from a single source tree that included an entire operating system.

Not that this matters, as the point isn’t how many ports of Linux there are or aren’t, but rather that the way in which it has been ported has nothing to do with it possessing any abstraction layers that make it portable.

If you think otherwise, kindly explain what the abstract bus layer is that the PCI bus is derived from so that when I next port Linux to a device with no PCI bus I’ll be able to take advantage of this abstraction.
2006-04-18 5:52 am

nick
blah blah blah

OK that’s great. I really don’t care about trying to

teach you what an abstraction layer is, or any of

those other arguments.

Let’s get back to my real question: how does Linux

force the x86 model on other architectures? (and no

blathering about PCI, please).

You’d better get that straight with yourself first

before trying to port Linux to anything.
2006-04-18 7:02 am

Cloudy
Given that I’ve already ported Linux several times, I’m going to guess that I’ve gotten that straight myself.

The PCI is something you brought up. But as it is a good demonstration of how Linux is ported by force fitting the x86 model on other architectures, I can see why you would suddenly want to stop talking about it.

How the x86 model is made the basis for other ports is easy to see. You’ve shown it twice, I’ve shown it a couple times.

The x86 implementation came first. When an opportunity to expand to another system has occured, the kernel was not refactored to take into account the simularities and differences, but rather, the new system’s drivers, vm support, and so forth, were written to use the interfaces that were drawn up first for the x86 architecture.

One of the clearest ways of seeing this is to look in the source tree and see how much of what should be common code is duplicated between various ports, rather than shared. An easy example is drivers/eisa and drivers/parisc — which contains yet another eisa driver.

Here’s the question you keep refusing to answer: If there’s a buss abstraction layer, where is it? It’s not PCI, because eisa is certainly not derived from PCI. So what is the common base that PCI and eisa both share?
2006-04-18 7:36 am

nick
Given that I’ve already ported Linux several times, I’m going to guess that I’ve gotten that straight myself.

I don’t think so. Adding support for some embedded

platform is not really “porting” Linux. Especially

under the ARM port, which makes these things really

easy.

But hey I’m a kernel developer for my day job too

(and as a hobby), big deal.

The PCI is something you brought up. But as it is a good demonstration of how Linux is ported by force fitting the x86 model on other architectures, I can see why you would suddenly want to stop talking about it.

Garrh! PCI has nothing to do with x86 for the upteenth

time.

How the x86 model is made the basis for other ports is easy to see. You’ve shown it twice, I’ve shown it a couple times.

The x86 implementation came first. When an opportunity to expand to another system has occured, the kernel was not refactored to take into account the simularities and differences, but rather, the new system’s drivers, vm support, and so forth, were written to use the interfaces that were drawn up first for the x86 architecture.

x86 was first, and in the past Linux has had some x86

centric designs. Linux 2.6 does not force the x86 model

on other architectures.

You keep saying that I have demonstrated that it DOES.

Rubbish. I have not. If it is apparently so easy, why

don’t you demonstrate how it does. Enough handwaving,

point out a specific “x86 model” that Linux forces on

other architectures. You can point to source files and

line numbers too if you like. I dare you.

One of the clearest ways of seeing this is to look in the source tree and see how much of what should be common code is duplicated between various ports, rather than shared. An easy example is drivers/eisa and drivers/parisc — which contains yet another eisa driver.

I don’t know the code, but presumably it is the parisc

implementation for the EISA layer. I don’t claim the

code is sparkling clean or perfect, but this little

example doesn’t prove anything about forcing the x86

model on other architectures.

If you want to see how much code is duplicated — there

is some, but architecture ports on Linux for roughly

equivalent support and features, are smaller than those

of NetBSD (in terms of LOC).

Finally, duplicated code in no way implies that any

x86 feature is forced upon other architectures.

Here’s the question you keep refusing to answer: If there’s a buss abstraction layer, where is it? It’s not PCI, because eisa is certainly not derived from PCI. So what is the common base that PCI and eisa both share?

Another strawman, I never said there is a bus

abstraction layer. I said there is a PCI abstraction

layer, and I also said that it is evolving into a

generic bus layer (which it is and maybe eventually

it will be).

What’s more, I never “kept refusing to answer” that

question because that’s the first time you’ve asked it.

Here is one you keep refusing to answer: why do

you claim that Linux “forces the x86 model on other

architectures”?
2006-04-18 8:05 am

Cloudy
How do you conclude that by “porting Linux” I meant “adding support for some embedded platform”?

There isn’t, by the way, “the arm port”. ARM support is fragmented along processor family lines and different processor families are in different shape and supported out of different trees.

Well, we’re making some progress. You’ve admitted that “in the past” Linux had x86-centric designs. Now all we have to do is get you to admit that in, say, power management, where the APCI approach is being taken, the design is stillx86-centric.

Since, at last count, the total size of the netbsd kernel, in lines of code, was smaller than the Linux kernel, I hope you don’t mind if I don’t believe your assertion about LOC.

We are discussing more than one thing. The comments about code duplication were clearly in a portion of the discussion related to the lack of abstraction in the linux kernel, not to the comment about the x86 model.

Actually, you did say there was a generic bus layer, or, more precisely, quoting you “There is a pci layer, which is basically evolving into a generic bus layer.” This is the third time you’ve claimed not to have said something that you’ve said. The gentlemanly way to retract an incorrect statement is not to deny having made it, especially when it’s in print, but rather to simply retract it.

You are correct that I did not specifically ask you to demonstrate the buss abstraction layer previously. I apologize for claiming you’d avoided the question.

I have answered your question. I have even given you yet another example. Now I shall dare you: admit that Linux is not the most ported operating system; that it lacks hardware abstraction layers; and that the PCI “generic bus” and the ACPI-derived power management model are examples of how Linux continues to be ported by imposing a PC like model onto other architectures.
2006-04-18 8:23 am

nick
How do you conclude that by “porting Linux” I meant “adding support for some embedded platform”?

I guessed.

There isn’t, by the way, “the arm port”. ARM support is fragmented along processor family lines and different processor families are in different shape and supported out of different trees.

Jesus, I’m talking about the main git kernel tree as

I keep telling you for the billionth time.

And there is an ARM port for that. It includes good

abstracted support for platforms under the several

ARM CPU ISAs that it supports. This might have confused

you seeing as you don’t know what an abstraction is.

Well, we’re making some progress. You’ve admitted that “in the past” Linux had x86-centric designs. Now all we have to do is get you to admit that in, say, power management, where the APCI approach is being taken, the design is stillx86-centric.

It is ACPI, by the way, but no, it is used quite

happily on ia64 systems as well. I’m not aware of

any x86 model that is forced upon those ia64 systems.

Since, at last count, the total size of the netbsd kernel, in lines of code, was smaller than the Linux kernel, I hope you don’t mind if I don’t believe your assertion about LOC.

I’m talking about the machine specific code for the

ports, in each case. Count them yourself if you don’t

believe me (Alpha is a good example of a static

platform with rough feature equivalence, although Linux

has far better SMP support).

Of course Linux is much bigger in total, it contains

far more architectures, drivers and features.

We are discussing more than one thing. The comments about code duplication were clearly in a portion of the discussion related to the lack of abstraction in the linux kernel, not to the comment about the x86 model.

But it doesn’t necessarily signal that, either. Again,

you make the claim you should provide the proof.

But just to oblige you, I’ll provide you with a

counterexample: include/asm-*/atomic.h

This is a nice abstraction of all the machine specific

details to perform various atomic operations. Much

code is duplicated, but it is a great abstraction.

Actually, you did say there was a generic bus layer, or, more precisely, quoting you “There is a pci layer, which is basically evolving into a generic bus layer.” This is the third time you’ve claimed not to have said something that you’ve said. The gentlemanly way to retract an incorrect statement is not to deny having made it, especially when it’s in print, but rather to simply retract it.

Let me help you: “evolvING”

I have answered your question. I have even given you yet another example. Now I shall dare you: admit that Linux is not the most ported operating system; that it

If initially said it was the most _ported_, that wasn’t

what I meant. It supports the most architectures from

the same source tree. The Linux source tree is the

most portable operating system source tree.

lacks hardware abstraction layers; and that the PCI “generic bus” and the ACPI-derived power management model are examples of how Linux continues to be ported by imposing a PC like model onto other architectures.

No I don’t “admit” any of that because it is all wrong.
2006-04-18 8:45 am

Cloudy
Don’t guess, you’re not very good at it.

You are talking about the main kernel git tree when you talk about how many linux ports there are. “the arm port” doesn’t exist in that tree, as I was trying to explain.

Duplication of code is by definition evidence of a lack of abstraction. Are you sure you understand abstraction?

atomic.h turns out not to be a terribly good abstraction. It give headaches on various platforms for which mimicing the interfaces it provides are expensive.

If initially said it was the most _ported_, that wasn’t what I meant. It supports the most architectures from the same source tree. The Linux source tree is the

most portable operating system source tree.

It supports no more architectures from the git tree than NetBSD does. Neither support as many architectures as Research Edition Unix did.

You become upset and claim that I assert your arguments are wrong without refuting them, and yet, you end your post with No I don’t “admit” any of that because it is all wrong. without making a single attempt to refute the acpi example.

Do you deny that the power management work in linux right now is aimed almost entirely at supporting laptops? Do you deny that even at last week’s power management summit that was true?

This is an exact, current, example of how the x86 model is pushed out to other architectures, rather the needs of various architectures being factored into a common abstraction that is implemented appopriately.
2006-04-18 9:00 am

nick
Don’t guess, you’re not very good at it.

You are talking about the main kernel git tree when you talk about how many linux ports there are. “the arm port” doesn’t exist in that tree, as I was trying to explain.

Yes it does. arch/arm/*, include/asm-arm/*. QED.

Duplication of code is by definition evidence of a lack of abstraction. Are you sure you understand abstraction?

No it doesn’t. On an abstract level, having code

duuplicated in 20 places in asm-*/atomic.h is no

different from having common code consolidated in

linux/atomic.h. Implementation in one case may be

cleaner, but abstraction is exactly the same.

atomic.h turns out not to be a terribly good abstraction. It give headaches on various platforms for which mimicing the interfaces it provides are expensive.

Umm no. There are a couple (parisc and sparc IIRC)

that need to use spinlocks. But if they wanted to

perform a similar operation in generic code without

atomic ops they’d need to use spinlocks anyway, thus

slowing down every architecture that can implement

them nicely.

Care to explain how you’d do it better?

It supports no more architectures from the git tree than NetBSD does.

It does support more.

Neither support as many architectures as Research Edition Unix did.

Which Research Edition Unix? Which tree? As far as I

know, the software to come out of bell labs supported

several PDPs, VAX, and a couple of other architectures.

Do you deny that the power management work in linux right now is aimed almost entirely at supporting laptops? Do you deny that even at last week’s power management summit that was true?

This has got nothing to do with forcing x86 features

onto other platforms. And plenty of powermanagement

work has been done in the ARM port and other embedded

platform ports, and on the powerpc port.

That laptops may be mostly x86 based does not mean

work on laptop support means forcing x86 model on

other architectures. Your logic is completely wrong.

This is an exact, current, example of how the x86 model is pushed out to other architectures, rather the needs of various architectures being factored into a common abstraction that is implemented appopriately.

No, it is not. As I said, powerpc, ARM, and embedded

folk are all interested in powersaving. They are being

pretty careful to ensure solutions fit their platforms

as well.
2006-04-18 9:30 am

Cloudy
The existence of an arm subdirectory in the git tree is not the same as being the arm port. None of the arm families are current in the mainline tree, not even ARM’s reference code.

On an abstract level, having code

duuplicated in 20 places in asm-*/atomic.h is no

different from having common code consolidated in

linux/atomic.h.

Except when you want to change the abstraction, or maintain the code, or verify that it really is all duplicated, or . . .

You don’t do spinlocks on parisc. it doesn’t have atomic operations.

Which Research Edition Unix? Which tree? As far as I

know, the software to come out of bell labs supported

several PDPs, VAX, and a couple of other architectures.

You do not know very far.

For example, you do not seem to know that Dennis Ritchie ported RE8 to the Cray X/MP.
2006-04-18 10:37 am

nick
The existence of an arm subdirectory in the git tree is not the same as being the arm port. None of the arm families are current in the mainline tree, not even ARM’s reference code.

Doesn’t matter whether they are current, whether there

are others or newer ones outside the mainline tree.

There is an ARM port in mainline, full stop. I’m

flabbergasted that you try to deny that.

Except when you want to change the abstraction, or maintain the code, or verify that it really is all duplicated, or . . .

That is an implementation detail plain and simple. It

has nothing to do with the abstraction. But I’m not

surprised you’re confused.

You don’t do spinlocks on parisc. it doesn’t have atomic operations.

Rubbish. It has the LDC instructions, which is an

atomic load and clear. It might surprise you to know

that this is how spinlocks are implemented in the

parisc port on Linux.

You do not know very far.

For example, you do not seem to know that Dennis Ritchie ported RE8 to the Cray X/MP.

I knew derivatives had been ported to many

architectures, I did not know it supported more than

Linux from a single tree. I’m yet to see your proof.
2006-04-18 8:31 am

nick
I have answered your question

No, after asking repeatedly and not getting straight

answers, you finally drag ACPI out of the mud and just

point to it and say “ACPI is the answer”.

Sorry no, that’s not an answer. I didn’t ask for you to

pick out some random likely-looking subsystem and point

to it.

I asked, specifically, what x86 hardware model is

forced upon other architectures by Linux. And I ask it

again.

Here is the structure of a typical correct answer:

“Architectures <???> are forced to follow the x86 model <???> in this manner <???>. A far better implementation

for these architectures would be <???>, as seen in

their native OS <???>”.
2006-04-18 8:55 am

Cloudy
Now you’re rewriting this thread as it happens.

I have answered your question.

You have even, in part, agreed with my answer, when you admitted that prior to 2.6, x86 features were forced onto other architectures.

I did not ‘drag ACPI out of the mud’. It is on my mind because the power management summit was last week, and it is a very fresh example of the comment I had made and you had agreed to: The x86 implementation came first. When an opportunity to expand to another system has occured, the kernel was not refactored to take into account the simularities and differences, but rather, the new system’s drivers, vm support, and so forth, were written to use the interfaces that were drawn up first for the x86 architecture.

This is how it happens. The x86 implementation of power management is coming first. The model is intended for laptops and is not suitable for handheld devices. Once the x86 model is widely deployed, handheld devices will continue to be relegated to non mainline trees, or they will have to use the x86 model.

If you do embedded development, than you know why the laptop power model is not suitable for an embedded device. If you do not, it’s not something I’m going to teach you here. In outline, though, laptop power management model consists of a simple di-graph with devices going on/off in limited groupings. You can’t imbed the power tree of a handheld device onto a di-graph.
2006-04-18 9:04 am

nick
Sorry, you’re rewriting the thread.

“pushing laptop power model on embedded devices”

is nothing close to

“pushing x86 hardware model on other architectures”

Secondly, ACPI to start with is not an x86 thing.

It is a monstrostity that Intel invented for x86

and ia64 (and other architectures if they wanted

it).

It may be found on most x86 systems out there now,

but nothing is being forced on other platforms.

My G5 is doing just fine with OF, it can control

fans, monitor temperature, throttle the CPU, standby

the system, etc etc.

So you’ll have to do far better than that.
2006-04-18 9:25 am

Cloudy
“ACPI to start with is not an x86 thing.

It is a monstrostity that Intel invented for x86[…]”

exactly how is something invented for x86 not an x86 thing?

I’m not the one ‘rewriting’ things, Nick. I haven’t said something and then denied I’ve said it, as you have. I haven’t implied someone was a crackhead and then pretended I wasn’t attacking them, as you have. I haven’t offered examples, only to demand they not be used.

I described what was being pushed onto other platforms: the di-graph model of power management. This is, by the way, over the objection of several embedded people. There was a lively discussion of it at the PM BoF last year.

You neglected to address that. In fact, you got rather abusive about that point. You’re not one of those people who gets abusive rather than concede a point, are you Nick?

This isn’t going to degenerate into one of those threads where I patiently answer your claims and you start calling me moron and troll?
2006-04-18 11:04 am

nick
“ACPI to start with is not an x86 thing.

It is a monstrostity that Intel invented for x86[…]”

exactly how is something invented for x86 not an x86 thing?

Nice selective quoting. I said it was invented for x86

and ia64. It is not an x86 specific thing.

And even if it was, it wouldn’t change the fact that my

G5 here with OF and no traces of ACPI works just fine,

and has decent PM support.

So your attempt to show that ACPI in Linux forces the

x86 model onto other architectures utterly falls on its

face.

This isn’t going to degenerate into one of those threads where I patiently answer your claims and you start calling me moron and troll?

No, because you haven’t been answering me. It has,

however, degenerated into one of those threads where

you are just blatantly making stuff up that is

completely wrong.

“there is no ARM port in Linux”, “parisc does not have

atomic instructions”, “you don’t do spinlocks on

parisc”, “ACPI forces x86 style power management on

other architectures”…
2006-04-18 6:01 am

nick
Yes, there are many source bases and variants of Unix. But I mentioned one specific source base: The Bell Labs RE source base, which has been ported to far more than 15-25 CPU architectures.

And since we are talking about building all from a single

tree, you are wrong. I explicitly said I was talking

about building from a single tree many times.

So, which branch of Bell Labs RE supports more

architectures than Linux? What is the source tree called?

Where is it / who owns it?

RE Unix was ported to more different classes of architectures than that, from a single source tree that included an entire operating system.

Why do you bring these straw men into it. I never said

it wasn’t, nor anything about “classes” of architectures.

Who are you arguing against?

You are trying to distract from my real question with

all this bluster: how does Linux “force the x86 model

on other architectures”?
2006-04-18 7:24 am

Cloudy
So, which branch of Bell Labs RE supports more

architectures than Linux? What is the source tree called?

Where is it / who owns it?

I don’t know where to begin with these questions. They are as if you had asked me “in what language is C written”?

How can you not know those answers and yet assert that Linux is more portable than other operating systems?

It is as if you had asserted that e is larger than any other irrational number and than asked me what this number ‘pi’ is.

Research Edition Unix from Bell Labs is Unix. The one, the original, the Dennis Ritchie et al written Unix. By the end of its life it had been ported to everything from small 16 bit machines through very large 64 bit multiprocessors. It has run on any imaginable endianness of machine. It has run on machines that did integer arithmetic in ones-complement rather than twos; on machines with ever imaginable endianness at the bit, byte, half word, word and dobule word level. On machines addressable in units of 1, 6, 8, 12, 16, 24, 32, and 64 bits, with address sizes ranging from 16 to 128 bits. On both harvard and von neumann memory architectures; on machines without caches; machines without priority levels; machines with vector functional units; machines without busses or interrupts. SMPS and NUMA machines.

Over the course of this exchange, you’ve demonstrated that you don’t know much about operating systems other than linux, that you’re not at all familiar with what the term hardware abstraction layer means to an OS designer, that you can’t even keep track of the points you’ve introduced, and now you’re starting to post rudeness.

I have patiently answered your questions about abstractions, portability, and how “the portability comes from a lot of people trying to force the x86 hardware model onto a lot of other architectures, not from any decent abstraction layer.” (At least when you quote me can you get what I say right?)

I have cited three examples of how this is done, in one case, an example that you originated. You did not even attempt to refute the examples, but only demand that I not use them.

So now it’s time to turn the table:

Why isn’t the PCI implementation in the linux kernel a good example of my observation above?

If there is a hardware abstraction layer in the linux kernel, where is it?

And why do you have to be rude to people who are trying to teach you?
2006-04-18 8:12 am

nick
BZZT! I’m asking about single source base. Do you know

what this means?

I quite well know what UNIX has been widely ported in

its many incarnations. I don’t know of a single source

tree that will support more architectures than Linux.

By all your blathering, it appears you don’t know of

one either.

I have patiently answered your questions about abstractions, portability, and how “the portability comes from a lot of people trying to force the x86 hardware model onto a lot of other architectures, not from any decent abstraction layer.” (At least when you quote me can you get what I say right?)

Umm yeah, heard of paraphrasing?

But no, you haven’t answered anything. You put words

into my mouth so you can conveniently “answer” or

“dispute” them. That is all.

Why isn’t the PCI implementation in the linux kernel a good example of my observation above?

Which observation? Can you please state your questions

in a coherent manner so I can answer them?

If there is a hardware abstraction layer in the linux kernel, where is it?

I gave you plenty of examples already but you close

your eyes and and say they’re not abstraction layers.

But OK, the CPU context abstraction I just illustrated

in my reply to Gundewo.

And why do you have to be rude to people who are trying to teach you?

Because they think they are teaching me.
2006-04-18 8:30 am

Cloudy
BZZT! I’m asking about single source base. Do you know what this means?

As you know I do, or you would have quoted the lines in which I named the single source base.

I quite well know what UNIX has been widely ported in its many incarnations.

But apparently, you don’t know what Research Edition Unix is, or how widely that single source base was ported.

I don’t know of a single source tree that will support more architectures than Linux.[i]

argumentum ad ignorantiam

[i]By all your blathering, it appears you don’t know of

one either.

Which part of “Research Edition 8” are you having trouble understanding?

CPU context abstraction is an example of an abstraction. It’s not an abstraction layer.

I am teaching you. Whether you learn or not depends on whether you open your mind or not. You are showing some small signs of learning, though. You have admitted that prior to 2.6 Linux did indeed consist of porting done by enforcing an x86 model on other systems.

Once you understand the power management example, you will see that the mechanism still plays out the same way.
2006-04-18 8:37 am

nick
But apparently, you don’t know what Research Edition Unix is, or how widely that single source base was ported.

Apparently you think that the single source base can

now run on every architecture which everyone ported it

to.

To use your own example “these are all external trees,

with their own platform support, outside the main

source tree”.

CPU context abstraction is an example of an abstraction. It’s not an abstraction layer.

Whatever, moron.

Once you understand the power management example, you will see that the mechanism still plays out the same way.

Power management example. Nice one. That’s the

worst example I’ve ever seen.

I overestimated you, I at least thought you would have

brought out the (invalid, but slightly more convincing)

“but it uses page tables” argument.
2006-04-18 9:02 am

Cloudy
Ok, one last try, The single source tree of research edition Unix, as it existed at Bell Labs, and was maintained by Dennis Ritchie’s research group, would build Unix that would run on all of the architectures I mentioned.

I’m not talking about a bunch of different companies getting copies of System V and porting it to a bunch of different machines.

I’m talking about one source tree, maintained by one small group of people at AT&T. From which one could compile for everything from a pdp-11 through a Cray supercomputer.

That was the most portable operating system ever and the most widely ported, by any measure.

At the start of this discussion it would have surprised me that you didn’t know that, but by the time you’ve gotten to “whatever, moron”, it no longer even surprises me that you can’t remember what you have or haven’t claimed in the discussion.
2006-04-18 9:07 am

nick
Ok, one last try, The single source tree of research edition Unix, as it existed at Bell Labs, and was maintained by Dennis Ritchie’s research group, would build Unix that would run on all of the architectures I mentioned.

OK, indeed I didn’t know that. Can you point me to some

proof? Say a list of architectures that it ran on?

At the start of this discussion it would have surprised me that you didn’t know that, but by the time you’ve gotten to “whatever, moron”, it no longer even surprises me that you can’t remember what you have or haven’t claimed in the discussion.

Don’t worry, I reserve that phrase for people who

are perfectly happy to redefine words and play

semantics games whenever they would otherwise lose

an argument
2006-04-18 6:27 am

Gunderwo
“Well it has many abstraction layers so you’re just

plain wrong. I don’t know what you think an abstraction

layer is though… maybe you should put down the crack

pipe before posting again.”

Just because he didn’t explain his statement as clearly as you may have liked is no reason for a personal attack of this nature. I’ve read all of Cloudy’s posts and it seems he knows about what he’s speaking. It’s comments like this that cause people to stop posting and I commend Cloudy for ignoring your attack and thoughtfully answering your questions and comments. By the way I found the majority of your post to be well thought out and helped to further the discussion nicely until your final comment.

So in the future it would be appreciated if you could keep your comments on topic and not resort to attacking one of the few people here who seems to know what he’s talking about. I have no problem with you disagreeing with his assertions, but please try and adress your comments in a more respectful manner.

Greg
2006-04-18 7:48 am

nick
So in the future it would be appreciated if you could keep your comments on topic and not resort to attacking one of the few people here who seems to know what he’s talking about. I have no problem with you disagreeing with his assertions, but please try and adress your comments in a more respectful manner.

I happen to be not entirely clueless either. So I

started by politely and clearly asking him to back

up his claim that:

“portability comes from a lot of people trying to force

the x86 hardware model onto a lot of other architectures”

And as you can see, I’ve asked him about 10 times now

and he cannot answer. I wouldn’t mind this so much if

he’d just admit he was wrong and withdraw that offensive

and baseless statement, but no he keeps trying to

attack me and make up straw men to distract from the

main point.

Asking someone to put down the crack pipe isn’t a

personal attack. It is just a phrase used to say that

you think the poster is completely wrong.

And he is, I can provide any number counter-examples of

abstraction layers in Linux as you’d like (problem with

Cloudy is that he just denies they are abstraction

layers and thinks he’s won the argument!!).

Here’s one: the architecture specific CPU context is

abstracted away from generic code (like the scheduler)

with architecture-private structures attached to

threads, and with the “switch_to()” function to switch

between contexts. See around line 1652 in kernel/sched.c

http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.gi…

to see how the generic scheduler performs a full CPU

context switch with about a dozen lines of code. And

nothing x86 specific about that at all.

If you don’t think that’s an abstraction… you need to

put down the crack pipe
2006-04-18 8:20 am

Gunderwo
I never meant to imply that you didn’t know what you are speaking about. Both of you certainly know more than the majority of posters here, myself included. I most certainly appreciate the time you have both taken to try and argue your respective points.

I think the main reason you cannot find some common ground is that you are speaking of different things when you speak of x86 specific. Cloudy seems to have taken the view of x86 of more of a platform on which desktop computers have grown from. While your definition is somewhat more correct in addressing processor specific details. Of course I may be way off too.

My original comment was in regards to your second (maybe 3rd post). At that time you had asked the question only twice and as I said, I think the disatisfaction you feel towards cloudy’s answers have more to do with an ambiguos definition of “x86 specific” Cloudy’s part.

Nonetheless no matter whether saying “Put down the crack Pipe” is a commonly used saying or what, it implies that he’s a crack head and I find it offensive. There’s no reason that we can’t agree to disagree and try and civily address the argument at hand.

While I certainly respect your opinions and am of the opinion that you are a very intelligent individual. I personally don’t have the necessary knowledge to even comment on the correctness of either of your assertions, I’m here to learn. But I do think you should attempt to be less rude and certainly refrain from anything that may personally offend someone who also has some excellent opinions, be they valid or not.

Greg
2006-04-18 8:33 am

Cloudy
I think the main reason you cannot find some common ground is that you are speaking of different things when you speak of x86 specific. Cloudy seems to have taken the view of x86 of more of a platform on which desktop computers have grown from.

That is correct. I’ve been speaking of abstraction levels and models, whereas “Nick” seems to be caught up in implementation details.

I should have recognized this sooner.
2006-04-18 8:40 am

nick
I haven’t mentioned any implementation details anywhere.

And you have not mentioned any x86 “abstraction level

or model” that Linux forces upon other architectures.

Except for your singularly idiotic ACPI example.

I should have recognized sooner that you are a troll

and have no idea what you’re really blathering about.
2006-04-20 4:02 pm

popper
it seems to have gone off the boil as it were, but i

would have loved to see what carl (AmigaOS) and Dan (qnxRTP) make of all the original thread….
2006-04-18 8:22 am

Cloudy
I happen to be not entirely clueless either.

Not entirely, no.

And as you can see, I’ve asked him about 10 times now and he cannot answer.[i]

I have answered in several different ways. You have dismissed answers without refuting them. You have asserted that you did not make comments you had in fact made. You have even come, in part, to agree with my answer, at least with respect to Linux before 2.6

[i]I wouldn’t mind this so much if he’d just admit he was wrong and withdraw that offensive and baseless statement,

Why do you find a description of a process to be “offensive”?

The statement is not baseless. It is a description of how Linux has been ported to new systems over its history.

but no he keeps trying to attack me and make up straw men to distract from the main point.

I have not attacked you although I have vigourously questioned your claims. To disagree with someone and state way is not to attack them.

Asking someone to put down the crack pipe isn’t a

personal attack. It is just a phrase used to say that

you think the poster is completely wrong.

It is a personal attack. It says more than “you are wrong”. It makes a comment about the poster’s state of mind, and how that state was achieved. It personalizes the discussion and takes it out of the realm of ideas and into the realm of individuals.

And he is, I can provide any number counter-examples of abstraction layers in Linux as you’d like

I’m still waiting. Nor have I simply denied that the examples you have given are abstraction layers. I have, in fact explained why they are not. The PCI code does not abstract the idea of a bus, it implements an interface to a PCI bus.

You are correct that the architecture specific CPU context is an example of an abstraction. It is the first you’ve provided. It is not, however, an example of an abstraction layer, which, partially, explains why sched.c is full of code to deal with things like MP processor affinity.

2006-04-17 1:27 am

jonsmirl
The DRM drivers are not an abstraction layer. The interface from each driver is different depending on the chip family. So I believe DRM/mesa does qualify as an exokernel design.

The layers look like this. libGL, common API for all to use. DRI, user space code that is specific to the chip family being supported. DRI and libGL are usually linked into the same shared object since most people only have one video card. This does not have to be done and it wasn’t done a few years ago.

Finally there is DRM which is the kernel space driver. There are about 30 common DRM ioctls which control standard things like file handles and enabling interrupts. But you can’t draw anything with the common ioctls. Next there are from 10 to 50 chip specific ioctls which control the meat of the hardware.

libGL (mesa) implements all drawing functions in software. The DRI libraries replace libGL functions with hardware accelerated ones if the hardware is capable. If a function can’t be accelerated it falls back to the software implementation since it hasn’t been replaced.

The ATI radeon family is a special case. There is one radeon DRM module since the core of all the radeon chips is very similar. But there are three DRI implementations R100, R200 and R300. The DRM driver sets up the DMA queue, but the commands that user space writes to the queues is different in each DRI driver.

Security is a problem. The DRM drivers need to scan the DMA commands to make sure no one is using the DMA hardware to peek at memory not owned by the calling process. Remember that DMA works on physical memory pages and can scan any of them. If you can get to arbitrary memory you can always gain root priv. To address this problem AMD is doing IOMMUs. Hypervisors hit this problem too.

2006-04-16 10:56 pm

mOOzilla
Thats a good one, everythings a file. Thats VERY abstract.

2006-04-16 11:12 pm

Cloudy
“Everything’s a file” is more Pike than Ritchie. Ritchie knew when to stop

Ritchie’s big contribution to modularity was pipelines. There’s a need to get back to that sort of decoupling.

Here’s a minimum set of abstractions:

byte-sequence (memory abstraction)

byte-source/sink (i/o abstraction)

thread (control flow abstraction)

time (because otherwise nothing happens)

uid (user abstraction)

permission (security abstraction, a mapping of booleans on the 4-plex {uid, time, object, operations})

process (a collection consisting of a uid, one or more threads, and one or more permissions)

All OS-es must provide at least this set of abstractions and one or more name spaces for manipulating them.

Ritchie’s brilliance included designing a compact set of orthogonal abstractions and a very small number of name spaces.

You’ll find that most of the mistakes in Unix consist of poor choices of or inappropriate exposure of name spaces. The most obivous of which is the exposure of PID.

The art of OS design is filling in the collection of operations allowed on the abstractions.

2006-04-17 1:05 am

Unbeliever
You never told us why *you* like them. You told us what they are (in very… general terms, to say the least), and you told us what they do, but none of the things you said support your reasons for liking microkernels.

Either that, or there’s something to this poorly-researched article that I’m not quite grasping.

Perhaps.
2006-04-17 1:17 am

TADavis
I’m shopping around for a name besides “kernel” for my operating system’s lower memory module. I don’t use priviledge features of the CPU, so “Kerenl” leads to confusion. The boot loader which does a BIOS call can only access the bottom meg of RAM initially, so my “Kernel” has a size limit. It must accomplish one thing–enable full memory and load the rest. It needs support for disk access and knowledge of file systems. I don’t want redundency so this low memory “kernel” has the real file access routines. Soon, it’ll need USB support for flash booting.

This is an operating system for home hobbiests who prefer total access to their machine over security. It’s programming heaven when you’re free to turn-off interrupts or whatever as you see fit. Nothing wrong with it if you know what you’re doing.

Anyway, there are only two binary modules–the “kernel” and the compiler. Everything else is compiled each time you boot–still boots in like 3 seconds. The compiler is blazing fast. I thought some people might be confused because the “kernel” binary is so small, yet there is so much source code. “Microkernel” might be how someone described it but it’s not very accurate.

http://www.justrighteous.org

I recently changed the name of a special task which is the parent of all other tasks from “ROOT” to “ADAM” because there was confusion. “ROOT” scared people, I heard, which is pretty funny since everything is root!!

2006-04-17 1:27 am

siride
Somebody already made an OS like that…it’s called DOS.

2006-04-17 1:33 am

TADavis
My operating system is for programmers who know what they’re doing and don’t want to be annoyed by road blocks to accessing their own machine. When will people see that a home operating system doesn’t have to be the same as a server or multiuser operating system. It’s criminal how many man hours have been wasted fiddling with file permissions and stuff when there’s no need.

2006-04-17 1:50 am

Cloudy
I had this very debate with RMS about 20 years ago. It’s interesting how it keeps resurfacing.

The freedom to do whatever you want to your system is very appealing.

But there are compelling reasons to have programs broken up into separate bits and use the virtual memory hardware to protect the bits from each other: we *all* make mistakes.

Believe me, the time wasted tracking down stray pointer references and other similar bugs that cause subsystem A to corrupt subsystem B is far more criminal than the time wasted fiddling protection states.
2006-04-17 4:08 am

siride
I’m sure the people who suffer from malware and spyware would love to hear how having even less security and protection would solve their woes.

2006-04-17 1:31 am

Cloudy
After a *very* quick look at your web site, I’d say you’ve got a boot loader on your hands and that you’re implementing the Nth in an infinite series of single-addres-space single-protection-domain multi-tasking DOS replacements.

But all that matters is that you’re having fun with it. I encourage you to keep at it.

2006-04-17 2:36 am

ma_d
Thom makes a great point about Microkernels, the part that is absolutely important to the security of your system (a processor and memory) is very small and thus can be bug free (or amazingly close to it).

A kernel like Linux has millions of lines of code which can all do anything they like.

So here-in lies the crux: Are drivers which can’t harm your system better than ones that can?

Well, the question that comes to mind is: Does your application need to driver to be as stable as the OS?

I’m willing to bet that the majority of your devices you consider their working to be crucial. Of course, you need your network card working, and disk drives, and your video card (well, maybe it can glitch but all out failures would be a nightmare to productivity on a desktop).

So, the system can stay up and not damage things when your video driver does something stupid. Great. Can you afford that occurance? No. So what’s the point? You still have to write the same quality of video driver?

The largest benefactor I see in this situation is actually developers: They have a nice testing platform that slaps them quick when they make a mistake.

My design hat loves the idea of a microkernel. It’s such a nice thought to look at. But my design hat also tells me it just might be a giant waste of time, computer and human alike.

It’s probable that a mix is the best solution. I’m no kernel expert, but it seems to be that things like drivers just have to be as close to perfect as possible, but things like packet routing might benefit from protection.

2006-04-17 5:59 am

nick
How can microkernel drivers “not harm you system”?!?

A buggy disk driver can hose your disks. A buggy

filesystem driver can corrupt your filesystem. A

buggy memory manager can corrupt memory. A buggy

network driver can corrupt network traffic.

Aside from corruption, their failure obviously will

result in the denial of usually critical system

resources like memory, filesystem, or IO device access.

So I don’t see how microkernel proponents can be

lulled into this false sense of security.

2006-04-17 8:59 am

AndyZ
Corrupting a filesystem is one point of failure, but a failed “server” (as in driver) can be restarted. If its not possible for the microkernel to restart ie the network or filesystem driver again, then how could they be started in the first place(at boottime)?

AndyZ

2006-04-17 10:11 am

nick
I didn’t say it wasn’t possible. Neither would it be

impossible to do a similar reinitialisation of some

subsystem or driver in a monolithic kernel.

I don’t try to claim there are no advantages of a

microkernel — obviouly there are some otherwise even

their most stubborn supporters would give up on the

idea in the face of all their disadvantages.

But this (immunity of the system / data from bugs) is

not one of them.

2006-04-17 12:28 pm

Brendan
Consider a dodgy driver or service that occasionally writes to random addresses.

In a traditional monolithic system, the driver/service would be implemented as part of the kernel and can trash anything that’s running on the computer – nothing will stop if from continuing to trash things, and nothing will help to detect which driver or service is faulty.

On a basic micro-kernel the driver/service can’t effect anything else in the system, and sooner or later it’d generate a page fault and be terminated. This makes it much easier to find which driver or piece of software was faulty, and means that damage is limited.

In this case, you’re still partially screwed because everything that was relying on that driver or service will have problems when that driver/service is terminated. This isn’t always a problem though (it depends on what died) – for example, if the driver for the sound card dies then no-one will care much. If the video driver dies then the local user might get annoyed, but you could still login via. network and things like databases and web servers won’t be effected.

The more advanced a micro-kernel is the more systems it will have in place to handle failures.

For example, if the video driver dies the OS might tell the GUI about it, try to download/install an updated driver, then restart the video driver and eventually tell the GUI that the video is back up and running. The user might lose video for 3 seconds or something but they can still keep working afterwards (and there’d hopefully be an explanation in the system logs for the system administrators to worry about).

Another way would be to use “redundancy”. For example, have one swap partition on “/dev/hda3” and another on “/dev/hdc3” with 2 seperate disk drivers. Writes go to both disk drivers, but reads come from the least loaded disk driver. In this case the system would be able to handle the failure of one swap partition or disk driver (but not both). With fast enough networking, maybe keeping a redundant copy of swap space on another computer is an option..

The point is that for monolithic kernels you don’t have these options – if anything in kernel space dies you have to assume that everything in kernel space has become unreliable, and rebooting is the only reliable option (if the code to do a kernel panic and reboot hasn’t been trashed too).

Most developers of monolithc systems will say that it’s easier to make their drivers and services bug free than it is to implement systems to recover from failures. They may be right, but it might be “wishful thinking” too…

2006-04-17 12:57 pm

nick
What if the soundcard driver gets corrupted and starts

DMA to a random page of memory that was actually some

filesystem’s pagecache[*]?

What if a driver goes haywire and starts sending the

wrong IPC messages down the pipe?

Another way would be to use “redundancy”. For example, have one swap partition on “/dev/hda3” and another on “/dev/hdc3” with 2 seperate disk drivers. Writes go to both disk drivers, but reads come from the least loaded disk driver. In this case the system would be able to handle the failure of one swap partition or disk driver (but not both). With fast enough networking, maybe keeping a redundant copy of swap space on another computer is an option..

I don’t think so. You have to have at least 3 devices

and 3 different drivers and perform checksumming across

all data that comes out of them if you really want to

be able to discard invalid results from a single

driver. Or you could possibly store checksums on disk,

but if you don’t trust a single driver…

I think in general it would be far better to go with

RAID, or a redundant cluster wouldn’t it?

The point is that for monolithic kernels you don’t have these options – if anything in kernel space dies you have to assume that everything in kernel space has become unreliable, and rebooting is the only reliable option (if the code to do a kernel panic and reboot hasn’t been trashed too).

A microkernel can fail too, end of story. If you need

really high availability, you need failover clusters.

And within a single machine, I happen to think

hypervisor/exokernel + many monolithic kernels is a

much nicer solution than a microkernel.

[*] Perhaps you might have DMA services in the kernel

and verify all DMA requests are going to/from

driver-local pages, yet more overhead… does any

microkernel do this?

2006-04-17 1:39 pm

Brendan
Hi,

What if the soundcard driver gets corrupted and starts

DMA to a random page of memory that was actually some

filesystem’s pagecache[*]?

Then you’re screwed regardless of what you do. PCI bus mastering is the only thing a micro-kernel can’t protect against (I’ve never found anything that can protected against it at least, but AMD’s virtualization hardware might help – I haven’t looked into it so I’m not sure). For the ISA DMA controllers it’s easy to force drivers to use a kernel API where decent checking can be done (if you haven’t guessed, I’m more for slightly largish micro-kernels than for minimalistic nano-kernels).

What if a driver goes haywire and starts sending the wrong IPC messages down the pipe?

It’s standard programming practive (or at least it should be) to always check input parameters before doing anything, especially if these input parameters come from elsewhere (e.g. function parameters, command line arguments, message contents, environment variables, etc). All “message receivers” should also be able to check who sent the message. If the message still passes all of this, then the receiver might do something that isn’t desired, but it’s very unlikely this would lead to major problems.

I don’t think so. You have to have at least 3 devices

and 3 different drivers and perform checksumming across

all data that comes out of them if you really want to

be able to discard invalid results from a single

driver. Or you could possibly store checksums on disk,

but if you don’t trust a single driver…

You are right – 2 redundant drivers/services can recover from detectable failures, but 3 are required to detect some types of failure. For a failure like completely crashing (page fault, general protection fault, etc) 2 drivers/services are enough, but for checksumming you need at least 3.

I think in general it would be far better to go with

RAID, or a redundant cluster wouldn’t it?

Regardless of how good it is, hardware RAID has at least 2 single points of failure (the device driver and the controller). Having entire redundant computers (or a redundant cluster) is an option for all types of kernels (but it’s not so cheap).

A microkernel can fail too, end of story. If you need

really high availability, you need failover clusters.

Of course – but it’s easier to find/fix bugs in something small, that isn’t cluttered full of every device driver imaginable.

And within a single machine, I happen to think

hypervisor/exokernel + many monolithic kernels is a

much nicer solution than a microkernel.

You mean like running 8 versions of Vista on the same machine so that you can edit text files without worrying about messing up your web server? Hardware manufacturers would love the idea (just think of the extra sales)!

2006-04-17 8:25 pm

Cloudy
The point is that for monolithic kernels you don’t have these options – if anything in kernel space dies you have to assume that everything in kernel space has become unreliable, and rebooting is the only reliable option (if the code to do a kernel panic and reboot hasn’t been trashed too).

This is true in most implementations, but it is a feature of the implementation rather than a necessity of the system. It is, given reasonable VM design, possible to make the user/supervisor transition distinct from the addressability distinction.

You can have a ‘monolithic’ kernel in the user/supervisor sense — meaning that the whole thing is compiled as a unit and all runs in supervisor mode — without having to have one in the memory addressability sense — meaning that various subsystems can only access what they’re allowed access to.

2006-04-18 8:43 am

Brendan
This is true in most implementations, but it is a feature of the implementation rather than a necessity of the system. It is, given reasonable VM design, possible to make the user/supervisor transition distinct from the addressability distinction.

Unfortunately, all VM implementations are restricted by what the hardware provides. For (32 bit) 80×86 this means paging and segmentation. Therefore, to seperate access from addressability you’d either need to modify the permission bits in the paging structures during each transition (very time consuming) or use segmentation.

While segmentation could help, it isn’t a complete solution – it can provide a distinction between 4 privilege levels, but code at higher privilege levels can trash anything at lower privilege levels (e.g. drivers could trash user space and each other, but not the kernel itself). Of course for portability (even to 64 bit 80×86) you can’t rely on segmentation anyway.

I guess it would be possible to design hardware to overcome this problem, but IMHO it’d make more sense to make context switching faster. For e.g. have “CR3 tagged” TLB entries so that address space switches aren’t so expensive, which would benefit all kernels to varying degrees and could be added to 80×86 without requiring changes to any existing software.

2006-04-18 4:32 am

ma_d
I did define system as memory+cpu didn’t I?

2006-04-17 1:56 pm

axilmar
I think that the concept of ‘kernel’ is a mistake and it exists only because CPUs are not sophisticated enough to provide proper isolation between software components. The discussion between monolithic kernels/modular kernels/microkernels would not take place if CPUs allowed us to provide only the necessary connections between components.

The reason microkernels exist is because it makes it easier to construct an O/S from components that are physically isolated and can not affect each other. But the physical barriers between processes make microkernels not as fast as ‘monolithic’ kernels.

I think the solution lies in between: CPUs should provide ways of linking components together without allowing undesirable interactions. For example, a graphics driver component should only be able to access the video card and the application’s memory and not all the hardware and memory; a hard disk I/O component should only be able to access the hardware I/O subsystem and app’s memory buffers and not anything else etc.

I think the above could be achieved with a solution called ‘memory map’. Just like a process has a page map, a software component should also have a memory map, i.e. a range of addresses that it is allowed to access. Inter-component communication shall be possible by mapping one part of a component to another component’s memory map. By calling a routine of another component, the current memory map would be replaced with the memory map of the current component; upon return, the previous memory map would be restored. The advantage of this solution would be tremendous flexibility in componentization of an O/S:

1) no performance costs like in a microkernel.

2) no safety/security problems like in a non-microkernel.

Memory maps would make traditional kernels reduntant: all that would be needed is a small program that would coordinate memory maps. The rest of the functionality would be placed in components, the memory maps of which are built by the small program mentioned above.

2006-04-17 2:16 pm

Brendan
I think the above could be achieved with a solution called ‘memory map’. Just like a process has a page map, a software component should also have a memory map, i.e. a range of addresses that it is allowed to access.

Accessing RAM is slow, but accessing several levels of the paging structures and severel levels of “memory map” structures and the final physical RAM location (every time the CPU accesses RAM) would be unbearable. That’s why CPUs cache the paging structures (TLB), and that’s why they’d need to cache the “memory map structures”. If you’re going to change the “memory map” structures every time the CPU changes between software components then you’ll lose the benefits of this caching and end up with “memory map structure cache” misses.

You might aswell just change the paging structures instead (which is what micro-kernels do)…

2006-04-17 8:29 pm

Cloudy
The trick is to use systems on which the transition between protection domains is assisted by the VM hardware rather than hindered. The PA-RISC VM model, for example, makes transitions between memory protection domains very fast, nearly as fast as procedure calls that don’t transition.

This is what hardware designers and OS designers keep missing. The VM system should make accessability and addressability of memory distinct so as implementation of protection domains in this way is cheap.

We at HP tried to convince Intel to do this in Itanium, but I didn’t stick around long enough to see if it made it.

2006-04-17 3:07 pm

abraxas
Why was there no mention of the biggest problems microkernels face, namely message passing synchronicity? The inherent complexity of a microkernel still outweighs the benefits of one, for most systems. This is mostly due to issues concerning message passing and the manner in which messages are given priority.

2006-04-17 8:20 pm

Cloudy
You can easily solve the synchronicity problem by using queues and using call backs.

The major problem that message passing has isn’t synchronicity, it’s overhead. Extra context switches, and the cost of marshalling are significant, even in highly optimized message passing systems.

2006-04-19 3:36 pm

abraxas
I think you’re missing my point. The overhead of a microkernel is caused mostly by mechanisms that control synchronicity. It’s not necessarily that overhead is the specific problem, it is that it seems the only way to solve the synchronicity problem is to introduce a lot of context switching which means a lot of overhead. Overhead is never a problem in itself, it is just a symptom of a problem.

2006-04-18 5:49 am

proforma
Because Linux does not have a microkernel a lot of people are attacking the idea. Because of course if Linux doesn’t have it, it must not be good to begin with.

Why not be better than that?

This is the reason I really don’t care about this website/forum so much. It’s like the loser slashdot crowd bored to death so they have to come here to put everything else in the universe down but Linux.

Honestly, I don’t care much for linux, but I care even less for their hippy, everything should be free fanboys.
2006-04-18 5:44 pm

j-s-h
Thom Thom Thom, how many microkernels did you program today?