After ten weeks of development Linus Torvalds has announced the release of Linux kernel 3.9. The latest version of the kernel now has a device mapper target which allows a user to setup an SSD as a cache for hard disks to boost disk performance under load. There’s also kernel support for multiple processes waiting for requests on the same port, a feature which will allow it to distribute server work better across multiple CPU cores. KVM virtualisation is now available on ARM processors and RAID 5 and 6 support has been added to Btrfs’s existing RAID 0 and 1 handling. Linux 3.9 also has a number of new and improved drivers which means the kernel now supports the graphics cores in AMD’s next generation of APUs and also works with the high-speed 802.11ac Wi-Fi chips which will likely appear in Intel’s next mobile platform. Read more about new features in What’s new in Linux 3.9.

 Guest post by
							Guest post by 
Hmmm, wondering if everything cut down as services, micro-kernel style, would run better and/or more securely.
Andrew, any input on the subject ?
Kochise
What do you mean by “better”?
Micro-kernel is mostly run-time distinction; you can write a kernel with well defined modules that have defined interfaces for communication, but if they run in the same address space and thus gain efficiency they are will be less secure, but if the modules have their own address spaces and use message passing they gain security but lose efficiency. Linux decided a long time ago to share a kernel address space.
I know that :
http://en.wikipedia.org/wiki/Tanenbaum–Torvalds_debate
The result is here, compare with the EU funded Minix3 OS :
http://www.minix3.org/
Now which one is the most usable ? Which is the most portable ? FPU support was added to Minix3 through a Google Summer of code, have no extension support (MMX, SSE, …) and do not run on ARM.
Linux, on the other hand…
Kochise
Edited 2013-04-29 18:27 UTC
Hi,
You’re comparing something designed for students to learn from to a commercial OS backed by large companies (RedHat, Novell, Intel, IBM, Oracle, Google, etc)? Are you the sort of person that does a victory lap after they’ve stepped on a small bug?
– Brendan
The “student project” got 2 million euros from EU for what ? Run what kind of stuff with “more security” ? Students’ puppy projects ? Nope, I guess ATM and more commercial things. I bet EU expect a return on investment some days. Just telling…
Kochise
Right on… that investment on basic research would have better spent, by the EU, in funding a crack team of top scientist who can help find the point of your post.
Having 2 millions euros or more invested, some people working on it full time, supposed benefits from the ukernel architechture, but still so retarded (anew fpu support, no port while less tie to x86) I don’t really get it. Minix started before Linux, was commercially supported (through the book purchase) and not just a puppy project from a student, but from a professor with more insight on the subject, considering how many books Andrew wrote.
Comparing Minix 3.2.1 to Linux 3.9… nothing to fell badly concerned though
Kochise
Linux has lucky to have a few companies putting money on it as a way to get a free UNIX compatible OS, without having to pay lots of money for a commercial implementation.
Minix at the time was just a research OS, and suffered from the micro-kernel stigma thanks to the Mach failure. Many in the industry thought it was not worth as OS architecture.
Fast forward to 2013 and you have Symbian and QNX as successful micro-kernel OS, Darwin and Windows using hybrid kernels, OSs running on top of micro-hypervisors
and the public in general having more awareness in high availability and security issues.
That is the main reason why Minix got the EU funding and now some companies are looking into it.
Nope, Minix was close licensed when it was the right time to unleash the beast. Linux was free and “copyleft”. I am, deep inside me, for the ukernel trend, but Andrew was unaware of the future where x86 rules them all. Linux was on the rights tracks from the beginning :
1- Open source
2- copyleft
3- x86
4- his original coder spending time maintening and evoluting
Minix was just the opposite, Andrew admitting it was just a professor project, yet with higher goals (portability, stability, security through the use of the ukernel architecture).
Sadly, the ACK toolchain is a mess to use, Minix is hard to maintain and doesn’t run well on most VM, thus so few people gets invested into, and that sadden me. Inherently, Minix IS superior, but stalled for so long.
Nowadays Linux is everywhere Minix were supposed to be. Was ever Minix ported to the “64M SPARCstation-5” after all ?
Kochise
Might be the case then. I don’t remember.
The ACK toolchain is long dead. Minix now uses the NetBSD userland.
The wrong is done, many people got away from Minix because ACK was so unfriendly and non standard compliant. Sure now Minix is more BSDish, but would it makes NetBSD people comes to Minix, while NetBSD is almost a Linux equal (portability, hardware support, etc…) ?
Kochise
Hard to tell.
I play with it every now and then, since the 3.0 release.
But on the other hand I am OS agnostic. I am coding since 1986, already used lots of disparate systems, and each system has plus and minus.
My religious OS days are long gone.
This!
No, the EU invested in Minix because it is a basic research project. Most of that money goes to finance infrastructure and grad student/researcher salaries in the institutions involved.
Also if by “mach stigma” you mean it being one of the most influential software research projects in the last 3 decades, then yeah.
Hi,
Let’s take a more accurate view of it. Linux was first released in 1991, when people were looking for a free alternative to commercial Unix. Due to very fortunate timing (and no technical reason whatsoever), a large number of people (including large companies) volunteered a massive amount of time and were able to convert the original dodgy/crappy kernel into something that was actually usable/stable.
Minix 1, 1.5 and 2 were intended as a tool for teaching and were never meant to be used as a serious OS. Minix 3 is the first version that was intended as something more (but still leans towards teaching and research rather than actual use). It was released in 2005 (about 14 years after the first release of Linux) at a time when at least 3 good free Unix clones already existed, and therefore didn’t attract a large number of volunteers to make it good.
The only thing we can really say from this comparison is that very fortunate timing is far more important than anything else. It doesn’t say anything about monolithic vs. micro-kernel. If Minix 3 was released in 1991 and Linux was released in 2005, then I doubt anyone would know what Linux was.
– Brendan
Edited 2013-04-30 11:02 UTC
Brendan,
“The only thing we can really say from this comparison is that very fortunate timing is far more important than anything else. It doesn’t say anything about monolithic vs. micro-kernel. If Minix 3 was released in 1991 and Linux was released in 2005, then I doubt anyone would know what Linux was.”
Very true, timing was everything. The same is even true of the commercial players as well. In early computing history, there were many competitors. Over time they consolidate and fall to the point were we only have a few options. For better or worse, it would take insane loads of money to budge the current market leaders and get consumers to discard their collective investments in incumbent technologies.
Userspace/Kernel context switches used to be much more expensive, so that may have been a historical factor in microkernels pulling ahead. As CPUs have evolved, this should eliminate the original monolithic kernel motivation, but it’s stuck around because alternatives have been marginalized in the market.
http://kerneltrap.org/node/531
(Anyone having more recent benchmarks?)
It’s funny that whenever I’ve talked about being able to write operating systems to less-technical people, many automatically equate that to being filthy rich and they don’t realize how many of us there are who struggle to find any work on OS tech. We would be just as good, but we’re too late.
Just what I’ve said in another post :
http://www.osnews.com/thread?560129
So true, my case.
Kochise
Hi,
It’s not the context switches between user space and kernel that hurt micro-kernels; it’s context switches between processes (e.g. drivers, etc).
But it’s not really the context switches between processes that hurt micro-kernels; it’s the way that synchronous IPC requires so many of these context switches. E.g. sender blocks (causing task switch to receiver) then receiver replies (causing task switch back).
But it’s not really the IPC that hurts micro-kernels; it’s APIs that are designed to require “synchronous behaviour”. If the APIs were different you could use asynchronous messaging (e.g. where a message gets put onto the receiver’s queue without requiring any task switching, and task switches don’t occur as frequently).
But it’s not really the APIs that are the problem (it’s easy to implement completely different APIs); it’s existing software (applications, etc) that are designed to expect the “synchronous behaviour” from things like the standard C library functions.
To fix that problem; you’d have to design libraries, APIs, etc to suit; and redesign/rewrite all applications to use those new libraries, APIs, etc.
Of course this is a lot of work – it’s no surprise that a lot of micro-kernels (Minix, L4, Hurd) failed to try. The end result is benchmarks that say applications that use APIs/libraries designed for monolithic kernels perform better when run on the monolithic kernels (and perform worse on “micro-kernel trying to pretend to be monolithic”).
– Brendan
Edited 2013-04-30 17:42 UTC
Brendan,
“It’s not the context switches between user space and kernel that hurt micro-kernels; it’s context switches between processes (e.g. drivers, etc).”
In today’s operating systems, don’t userspace context switches need to go through kernel space context switches?
“But it’s not really the context switches between processes that hurt micro-kernels; it’s the way that synchronous IPC requires so many of these context switches. E.g. sender blocks (causing task switch to receiver) then receiver replies (causing task switch back).”
Still, if the context switch were “free”, I think it’d help take microkernels out of the shadows. IPC doesn’t have to be expensive, but we’d have to use it differently than the synchronous call & block pattern (like you said). I was always a fan of asynchronous batch messaging like that used by mainframes. We think of them like dinosaurs, but they did an inspirational job of breaking problems down into elements that could scale up very easily. Modern software design doesn’t do justice to the software efficiency that earlier computers demanded.
“Of course this is a lot of work – it’s no surprise that a lot of micro-kernels (Minix, L4, Hurd) failed to try.”
I have been working on my own async library, and although it works, the nagging problem is that without an OS written for truly async system calls, it ends up being emulated on top of a synchronous kernel like linux where the benefits cannot be witnessed. It’s difficult to sell a new paradigm (even with merit) when it runs poorly on existing operating systems which were optimized for the old paradigm.
Code in Erlang : asynchronous message passing between lightweight processes.
Kochise
Kochise,
“Code in Erlang : asynchronous message passing between lightweight processes.”
What do you mean by lightweight process?
There are various message passing APIs, MPI being a well established one, but to my knowledge none have been incorporated as a standard IPC & system call mechanism at the operating system level such that applications can make direct use of OS services without building more middle tier abstraction layers on top of other IPC/syscall mechanisms.
There are a lot of cool things we can do, but an operating system that is rebuilt from the ground up to do them would face a market share problem regardless of it’s merit.
Can bundling the OS with OEM’s hardware through a very restrictive licensing scheme be of any help regarding this issue ?
Kochise
Hi,
Yes; but the user space to kernel space switching is only about 50 cycles on old CPUs (less for a newer CPU using the SYSCALL instruction); and often you’re in the kernel (e.g. due to IRQ, exception or unrelated syscall) when you find out that you need to switch processes and there is no user space to kernel space switch before a task switch.
Note that this applies to both micro-kernels and monolithic kernels – they both have the same user space to kernel space context switch costs.
Agreed. The other thing I’d mention is that asynchronous messaging can work extremely well on multi-core; as the sender and receiver can be running on different CPUs at the same time and communicate without any task switches at all.
Ironically; for modern kernels (e.g. both Linux and Windows) everything that matters (IO) is asynchronous inside the kernel.
– Brendan
Brendan,
“Yes; but the user space to kernel space switching is only about 50 cycles on old CPUs (less for a newer CPU using the SYSCALL instruction)”
On older processors it used to be a couple hundred cycles like in the link I supplied. I’m not sure how much they’ve brought it down since then. Do you have a source for the 50 cycles stat?
“and often you’re in the kernel (e.g. due to IRQ, exception or unrelated syscall) when you find out that you need to switch processes and there is no user space to kernel space switch before a task switch.”
That’s only for pre-emption though. Blocking system calls always incur explicit context switches to/from userspace.
“Note that this applies to both micro-kernels and monolithic kernels – they both have the same user space to kernel space context switch costs.”
While technically true, the monolithic kernel doesn’t need to context switch between modules like a microkernel does. That’s the reason microkernels are said to be slower. The microkernel context switches can be reduced by using non-blocking messaging APIs, this is what I thought you were already suggesting earlier, no?
“Agreed. The other thing I’d mention is that asynchronous messaging can work extremely well on multi-core; as the sender and receiver can be running on different CPUs at the same time and communicate without any task switches at all.”
On the other hand, whenever I benchmark things like this I find that the cache-coherency overhead is a significant bottleneck for SMP systems such that a single processor can often do better with IO-bound processes. SMP is best suited for CPU bound processing where the ratio of CPU processing to inter-core IO is relatively high. Nothing is ever simple huh?
“Ironically; for modern kernels (e.g. both Linux and Windows) everything that matters (IO) is asynchronous inside the kernel.”
With linux, file IO uses blocking threads in the kernel, all the FS drivers use threads. These are less scalable than async designs since every request needs a kernel stack until it returns. The bigger problem with threads is that they’re extremely difficult to cancel asynchronously. One cannot simply “kill” a thread just anywhere, there could be side effects like locked mutexes, incomplete transactions and corrupt data structures…consequentially most FS IO requests are not cancelable on linux. In most cases this isn’t observable because most file IO operations return quickly enough, but there are very annoying cases from time to time (most commonly with network shares) where we cannot cancel the blocked IO or even kill the process. We are helpless, all we can do is wait for FS timeouts to elapse.
It’s difficult to justify the amount of work that’d be needed to fix these abnormal cases. I’d rather push for a real async model, but that’s not likely to happen given the immense scope such a patch would entail.
Hi,
I think we’re talking about different things. The cost of a “bare” software interrupt or call gate is around 50 cycles; but the benchmarks from your link are probably measuring the bare syscall plus an assembly language “stub” plus a call to a C function (and prologue/epilogue) plus another call (possibly via. a table of function pointers) to a minimal “do nothing” function.
For the overhead of privilege level switches, and for the overhead of switching between tasks/processes, there’s no real difference between micro-kernel and monolithic.
Micro-kernels are said to be slower because privilege level switches and switching between tasks/processes tend to happen more often; not because the overhead is higher.
That’d be true regardless of how threads communicate. The only way to reduce the cache-coherency overhead is to build a more intelligent scheduler (e.g. make threads that communicate a lot run on CPUs that share the same L2 or L3 cache).
Sadly, it’s easier to keep adding extensions on top of extensions (and end up with an ugly mess that works) than it is to start again with a new/clean design; even when a new/clean design would reduce the total amount of work and improve the quality of the end result in the long run. Most people are too short-sighted for that – they only look at the next few years rather than the next few decades.
– Brendan
Brendan,
“I think we’re talking about different things. The cost of a ‘bare’ software interrupt or call gate is around 50 cycles; but the benchmarks from your link are probably measuring the bare syscall plus…”
I still have some of my old books, so I looked it up again:
From Programming the 386 Sybex (c) 1987:
There’s a very large table showing all the various timings for all the instructions in all their permutations. This spans a few pages, but some excerpts follow…
“int to same privilege level” -> 59 cycles
“int to different privilege level” -> 99 cycles
“int from 386 task to 386 TSS via task gate” -> 307 cycles
“ret from 386 task to 386 TSS via task gate” -> 294 cycles
I also found this slightly more recent source:
http://zsmith.co/intel_i.html#int
http://zsmith.co/intel.html#ts
For 486…
“prot mode same priv” -> 44 cycles
“prot mode more priv” -> 71 cycles
“prot mode via task gate” -> 37+ts
486 protected mode TSS = 199 (for total of 236)
So if you are using a cpu task state segment, it definitely used to take hundreds of cycles between the call and return. If you aren’t, it’d require more instruction cycles to manually save and restore cpu state. Of course this material is very dated, which is why I’m asking for a newer source. But right now I’m lead to believe that maybe your number is for interrupt calls without a TSS, is that possible? I couldn’t find the timing information for sysenter and sysexit used with amd64.
“Micro-kernels are said to be slower because privilege level switches and switching between tasks/processes tend to happen more often; not because the overhead is higher.”
We’re saying the same thing.
“Sadly, it’s easier to keep adding extensions on top of extensions (and end up with an ugly mess that works) than it is to start again with a new/clean design; even when a new/clean design would reduce the total amount of work and improve the quality of the end result in the long run. Most people are too short-sighted for that – they only look at the next few years rather than the next few decades.”
Agree. It’s the consequence of the philosophy “if it works, don’t fix it”, ignoring the fact that some things can work better than others. There’s inevitably a large upfront cost with only marginal benefit. Over several years this would pay off, but you are right that short term thinking will mean it will never happen.
I really hope they don’t start pointless version jumping just for the hell of it with the kernel… we already have that crap with Firefox and Chrome. And it’s not fun.
Hay for Linux 2013.04 !
Kochise
Torvalds made pretty clear when the second number becomes to big, something like 43 is to large, he’ll start with 4.0. So I don’t expect that anytime soon.
Why do you care about the version number of your browser ?
The whole point of these browser makers is to make it clear the latest version is the only browser you should care about. The number has also been removed from the browser ‘chrome’ (like the title bar of the browser window).
The version number only exists for people reporting bugs and developers of the browser and websites.