Linked by JRepin on Mon 29th Apr 2013 09:24 UTC
Linux After ten weeks of development Linus Torvalds has announced the release of Linux kernel 3.9. The latest version of the kernel now has a device mapper target which allows a user to setup an SSD as a cache for hard disks to boost disk performance under load. There's also kernel support for multiple processes waiting for requests on the same port, a feature which will allow it to distribute server work better across multiple CPU cores. KVM virtualisation is now available on ARM processors and RAID 5 and 6 support has been added to Btrfs's existing RAID 0 and 1 handling. Linux 3.9 also has a number of new and improved drivers which means the kernel now supports the graphics cores in AMD's next generation of APUs and also works with the high-speed 802.11ac Wi-Fi chips which will likely appear in Intel's next mobile platform. Read more about new features in What's new in Linux 3.9.
Thread beginning with comment 560179
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[8]: Load of works there
by Alfman on Tue 30th Apr 2013 18:59 UTC in reply to "RE[7]: Load of works there"
Alfman
Member since:
2011-01-28

Brendan,

"It's not the context switches between user space and kernel that hurt micro-kernels; it's context switches between processes (e.g. drivers, etc)."

In today's operating systems, don't userspace context switches need to go through kernel space context switches?

"But it's not really the context switches between processes that hurt micro-kernels; it's the way that synchronous IPC requires so many of these context switches. E.g. sender blocks (causing task switch to receiver) then receiver replies (causing task switch back)."


Still, if the context switch were "free", I think it'd help take microkernels out of the shadows. IPC doesn't have to be expensive, but we'd have to use it differently than the synchronous call & block pattern (like you said). I was always a fan of asynchronous batch messaging like that used by mainframes. We think of them like dinosaurs, but they did an inspirational job of breaking problems down into elements that could scale up very easily. Modern software design doesn't do justice to the software efficiency that earlier computers demanded.


"Of course this is a lot of work - it's no surprise that a lot of micro-kernels (Minix, L4, Hurd) failed to try."


I have been working on my own async library, and although it works, the nagging problem is that without an OS written for truly async system calls, it ends up being emulated on top of a synchronous kernel like linux where the benefits cannot be witnessed. It's difficult to sell a new paradigm (even with merit) when it runs poorly on existing operating systems which were optimized for the old paradigm.

Reply Parent Score: 2

RE[9]: Load of works there
by Kochise on Wed 1st May 2013 06:56 in reply to "RE[8]: Load of works there"
Kochise Member since:
2006-03-03

Code in Erlang : asynchronous message passing between lightweight processes.

Kochise

Reply Parent Score: 2

RE[10]: Load of works there
by Alfman on Wed 1st May 2013 14:08 in reply to "RE[9]: Load of works there"
Alfman Member since:
2011-01-28

Kochise,

"Code in Erlang : asynchronous message passing between lightweight processes."

What do you mean by lightweight process?

There are various message passing APIs, MPI being a well established one, but to my knowledge none have been incorporated as a standard IPC & system call mechanism at the operating system level such that applications can make direct use of OS services without building more middle tier abstraction layers on top of other IPC/syscall mechanisms.

There are a lot of cool things we can do, but an operating system that is rebuilt from the ground up to do them would face a market share problem regardless of it's merit.

Reply Parent Score: 2

RE[9]: Load of works there
by Brendan on Wed 1st May 2013 14:09 in reply to "RE[8]: Load of works there"
Brendan Member since:
2005-11-16

Hi,

"It's not the context switches between user space and kernel that hurt micro-kernels; it's context switches between processes (e.g. drivers, etc)."

In today's operating systems, don't userspace context switches need to go through kernel space context switches?


Yes; but the user space to kernel space switching is only about 50 cycles on old CPUs (less for a newer CPU using the SYSCALL instruction); and often you're in the kernel (e.g. due to IRQ, exception or unrelated syscall) when you find out that you need to switch processes and there is no user space to kernel space switch before a task switch.

Note that this applies to both micro-kernels and monolithic kernels - they both have the same user space to kernel space context switch costs.

"But it's not really the context switches between processes that hurt micro-kernels; it's the way that synchronous IPC requires so many of these context switches. E.g. sender blocks (causing task switch to receiver) then receiver replies (causing task switch back)."


Still, if the context switch were "free", I think it'd help take microkernels out of the shadows. IPC doesn't have to be expensive, but we'd have to use it differently than the synchronous call & block pattern (like you said). I was always a fan of asynchronous batch messaging like that used by mainframes. We think of them like dinosaurs, but they did an inspirational job of breaking problems down into elements that could scale up very easily. Modern software design doesn't do justice to the software efficiency that earlier computers demanded.


Agreed. The other thing I'd mention is that asynchronous messaging can work extremely well on multi-core; as the sender and receiver can be running on different CPUs at the same time and communicate without any task switches at all.

"Of course this is a lot of work - it's no surprise that a lot of micro-kernels (Minix, L4, Hurd) failed to try."

I have been working on my own async library, and although it works, the nagging problem is that without an OS written for truly async system calls, it ends up being emulated on top of a synchronous kernel like linux where the benefits cannot be witnessed. It's difficult to sell a new paradigm (even with merit) when it runs poorly on existing operating systems which were optimized for the old paradigm.


Ironically; for modern kernels (e.g. both Linux and Windows) everything that matters (IO) is asynchronous inside the kernel.

- Brendan

Reply Parent Score: 2

RE[10]: Load of works there
by Alfman on Wed 1st May 2013 16:17 in reply to "RE[9]: Load of works there"
Alfman Member since:
2011-01-28

Brendan,

"Yes; but the user space to kernel space switching is only about 50 cycles on old CPUs (less for a newer CPU using the SYSCALL instruction)"


On older processors it used to be a couple hundred cycles like in the link I supplied. I'm not sure how much they've brought it down since then. Do you have a source for the 50 cycles stat?


"and often you're in the kernel (e.g. due to IRQ, exception or unrelated syscall) when you find out that you need to switch processes and there is no user space to kernel space switch before a task switch."

That's only for pre-emption though. Blocking system calls always incur explicit context switches to/from userspace.


"Note that this applies to both micro-kernels and monolithic kernels - they both have the same user space to kernel space context switch costs."

While technically true, the monolithic kernel doesn't need to context switch between modules like a microkernel does. That's the reason microkernels are said to be slower. The microkernel context switches can be reduced by using non-blocking messaging APIs, this is what I thought you were already suggesting earlier, no?


"Agreed. The other thing I'd mention is that asynchronous messaging can work extremely well on multi-core; as the sender and receiver can be running on different CPUs at the same time and communicate without any task switches at all."

On the other hand, whenever I benchmark things like this I find that the cache-coherency overhead is a significant bottleneck for SMP systems such that a single processor can often do better with IO-bound processes. SMP is best suited for CPU bound processing where the ratio of CPU processing to inter-core IO is relatively high. Nothing is ever simple huh?


"Ironically; for modern kernels (e.g. both Linux and Windows) everything that matters (IO) is asynchronous inside the kernel."

With linux, file IO uses blocking threads in the kernel, all the FS drivers use threads. These are less scalable than async designs since every request needs a kernel stack until it returns. The bigger problem with threads is that they're extremely difficult to cancel asynchronously. One cannot simply "kill" a thread just anywhere, there could be side effects like locked mutexes, incomplete transactions and corrupt data structures...consequentially most FS IO requests are not cancelable on linux. In most cases this isn't observable because most file IO operations return quickly enough, but there are very annoying cases from time to time (most commonly with network shares) where we cannot cancel the blocked IO or even kill the process. We are helpless, all we can do is wait for FS timeouts to elapse.

It's difficult to justify the amount of work that'd be needed to fix these abnormal cases. I'd rather push for a real async model, but that's not likely to happen given the immense scope such a patch would entail.

Reply Parent Score: 2