IHK/McKernel is a light-weight multi kernel operating system designed specifically for high performance computing. It runs Linux and McKernel, a lightweight kernel (LWK), side-by-side on compute nodes primarily aiming at the followings:
- Provide scalable and consistent execution of large-scale parallel applications and at the same time rapidly adapt to exotic hardware and new programming models
- Provide efficient memory and device management so that resource contention and data movement are minimized at the system level
- Eliminate OS noise by isolating OS services in Linux and provide jitter free execution on the LWK
- Support the full POSIX/Linux APIs by selectively offloading system calls to Linux
So it is possible…
kwan_e,
Haha, funny
This is interesting, thanks for posting it Thom.
There are some aspects of this project that I don’t quite follow from the article. What is it exactly that makes the McKernel kernel more efficient than the linux kernel? And which calls exactly are handled by IHK/McKernel stack versus passed along to linux? I feel a concrete example would have been illustrative here.
I’m a bit confused about the motivation for this because I’m inclined to think that whatever optimizations that McKernel incorporated to improve HPC efficiency could probably have been added to the linux kernel directly without the need for a separate McKernel stack.
Still, I find this design to be an intriguing answer to the driver & software compatibility catch-22 hurdles that independent operating systems will typically face early on in their existence. Building a new kernel that runs adjacent to linux that can make use of it’s widespread driver & software support enables indy kernel developers to focus more of their attention on the features that make their kernel unique. Putting my questions above aside, I think this is quite clever!
One point might be that the scheduler is a lot simpler. No completely fair scheduling which changes priorities over time and such. I guess it is more deterministic and a lot quicker.
Another approach was this one:
https://github.com/ReturnInfinity
but w/o the Linux-API.
This Linux API might help porting existing HPC software to McKernel.
Deep Thought,
But I’m still confused about their motivation of using a whole other kernel rather than using a linux kernel with a simpler scheduler.
It looks like the design would add more IO latency because (I assume) that normal IO has to go through two kernels instead of just one. Presumably their target workload is CPU intensive rather than IO intensive, but was there a technical reason for using two kernels instead of optimizing linux in place or was it just a case of “we wanted to try something different and this is the result”? If it’s the later, then personally I get it, it’s a lot more creative and interesting to work on your own designs. But if it’s supposed to be the former, I wish they had explained their reasoning and perhaps added benchmarks to back their hypothesis that such a design would perform better for HPC than linux alone.
My understanding is, they leave on the Linux side all the “nasty” stuff like filesystem, graphics etc.
If the HPC-side needs to access for example a file, it hands of this job to the Linux.
My impression of a node in an HPC cluster is mainly computing and exchanging data with other nodes. This can be done easy in a limited kernel.
The overall idea is not new though. Hypervisors are used to have Linux in parallel with an RTOS.
New is, that the McKernel looks like a vanilla Linux on API level.
But yes, some real-world figures would be nice to see if this is just an academic idea or give a benefit.
DeepThought,
Well, I’m never one to say linux can’t be improved upon because I know that it can be. However it’s hard to see how a dual kernel approach would significantly improve the performance over a single optimized linux kernel. Benchmarks would really help.
Anyways, this is certainly an interesting alt-os. It’d be nice to discuss it with a member of the project
Edited 2018-11-28 12:53 UTC
I think it’s more a case of maintainability than performance here. If they’re completely separate from the Linux kernel (and it sounds like they mostly are), they don’t have to worry about adapting to purely internal changes to the Linux kernel that tend to make carrying any kind of complicated patches for any extended period of time difficult (for example the ongoing change to a 64-bit time_t representation internally), or any of the linking related licensing issues.
There’s also the very distinct possibility that this all started long before the whole ‘isolcpus’ thing that Linux now supports started. Prior to that, you couldn’t completely keep the kernel from interfering with userspace execution on a given CPU core, because there were some types of kernel threads that you couldn’t force off of a core with CPU affinities.
The whole thing reminds me of IBM’s Watson systems though. Those use physically distinct I/O and compute nodes, with the I/O nodes running a modified version of Linux (with an RHEL derived userspace I believe), while the compute nodes run a minuscule kernel consisting of a few hundred lines of C++ that just provides IPC primitives for talking to the I/O nodes.
“while the compute nodes run a minuscule kernel consisting of a few hundred lines of C++ that just provides IPC primitives for talking to the I/O nodes.”
That’s the key right there… most of the compute nodes never end up calling into the Linux kernel itself and the code just runs on the LWK. It makes the improvement in performance automatic at runtime.
Linux and other full unix kernels tend to be way too bloated for compute node’s even if you chop out a ton of stuff, you still end up with a multi megabyte kernel… LKW is probably only kilobytes.
But, they can share the same programming environment instead of having a special kernel ABI…. as is done on Watson and others.
cb88,
You’re the second person to mention it, I should learn more about watson
I think the analogy here, in hardware terms, is that Linux is the CPU, and the McKernel is an FPGA or ASIC.
kwan_e,
I don’t get this analogy. They’re both co-kernels running on ordinary SMP CPUs.
Yes, but one kernel is doing general purpose things, while the other is specific to achieving the massive parallel scaling.
You can optimize Linux as much as you like, but things like preemptive scheduling will always make things a bit unpredictable. The McKernel uses cooperative scheduling to reduce jitter, which I take to mean processes aren’t interrupted at random intervals.
So the analogy to CPU and FPGA is that the CPU is highly interruptible and doing lots of context switching stuff while the FPGA is single minded about its tasks.
The parallelism is easier to scale if there aren’t as much interruptions to synchronize everything.
Edited 2018-11-29 01:08 UTC
kwan_e,
Ok, I’m going to continue thinking in terms of CPUs though, haha
It might make more sense (and maybe this is what is intended) if the full Linux Kernel only runs on a master node and the LWK runs on the zillion diverse architecture slaves. However this has already been done. Supercomputers already use lightweight CNKs (Compute Node Kernels).
The kernel-to-userspace ABI has been stable for a long, long time. It’s just that people confuse “an OS and a set of userspace libraries” for “an OS” and then whine about issues in dynamic libraries.
https://en.m.wikipedia.org/wiki/K_computer
“K computer comprises 88,128 2.0Â GHz eight-core SPARC64 VIIIfx processors contained in 864 cabinets, for a total of 705,024 cores”
This is essentially a hypervisor/microkernel with additional support to use Linux services by (1) system call forwarding to the Linux instance and (2) mirrored address spaces with a corresponding “proxy” process running on the Linux instance.
Probably more details in the papers, but that mostly summarizes what is introduced on the homepage.
Edited 2018-11-29 09:44 UTC