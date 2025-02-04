Geico, an American insurance company, is building a live-patching solution for the Linux kernel, called TuxTape.
TuxTape is an in-development kernel livepatching ecosystem that aims to aid in the production and distribution of kpatch patches to vendor-independent kernels. This is done by scraping the Linux CNA mailing list, prioritizing CVEs by severity, and determining applicability of the patches to the configured kernel(s). Applicability of patches is determined by profiling kernel builds to record which files are included in the build process and ignoring CVEs that do not affect files included in kernel builds deployed on the managed fleet.↫ Presentation by Grayson Guarino and Chris Townsend
It seems to me something like live-patching the Linux kernel should be a standardised framework that’s part of the Linux kernel, and not several random implementations by third parties, one of which is an insurance company. There’s a base core of functionality for live-patching in the Linux kernel since 4.0, released in 2015, but it’s extremely limited and requires most of the functionality to be implemented separately, through things like Red Hat’s kpatch and Oracle’s Ksplice.
Geico is going to release TuxTape as open source, and is encouraging others to adopt and use it. There are various other solutions out there offering similar functionality, so you’re not spoiled for choice, and I’m sure there’s advantages and disadvantages to each. I would still prefer if functionality like this is a standard feature of the kernel, not something tied to a specific vendor or implementation.
What’s the point? Why would an insurance company create its own kernel live patching solution?
In fact, what Linux distribution are they using? Are they not using RHEL, SLES or Ubuntu, like 99% of the large companies in the world.
Because they don’t want to pay for those vendors’ implementations, or need features those vendors don’t provide.
When you are large enough it can often be cheaper to maintain your own in house implementation rather than use an external vendor, especially where all of the source code is open so you only have to maintain customizations rather than the whole stack.
Big players like Google, AWS, even MS do this already.
Live kernel patching involves some hacks that make it less appealing for standardization. The live patching kernels can’t be arbitrarily changed and it doesn’t result in the same kernel at the end, it’s more like a custom version of the original kernel still running. Patches have to be carefully crafted and tested. Structures can only be expanded if they had enough empty space appended to them originally. Kernel threads need more interlocks to enforce an orderly live migration. 3rd parties usually only support a subset of kernel versions, however mainline kernel devs put out a lot more kernels and the work to support livepatching between them could represent a higher workload without a correspondingly higher revenue stream for the linux kernel. Even if it were available most users would be better off with a standard kernel.
I propose a different solution: rather than a mainline live patching solution, howabout a feature to standardize live process migration? As an admin it’s something I’ve wanted to be able to do with userspace processes. And as a bonus (assuming the ABI for process migration is stable) it would effectively let you upgrade the kernel by migrating all the processes to a new server running a new kernel. If you’re running a physical server, you could perform the migration twice: once to a temporary server while the physical server gets updated, and then back to the permanent server. If it’s a virtual server you can just skip the temporary server and de-provision the original server. after migration. This feature would have benefits over live patching a kernel. Consider the server would be available for all kinds of maintenance, including hardware/fixes/cleaning/upgrades and even physical relocation without any software downtime.
Qemu has similar functionality but it would be nice to have support for this at the kernel level. This works best when you have virtualized storage and networking – things that linux can do, but it would be nice to have a standardized processes/commands to put all of this together at a high level. For example, if you have a SAN for network storage, this migration is quite trivial. If you don’t have network storage, then you have to migrate the storage as well, which is still doable but becomes cumbersome for large local disks.