One of the problem with operating system updates is that you often need to reboot the system. While this is nothing but a minor nuisance for us desktop users, it’s a bigger problem when it comes to servers. Ksplice is a technology that allows Linux kernel patches to be applied without actually restarting the kernel.
Ksplice is actually quite an intricate piece of technology, and since I’m simply not qualified enough to understand and explain it all, I’ll leave it to Ars to explain:
To generate a live update, it compares compiled object code from before and after a source patch is applied, a technique that the developers refer to as “pre-post differencing.” They take advantage of the -ffunction-sections and -fdata-sections options of the C compiler to eliminate some variance between the pre and post object code.To determine where the symbols reside in memory, they use a method that they describe as run-pre matching, which compares the “pre” object code to the code that is running in memory. This is done with a special Ksplice kernel module. The live updates generated by Ksplice inject new functions into memory while the kernel is running and modify the old functions so that their path of execution will be redirected to the new versions.
The developers behind Ksplice claim that the process interrupts system operation for only 0.7 milliseconds, which is pretty impressive. Most kernel patches do not need to be adapted to work with Ksplice (88%, limited to x86), with the remainder (12%) needing 1-12 lines of code in order to work. The latter category consists of patches that perform semantic changes to kernel data structures.
Ksplice is not some vague proposal – it actually works right now, and you can test it out on Ubuntu via Ksplice Uptrack.
for example when in memory data structures change significantly. But that is only in a very few cases.
My biggest concern is whether it can detect those cases and abort/perform a normal reboot rather than fail silently.
Trust me.. the failure won’t be silent .
Does one hear the shrill of “I’m dying, dying! oh what a world! what a world!” from the server?
I don’t think it will fail “silently” in such a case (more like kernel panic).
Actually no. It just means you need to have functionality that can serialize and deserialize the structures in a manner both the old kernel and new kernel understand.
The ONLY time you need to fully reboot (as you suggest) is when certain hardware needs replaced and is not hot-swappable or when hardware needs to re-initialize and it cannot be done on the fly. But that is a hardware limitation.
There is NO software limitation to needing to reboot. Any software limitation can be removed from being a limitation by proper design of the functionality.
OK, in theory you do serialize and deserialize everything, but in the real world it can be in some cases so complex that doing it in a bug free manner is not worth the effort.
That is why I wrote significantly.
In this case a programmer needs to write code to convert the old structures into the new structures (which isn’t the same as converting the structure/s into some common format that both kernels understand). Of course in some cases this conversion would be impossible, because you can’t create data out of nothing.
Wrong. Their own paper (called “Ksplice: Automatic Rebootless Kernel Updates”) says:
“5.2 Capturing the CPUs to update safely
A safe time to update a function is when no thread’s instruction pointer falls within that function’s text in memory and when no thread’s kernel stack contains a return address within that function’s text in memory.
Ksplice uses Linux’s stop machine facility to achieve an appropriate opportunity to check the above safety condition for every function being replaced. When invoked, stop machine simultaneously captures all of the CPUs on the system and runs a desired function on a single CPU.
If the above safety condition is not initially satisï¬ed, then Ksplice tries again after a short delay. If multiple such attempts are unsuccessful, then Ksplice abandons the upgrade attempt and reports the failure.
Ksplice’s current implementation therefore cannot be used to automatically upgrade non-quiescent kernel functions. A function is considered non-quiescent if that function is always on the call stack of some thread within the kernel. For example, the primary Linux scheduler function, schedule, is non-quiescent since sleeping threads block in the scheduler. This limitation does not prevent Ksplice from handling any of the signiï¬cant Linux security vulnerabilities from May 2005 to May 2008.”
I’d also point out that they only talk about security patches; and nowhere do they mention patches for any other purpose (e.g. new device drivers, switching to a different scheduler, changing the USB stack, etc). With this in mind I expect it won’t be useful for any major kernel changes – e.g. it might handle a change from “2.6.28-r5” to “2.6.28-r6”, but might not handle a change from “2.6.28-r6” to “2.6.29-r1”, and also might not handle changes to most compile time options. My theory here is that if it did handle this properly they would have bragged about it, instead of only mentioning security patches.
– Brendan
In this case a programmer needs to write code to convert the old structures into the new structures (which isn’t the same as converting the structure/s into some common format that both kernels understand). Of course in some cases this conversion would be impossible, because you can’t create data out of nothing. [/q]
That ‘common format’ is what would enable the programmer to convert the old structure to the new structure. So yes, that is exactly what happens. Of course, some things need to be able to be initialized to when the common format doesn’t support it – e.g. a major change, but that common format should have versions of its own so they can communicate.
Essentially – you need a way to communicate between the two kernels what the state of the different relevant parts are. Whether you tear down the whole kernel (like kexec does) or build from a base of nothing, or tear down enough to move from one running kernel to another without having to do full hardware initialization under the new kernel (like ksplice), you still need a way to communicate some of that state information – enough to initialize (especially) drivers and other parts to a known state they can then move forward from to bring the rest of the kernel back and continue on.
Actually, that is only a limitation to their implementation. There really is no software limitation unless you design one in.
This is true of any implementation; but you can work around it.
ksplice seems to try to keep as much running as possible while making the transition. Thus their documented limitation.
On the other hand, you can reduce the system to a minimal state temporarily, serialize the minimal state of the drivers and other parts (e.g. memory allocations tables) – which each part would have to be responsible for serializing – unload drivers, etc. transfer to the new kernel, informing it where to find the serialized data, which can then load the drivers, etc, deserialize the data, and resume operation.
I’m not saying it’s an easy task – but it is accomplishable and it removes any software limitations.
kexec, on the other hand, shuts down the kernel entirely and does nothing to serialize the individual kernel parts – it just shuts down the entire system and restarts the system on the new kernel instead of initiating a hardware reset.
Not to be paranoid, but would this not allow a new means for rootkits to embed themselves in the running kernel?
If they gain root access, yes. But if you got malware running as root then you’re already screwed.
No, this adds no _new_ means other than any other new code added to the kernel (because it might have bugs). If you are root you can a install a rootkits in most cases anyway.
…and what about Kexec !
That still involves a reboot of the kernel, including a restart of init and all of the services. The only advantage of Kexec over a normal reboot is that you can skip the BIOS, which on servers with many addon cards and utilities can save a lot of time.
NT has had this capability on all supported architectures since Win2K3 SP1 for both user-mode and kernel-mode components.
I’m not sure it’s that useful because any installation which has high enough reliability requirements to use hotpatching probably organizes its services in a cluster which can be patched through more normal mechanisms without downtime.
The code that makes this work is kinda cool though, so I can see why it was developed .
Except on NT you probably need to reboot to install the new components in the first place because files are in use and are therefore locked (which is a major design flaw in NT compared to *nix systems).
That is the one thing I hate about Windows – the stupid idea of locking files; who ever designed such a stupid principle needs to be fired from Microsoft because it lacks all degree of common sense. It not only effects the kernel but try uninstalling applications where the application fails to unload the shared libraries resulting in locked files that results in a whole heap of crap left over when uninstalling.
http://www.osnews.com/permalink?291395
There must be a fundamental difference in how files are handled in Windows and *nix. On a *nix system, when a process reads or writes a file, and this file is deleted, renamed or replaced by another process, the first process will still see the old file until it closes its file handle.
I trust that loading libraries works the same way, although I don’t know the details. I guess the trick is that files are considered the same if they are the same inode, regardless of where or when you found them in the file system tree.
This is true in NT as well. But in UNIX the locking of files is advisory by default and shared by default (i.e. if I open a file anyone else can also open/modify the file and if I say I want to lock it other guys have to check the lock bit in order to respect my wishes). As a design choice (and to increase compatibility with older Windows), NT locking is mandatory and the default (if you specify no flags) is to disallow sharing. The library loader also seems to disallow sharing for delete as well.
As cetp mentions, this may be a conscious design choice because structures shared across process boundaries by DLLs (this can happen via things like Window Messages) may not tolerate having two different versions of the DLL loaded and interacting with each other. There’s nothing in the NT Kernel architecture that prevents you from replacing a DLL when the program is still running (you can even do this yourself and get around the file locking problem by just renaming/moving the thing you want to replace and no one will stop you).
Why is it then that Windows till needs to reboot more often than Linux? (Just felt reboots. Dunno if its the real amount of reboots.)
you have to differentiate between needing to reboot and beeing prompted to reboot
most of the apps/drivers that finish the install-process with a reboot-message just work after clicking NO
Right on. Either a service is so important that it is already in a failover or cluster configuration and rebooting is not a problem or it’s just not important enough and then rebooting isn’t a problem either.
It’s a very good news in the server world or with user who hate reboot like me
let’s hope we’ll get there on windows too, eventually. (vista had gone a long way, but some things are still lacking) Now the only reason I reboot my vista box is on patch day or when I plug my projector/2nd screen again (nvidia drivers don’t recognize what’s connected to the 2nd dvi slot without rebooting)
Seriously, there are 2 kinds of services: those that you can afford to bring down (in a planned service window), and those that you cannot.
For the former, just plan the reboot. A reboot should not be ‘risky’ – if it is, the machine is not properly maintained. In fact, it’s a good test if all services come back up correctly, to be ready for a forced reboot (such as a hardware failure).
For the latter, you need failover anyway, so you can just reboot the servers in the cluster one at a time.
Or never reboot the machines and spend time playing tetris.
That is correct. What kind of service is so important that it needs to be always up, yet not important enough to justify a failover scheme?
Projects like this are nice, of course. But they aren’t practical at all.
Because some things are really hard (or just expensive) to be able to complete failover for, like Linux with specific hardware for running telephony systems.
Even 0.02 cents per machine could be to expensive if their are a lot of machines. For example it would be really nice if the update of your cable-modem didn’t disconnect you from internet for a couple of minutes.
For some things it’s just easier on the administrative side, like the cluster running calculations where you don’t want to move jobs around and rebooting cluster-nodes that haven’t been rebooted and then moving the jobs back and rebooting other nodes and keeping track of which you have and haven’t had.
Their could be many reasons. 🙂
But how is this technology bad? You doubt your service should be able to be rebooted, but at the same time, if you can avoid a reboot safely, then why not? Don’t use it if it scares you, but its nice to have the option.
No one said it was bad, we’re just not seeing what’s so great about it. Server uptime is mostly a meaningless measurement unless you’re one of those guys who like to measure your peni…uptime by posting it on your blog or whatever.
Microsoft has promised same feature in Vista… Ages ago…
i’d be too scared to try it on any servers yet but it’s in squeeze right now with no open bugs so i think i’ll try it out on a few desktops next kernel update.
massive boon right there for an always on setup.