Linked by Thom Holwerda on Wed 15th Jul 2009 16:09 UTC
Linux One of the problem with operating system updates is that you often need to reboot the system. While this is nothing but a minor nuisance for us desktop users, it's a bigger problem when it comes to servers. Ksplice is a technology that allows Linux kernel patches to be applied without actually restarting the kernel.
Order by: Score:
Sometimes you still need to reboot ..
by kragil on Wed 15th Jul 2009 16:23 UTC
kragil
Member since:
2006-01-04

for example when in memory data structures change significantly. But that is only in a very few cases.

Reply Score: 2

anarxia Member since:
2006-06-02

My biggest concern is whether it can detect those cases and abort/perform a normal reboot rather than fail silently.

Reply Score: 1

PlatformAgnostic Member since:
2006-01-02

Trust me.. the failure won't be silent ;) .

Reply Score: 2

kaiwai Member since:
2005-07-06

Does one hear the shrill of "I'm dying, dying! oh what a world! what a world!" from the server? ;)

Reply Score: 2

panzi Member since:
2006-01-22

I don't think it will fail "silently" in such a case (more like kernel panic).

Reply Score: 2

TemporalBeing Member since:
2007-08-22

for example when in memory data structures change significantly. But that is only in a very few cases.


Actually no. It just means you need to have functionality that can serialize and deserialize the structures in a manner both the old kernel and new kernel understand.

The ONLY time you need to fully reboot (as you suggest) is when certain hardware needs replaced and is not hot-swappable or when hardware needs to re-initialize and it cannot be done on the fly. But that is a hardware limitation.

There is NO software limitation to needing to reboot. Any software limitation can be removed from being a limitation by proper design of the functionality.

Reply Score: 3

kragil Member since:
2006-01-04

OK, in theory you do serialize and deserialize everything, but in the real world it can be in some cases so complex that doing it in a bug free manner is not worth the effort.

That is why I wrote significantly.

Reply Score: 3

Brendan Member since:
2005-11-16

"for example when in memory data structures change significantly. But that is only in a very few cases.


Actually no. It just means you need to have functionality that can serialize and deserialize the structures in a manner both the old kernel and new kernel understand.
"

In this case a programmer needs to write code to convert the old structures into the new structures (which isn't the same as converting the structure/s into some common format that both kernels understand). Of course in some cases this conversion would be impossible, because you can't create data out of nothing.

The ONLY time you need to fully reboot (as you suggest) is when certain hardware needs replaced and is not hot-swappable or when hardware needs to re-initialize and it cannot be done on the fly. But that is a hardware limitation.

There is NO software limitation to needing to reboot. Any software limitation can be removed from being a limitation by proper design of the functionality.


Wrong. Their own paper (called "Ksplice: Automatic Rebootless Kernel Updates") says:

"5.2 Capturing the CPUs to update safely

A safe time to update a function is when no thread’s instruction pointer falls within that function’s text in memory and when no thread’s kernel stack contains a return address within that function’s text in memory.
Ksplice uses Linux’s stop machine facility to achieve an appropriate opportunity to check the above safety condition for every function being replaced. When invoked, stop machine simultaneously captures all of the CPUs on the system and runs a desired function on a single CPU.
If the above safety condition is not initially satisfied, then Ksplice tries again after a short delay. If multiple such attempts are unsuccessful, then Ksplice abandons the upgrade attempt and reports the failure.
Ksplice’s current implementation therefore cannot be used to automatically upgrade non-quiescent kernel functions. A function is considered non-quiescent if that function is always on the call stack of some thread within the kernel. For example, the primary Linux scheduler function, schedule, is non-quiescent since sleeping threads block in the scheduler. This limitation does not prevent Ksplice from handling any of the significant Linux security vulnerabilities from May 2005 to May 2008."


I'd also point out that they only talk about security patches; and nowhere do they mention patches for any other purpose (e.g. new device drivers, switching to a different scheduler, changing the USB stack, etc). With this in mind I expect it won't be useful for any major kernel changes - e.g. it might handle a change from "2.6.28-r5" to "2.6.28-r6", but might not handle a change from "2.6.28-r6" to "2.6.29-r1", and also might not handle changes to most compile time options. My theory here is that if it did handle this properly they would have bragged about it, instead of only mentioning security patches.

- Brendan

Reply Score: 2

TemporalBeing Member since:
2007-08-22

"[q]for example when in memory data structures change significantly. But that is only in a very few cases.


Actually no. It just means you need to have functionality that can serialize and deserialize the structures in a manner both the old kernel and new kernel understand.
"

In this case a programmer needs to write code to convert the old structures into the new structures (which isn't the same as converting the structure/s into some common format that both kernels understand). Of course in some cases this conversion would be impossible, because you can't create data out of nothing. [/q]

That 'common format' is what would enable the programmer to convert the old structure to the new structure. So yes, that is exactly what happens. Of course, some things need to be able to be initialized to when the common format doesn't support it - e.g. a major change, but that common format should have versions of its own so they can communicate.

Essentially - you need a way to communicate between the two kernels what the state of the different relevant parts are. Whether you tear down the whole kernel (like kexec does) or build from a base of nothing, or tear down enough to move from one running kernel to another without having to do full hardware initialization under the new kernel (like ksplice), you still need a way to communicate some of that state information - enough to initialize (especially) drivers and other parts to a known state they can then move forward from to bring the rest of the kernel back and continue on.

"The ONLY time you need to fully reboot (as you suggest) is when certain hardware needs replaced and is not hot-swappable or when hardware needs to re-initialize and it cannot be done on the fly. But that is a hardware limitation.

There is NO software limitation to needing to reboot. Any software limitation can be removed from being a limitation by proper design of the functionality.


Wrong. Their own paper (called "Ksplice: Automatic Rebootless Kernel Updates") says:
"

Actually, that is only a limitation to their implementation. There really is no software limitation unless you design one in.




"5.2 Capturing the CPUs to update safely

A safe time to update a function is when no thread’s instruction pointer falls within that function’s text in memory and when no thread’s kernel stack contains a return address within that function’s text in memory.


This is true of any implementation; but you can work around it.

ksplice seems to try to keep as much running as possible while making the transition. Thus their documented limitation.

On the other hand, you can reduce the system to a minimal state temporarily, serialize the minimal state of the drivers and other parts (e.g. memory allocations tables) - which each part would have to be responsible for serializing - unload drivers, etc. transfer to the new kernel, informing it where to find the serialized data, which can then load the drivers, etc, deserialize the data, and resume operation.

I'm not saying it's an easy task - but it is accomplishable and it removes any software limitations.

kexec, on the other hand, shuts down the kernel entirely and does nothing to serialize the individual kernel parts - it just shuts down the entire system and restarts the system on the new kernel instead of initiating a hardware reset.

Reply Score: 2

Rootkits
by bbell on Wed 15th Jul 2009 16:23 UTC
bbell
Member since:
2006-05-04

Not to be paranoid, but would this not allow a new means for rootkits to embed themselves in the running kernel?

Reply Score: 2

RE: Rootkits
by WereCatf on Wed 15th Jul 2009 16:28 UTC in reply to "Rootkits"
WereCatf Member since:
2006-02-15

If they gain root access, yes. But if you got malware running as root then you're already screwed.

Reply Score: 3

RE: Rootkits
by kragil on Wed 15th Jul 2009 16:28 UTC in reply to "Rootkits"
kragil Member since:
2006-01-04

No, this adds no _new_ means other than any other new code added to the kernel (because it might have bugs). If you are root you can a install a rootkits in most cases anyway.

Reply Score: 2

Kexec
by safekali on Wed 15th Jul 2009 17:24 UTC
safekali
Member since:
2005-08-19

...and what about Kexec !

Reply Score: 2

RE: Kexec
by hufman on Wed 15th Jul 2009 17:47 UTC in reply to "Kexec"
hufman Member since:
2008-10-11

That still involves a reboot of the kernel, including a restart of init and all of the services. The only advantage of Kexec over a normal reboot is that you can skip the BIOS, which on servers with many addon cards and utilities can save a lot of time.

Reply Score: 3

Already in NT.
by PlatformAgnostic on Wed 15th Jul 2009 17:58 UTC
PlatformAgnostic
Member since:
2006-01-02

NT has had this capability on all supported architectures since Win2K3 SP1 for both user-mode and kernel-mode components.

I'm not sure it's that useful because any installation which has high enough reliability requirements to use hotpatching probably organizes its services in a cluster which can be patched through more normal mechanisms without downtime.

The code that makes this work is kinda cool though, so I can see why it was developed ;) .

Reply Score: 3

RE: Already in NT.
by bogomipz on Wed 15th Jul 2009 19:50 UTC in reply to "Already in NT."
bogomipz Member since:
2005-07-11

Except on NT you probably need to reboot to install the new components in the first place because files are in use and are therefore locked (which is a major design flaw in NT compared to *nix systems).

Reply Score: 3

RE[2]: Already in NT.
by kaiwai on Thu 16th Jul 2009 01:12 UTC in reply to "RE: Already in NT."
kaiwai Member since:
2005-07-06

Except on NT you probably need to reboot to install the new components in the first place because files are in use and are therefore locked (which is a major design flaw in NT compared to *nix systems).


That is the one thing I hate about Windows - the stupid idea of locking files; who ever designed such a stupid principle needs to be fired from Microsoft because it lacks all degree of common sense. It not only effects the kernel but try uninstalling applications where the application fails to unload the shared libraries resulting in locked files that results in a whole heap of crap left over when uninstalling.

Reply Score: 2

RE[2]: Already in NT.
by cetp on Thu 16th Jul 2009 09:51 UTC in reply to "RE: Already in NT."
cetp Member since:
2007-12-16
RE[3]: Already in NT.
by bogomipz on Thu 16th Jul 2009 12:33 UTC in reply to "RE[2]: Already in NT."
bogomipz Member since:
2005-07-11

There must be a fundamental difference in how files are handled in Windows and *nix. On a *nix system, when a process reads or writes a file, and this file is deleted, renamed or replaced by another process, the first process will still see the old file until it closes its file handle.

I trust that loading libraries works the same way, although I don't know the details. I guess the trick is that files are considered the same if they are the same inode, regardless of where or when you found them in the file system tree.

Reply Score: 2

RE[4]: Already in NT.
by PlatformAgnostic on Fri 17th Jul 2009 01:20 UTC in reply to "RE[3]: Already in NT."
PlatformAgnostic Member since:
2006-01-02

This is true in NT as well. But in UNIX the locking of files is advisory by default and shared by default (i.e. if I open a file anyone else can also open/modify the file and if I say I want to lock it other guys have to check the lock bit in order to respect my wishes). As a design choice (and to increase compatibility with older Windows), NT locking is mandatory and the default (if you specify no flags) is to disallow sharing. The library loader also seems to disallow sharing for delete as well.

As cetp mentions, this may be a conscious design choice because structures shared across process boundaries by DLLs (this can happen via things like Window Messages) may not tolerate having two different versions of the DLL loaded and interacting with each other. There's nothing in the NT Kernel architecture that prevents you from replacing a DLL when the program is still running (you can even do this yourself and get around the file locking problem by just renaming/moving the thing you want to replace and no one will stop you).

Reply Score: 2

RE: Already in NT.
by panzi on Wed 15th Jul 2009 22:22 UTC in reply to "Already in NT."
panzi Member since:
2006-01-22

Why is it then that Windows till needs to reboot more often than Linux? (Just felt reboots. Dunno if its the real amount of reboots.)

Reply Score: 2

RE[2]: Already in NT.
by smashIt on Wed 15th Jul 2009 22:33 UTC in reply to "RE: Already in NT."
smashIt Member since:
2005-07-06

you have to differentiate between needing to reboot and beeing prompted to reboot

most of the apps/drivers that finish the install-process with a reboot-message just work after clicking NO

Reply Score: 3

RE: Already in NT.
by Soulbender on Thu 16th Jul 2009 07:06 UTC in reply to "Already in NT."
Soulbender Member since:
2005-08-18

I'm not sure it's that useful because any installation which has high enough reliability requirements to use hotpatching probably organizes its services in a cluster which can be patched through more normal mechanisms without downtime.


Right on. Either a service is so important that it is already in a failover or cluster configuration and rebooting is not a problem or it's just not important enough and then rebooting isn't a problem either.

Reply Score: 2

WE'LL GET THERE
by po134 on Wed 15th Jul 2009 18:15 UTC
po134
Member since:
2009-05-15

It's a very good news in the server world or with user who hate reboot like me ;)

let's hope we'll get there on windows too, eventually. (vista had gone a long way, but some things are still lacking) Now the only reason I reboot my vista box is on patch day or when I plug my projector/2nd screen again (nvidia drivers don't recognize what's connected to the 2nd dvi slot without rebooting)

Reply Score: 1

What's wrong with a reboot?
by raboof on Wed 15th Jul 2009 18:19 UTC
raboof
Member since:
2005-07-24

Seriously, there are 2 kinds of services: those that you can afford to bring down (in a planned service window), and those that you cannot.

For the former, just plan the reboot. A reboot should not be 'risky' - if it is, the machine is not properly maintained. In fact, it's a good test if all services come back up correctly, to be ready for a forced reboot (such as a hardware failure).

For the latter, you need failover anyway, so you can just reboot the servers in the cluster one at a time.

Reply Score: 2

RE: What's wrong with a reboot?
by fretinator on Wed 15th Jul 2009 19:05 UTC in reply to "What's wrong with a reboot?"
fretinator Member since:
2005-07-06

Or never reboot the machines and spend time playing tetris.

Reply Score: 3

RE: What's wrong with a reboot?
by fernandotcl on Wed 15th Jul 2009 21:39 UTC in reply to "What's wrong with a reboot?"
fernandotcl Member since:
2007-08-12

That is correct. What kind of service is so important that it needs to be always up, yet not important enough to justify a failover scheme?

Projects like this are nice, of course. But they aren't practical at all.

Reply Score: 1

RE[2]: What's wrong with a reboot?
by Lennie on Thu 16th Jul 2009 00:31 UTC in reply to "RE: What's wrong with a reboot?"
Lennie Member since:
2007-09-22

Because some things are really hard (or just expensive) to be able to complete failover for, like Linux with specific hardware for running telephony systems.

Even 0.02 cents per machine could be to expensive if their are a lot of machines. For example it would be really nice if the update of your cable-modem didn't disconnect you from internet for a couple of minutes.

For some things it's just easier on the administrative side, like the cluster running calculations where you don't want to move jobs around and rebooting cluster-nodes that haven't been rebooted and then moving the jobs back and rebooting other nodes and keeping track of which you have and haven't had.

Their could be many reasons. :-)

Reply Score: 1

RE: What's wrong with a reboot?
by stoth on Wed 15th Jul 2009 22:36 UTC in reply to "What's wrong with a reboot?"
stoth Member since:
2009-07-15

But how is this technology bad? You doubt your service should be able to be rebooted, but at the same time, if you can avoid a reboot safely, then why not? Don't use it if it scares you, but its nice to have the option.

Reply Score: 2

Soulbender Member since:
2005-08-18

But how is this technology bad?


No one said it was bad, we're just not seeing what's so great about it. Server uptime is mostly a meaningless measurement unless you're one of those guys who like to measure your peni...uptime by posting it on your blog or whatever.

Reply Score: 3

reboot-free updates
by BlackTiger on Thu 16th Jul 2009 11:46 UTC
BlackTiger
Member since:
2005-07-22

Microsoft has promised same feature in Vista... Ages ago...

Reply Score: 1

awesome
by stabbyjones on Fri 17th Jul 2009 02:35 UTC
stabbyjones
Member since:
2008-04-15

i'd be too scared to try it on any servers yet but it's in squeeze right now with no open bugs so i think i'll try it out on a few desktops next kernel update.

massive boon right there for an always on setup.

Reply Score: 2