Post a Comment
Actually no. It just means you need to have functionality that can serialize and deserialize the structures in a manner both the old kernel and new kernel understand.
The ONLY time you need to fully reboot (as you suggest) is when certain hardware needs replaced and is not hot-swappable or when hardware needs to re-initialize and it cannot be done on the fly. But that is a hardware limitation.
There is NO software limitation to needing to reboot. Any software limitation can be removed from being a limitation by proper design of the functionality.
Actually no. It just means you need to have functionality that can serialize and deserialize the structures in a manner both the old kernel and new kernel understand. "
In this case a programmer needs to write code to convert the old structures into the new structures (which isn't the same as converting the structure/s into some common format that both kernels understand). Of course in some cases this conversion would be impossible, because you can't create data out of nothing.
There is NO software limitation to needing to reboot. Any software limitation can be removed from being a limitation by proper design of the functionality.
Wrong. Their own paper (called "Ksplice: Automatic Rebootless Kernel Updates") says:
"5.2 Capturing the CPUs to update safely
A safe time to update a function is when no thread’s instruction pointer falls within that function’s text in memory and when no thread’s kernel stack contains a return address within that function’s text in memory.
Ksplice uses Linux’s stop machine facility to achieve an appropriate opportunity to check the above safety condition for every function being replaced. When invoked, stop machine simultaneously captures all of the CPUs on the system and runs a desired function on a single CPU.
If the above safety condition is not initially satisfied, then Ksplice tries again after a short delay. If multiple such attempts are unsuccessful, then Ksplice abandons the upgrade attempt and reports the failure.
Ksplice’s current implementation therefore cannot be used to automatically upgrade non-quiescent kernel functions. A function is considered non-quiescent if that function is always on the call stack of some thread within the kernel. For example, the primary Linux scheduler function, schedule, is non-quiescent since sleeping threads block in the scheduler. This limitation does not prevent Ksplice from handling any of the significant Linux security vulnerabilities from May 2005 to May 2008."
I'd also point out that they only talk about security patches; and nowhere do they mention patches for any other purpose (e.g. new device drivers, switching to a different scheduler, changing the USB stack, etc). With this in mind I expect it won't be useful for any major kernel changes - e.g. it might handle a change from "2.6.28-r5" to "2.6.28-r6", but might not handle a change from "2.6.28-r6" to "2.6.29-r1", and also might not handle changes to most compile time options. My theory here is that if it did handle this properly they would have bragged about it, instead of only mentioning security patches.
- Brendan
Actually no. It just means you need to have functionality that can serialize and deserialize the structures in a manner both the old kernel and new kernel understand. "
In this case a programmer needs to write code to convert the old structures into the new structures (which isn't the same as converting the structure/s into some common format that both kernels understand). Of course in some cases this conversion would be impossible, because you can't create data out of nothing. [/q]
That 'common format' is what would enable the programmer to convert the old structure to the new structure. So yes, that is exactly what happens. Of course, some things need to be able to be initialized to when the common format doesn't support it - e.g. a major change, but that common format should have versions of its own so they can communicate.
Essentially - you need a way to communicate between the two kernels what the state of the different relevant parts are. Whether you tear down the whole kernel (like kexec does) or build from a base of nothing, or tear down enough to move from one running kernel to another without having to do full hardware initialization under the new kernel (like ksplice), you still need a way to communicate some of that state information - enough to initialize (especially) drivers and other parts to a known state they can then move forward from to bring the rest of the kernel back and continue on.
There is NO software limitation to needing to reboot. Any software limitation can be removed from being a limitation by proper design of the functionality.
Wrong. Their own paper (called "Ksplice: Automatic Rebootless Kernel Updates") says: "
Actually, that is only a limitation to their implementation. There really is no software limitation unless you design one in.
A safe time to update a function is when no thread’s instruction pointer falls within that function’s text in memory and when no thread’s kernel stack contains a return address within that function’s text in memory.
This is true of any implementation; but you can work around it.
ksplice seems to try to keep as much running as possible while making the transition. Thus their documented limitation.
On the other hand, you can reduce the system to a minimal state temporarily, serialize the minimal state of the drivers and other parts (e.g. memory allocations tables) - which each part would have to be responsible for serializing - unload drivers, etc. transfer to the new kernel, informing it where to find the serialized data, which can then load the drivers, etc, deserialize the data, and resume operation.
I'm not saying it's an easy task - but it is accomplishable and it removes any software limitations.
kexec, on the other hand, shuts down the kernel entirely and does nothing to serialize the individual kernel parts - it just shuts down the entire system and restarts the system on the new kernel instead of initiating a hardware reset.
NT has had this capability on all supported architectures since Win2K3 SP1 for both user-mode and kernel-mode components.
I'm not sure it's that useful because any installation which has high enough reliability requirements to use hotpatching probably organizes its services in a cluster which can be patched through more normal mechanisms without downtime.
The code that makes this work is kinda cool though, so I can see why it was developed
.
That is the one thing I hate about Windows - the stupid idea of locking files; who ever designed such a stupid principle needs to be fired from Microsoft because it lacks all degree of common sense. It not only effects the kernel but try uninstalling applications where the application fails to unload the shared libraries resulting in locked files that results in a whole heap of crap left over when uninstalling.
There must be a fundamental difference in how files are handled in Windows and *nix. On a *nix system, when a process reads or writes a file, and this file is deleted, renamed or replaced by another process, the first process will still see the old file until it closes its file handle.
I trust that loading libraries works the same way, although I don't know the details. I guess the trick is that files are considered the same if they are the same inode, regardless of where or when you found them in the file system tree.
This is true in NT as well. But in UNIX the locking of files is advisory by default and shared by default (i.e. if I open a file anyone else can also open/modify the file and if I say I want to lock it other guys have to check the lock bit in order to respect my wishes). As a design choice (and to increase compatibility with older Windows), NT locking is mandatory and the default (if you specify no flags) is to disallow sharing. The library loader also seems to disallow sharing for delete as well.
As cetp mentions, this may be a conscious design choice because structures shared across process boundaries by DLLs (this can happen via things like Window Messages) may not tolerate having two different versions of the DLL loaded and interacting with each other. There's nothing in the NT Kernel architecture that prevents you from replacing a DLL when the program is still running (you can even do this yourself and get around the file locking problem by just renaming/moving the thing you want to replace and no one will stop you).
Right on. Either a service is so important that it is already in a failover or cluster configuration and rebooting is not a problem or it's just not important enough and then rebooting isn't a problem either.
It's a very good news in the server world or with user who hate reboot like me
let's hope we'll get there on windows too, eventually. (vista had gone a long way, but some things are still lacking) Now the only reason I reboot my vista box is on patch day or when I plug my projector/2nd screen again (nvidia drivers don't recognize what's connected to the 2nd dvi slot without rebooting)
Seriously, there are 2 kinds of services: those that you can afford to bring down (in a planned service window), and those that you cannot.
For the former, just plan the reboot. A reboot should not be 'risky' - if it is, the machine is not properly maintained. In fact, it's a good test if all services come back up correctly, to be ready for a forced reboot (such as a hardware failure).
For the latter, you need failover anyway, so you can just reboot the servers in the cluster one at a time.
Because some things are really hard (or just expensive) to be able to complete failover for, like Linux with specific hardware for running telephony systems.
Even 0.02 cents per machine could be to expensive if their are a lot of machines. For example it would be really nice if the update of your cable-modem didn't disconnect you from internet for a couple of minutes.
For some things it's just easier on the administrative side, like the cluster running calculations where you don't want to move jobs around and rebooting cluster-nodes that haven't been rebooted and then moving the jobs back and rebooting other nodes and keeping track of which you have and haven't had.
Their could be many reasons. :-)




