To view parent comment, click here.
To read all comments associated with this story, please click here.
Out of curiosity, if the capture kernel were preloaded, wouldn't it be just as vulnerable to getting trashed as anything else in mem? Conversely, if it's not preloaded, how do you ensure that whatever block of code is responsible for loading the capture kernel isn't itself trashed? Does it require hypervisor support to be reliable?
You're correct.
Any code running on bare metal in kernel mode has the potential to barf all over memory until it barfs all over itself. Only hardware virtualization can contain the damage. But this is an unlikely scenario. The most common cause of data corruption is that the code that's supposed to be playing with the data does something wrong. The odds of the capture kernel getting trashed are slim because nothing in the production kernel is supposed to be playing with its data.
So, there is no design that guarantees that we will always be able to dump a crashed host kernel. But we can dramatically increase our chances by using a separate capture kernel, and nothing in a virtual machine can negatively impact the hypervisor's ability to dump it. The hypervisor could crash on its own, of course, but not because the virtual machine crashed.






Member since:
2005-07-08
If anyone is interested in the kexec-based hibernation proposal, you should also read the OLS paper on kdump:
https://ols2006.108.redhat.com/2007/Reprints/goyal-Reprint.pdf
This is shaping up to be the most reliable kernel crash dump facility out there, including the commercial UNIX implementations. These implementations simply hope that the crashed kernel is sane enough to rely on its memory management and I/O subsystems. Certain kinds of crashes will always result in dump failures.
The Linux kdump implementation uses a fresh kernel with a working userspace to dump the crashed kernel. It can use the crashed kernel's page tables to filter unwanted pages from the dump (userspace, empty, free, and cache pages), but it can produce a full-memory dump even if the crashed kernel got completely trashed by, let's say, an errant DMA. A working userspace means that dumping to NFS volumes, USB keys, DVD-RW, or a remote file over SSH is simple to implement.
The key to this is the relocatable kernel, which allows the kernel to be loaded (almost) anywhere in memory. The only question is: when and where do we load the capture kernel? As mentioned in the LKML posting about hibernation, it is possible to load the capture kernel as soon as we need it, making room presumably by writing the memory region's contents to persistent media.
This might work for hibernation, where we are reasonably sure that the kernel is healthy, but it won't be reliable enough for crash dumping. In fact, to support crash dumping during the boot process, we would prefer to pre-load the capture kernel in reserved memory before we load and boot the production kernel. This approach involves convincing administrators to reserve a small but significant amount of memory (currently 2-10MB) that cannot be used for production in exchange for reliable crash dumps.
Very few users will want both hibernation and crash dump support on the same system. Production servers don't typically hibernate, and mobile devices don't typically require first-failure data capture. So users can either choose to have reliable crash dumps or not. They can still hibernate without any reserved memory, and they can still get successful dumps most of the time if they really want.
There is another option that will come into play as virtualization becomes more prevalent. The hypervisor can be used as the capture kernel if any its guest kernels crash. Virtualization can also be used for hibernation. The hypervisor can dump its guests along with their states and then simply shut itself down. On reboot, the hypervisor can resume its guests from where they left off.
The inverse situation, where a guest dumps the hypervisor if it crashes, is actually pretty similar in concept to the current kexec-based kdump design. It stands to question whether kexec could eventually become a degenerate case of the kvm code. That would be a big win for maintainability and quality. Linus would approve.