To view parent comment, click here.
To read all comments associated with this story, please click here.
This, once again, confuses two separate implementation issues as if they were one. I routinely restart crashed device drivers on monolithic systems, most notably USB devices on Linux.
It's not an implementation issue, its a design issue. If the driver and the kernel are co-located, and the driver crashes, you cannot restart it. It might be possible, but you have no guarantees about whether the kernel was compromised during the crash.
You are assuming that when a disk driver fails it'll do so in an obvious way with almost no side effects. This is called "fail fast, fail silent" in the literature. But it's not typically the way drivers fail. Disk drivers fail because the hardware did something unexpected by the designer, as often as not, and I, for one, want to know what that was. (Real disk devices fail in intermitten ways, and the damage can be long done before the driver falls over.)
That's fine, the device can log the fault or whatever, and ask the user whether they want to remount the device. The recovery mechanism you want isn't really pertinent here. What is pertinent is whether recovery is possible. In a monolithic design, recovery is not possible, not in any trustworthy or robust fashion.
It's not an implementation issue, its a design issue. If the driver and the kernel are co-located, and the driver crashes, you cannot restart it. It might be possible, but you have no guarantees about whether the kernel was compromised during the crash.
It's both a design and an implementation issue, and for some kinds of crashes you can make such guarentees, which is why monolithic kernels do have restartable drivers.
The recovery mechanism you want isn't really pertinent here. What is pertinent is whether recovery is possible. In a monolithic design, recovery is not possible, not in any trustworthy or robust fashion.
A microkernel does not increase the trustworthiness nor robustness of driver recovery. All it does is change the way in which faults fail to be contained.
A driver with an error in bad block handling may well pass a lot of bad data on to the file system layer before it falls over, for instance, and whether the driver is message passing and in a separate address space or not has zero impact on the system's ability to contain that kind of error. (Such an error took ebay down for three days a couple of years ago, getting them front page coverage in the local press...)
Address space separation contains exactly one kind of fault, which while being a particularly difficult fault to debug isn't even in the top ten list of ways in which broken drivers cause propagation of corrupted data.
And yes, there are ways to reduce your exposure to that particular kind of fault without going to separate address spaces. See, for instance, sparse data address space models or self-repairing data structures.






Member since:
2006-02-15
"Hell, even on a desktop: I'd rather the disk driver crash and silently restart itself than wait around for a full reboot."
Exactly, and this lies at the base of why I like the muK design; contrary to many, I think the muK is suited much better for desktops than a monolithic design.
This, once again, confuses two separate implementation issues as if they were one. I routinely restart crashed device drivers on monolithic systems, most notably USB devices on Linux.
You may want a disk driver to crash and restart silenetly, but I do not, because I understand both the fault model of disk devices and the fault containment issues with disk drivers.
You are assuming that when a disk driver fails it'll do so in an obvious way with almost no side effects. This is called "fail fast, fail silent" in the literature. But it's not typically the way drivers fail. Disk drivers fail because the hardware did something unexpected by the designer, as often as not, and I, for one, want to know what that was. (Real disk devices fail in intermitten ways, and the damage can be long done before the driver falls over.)