Andrew Tanenbaum has introduced his latest metric: LFs – Lifetime Failures to describe the number of times software, particularly the operating system, has crashed in a user’s lifetime.
Andrew Tanenbaum has introduced his latest metric: LFs – Lifetime Failures to describe the number of times software, particularly the operating system, has crashed in a user’s lifetime.
FTA: Tanenbaum was critical of software design today, saying there were far too many features, many of which were unnecessary in applications. He said as software gets more bloated, it becomes less reliable, more buggy, and slow.
Continually refactoring code and eliminating “extra stuff” either by abstracting it or re-evaluating its purpose and need is a great way to get there.
Unfortunately, this also should go along with abandoning backward-compatibility. Backward-compatibility is the cause for a great amount of hacks and bad code and Linux is definitely falling into this trap.
linux must also use the unix/x windows paradigm which is very old. Now that macosx is being officially unix certified, it must be stuck to the past
“Maybe the direction Linux could go would be [as] the system that is ultra reliable, that works all the time and has not got all the problems that you get in Windows,” he suggested.
I think Linux is definitely much more reliable than Windows already, but the real problem I see is in the differences between Linux as a server and Linux as a desktop. I know of some Linux servers that have been up and running for years with only minimal scheduled downtime. My Ubuntu desktop on the other hand has to be rebooted every 7 to 21 days because it starts slowing down too much. I suppose thats still better than my coworkers who have to reboot their XP machines every couple of days.
My Ubuntu desktop on the other hand has to be rebooted every 7 to 21 days because it starts slowing down too much.
Then you’re doing something wrong. I’ve had a heavily used Gentoo desktop run for up to 72 days without a problem, kept up to date and during a still ongoing ~x86 to x86 transition.
Outside of changing kernels or powerloss, there’s nothing that forces a reboot.
The problem that I had was that dbus stopped responding and I couldn’t get it restarted. This happened after the system was updated, bad dependency I’m assuming. The system has since updated several more times and seems to be running beautifully. But as well as being my primary desktop, it also serves as our SFTP server and syslog server for our firewalls. Currently uptime is reporting
14:46:44 up 9 days, 2:56
Lets see if it can beat its previous record of 23 days before running like complete crap.
I’ve had to restart x windows many times in kubuntu. the gui isn’t such a strong link but all the attention goes to the kernel. For a server that’s ok but if you are doing a multimedia task that can’t be interrupted, that’s a big deal.
A small bug free kernel is great but we really have to look at the whole system.
You’ve got a strange experience.
My system, although not Ubuntu, but Arch, is “up” for half of a year (I use software suspend instead of shutdown) and feels as snappy as ever.
Sometimes Firefox eats all the memory, or other strange things, but ok, I shut it down and restart it. The system instantly gets all the RAM and ressources back and everything is fine again…
I’ve always asked myself, how a modern OS environment could “slow down” silently. I granted the Windows platform for doing the “impossible”, although infamously.
p.s.: It is possible to store ressources in the X server and not free them. So it could be that Xorg gets bigger and bigger.. even then so, these unused ressources should be swapped out anyway. I barely had my swap go anywhere near “full” yet..
Software is a complex thing. Any commercial use software will face a feature-to-feature compare with its competitors. Marketing guys loves new features. Feature matters, even though most of the features are useless for most of the users. And some features useless for you maybe extremely useful for others. This is a major reason that new features keep on being added to exist products without stop. Therefore, software gets more bloated, it becomes less reliable, more buggy, and slow.
Edited 2007-01-24 19:31
The basic flaw i see in the analogy is because TV is single purpose device where you don’t install new things etc etc.
Don’t you think that we can make a better computer if i know that my computer will launch a browser and that is the only purpose of this computer. Just browse internet on it.
Because computers are multipurpose and user can install 100s of different software on 100s of different hardware, this result in interactions between various components that is sometime unknown during software development and testing and that is why you see unknown or new bugs.
However i see where Tanenbaum is going and i agree with him that his ideas if are implemented properly can provide a more reliable system…ofcourse on the cost of performance.
Edited 2007-01-24 19:32
The basic flaw i see in the analogy is because TV is single purpose device where you don’t install new things
Step back from the TV and examine the entire home entertainment system. You typically don’t have to worry about a casette deck reducing the reliability of your CD player or your DVD player or your television or whatever else you choose to add to it. The reason is simple: the entire system is built up from modular components that interact via well defined interfaces, which is what our protagonist suggest should be done with OSes.
Edited 2007-01-24 20:06
You do have to worry about your entire home entertainment system if you’re using a HDMI. This is where the future is going. Digital connections of different revisions connecting your components will cause handshaking issues. I have this problem right now. My DVDs sometimes go green or pink because the TV thinks its getting RGB instead of YCbCr.
Step back from the computer and see what a mess that is too. I joke here at work when something doesn’t work right. I say “See the problem here is this…you have a dell computer with a dell keyboard and mouse plugged into a dell monitor via a dell docking station….no wonder you can’t print to your dell printer.”
But these modular components do not have to accommodate new features every now and then.
I don’t think that things like RAID or ECC memory can be compared to the concept of self-healing or self-correcting software. RAID and ECC are basically cludgy approaches that rely on redundancy to correct for failures, they don’t address the failure directly.
Not that there’s anything wrong with that, of course. RAID and ECC work for what they’re supposed to do, and applying redundancy to critical components is likely far more cost-effective than investing in extra engineering resources to try and prevent the components from failing in the first place.
We actually have a fair amount of this type of redundancy built into software. If a driver or module fails, the kernel will often try and reload it transparently to the user; hardware components can fail or be removed from the system without necessarily causing system crashes; system mechanisms exist to monitor processes and prevent runaway applications or memory leaks from destabilizing systems, or to prevent errant processes from interfering with others.
Things like this don’t address the issue of bad code in the first place, but they accomodate it to an extent without sacrificing the stability of the entire system. It seems not different than having hardware redundancy to account for the possibility of component failure.
Optimized, error free code is a whole seperate issue. And as with most products, the design and engineering comes down to a basic cost-benefit analysis; Do the time and resources involved for creating flawless code yield a benefit that outweighs cheaper, rapid development with greater chances of flaws or errors?
Of course, as end users, we want bulletproof applications and we complain endlessly about sloppy, rushed-to-market products, so the answer seems obvious. But at the same time, we are also quick to complain with delayed product releases, prolonged development cycles and that one missing feature that seems to be different for everyone. We certainly don’t want to have to start paying more for software (or at all, in some cases).
Am I proposing that we forgive and accept sloppy code? Not at all. But I think we have to start accepting some of the inherent tradeoffs in present day IT computing paradigms.
The TV analogy, about not needing a reset button, is way off. TV’s are closed environments performing simply, non-wavering functions using engineered components. Perhaps more tellingly, many higher-end modern flat panel displays are actually flash-upgradable to accomodate firmware errors or flaws in the orginal design, so even TV’s are becoming less bulletproof as their capability progresses.
If software developers could target a single, mission-specific application running on a particular platform with rigid hardware specifications, we’d probably find those applications much more stable and reliable but we would lose the flexibility of having a myriad of applications interacting on general purposes platforms running on an endless variety of hardware configs. There are so many factors and intersection points in present hardware/software design that, frakly, it’s remarkable systems are as stable as they are today.
If people want secure, stable computing systems, we’ll start to see closed appliances with fixed applications for general purpose things like email and web browsing; these will have limited functionality for the benefit of increased stability and reliability. But if you want flexibility and choice, that comes with tradeoffs. Can’t have your cake and eat it too.
Not that I’m advocating crappy software, I’m as frustrated as anyone else when an application crashes or my system misbehaves, I just try to keep things in perspective. I want developers of the applications I use writing smart, intelligent code and addressing errors and points of failures, but I’m willing to accept that not every contingency can be accounted for. There’s a tradeoff I’m willing to between flexibility and usability.
Just my 2c…
Uptime has nothing to do with grandma-proofness. Resetting a device is relatively a minor drawback compared to simply not having a clue what the thing is asking, what so many elements on the screen are for and having too many options at hand.
It depends what distro, version, etc. you’re talking.
I don’t know if grandma wants Compiz or not. But sure, CentOS would have less “Lifetime Failures” than Fedora, Debian Stable less than Ubuntu Edgy, and so on.
But the trade-off principle (get bleeding edge, lose stability and leanness) is maybe not the way to go. Maybe a (new release of an) OS should concentrate on clean code more than is the case now.
If you would have time traveled and said to someone twenty years ago that PCs in the future would have 2.5 Ghz dual core CPUs, and 1GB of RAM (amazing numbers), this person would have said, “cool, it will boot in a split nanosecond and every app will start up in even less!”
“No, sorry, boot time may be a minute, and many apps still need quite a few seconds to pop up.”
“OK.. but no bugs, no crashes then?”
“Wrong again.”
“So.. what on earth went wrong?”
“The programmers got new machines too..”
Its pretty obvious the auther of the article didn’t have a very strong grasp of the relationship between Minix and Linux.
Saying that Linux is “based on” Minix is perhaps a bit to strong, if not incorrect. “Inspired by” might be a better wording. While the original Linux implimented many of the same features and techniques as Minix (Torvalds was a student of Tanenbaum at the time of its creation,) the overall design philosophy is different. Linux is a highly monolithic kernel, while Minux is a modular microkernel design.
Neither do I think that Tanenbaum’s suggestion Holds much more weight with Torvalds than anyone else’s. Surely Torvalds would agree with the ends, but its been pretty clear over the years that he disagree’s with Tanenbaum on the means.
(Torvalds was a student of Tanenbaum at the time of its creation,)
Linus was a student, but not a student of Tannebaum.
//Saying that Linux is “based on” Minix is perhaps a bit to strong, if not incorrect.//
Definitely too strong, and definitley incorrect. Replace “based on Minix” with “written using a Minix system”, and you will get it.
// “Inspired by” might be a better wording.//
It might be, but it isn’t. {Correcting myself, maybe it is: from Wikipedia: “Operating Systems: Design and Implementation and Minix [1] were Linus Torvalds’ inspiration for the Linux kernel.” Also: “Linus was inspired by Minix (an operating system developed by Andrew S. Tanenbaum) to develop a capable Unix-like operating system that could be run on a PC.”}
“Inspired by” it is, then.
// While the original Linux implimented many of the same features and techniques as Minix//
Not really.
// (Torvalds was a student of Tanenbaum at the time of its creation,) //
Are you sure about that? I don’t think this is correct. Torvalds and Tanenbaum come from two entirely different European nations.
http://en.wikipedia.org/wiki/Linus_Torvalds
“In 1990 he purchased an Intel 80386-based IBM PC and spent a few weeks playing the game Prince of Persia before receiving his Minix copy which in turn enabled him to begin his work on Linux.”
Torvalds is from Finland.
http://en.wikipedia.org/wiki/Andrew_S._Tanenbaum
Tanenbaum is from the Netherlands. Dutch.
The relationship between these individuals seems to be limited to a lively internet debate:
http://en.wikipedia.org/wiki/Tanenbaum-Torvalds_debate
//the overall design philosophy is different. Linux is a highly monolithic kernel, while Minux is a modular microkernel design. //
This bit you also got right. The only thing I would change in this text is to say the overall design philosophy is fundamentally different.
Edited 2007-01-25 01:15
//
http://en.wikipedia.org/wiki/Andrew_S._Tanenbaum
Tanenbaum is from the Netherlands. Dutch. //
Correcting myself once again: Tanenbaum is actually from America.
“Tanenbaum was born in New York City and grew up in suburban White Plains, New York. ”
He only works in the Netherlands:
“Dr. Andrew Stuart “Andy” Tanenbaum (sometimes called ast)[1] (born 1944) is a professor of Computer Science at the Vrije Universiteit, Amsterdam in the Netherlands.”
hal2k1 – you should probably print the whole of wikipedia out, just to make sure that when your on the can you can still memorise it word for word to impress and correct us all with particulars when the need arises.
//// While the original Linux implimented many of the same features and techniques as Minix//
Not really.//
My understanding was that the original linux was essentially minix compatible, supporting the minix filesystem and binary format – unless thats mistaken I guess the accuracy of my statement depends on your definition of “many” and how much difference you allow for two similar techniques or features to be considered equal.
//// (Torvalds was a student of Tanenbaum at the time of its creation,) //
Are you sure about that? I don’t think this is correct. Torvalds and Tanenbaum come from two entirely different European nations.//
Fair enough, I guess I’m mistaken. I recall reading an article that Linus was a student at the time of Tanenbaum’s work on Minix, but perhaps it was the wording of the article or the way I read it that led me to believe that Linus was a student of Tanenbaum’s.
btw our digital cable tv does crash from time to time and requires a reboot. In fact that just happened this week while we were watching Pirates of the carib2 from on demand.
I also think it’s easier to turn on a pc than a tv which has a remote for tv, cable, speakers and dvd/vhs player
Regarding the TV analogy. The fundamentals of a PC should be rock solid, i think this day in age that means from the desktop GUI down to the kernel. A compact kernel that has the power to detect a failure and restart a module would rock. Things like bulletproof X and networking/internet connectivity would be very important for many.
Convieniance should be built atop the desktop, i can tolerate some bugginess from this point on, but then again, maybe grandma cannot.
..it fails by failing to fail safe.
This is theorem 27 of Systemantics, an entertaining book.
http://en.wikipedia.org/wiki/Systemantics
Introduce ‘lemon laws’ requiring commercial software to be of high quality. Sure it will slow development but has software really improved much in 20 years? Systems are more bloated and slower than ever despite hardware having at least 1000x the RAM, disk space and processor speed.
I suggest:
A very small ultra optimised ROM based OS using a low level language – a maximum size of 16MB.
Full hardware accelerated graphics and sound using a limited range of hardware codecs. There can’t be more than a dozen *decent* media codecs.
Image formats – jpeg, png, tiff and gif.
A small ram based 2-4GB HD. A back up “hard drive” facility using cheap digital video tape.
Applications should be ultra small and light. An office suite should be no larger than 1MB and require 1-2MB of RAM.
The WWW ran much faster without junk like flash/shockwave, popups and javascripts.
Personally I think the last great computer experience was WP5.1 on a 486 – staggering performance and efficiency once you knew the function keys.
The problem for software’s unreliability lies in the programming languages used to built the software. Programming languages that allow for partial functions is where the problem is at (for a definition of partial function use Wikipedia). Not all the assumptions a programmer makes can be expressed in software, resulting in programs with many paths of unspecified/erroneous behavior.