Linked by Hadrien Grasland on Fri 28th Jan 2011 20:37 UTC
OSNews, Generic OSes It's recently been a year since I started working on my pet OS project, and I often end up looking backwards at what I have done, wondering what made things difficult in the beginning. One of my conclusions is that while there's a lot of documentation on OS development from a technical point of view, more should be written about the project management aspect of it. Namely, how to go from a blurry "I want to code an OS" vision to either a precise vision of what you want to achieve, or the decision to stop following this path before you hit a wall. This article series aims at putting those interested in hobby OS development on the right track, while keeping this aspect of things in mind.
Thread beginning with comment 460214
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[12]: Machine language or C
by Neolander on Sun 30th Jan 2011 10:15 UTC in reply to "RE[11]: Machine language or C"
Neolander
Member since:
2010-03-08

That's not what I said, but you must have encountered one of the earlier, less clear edits of my post.

I do not advocate the absolute lack of software optimization which we have nowadays. Though I think feature bloat and poor design are more to blame there, as an aside.

What I advocate is only not optimizing too much too early.

There are several reasons for this.

First, it's a great way to lose time you could have spent on optimizing something more important. Tanenbaum tells us an interesting story in Modern Operating Systems, for that matter : one of his students, who worked on MINIX, spent 6 months optimizing the "mkfs" program, which writes a filesystem on a freshly formatted disk, and more months debugging the optimized version in order to make it work.

This program is generally called exactly once in the life of the operating system, so was it really worth the effort ? Shouldn't he have cut his optimizer's teeth on something more important, like say boot times ?

Second reason why early optimization is bad is that, as I mentioned earlier, there's a degree of optimization past which code becomes dirtier and harder to debug. Caching is a good example. Clean code is very, very important, because past a certain degree of dirtiness one can do nothing with code. Not even optimizing. So this is a though decision, one that is not trivial and which should only be made after profiling has shown that the code is actually too slow as is.

Reply Parent Score: 1

RE[13]: Machine language or C
by Alfman on Sun 30th Jan 2011 22:22 in reply to "RE[12]: Machine language or C"
Alfman Member since:
2011-01-28

"Second reason why early optimization is bad is that, as I mentioned earlier, there's a degree of optimization past which code becomes dirtier and harder to debug."

Re-writing code in assembly (for example) is usually a bad idea even after everything is working, surely it's even worse to do before. But then this isn't the sort of optimization I'm referring to at all.

Blanket statements like "premature optimization is the root of all evil" put people in the mindset that it's ok to defer consideration of efficiency in the initial design. The important factors proposed are ease of use, manageability, etc. Optimization and efficiency should only be tackled on at the end.

However, some designs are inherently more optimal than others, and switching designs mid stream in order to address efficiency issues can involve a great deal more difficultly than had the issues been addressed up front.

For a realistic example, see how many unix client/server apps start by forking each client. This design, while easy to implement up front, tends to perform rather poorly. So now we have to add incremental optimizations such as preforking and adding IPC, then we have to support multiple clients per process, etc.

After all this work, the simple app + optimizations end up being more convoluted than an more "complicated" solution would have been in the first place.

The Apache project is a great example of where this has happened.


The linux kernel has also made some choices up front which has made optimization extremely difficult. One such choice has been the dependence on kernel threads in the filesystem IO layer. The cement has long dried on this one. Every single file IO request requires a kernel thread to block for the duration of IO. Not only has this design been responsible numerous lock ups for network file systems due to it being very difficult to cancel threads safely, but it has impeded the development of efficient asynchronous IO in user space.

Had I been involved in the development of the Linux IO subsystem in the beginning, the kernel would have used async IO internally from the get go. We cannot get there from here today without rewriting all the filesystem drivers.

The point being, sometimes it is better to go with a slightly more complicated model up front inorder to head off complicated optimizations at the end.

Edited 2011-01-30 22:36 UTC

Reply Parent Score: 2

RE[14]: Machine language or C
by Neolander on Mon 31st Jan 2011 07:35 in reply to "RE[13]: Machine language or C"
Neolander Member since:
2010-03-08

Blanket statements like "premature optimization is the root of all evil" put people in the mindset that it's ok to defer consideration of efficiency in the initial design. The important factors proposed are ease of use, manageability, etc. Optimization and efficiency should only be tackled on at the end.

However, some designs are inherently more optimal than others, and switching designs mid stream in order to address efficiency issues can involve a great deal more difficultly than had the issues been addressed up front.

For a realistic example, see how many unix client/server apps start by forking each client. This design, while easy to implement up front, tends to perform rather poorly. So now we have to add incremental optimizations such as preforking and adding IPC, then we have to support multiple clients per process, etc.

Actually, I think this whole fork() thing started as a memory usage optimization. On systems with a few kilobytes of memory like the ones which UNIX was designed to support, being able to have two processes using basically the same binary image was a very valuable asset, no matter the cost in other areas.

Nowadays, however, even cellphones have enough RAM for the fork() system to be a waste of CPU time and coding efforts (hence the decision of the Symbian team not to include it a while ago). But due to legacy reasons and its mathematical elegance, it still remains.

After all this work, the simple app + optimizations end up being more convoluted than an more "complicated" solution would have been in the first place.

The Apache project is a great example of where this has happened.

All hail the tragedy of legacy code which sees its original design decisions becoming irrelevant as time passes (the way I see it).

The linux kernel has also made some choices up front which has made optimization extremely difficult. One such choice has been the dependence on kernel threads in the filesystem IO layer. The cement has long dried on this one. Every single file IO request requires a kernel thread to block for the duration of IO. Not only has this design been responsible numerous lock ups for network file systems due to it being very difficult to cancel threads safely, but it has impeded the development of efficient asynchronous IO in user space.

I think that should we look deeper, we'd find legacy reasons too. After all, as Linux was designed as a clone of ye olde UNIX, maybe it had to behave in the same way on the inside for optimal application compatibility too ? Asynchronous IO is, like microkernels, a relatively recent trend, which has only be made possible due to computers becoming sufficiently powerful to largely afford the extra IPC cost.

Edited 2011-01-31 07:42 UTC

Reply Parent Score: 1