Climbing the Kernel Mountain

Guest post by Emmanuel Marty 2002-08-13 General Development 16 Comments

So, you want to write an operating system. We discussed earlier a generic set of considerations that are important, from my experience, for this type of adventure. We proceed to look at solutions to the problem of actually getting started with writing your system: how to do it when you know you don’t know what you’re doing, making it work before making it work fast, and what to do when things go wrong.

Introduction

You can spend no more than an hour researching online, and you will find literally hundreds of open-source operating system projects. Deep down, the developers of these projects hope to become the next Linus. Their dream is to complete a working system, and be swamped in e-mail from all over the world with stories of how reliable and friendly their nanokernel-based system is.

All of us are happy if others find the creation of our own design useful. To paraphrase Frederick P. Brooks Jr., author of

“The Mythical Man-Month”, desire for praise starts early, with children making pencil holders “for daddy’s office”.

Going back to the online projects, most are unfortunately stuck in the dream stage or boot loader stage. Grandiose plans are expressed with a boot sector and sometimes a “hello world” in C, last modified two years before.

There are many reasons for operating system projects to end that early in the cycle. In a previous article, I shared my experience of developing a closed source project to completion and listed generic points. They do not guarantee success, but in my experience they are important to help focus on the end goal. Weed the distractions out. Get the job done.

You have decided on your camp. You know your audience. You have decided to get real about what you can do. You
have agreed to be the benevolent dictator of your project. Well, you still have to write your operating system, now don’t you?

Cover your bases

Systems programming is the hardest form of programming. To paraphrase the good Mr. Brooks Jr. again, compilers are three times more complex than regular applications, and operating systems are three times more complex than compilers. That was his rule of thumb in 1975, after managing the IBM OS/360 project to completion. If we consider how much is expected from operating systems today, we can chance that the chasm is even wider. Ironically, phenomenal progress has been made in easing application development, while you still need to write kernel code with your bare hands.

Let it sink in. You want to climb Mount Everest. It is fun and rewarding to climb Mount Everest, but you don’t want to do that with a small plastic hammer, wearing shorts and a t-shirt, at minus fifty degrees.

As your first operating system project is by definition the first one, you want to give yourself some credit and just dive in, after you have all the basics covered. If operating system programming is at least nine times harder than regular applications programming, logic would dictate that you better be a good applications programmer. You better know how to manipulate all basic data structures. Lists, arrays, hash tables, arrays of function vectors, bit arrays will all become useful as you develop your kernel. Any error in manipulating them will make you a new member of the triple-fault club. Worse, subtle non-systematic bugs will annoy you for days and take the fun out.

Good software engineering dictates modularity. This is not just for teaching in computer science classes. If you’re a good programmer, you’re lazy. You don’t want to spend hours isolating a bug to finally locate it in your umpteenth list insertion inline code. Make one set of dependable primitives, and use them. The net effect in terms of bugs and frustration saved is tremendous.

A fellow systems programmer, who responded to a previous piece, mentioned he saw the same attitude while a mentor for aspiring game writers. People who had never written a working Tetris game contacted him and asked “How do I write Quake?”. Get real and make sure you stand on solid ground before progressing higher on the mountain.

Any good practice of software engineering applies to operating system development, and is at least nine times as
important. I suggest you draft at least some sort of specifications and design document, keep things modular, write test suites, use version control, a bug database, and all sorts of good practice you may have encountered while writing other kinds of software.

As a systems programmer, you know you don’t know what you are doing. So do it carefully.

Sharpen your tools

It is crucial you prepare a reliable set of tools before you get started. You will find this kind of advice in any book on software development so let me be more specific. You will encounter really weird problems, like hard reboots, or registers being thrashed in your thread for no logical reason. Your scheduler may pick the wrong task, or execute code in the wrong location. Even with the best programming abilities, miscommunication between programmers, or your brain spacing out for two minutes will do that, and when it happens, you need to be certain your code did it.

I tried to be smart once, and started using
gcc 3.0
for development, instead of the 2.95 series. I was happy to find the new compiler reported more useful warnings and was a lot cleverer about subtle potential problems in the code, so I started adopting it for daily work. All was good, until I tried using it on strongarm. The scheduler would insist on running the idle task, which strongly reduced the usefulness of our system. The bug occurred in portable code, but the strongarm code base was new at the time, and there is no end to the side effects it could introduce. In my mind, a compiler couldn’t generate subtly wrong code, so I never questioned it. Out of things to try, I reverted to gcc 2.95. Sure enough, it made the code work, and disassembling shown that gcc 3 was optimizing so much, that it removed a variable related to the priority
level, which forced it to be the idle one. Now gcc 3 made it fast! And useless.

Continuing in the open source world, the same goes for
binutils. A lot of times, new versions have really interesting
bugs. Symbols that are offset by 4 bytes in the internal representation of COFF, compensated by a hack that says
/* I don’t know why I need to do this */ somewhere else in the code. If you hack binutils and bfd to produce your own format, be aware that this is major quicksand. Use ELF and a reliable, official release of binutils unless you have a truly relevant reason for doing otherwise.

The moral of the story is to select reliable, proven tools. Save yourself frustration and time by being unable to accuse the toolset when something goes wrong.

Use a high-level language

This is not as controversial as it sounds. If you insist on using assembler even for kernel development, I assume you don’t need advice and you are making good progress. Some projects can pull it off. I have written real-time kernels and even applications fully in assembler before. I was probably more stubborn than you can be about the virtues of assembler, but I would never go back to it now.

I won’t even discuss the portability advantages of a high-level language. If you think C is not much more than a
high-level assembler, you need to use it more. This said, you might not even have portability as one of your goals and don’t necessarily care.

Using C, C++ or a high level language of your choice shields you from a whole class of problems: using the wrong register, swapping operands by mistake, using the wrong opcode, or miscalculating how many bytes you need for something. GNU tools don’t even make it remotely easy. Going to the dentist is a nicer experience than writing assembler with gas or inline with gcc. You can always use nasm on x86, but it doesn’t cover any other cpu. If you want to be portable,
you are stuck with gas syntax.

If you’re comfortable with keeping track of what is in every register and certain you will not pass an integer to a function expecting a pointer to your threads hash table, fine. If not, high-level languages provide you with very powerful tools to diagnose a problem when your brain spaced out. Types and prototypes will generate warnings or errors when you mismatch things by mistake.

No early optimizing

This can be seen as controversial advice again, but if you decide to use a high-level language, you should try to
write as much as possible in that language, even if it is tempting to have an optimized version of some part of memory management or semaphores in assembler. It is highly probable your internal design will have changed a lot by the time you can run applications. By definition, if you optimize a critical piece, it is critical. It will be called a lot. If you introduce a bug in it, you will wreck all kinds of things.

At first you want to make it work. Then you will have a good view of how to make it work fast.

Don’t try to code smart

In the same way, any attempt at smart coding should be avoided like the plague or going to the doctor. Don’t save
sixteen bytes by writing cryptic, bug-prone code. You will thank yourself a year later. Or you won’t, a few days later, when you finally locate this really strange bug.

Use an instrumented environment

If possible, and if you’re going for a bootable, standalone environment, the icing on the cake is to use an
instrumented target machine.

Without one, you will have absolutely no help if your system doesn’t boot as expected. Being able to execute
your code step-by-step, set breakpoints, and watch memory or registers as your code executes, is invaluable. Later, when your operating system boots, the instrumentation can be used to monitor memory access patterns and execution time of your crucial code.

A lot of embedded boards come with JTAG ports, In-Circuit Emulators, and software to do exactly that. If you
develop for fun at home, you probably do not want to afford them, but a reasonable starting point is then an emulator such as Bochs.
It obviously executes much slower than a real machine, but this is not a problem when you start writing the system. You can step your code, in a virtual machine, without the need for another PC. A hack I contributed to Bochs lets you dump traces to an I/O port and read them on your console.

If nothing else, pepper your code with optionally compiled traces, coupled with a reliable method for displaying them. If your code does not work as expected, you then have the option to enable the traces and see what got printed last.

Small is beautiful

Once you have bootstrapped your effort, you probably want to have the smallest interface to your kernel as possible. Logic dictates that the less entry points you have from the outside world, the less you have to code and
document. It is important to keep a small and consistent design in the external view you will give of your kernel, as much if not more than the internal view.

Don’t get scared or distracted

Paraphrasing our friend Mr. Brooks Jr. a third time, the process of creation goes in three steps. First the idea, when you have an ideal representation in your mind of your project, and it sounds all perfect. Then comes the implementation, where you face the limitations of the physical medium, and try to get it to work. This is where the flaws in your idea appear, and have to be fixed before you have something that works. Lastly is the interaction, where you get feedback from users.

The idea is always the fun stage. When you start implementing it, you get frustrated by slow progress and limitations of the hardware interfaces that don’t exactly map to your concepts. The temptation to jump to another project in “idea” stage is very high. I’ve been there.

You have to get confident that your project is great and that it will work out. A new project being more fun is just an illusion. Once you move to implementation stage, it will be just the same as your OS. Stick with it. Don’t get distracted by something else unless you are certain you want to give up on the OS project because you don’t think it is useful anymore, and not because working on something else is more interesting.

Don’t get scared by other projects. You see somebody coming up with a similar idea. So what. They, too, have
to implement it. Only match your work against other existing work, not some idea-stage project. Stay confident.

Don’t get sidetracked. Out of the hundreds of projects online, there is a preposterous amount that is also working
on a GUI. A text editor. A SETI client. Whatever it is. Stick to the kernel. It is a lot more realistic that you can come up with a better, or at least, working, kernel, than it is to come up with a kernel, a GUI, a TCP/IP stack and accounting software, starting all from scratch. Unless you have an overriding reason, such as rewriting a whole system in Forth, or using an entirely new way of designing software that allows for no recycling, refrain from it.

The most compelling reason not to develop a GUI on top of your kernel is to show how useful or useless your
kernel is. If you write all the software that runs on your system, you set all rules of the game, and never get a chance to see if your code actually fulfills the requirements. Porting and running existing applications or GUIs is a great test of your kernel interface. You will have to match it with real-world requirements. If you can’t, you’re in trouble.

Form is liberating

Free software projects have a lot less constraints than commercial software. The code ships “when it is ready”,
not before. This gives time to perfect the code until it is well structured and reliable.

Constraints are not entirely bad. Limitations in hardware, budget, memory, time, and other software APIs are
great to resolve endless discussions and spot inconsistent design. Constraints force you to take decisions and cut an endless debate. Not having any direction encourages sloth, argumentation and not taking decisions at all.

This is why I would advocate that if you have no particular plans other than writing your own operating system, you should clone an existing design and not try to improve it until you have a version that works. Linux is obviously a good example of that.

Another real-world example is
AROS,
the Amiga Research Operating System. The project
history teaches us that they originally tried to design a better AmigaOS than AmigaOS, adding virtual memory and all sorts of features people had been longing for in the original environment. After endless debates about how to implement them, somebody took over and decided to clone the original, call by call, and then see about the improvements. Suddenly the project started and has been making great progress since then.

Bottom-up and top-down approaches

From experience, coming up with a working kernel, excluding any considerations of new architecture or
design, can be broken up into a few stages:

Boot an image written in a high-level language

Boot it with paging, if desired

Write basic drivers for debug and communication with host.

Add memory management. Write test-suite. Debug.

Write basic drivers for timers. Write test-suite. Debug.

Add multitasking. Write test-suite. Debug.

Add and extend drivers and a file system. Write test-suite. Debug.

Run a simple user program. Debug all side effects that this brings up.

Run a simple command interpreter.

Run simple applications from the interpreter. Debug side effects.

Port and run existing applications of increasing complexity. Major debugging.

Port the whole thing to other architectures. Repeat steps 1-3 and 5 and more major debugging.

If your system is carefully designed, steps 4 to 11 can partly or entirely be completed on top of another operating
system. To some extent, in reverse order. This is the top-down approach. I would highly recommend it if you don’t feel comfortable with climbing Mount Everest in shorts with a single bottle of water.

The bottom-up approach is to start with the boot loader and complete steps in the order described above. This
obviously has been done before, but again, at the risk of hammering you over the head with it, is a lot more complex.

The latter approach can be made easier. First, writing your system for x86, unless you absolutely want to write
your own umpteenth incompatible boot loader and protected mode jump code, a standard, dependable boot loader such as Grub will make it a lot less painful to complete a very important milestone: booting code in a high-level language on the target, in a packaged and controlled bootable image.

You can also use
OSKit
to avoid having to write most of your kernel before it can say “hello world” from a user space application. Unlike completing the steps on top of another operating system, you can gradually replace code by your own.

A fellow programmer suggested another smart approach. Use a simple, interactive and mono-tasking OS, which
will give you access to the machine while providing an interactive environment with a file system and whatnot. Namely DOS: you can start the kernel on top of a running system, take over interrupts and memory management, use the
underlying system for all sorts of things, and even exit to it if needed.

Conclusion

This covers mentally bootstrapping your kernel coding process in more detail. This does not even begin to discuss
how to steer your project and never lose sight of your end goal, once you got it started. This will be the subject
of another article.

About the author

Emmanuel Marty is the Chief Technical Officer of NexWave Solutions, the supplier of a new software architecture, OS services, and telecom stacks, for consumer electronics and telecom. He has been working with computers since the age of 10. Currently aged 26, he lives in Montpellier, France, with his fiancee. He can be reached at [email protected].

16 Comments

2002-08-14 12:27 am
Anonymous
I find it funny how he links v2os with “pulling off being 100% assembly”, and yet labels Uuu with not being able to stick to one language in his previous OS article. V2_OS is the one that died because of a kernel rewrite (still in assembly). Uuu wasnt canceled because of reasons like that, we simply didnt see the point of continuing it any further than it was.
2002-08-14 3:34 am
Anonymous
Nice tips for anybody desiring to develop an operating system. And like Emmanuel said, unless it’s for some kind of weird religious reason or just for passion, use HLL. That way if you ever replace that x86 computer you’ve got right now by one of those new shinning 64bit beast, all you’ll really need for most of your code will be to recompile, not rewrite.
Same if you feel nostalgic or geek enough to go and try some server machines, a nice SGi, Alpha or Sun is sure to make your friend drools when they see it booting your OS after only a weekend of work.
I’m personally attached to low-level development. Not necessarily to x86, just to about any processor. If I could get a better feel of the hardware by grinding my teeth on the address bus I would
Assembly forever
2002-08-14 10:24 am
Anonymous
The moral of the story is to select reliable, proven tools. Save yourself frustration and time by being unable to accuse the toolset when something goes wrong.
Even proven, reliable tools can have their problems. You highlight a GCC 3.0 bug in the article, and I’ve also run into a very subtle problem in the last couple of days.
I had Glibc compiling on AtheOS 0.3.7, and I thought that all was good. However, when I tried to compile the exact same code on Syllable, one of the Makefiles fails, and the build process stops.
After a cry for help on the mailing list (Thank you Andrew!) it turns out that the problem is apparently due to a subtle difference in the command enterpreter, probably due to Syllable using a slightly newer version of GNU Bash & Make.
You’re damed if you do, damed if you don’t! If you didn’t get it from Emmanuels article; Operating Systems are tricky, slippery beasts. You have to be the lion tamer
2002-08-14 11:13 am
Anonymous
you probably want to have the smallest interface to your kernel as possible. Logic dictates that the less entry points you have from the outside world, the less you have to code and document.
thats a wonderfully lazy argument for a microkernel if ever i heard one!
2002-08-14 3:14 pm
Anonymous
He points out that writing an operating system is much more difficult than writing an applications program. I wonder how many hobbyist OS people start out writing an OS and get stuck because they *don’t know enough* about programming to do it?
2002-08-14 3:55 pm
Anonymous
I wonder how many hobbyist OS people start out writing an OS and get stuck because they *don’t know enough* about programming to do it?
That is a very good point indeed. The answer to this question is — too many. There are too many people who are just getting into programming and who decide they want to write an operating system, without even knowing how to program well. I’ve seen too many people like that, and I think Dave Poirier at least understands what I mean.
2002-08-14 5:14 pm
Anonymous
I wonder how many hobbyist OS people start out writing an OS and get stuck because they *don’t know enough* about programming to do it?
That is a very good point indeed. The answer to this question is — too many. There are too many people who are just getting into programming and who decide they want to write an operating system, without even knowing how to program well. I’ve seen too many people like that, and I think Dave Poirier at least understands what I mean.
So?
You make this sound like a bad thing. You can replace “writing an OS” with “writing a game” or “writing a 3D engine”.
If programmers never go in over their head, how else would they learn anything? The real skill is in realizing that they’re in too deep and either learn more to solve the problem, or hopefully go back to and fix the answer that they hacked out in ignorance when they learn more.
I remember trying to write an “Ogre” game (famous mini board game) a zillion years ago, and it was working okay until I actually got to pathfinding. Boy, was THAT a complete mess and a disaster. Ruined the whole project. Had NO IDEA what I was doing. You should have seen the tank zig zag and BACKING UP during attack. Amusing in hindsight, frustrating at the time.
But, the point is that when you get to that point where you’re stonewalled, you find out whether the project is still interesting enough to be worth jumping the hurdle placed in front of you. In my case, it wasn’t, I had other games I wanted to play with (and I eventually figured out the simple movement algorithm…).
2002-08-14 6:33 pm
Anonymous
Will,
you make a fair point I think. However, there is a difference between taking on something you think’s going to be hard and something that’s in a completely different ballpark. I guess you just have to be pragmatic about your abilities and time available; I don;t think Emmanuel is saying don;t start it unless you know you can do it. Sometimes it will work out and sometimes it won;t
Codefire.
2002-08-15 12:04 am
Anonymous
You make this sound like a bad thing. You can replace “writing an OS” with “writing a game” or “writing a 3D engine”.
OK, fair enough. It certainly is possible to write an OS with very little programming experience — but it would be quite a bad OS. It’s possible to write a good game without being able to program well (if you count concept and design over robustness of code). Unless you the machine and your intended design inside out, however, any OS you write will be either very limited* or buggy.
—
* Observe the relative numbers of boot loaders and OS kernels available on the Internet. It’s easy to just give up after writing a boot sector program, or writing a boot sector program and a short kernel stub.
2002-08-15 12:55 am
Anonymous
OK, fair enough. It certainly is possible to write an OS with very little programming experience — but it would be quite a bad OS. It’s possible to write a good game without being able to program well (if you count concept and design over robustness of code). Unless you the machine and your intended design inside out, however, any OS you write will be either very limited* or buggy.
Umm…again…So?
We’re talking about hobbyists. Folks who like to play with computers for fun.
Maybe they’re happy that they’ve got the bootloader working? They didn’t get the scheduler done? Bummer, but Big Deal. But one thing they did get out of it, they learned that they don’t like writing OS’s, for whatever reason.
Some will rise to the challenge and persevere, but most won’t. There are a lot of dead, incomplete, buggy, badly coded 3D engines out there as well. But the fact that there are all of these crappy bits of OS code floating around in the ether shouldn’t discourage anyone who has the interest to try it out for themselves.
Maybe they have the knack for the arcane art of OS writing. Maybe as they reach each complexity they soar right on past surprising themselves at how easily they beat something that “was so hard”. Then, they get to the linker/loader and give up, or whatever.
It just sounds like y’all are saying that these folks shouldn’t even try. The saying is “on the internet, no one knows you’re a dog”. On the other hand “once your code is compiled and working, nobody knows its badly designed and written without indenting.”
Replace writing an OS with “installing kitchen cabinets” or “laying your own tile floor”. Lots of disasters out there.
These unqualified hobbyists, while perhaps with mild delusions of grandeur, are HOBBYISTs! They’re doing it for fun. If it gains any traction or any communal repect, then swell. If not, then, hey, swell again. Beats watching 90210 re-runs.
I just think that if you’re in a position to encourage and help the unqualified become more qualified, and they’re respectful of your time, then help these people out.
2002-08-15 9:43 am
Anonymous
Will,
The purpose of the article is to try and help people size up the complexity of what they are getting into, and dive in better prepared. I like seeing a lot of hobbyist projects online; I hope this series of articles can help people progress and have more fun. If you look online, you will find a large amount of articles about setting up your GDT or managing virtual memory, but none about getting prepared.
Thanks for your comments
2002-08-15 9:50 am
Anonymous
Matt,
thats a wonderfully lazy argument for a microkernel if ever i heard one!
This is not what I mean. Maybe I didn’t express it right. The general idea is to export the minimum number of ways of doing the same thing to the ‘outside world’. It’s a common trait of first designs, to provide five different mutexes and three types of semaphores and, on top of that, explicit primitives for suspending a thread, for instance.
This adds a lot of complexity and code with potential bugs. If you’re struggling with developing a kernel, it’s better to think about one good primitive and reduce the scope of the problem. If you’re having no problem whatsoever developing your system, I am in no position to tell you what architecture to adopt or what primitives to provide.
2002-08-15 9:57 am
Anonymous
Richard,
This is my opinion about the originally released version of V2_OS, which intended to be a fast, monotasking system for digital media playback. It got the job done.
The UUU project cites the dependency to assembler in the project termination document. “Chapter 2. Failure points :: 2.1. Skilled Assembly Programmers”.
Quoting Dave Poirier, the UUU project founder and VOiD architecture luminary:
“In conclusion: While in theory achievable, the development of a complete operating system entirely in assembly in the current economic context cannot be justified over the development under other languages for which nowadays very good compilers are available.”
This document oriented my analysis of UUU.
2002-08-15 4:46 pm
Anonymous
If it’s really just a hobby OS, then fine, there’s no problem. Some OS creators readily admit it, too, like Travis G. and his NewOS.
But some people *do* get visions of grandeur. I remember watching the FreeDows site forever, waiting for updates. It wasn’t until much later that I found out just how little had been accomplished, or that the original group had even split over policy conflicts.
UUU certainly sounded interesting, though.
2002-08-15 10:07 pm
Anonymous
I guess the line that irked me was this one from Tim:
There are too many people who are just getting into programming and who decide they want to write an operating system, without even knowing how to program well.
To me, that’s no big deal. So what. These folks who do not have the skills and what not to write an OS (or a 3D Engine, or install kitchen cabinets) are not hurting anyone by trying to do it.
It sounds as if “too many” of these inexperienced and/or ignorant people trying to write OSs (etc) are “a problem”, and I just don’t see it that way.
Let them flail away.
If it’s really just a hobby OS, then fine, there’s no problem.
And if the folks writing this stuff are truly incompetent, then it will probably never even raise to the level of “Hobby OS”. I guess the “problem” is that these folks come roaring in with a bunch of glossy lit, high hopes, and “I’ve got the boot loader, who wants to help with the rest?”.
I’m not deep enough in the community to actually see any of these things, sticking more closely with the mainstream stuff. So, I guess I’m not in the loop of what the real problem is here.
Maybe someone can enlighten me.
2002-08-15 10:53 pm
Anonymous
I guess the line that irked me was this one from Tim:
There are too many people who are just getting into programming and who decide they want to write an operating system, without even knowing how to program well.
OK, I apologise. There can never be too many people getting invovled in OS development. Maybe I should re-word my sentence:
“It disappoints me to see people who are unable to program well who think they can jump in and write a good operating system.”
It’s my belief that kernel coding is something that you have to work up to; that you need a solid grounding in user-mode development first. I have learned a lot from my own experiences in OS development (and I’m better off for several kernel rewrites), but I really would be flailing if I didn’t have the prior experience I have.