Climbing the Kernel Mountain

So, you want to write an operating system. We discussed earlier a generic set of considerations that are important, from my experience, for this type of adventure. We proceed to look at solutions to the problem of actually getting started with writing your system: how to do it when you know you don’t know what you’re doing, making it work before making it work fast, and what to do when things go wrong.

Introduction

You can spend no more than an hour researching online, and you will find literally hundreds of open-source operating system projects. Deep down, the developers of these projects hope to become the next Linus. Their dream is to complete a working system, and be swamped in e-mail from all over the world with stories of how reliable and friendly their nanokernel-based system is.

All of us are happy if others find the creation of our own design useful. To paraphrase Frederick P. Brooks Jr., author of

“The Mythical Man-Month”
, desire for praise starts early, with children making pencil holders “for daddy’s office”.

Going back to the online projects, most are unfortunately stuck in the dream stage or boot loader stage. Grandiose plans are expressed with a boot sector and sometimes a “hello world” in C, last modified two years before.

There are many reasons for operating system projects to end that early in the cycle. In a previous article, I shared my experience of developing a closed source project to completion and listed generic points. They do not guarantee success, but in my experience they are important to help focus on the end goal. Weed the distractions out. Get the job done.

You have decided on your camp. You know your audience. You have decided to get real about what you can do. You
have agreed to be the benevolent dictator of your project. Well, you still have to write your operating system, now don’t you?

Cover your bases

Systems programming is the hardest form of programming. To paraphrase the good Mr. Brooks Jr. again, compilers are three times more complex than regular applications, and operating systems are three times more complex than compilers. That was his rule of thumb in 1975, after managing the IBM OS/360 project to completion. If we consider how much is expected from operating systems today, we can chance that the chasm is even wider. Ironically, phenomenal progress has been made in easing application development, while you still need to write kernel code with your bare hands.

Let it sink in. You want to climb Mount Everest. It is fun and rewarding to climb Mount Everest, but you don’t want to do that with a small plastic hammer, wearing shorts and a t-shirt, at minus fifty degrees.

As your first operating system project is by definition the first one, you want to give yourself some credit and just dive in, after you have all the basics covered. If operating system programming is at least nine times harder than regular applications programming, logic would dictate that you better be a good applications programmer. You better know how to manipulate all basic data structures. Lists, arrays, hash tables, arrays of function vectors, bit arrays will all become useful as you develop your kernel. Any error in manipulating them will make you a new member of the triple-fault club. Worse, subtle non-systematic bugs will annoy you for days and take the fun out.

Good software engineering dictates modularity. This is not just for teaching in computer science classes. If you’re a good programmer, you’re lazy. You don’t want to spend hours isolating a bug to finally locate it in your umpteenth list insertion inline code. Make one set of dependable primitives, and use them. The net effect in terms of bugs and frustration saved is tremendous.

A fellow systems programmer, who responded to a previous piece, mentioned he saw the same attitude while a mentor for aspiring game writers. People who had never written a working Tetris game contacted him and asked “How do I write Quake?”. Get real and make sure you stand on solid ground before progressing higher on the mountain.

Any good practice of software engineering applies to operating system development, and is at least nine times as
important. I suggest you draft at least some sort of specifications and design document, keep things modular, write test suites, use version control, a bug database, and all sorts of good practice you may have encountered while writing other kinds of software.

As a systems programmer, you know you don’t know what you are doing. So do it carefully.

Sharpen your tools

It is crucial you prepare a reliable set of tools before you get started. You will find this kind of advice in any book on software development so let me be more specific. You will encounter really weird problems, like hard reboots, or registers being thrashed in your thread for no logical reason. Your scheduler may pick the wrong task, or execute code in the wrong location. Even with the best programming abilities, miscommunication between programmers, or your brain spacing out for two minutes will do that, and when it happens, you need to be certain your code did it.

I tried to be smart once, and started using
gcc 3.0
for development, instead of the 2.95 series. I was happy to find the new compiler reported more useful warnings and was a lot cleverer about subtle potential problems in the code, so I started adopting it for daily work. All was good, until I tried using it on strongarm. The scheduler would insist on running the idle task, which strongly reduced the usefulness of our system. The bug occurred in portable code, but the strongarm code base was new at the time, and there is no end to the side effects it could introduce. In my mind, a compiler couldn’t generate subtly wrong code, so I never questioned it. Out of things to try, I reverted to gcc 2.95. Sure enough, it made the code work, and disassembling shown that gcc 3 was optimizing so much, that it removed a variable related to the priority
level, which forced it to be the idle one. Now gcc 3 made it fast! And useless.

Continuing in the open source world, the same goes for
binutils. A lot of times, new versions have really interesting
bugs. Symbols that are offset by 4 bytes in the internal representation of COFF, compensated by a hack that says
/* I don’t know why I need to do this */ somewhere else in the code. If you hack binutils and bfd to produce your own format, be aware that this is major quicksand. Use ELF and a reliable, official release of binutils unless you have a truly relevant reason for doing otherwise.

The moral of the story is to select reliable, proven tools. Save yourself frustration and time by being unable to accuse the toolset when something goes wrong.

Use a high-level language

This is not as controversial as it sounds. If you insist on using assembler even for kernel development, I assume you don’t need advice and you are making good progress. Some projects can pull it off. I have written real-time kernels and even applications fully in assembler before. I was probably more stubborn than you can be about the virtues of assembler, but I would never go back to it now.

I won’t even discuss the portability advantages of a high-level language. If you think C is not much more than a
high-level assembler, you need to use it more. This said, you might not even have portability as one of your goals and don’t necessarily care.

Using C, C++ or a high level language of your choice shields you from a whole class of problems: using the wrong register, swapping operands by mistake, using the wrong opcode, or miscalculating how many bytes you need for something. GNU tools don’t even make it remotely easy. Going to the dentist is a nicer experience than writing assembler with gas or inline with gcc. You can always use nasm on x86, but it doesn’t cover any other cpu. If you want to be portable,
you are stuck with gas syntax.

If you’re comfortable with keeping track of what is in every register and certain you will not pass an integer to a function expecting a pointer to your threads hash table, fine. If not, high-level languages provide you with very powerful tools to diagnose a problem when your brain spaced out. Types and prototypes will generate warnings or errors when you mismatch things by mistake.

No early optimizing

This can be seen as controversial advice again, but if you decide to use a high-level language, you should try to
write as much as possible in that language, even if it is tempting to have an optimized version of some part of memory management or semaphores in assembler. It is highly probable your internal design will have changed a lot by the time you can run applications. By definition, if you optimize a critical piece, it is critical. It will be called a lot. If you introduce a bug in it, you will wreck all kinds of things.

At first you want to make it work. Then you will have a good view of how to make it work fast.

Don’t try to code smart

In the same way, any attempt at smart coding should be avoided like the plague or going to the doctor. Don’t save
sixteen bytes by writing cryptic, bug-prone code. You will thank yourself a year later. Or you won’t, a few days later, when you finally locate this really strange bug.

Use an instrumented environment

If possible, and if you’re going for a bootable, standalone environment, the icing on the cake is to use an
instrumented target machine.

Without one, you will have absolutely no help if your system doesn’t boot as expected. Being able to execute
your code step-by-step, set breakpoints, and watch memory or registers as your code executes, is invaluable. Later, when your operating system boots, the instrumentation can be used to monitor memory access patterns and execution time of your crucial code.

A lot of embedded boards come with JTAG ports, In-Circuit Emulators, and software to do exactly that. If you
develop for fun at home, you probably do not want to afford them, but a reasonable starting point is then an emulator such as Bochs.
It obviously executes much slower than a real machine, but this is not a problem when you start writing the system. You can step your code, in a virtual machine, without the need for another PC. A hack I contributed to Bochs lets you dump traces to an I/O port and read them on your console.

If nothing else, pepper your code with optionally compiled traces, coupled with a reliable method for displaying them. If your code does not work as expected, you then have the option to enable the traces and see what got printed last.

Small is beautiful

Once you have bootstrapped your effort, you probably want to have the smallest interface to your kernel as possible. Logic dictates that the less entry points you have from the outside world, the less you have to code and
document. It is important to keep a small and consistent design in the external view you will give of your kernel, as much if not more than the internal view.

Don’t get scared or distracted

Paraphrasing our friend Mr. Brooks Jr. a third time, the process of creation goes in three steps. First the idea, when you have an ideal representation in your mind of your project, and it sounds all perfect. Then comes the implementation, where you face the limitations of the physical medium, and try to get it to work. This is where the flaws in your idea appear, and have to be fixed before you have something that works. Lastly is the interaction, where you get feedback from users.

The idea is always the fun stage. When you start implementing it, you get frustrated by slow progress and limitations of the hardware interfaces that don’t exactly map to your concepts. The temptation to jump to another project in “idea” stage is very high. I’ve been there.

You have to get confident that your project is great and that it will work out. A new project being more fun is just an illusion. Once you move to implementation stage, it will be just the same as your OS. Stick with it. Don’t get distracted by something else unless you are certain you want to give up on the OS project because you don’t think it is useful anymore, and not because working on something else is more interesting.

Don’t get scared by other projects. You see somebody coming up with a similar idea. So what. They, too, have
to implement it
. Only match your work against other existing work, not some idea-stage project. Stay confident.

Don’t get sidetracked. Out of the hundreds of projects online, there is a preposterous amount that is also working
on a GUI. A text editor. A SETI client. Whatever it is. Stick to the kernel. It is a lot more realistic that you can come up with a better, or at least, working, kernel, than it is to come up with a kernel, a GUI, a TCP/IP stack and accounting software, starting all from scratch. Unless you have an overriding reason, such as rewriting a whole system in Forth, or using an entirely new way of designing software that allows for no recycling, refrain from it.

The most compelling reason not to develop a GUI on top of your kernel is to show how useful or useless your
kernel is. If you write all the software that runs on your system, you set all rules of the game, and never get a chance to see if your code actually fulfills the requirements. Porting and running existing applications or GUIs is a great test of your kernel interface. You will have to match it with real-world requirements. If you can’t, you’re in trouble.

Form is liberating

Free software projects have a lot less constraints than commercial software. The code ships “when it is ready”,
not before. This gives time to perfect the code until it is well structured and reliable.

Constraints are not entirely bad. Limitations in hardware, budget, memory, time, and other software APIs are
great to resolve endless discussions and spot inconsistent design. Constraints force you to take decisions and cut an endless debate. Not having any direction encourages sloth, argumentation and not taking decisions at all.

This is why I would advocate that if you have no particular plans other than writing your own operating system, you should clone an existing design and not try to improve it until you have a version that works. Linux is obviously a good example of that.

Another real-world example is
AROS,
the Amiga Research Operating System. The project
history teaches us that they originally tried to design a better AmigaOS than AmigaOS, adding virtual memory and all sorts of features people had been longing for in the original environment. After endless debates about how to implement them, somebody took over and decided to clone the original, call by call, and then see about the improvements. Suddenly the project started and has been making great progress since then.

Bottom-up and top-down approaches

From experience, coming up with a working kernel, excluding any considerations of new architecture or
design, can be broken up into a few stages:



  1. Boot an image written in a high-level language

  2. Boot it with paging, if desired

  3. Write basic drivers for debug and communication with host.

  4. Add memory management. Write test-suite. Debug.

  5. Write basic drivers for timers. Write test-suite. Debug.

  6. Add multitasking. Write test-suite. Debug.

  7. Add and extend drivers and a file system. Write test-suite. Debug.

  8. Run a simple user program. Debug all side effects that this brings up.

  9. Run a simple command interpreter.

  10. Run simple applications from the interpreter. Debug side effects.

  11. Port and run existing applications of increasing complexity. Major debugging.

  12. Port the whole thing to other architectures. Repeat steps 1-3 and 5 and more major debugging.

If your system is carefully designed, steps 4 to 11 can partly or entirely be completed on top of another operating
system. To some extent, in reverse order. This is the top-down approach. I would highly recommend it if you don’t feel comfortable with climbing Mount Everest in shorts with a single bottle of water.

The bottom-up approach is to start with the boot loader and complete steps in the order described above. This
obviously has been done before, but again, at the risk of hammering you over the head with it, is a lot more complex.

The latter approach can be made easier. First, writing your system for x86, unless you absolutely want to write
your own umpteenth incompatible boot loader and protected mode jump code, a standard, dependable boot loader such as Grub will make it a lot less painful to complete a very important milestone: booting code in a high-level language on the target, in a packaged and controlled bootable image.

You can also use
OSKit
to avoid having to write most of your kernel before it can say “hello world” from a user space application. Unlike completing the steps on top of another operating system, you can gradually replace code by your own.

A fellow programmer suggested another smart approach. Use a simple, interactive and mono-tasking OS, which
will give you access to the machine while providing an interactive environment with a file system and whatnot. Namely DOS: you can start the kernel on top of a running system, take over interrupts and memory management, use the
underlying system for all sorts of things, and even exit to it if needed.

Conclusion

This covers mentally bootstrapping your kernel coding process in more detail. This does not even begin to discuss
how to steer your project and never lose sight of your end goal, once you got it started. This will be the subject
of another article.

About the author

Emmanuel Marty is the Chief Technical Officer of NexWave Solutions, the supplier of a new software architecture, OS services, and telecom stacks, for consumer electronics and telecom. He has been working with computers since the age of 10. Currently aged 26, he lives in Montpellier, France, with his fiancee. He can be reached at [email protected].

16 Comments

  1. 2002-08-14 12:27 am
  2. 2002-08-14 3:34 am
  3. 2002-08-14 10:24 am
  4. 2002-08-14 11:13 am
  5. 2002-08-14 3:14 pm
  6. 2002-08-14 3:55 pm
  7. 2002-08-14 5:14 pm
  8. 2002-08-14 6:33 pm
  9. 2002-08-15 12:04 am
  10. 2002-08-15 12:55 am
  11. 2002-08-15 9:43 am
  12. 2002-08-15 9:50 am
  13. 2002-08-15 9:57 am
  14. 2002-08-15 4:46 pm
  15. 2002-08-15 10:07 pm
  16. 2002-08-15 10:53 pm