posted by Hadrien Grasland on Sun 20th Feb 2011 13:20 UTC
IconNow that you have an idea of where your OS project is heading as a whole, it's time to go into specifics. The first component of your OS which you'll have to design, if you're building it from the ground up, is its kernel, so this article aims at being a quick guide to kernel design, describing the major areas which you'll have to think about and guiding you to places where you can find more information on the subject.

What is a kernel and what does it do?

I think it's best to put a definition there. The kernel is the central part of an operating system, whose role is to distribute hardware resources across other software in a controlled way. There are several reasons why such centralized hardware resource management is interesting, including reliability (the less power an application has, the less damage it can cause if it runs amok), security (for the very same reasons, but this time the app goes berserk intentionally), and several low-level system services which require a system-wide hardware management policy (pre-emptive multitasking, power management, memory allocation...).

Beyond these generic considerations, the central goal of most modern kernels is in practice to manage the process and thread abstractions. A process is an isolated software entity which can hold access to a limited amount of hardware resources, generally in an exclusive fashion, in order to avoid concurrency disasters. A thread is a task which may be run concurrently from other tasks. Both concepts are independent from each other, although it is common for each process to have at least one dedicated thread in modern multitasking OSs.

Hardware resources

So far, I've not given much depth to the "hardware resource" concept. When you read this expression, the first thing which you're thinking about is probably some pieces of real hardware which are actually fully independent from each other: mice, keyboards, storage devices, etc...

However, as you know, these peripherals are not directly connected to the CPU. They all are accessed via the same bus, through one single CPU port. So if you want to make sure that each process only has access to some peripherals, the kernel must be the one in control of the bus. Or you may also decide that the bus is the hardware resource which processes must request access to. Depending on how fine-grained your hardware resource model is, your position in the process isolation vs kernel complexity scale will vary.

To make things even more complex, modern OSs also manage some very useful hardware resources which do not actually exist from a pure hardware point of view. Consider, as an example, memory allocation. From a hardware point of view, there is only one RAM. You may have several RAM modules in your computer, but your CPU still sees them as one single, contiguous chunk of RAM. Yet you regularly want to allocate some part of it to one process, and another part of it to another process.

For this to work, the kernel has to take its finest cleaver and virtually slice the contiguous RAM in smaller chunks which can safely be allocated separately to various processes, based on each one's needs. There also has to be some mechanism for preventing different processes from peeking into each other's memory, which can be implemented in various ways but most frequently implies use of special hardware bundled in the CPU, the Memory Management Unit (MMU). This hardware allows the kernel to only give each process access to his limited region of memory, and to quickly switch between memory access permissions of various processes while the kernel is switching from one process to another.

Another typical example of abstract hardware resource is CPU time. I assume you have all noticed that desktop operating systems did not wait for multicore chips to appear before letting us run several applications at once. They all made sure that processes would share CPU time in some way, having the CPU frequently switching from one process to another, so that as a whole it looks like simultaneous execution under normal usage conditions.

Of course, this doesn't work by calling the CPU and telling it "Hi, fella, can you please run process A with priority 15 and process B with priority 8?". CPUs are fairly stupid, they just fetch an instruction of a binary, execute it, and then fetch the next one, unless some interrupt distracts them from their task. So in modern interactive operating systems, it's the kernel which will have to make sure that an interrupt occurs regularly, and that each time this interrupt occurs, a switch to another process occurs. This whole process is called pre-emptive multitasking.

Finally, it is common not to let processes access storage devices directly, but rather to give them access to some places in the file system. Of course, like allocated RAM, the file system is a purely virtual construct, which has no physical basis in HDDs/SSDs, and must be fully managed by the OS at some point.

In short, you'll have to define which hardware resources your kernel manages and gives processes access to. It is generally not a matter of just giving processes access to hardware x or not, there is often a certain amount of management work to be done in the kernel, compromises to be considered, and sometimes hardware resources must just be created by the kernel out of nowhere, as an example in the case of memory allocation, pre-emptive multitasking, and filesystem operation.

Inter-process communication and thread synchronization

Generally-speaking, the more isolated from each other processes are the better. As said earlier, malware can't do much in a tightly sandboxed environment, and reliability is greatly improved too. On the other hand, there are several occasions where it is convenient for processes to exchange information with each other.

Typical use case for this is a client-server architecture: somewhere in the depths of the system, there's a "server" process sleeping, waiting for orders. "Client" processes can wake it up and give it some work to do in a controlled way. At some point, "server" process is done and returns the result to "client" process. This way of doing things is especially common in the UNIX world. Another use case for inter-process communications is apps which are themselves made of several interacting processes.

There are several ways through which processes may communicate with each other. Here are a few:

  • Signals: The dumbest form of inter-process communication, akin to an interrupt. Process A "rings a bell" in process B. Said bell, called a signal, has a number associated to it, but nothing more. Process B may be waiting for this signal to arrive at the time, or have defined a function attached to it which is called in a new thread by the kernel when the process receives it.

  • Pipes and other data streams: Processes also frequently want to exchange data of various types. Most modern OSs provide a facility for doing this, although several ones only allow processes to exchange data on a byte-per-byte basis, for legacy reasons.

  • Remote/Distant procedure calls: Once we are able to both send data from one process to another and to send signals to other processes so that one of their methods gets called, it's highly tempting to combine both and allow one process to call methods from another process (in a controlled way, obviously). This approach allows one to use processes like shared libraries, with the added advantage that contrary to shared libraries, processes may hold access to resources which the caller doesn't have access to, giving the caller access to these resources in a controlled way.

  • Shared memory: Although in most cases processes are better isolated from each other, it may sometimes be practical for two processes to share a chunk of RAM and just do whatever they want in it without having the kernel going in the way. This approach is commonly used under the hood by kernels to speed up data transfers and to avoid loading shared libraries several times when several processes want to use them, but some kernels also make this functionality publicly available to processes which may have other uses for it.

Another issue, related to inter-process communication, is synchronization, that is situations where threads must act in a coordinated manner.

To get started to this problem, notice that in a multitasking environment, there are several occasions where you want to make sure that only a limited amount of threads (generally one) may access a given resource at a time. Imagine, as an example, the following scenario: two word processors are opened simultaneously, with different files inside of them. The user then brutally decides to print everything, and quickly clicks the "print" button of both windows.

Without a mechanism in place to avoid this, here's what would happen: both word processors start to feed data to the printer, which gets confused and prints garbled output, basically a mixture of both documents. Not a pretty sight. To avoid this, we must put somewhere in the printer driver a mechanism which ensures that only one thread may be printing a document at the same time. Or, if we have two printers available and if which one is used does not matter, we can have a mechanism which ensures that only two threads may be printing a document at the same time.

The usual mechanism for this is called a semaphore, and typically works as follows: on the inside, the semaphore has a counter which represents how much time a resource can still be accessed. Each time a thread tries to access a resource protected by a semaphore, this counter is checked. If its value is nonzero, it is decreased by one and the thread is permitted to access the resource. If its value is zero, the thread may not access the resource. Notice that to be perfectly bullet-proof, the mechanism in use must ensure that the value of the semaphore is checked and changed in a single processor instruction that can't be run on several CPU cores at once, which requires a bit of help from the hardware. It's not as simple as just checking and changing an integer value. But how exactly this is done is not our business at this design stage.

Apart from semaphores, another, less frequently used but still well-known synchronization mechanism, is the barrier. It allows N threads to wait until each one has finished its respective work before moving on. This is particularly useful in situations where a task is parallelized in several chunks that may not necessarily take the same time to be completed (think as an example about rendering a 3D picture by slicing it in a number of parts and having each part be computed by a separate thread).

So in short, having defined your process model, you'll have to define how they will communicate with each other, and how threads' actions will be synchronized.

Major concerns and compromises

You may have noticed that I've done my best to stick to generic concepts and put some care in showing that there are several ways to do a given thing. That's not just for fun. There are several compromises to play with when designing a kernel, and depending on what your OS project's goals are, you'll probably have to consider them in different ways.

Here's an overview of some major concerns which you should have in mind when designing a kernel (though the importance of each varies depending on what your OS project is):

  • Performance: Both in terms of hardware resource use and in terms of performance perceived by the user. These are not totally the same. As an example, in a desktop operating system, prioritizing software which the user is directly interacting with over background services improves perceived performance without requiring much optimization work. In a real-time operating system, hardware resource usage does not matter as much as meeting deadlines. And so on...

  • Maintainability: Kernels are pretty hard to code, therefore you generally want them to last a fairly long time. For this to happen, it must be easy to grab their code and tinker with it as soon as a flaw is found or a feature is to be added. The codebase must thus be kept as short as possible, well-commented, well-documented, well-organized, and leave room for tweaking things without breaking the API.

  • Portability: In our world of quickly-evolving hardware, operating system kernels should be easily portable to a new architecture. This is especially true in the realm of embedded devices, where the hardware is much more complex and ever-changing than it is on the desktop.

  • Scalability: The other side of the portability coin is that your kernel should adapt itself to future hardware evolutions in a given architecture. This noticeably implies optimizing for multi-core chips: you should strive to restrict the parts of the kernel which can only be accessed by a limited number of threads to a minimal amount of code, and aggressively optimize your kernel for multi-threaded use.

  • Reliability: Processes should not crash. But when they do crash, the impact of their crash should be minimized, and recoverability should be maximized. This is where maximizing process isolation, reducing the amount of code in loosely-isolated processes, and investigating means of backing up process data and restarting even the most critical services without rebooting the OS really shines.

  • Security: On platforms which allow untrusted third-party software to run, there should be some protection against malware. You must understand right away that things like open-source, antiviruses, firewalls, and having software validated by a bunch of testers, are simply neither enough nor very efficient. These should only be tools for paranoids and fallback methods when system security has failed, and system security should fail as infrequently as possible. Maximal isolation of processes is a way to reach that result, but you must also minimize the probability that system component can be exploited by low-privilege code.

  • Modularity: You'll generally want to make kernel components as independent from each other as possible. Aside from improving maintainability, and even reliability if you reach a level of modularity where you can restart failing kernel components on the fly without having the live system take a big hit, it also permits you to make some kernel features optional, a very nice feature, especially when applied to hardware drivers in kernels which include them.

  • Developer goodies: In the dark ages of DOS, it was considered okay to ask from developers that they literally code hardware drivers in their software, as the operating system would do nearly nothing for them. This is not the case anymore. For everything which you claim to support, you must provide nice and simple abstractions which hide the underlying complexity of hardware behind a friendly, universal interface.

  • Cool factor: Who'll use a new kernel if it's just the same as others, but in a much superior way? Let's introduce power-efficient scheduling, rickrolling panic screens, colored and formatted log output, and other fun and unique stuff!

Now, let's see how they end up in conflict with each other...

  • The quest for performance conflicts with everything but scalability when taken too far (writing everything in assembly, not using function calls, putting everything in the kernel for better speed, keeping the level of abstraction minimal...)

  • Maintainability conflicts with scalability, along with anything else that makes the codebase more complex, especially if the extra complexity can't be put in separate modules.

  • Portability is in conflict with everything that requires using or giving access to architecture-specific features, particularly when arch-specific code ends up being spread all over the place and not tightly packed in a known corner (as in some forms of performance optimizations).

  • Scalability is in conflict with any feature or construct which can't be used on 65536 CPU cores at the same time. Aside from the obvious compromise with maintainability and reliability which are better without hard-to-code and hard-to-debug threads spread all over the place, there's also a balance with some developer goodies (an obvious example being the libc and its hundreds of blocking system calls).

  • Reliability is the fiend of anything which adds complexity, as more code statistically means more bugs, especially when said code is hard to debug. The conflict with performance is particularly big, as many performance optimizations require to provide code with more access to hardware than it actually requires. It is also the sole design criteria in this list to have the awesome property of conflicting with itself, as some system features can improve reliability.

  • Security is a cousin of reliability as far as code complexity is concerned, since bugs can be exploitable. It also doesn't like low-level code where every single action is not checked (pointers arithmetic, C-style arrays...), which is more prone to exploitable failures than the rest.

  • Modularity doesn't like chunks of code which must be put at the same place of RAM. This means a serious conflict with performance, since code locality allows optimal use of CPU caches. The relationship between modularity and maintainability is ambiguous: separating system components from each other initially helps maintainability a lot, but extreme forms of modularity like putting the scheduler (part of the kernel which manages multitasking) in a process can make the code quite confusing.

  • We've previously seen that developer goodies and other cool stuff conflict with a large part of the rest for a number of reasons. Notice also an extra side of the feature vs maintainability conflict: it's easy to add features, but hard to remove them, and you don't know in advance how useful they will be. If you're not careful, this results in the phenomenon of feature bloat where the number of useless features grows exponentially with time. A way to avoid this is to keep the feature set minimal in the first release, then examine user feedback to see what is actually lacking. But beware of the "second-system effect", where you just put everything you're asked for in the second release, resulting in even worse feature bloat than if you had put a more extensive feature set to start with.

Some examples of kernel design

There are many operating system kernels in existence, though not all meet the same level of success. Here are a few stereotypical designs which tend to be quite frequently encountered (this list is by no mean exhaustive):

Monolithic kernels

The way all OS kernels were written long ago, for performance reasons, and still the dominant kernel design as of today. The monolithic kernel model remains quite attractive due to the extreme simplicity of its design. Basically, the kernel is a single big process running with maximum privileges and tending to include everything but the kitchen sink. As an example, it is common for desktop monolithic kernels to include facilities for rendering GPU-accelerated graphics and managing every single filesystem in existence.

Monolithic kernels shine especially in areas where high performance is needed, as everything is part of the same process. They are also easier to design, since the hardware resource model can be made simpler (only the kernel has direct access to hardware, user-space processes only have access to kernel-crafted abstractions), and since user-space is not a major concern until late in the development process. On the other hand, this way of doing things highers the temptation of using bad coding practices, resulting in unmaintainable, non-portable, non modular code. Due to the large codebase and the full access to hardware, bugs in a monolithic kernel are also more frequent and have a larger impact than in more isolated kernel designs.

Examples of monolithic kernel include Linux and its Android fork, most BSDs' kernels, Windows NT and XNU (Yes, I know, the two latter call themselves hybrid kernels, but that's mostly marketing. If you put most services in the same address space, with full access to the hardware, and without any form of isolation between each other, the result is still a monolithic kernel, with the advantages and drawbacks of monolithic kernels).

Micro-kernels

This is the exact opposite of a monolithic kernel in terms of isolation. The part of the kernel which has full access to the hardware is kept minimal (a few thousands of lines of executable code in the case of MINIX 3, to be compared with the millions of lines of code of monolithic kernels like Linux or Windows NT), and most kernel services are moved in separate services whose access to hardware is fine-tuned for their specific purpose.

Microkernels are highly modular by their very nature, and the isolated design favors good coding practices. Process isolation and fine-tuned access to hardware resources also ensure optimal reliability and security. On the other hand, microkernels are harder to write as much as they are easier to maintain, and the need to constantly switch from one process to another makes the most straightforward implementation perform quite poorly: it takes more optimization work to have a microkernel reach high performance, especially on the IPC side (as IPC becomes a critical mechanism).

Examples of commercial-grade microkernels include QNX, ยต-velOSity and PikeOS. On the research side, one can mention MINIX 3, GNU Hurd, the L4 family, and the EROS family (KeyKOS, EROS, Coyotos, CapROS).

VM-based kernels

A fairly recent approach, which at the time has not fully gotten out of research labs and proof-of-concept demos. Maybe you'll be the one implementing it successfully. The idea here is that since most bugs and exploits in software come from textbook mistakes with native code (buffer overflows, dangling pointers, memory leaks...), native code is evil and should be phased out. The challenge is thus to code a whole operating system, including its kernel, in a managed language like C# or Java.

Benefits of this approach include obviously a very high cool factor and increased reliability and security. It could also potentially reach better performance than microkernels while providing similar isolation in a distant future, by isolating processes through a purely software mechanism (since all pointers and accesses to the hardware are checked by the virtual machine, no process may access resources which it's not allowed to access). On the other hand, nothing is free in the world of kernel development, and VM-based kernels have several major drawbacks to compensate for these high promises.

  • The kernel must include a full featured interpreter for the relevant language, which means that the codebase will be huge, hard to design, and that very nasty VM bugs are to be expected during implementation.

  • Making a VM fast enough that it is suitable for running kernel-level code full of hardware access and pointer manipulation is another big programming challenge.

  • A VM running on top of bare hardware will be harder to write, and thus more buggy and exploitable, than a VM running in the user space of an operating system. At the same time, exploits will have even worse consequences. Currently, the Java Virtual Machine is one of the biggest sources of vulnerabilities in existence on desktop PCs, so clearly something must change in the way we write VMs before they are suitable for inclusion in operating system kernels.

Examples of active VM-based kernel projects include Singularity, JNode, PhantomOS and Cosmos. There are also some interesting projects that are not active anymore, like JX and SharpOS (whose developers are now working in the MOSA project).

Bibliography, links, and friends

Having defined what the general concepts which you'll have to care about are, I bet you want to get into more details. In fact you should. So here is some material for going deeper than this introductory article on the subject of kernel design:

  • Modern Operating Systems (Andrew S. Tanenbaum): Should you only read one book on the subject, I strongly recommend this one. It is an enlightening, extensive description of the subject, covering a lot of aspects of kernel design and which you may also use in much more parts of your OS development work.

  • You may also find a list of other books, along with some reviews, on the OSdev wiki.

  • While you're on said wiki, you might also want to have a look at its main page, more precisely at the "Design Considerations" links in the left column (scroll down a bit). Globally, you should bookmark this website, because you'll have a vital need for it once you start working on implementation. It's, simply put, the best resource I know of on the subject.

  • And when you have question, also consider asking them in their forum. They are being asked hundreds of "how do I?" and "I'm stuck, what should I do?" implementation questions per month, so a bit of theoretical discussions would really please them. But beware of stupid questions which are answered in the wiki, otherwise prepare to face Combuster's sharp tongue.

  • Questions for an OS designer is also an interesting read, although it doesn't go too deeply into specifics. I should have linked to it in my previous article.

And that's all for now. Next time, we're going to go a bit more platform-specific, as I'm going to describe basic characteristics of the x86 architecture, which will be used for the rest of this tutorial (I'll noticeably explain why).

e p (10)    34 Comment(s)

Technology White Papers

See More