Book Review: Understanding the Linux Virtual Memory Manager

Virtual memory is one of the most important subsystems of any modern operating system. Virtual memory is deeply intertwined with user processes, protection between processes and protection of the kernel from user processes, efficient shared memory, communication with IO (DMA, etc.), paging, swapping, and countless other systems. Understanding the VM subsystem greatly helps understanding how all other parts of the kernel work and interact. Because of this “Understanding the Linux Virtual Memory Manager” is a great guide in better understanding and working with the entire kernel.The book is written in a very precise technical style and Gorman explains things clearly if somewhat dryly. Readers are expected to have knowledge of common hardware/OS terms, and prior knowledge of the kernel helps. Be aware that this is no read for somebody who has no prior understanding of Operating Systems, or just wants to understand the basics of what is going on. If you, however, want to really understand how modern operating systems handle memory, you should immediately buy this book. There are almost no other books on memory in Linux, and none of comparable quality.

Rather than trying to give you a rough overview of the kernel, and then focusing on individual subsystems, Gorman immediately dives into how physical memory is managed, and works his way up from there. This approach works quite well, and is consistent with the no-nonsense, no-fluff style of the book, but can make for a difficult start for beginners. Several terms such as buddy allocator, or slab allocator where mentioned early on, without being explained at the time. Of course these concepts where explained in great details later on, but somebody who had never heard of these things would be initially confused. If this is true for you, you might want to skim over some of the introductory kernel websites and books. Otherwise, it allows you to immediately start understanding VM, rather than rehashing simple concepts.

Virtual Memory is one of the subsystems of the kernel that interfaces very heavily with the hardware, so many things depend on how each instruction set architecture implements a feature. In order to be able to concentrate on the VM and not get bugged down in hardware issues, the book chooses to focus on the x86 and sometimes points out how things would work differently in other architectures. It would, however have been nice if some specific details and oddities of the x86 had been explained early on, because e.g. High Memory was confusing throughout the book, until it was covered in great detail, towards the end of the book.

Even experts should be satisfied with the amount of detail Gorman goes into, for instance many aspects of the implementation of Nodes, which are mostly important on NUMA (Non Uniform Memory Access) machines, are described extensively. The author wrote his master thesis about the Linux Virtual Memory architecture, and it shows. I found the chapter on Process Address Space to be particularly important, even for readers not immediately concerned with the entire VM, because it describes the implementation of the user address space, which is key to understanding how the kernel implements user process. Kernel programmers (even people only interested in straightforward tasks such as writing drivers) will be very interested in Physicial Page and Noncontigiuous Memory Allocation, as well as the Slab Allocator, which allows automatic reuse of “objects” (which are structures in memory, and unrelated to OO-objects). The explanation of the Slab Allocator also illustrates why using it can be preferrable than simply calling kmalloc.

Another chapter that is particularly important, after the recent discussion of whether Swapping is still a good idea, on the Linux Kernel Mailing list, is on Swap Managment. Linux does not do swapping in the traditional sense (writing out an entire process at once, and then reading it back in in one go), unless severe memory pressure demands it. Instead, swapping refers to writing out dirty memory pages to disk, which allows the kernel to eject unused pages, in order to free up memory for more important tasks. The implementation of this, touches on many issues, such as how the file system and VM interact, and this is explained very well in this chapter.

Only slightly more than 200 of the book’s 730 pages describe the VM in writing. The majority of the book is so called code commentary, small excerpts of source code followed by a short description of its purpose and a line by line explanation. This code commentary is especially useful, because even with a good understanding of the general workings of the VM, understanding the actual code, without having had any prior exposure to Linux kernel source, is extremely difficult.

The commentary is divided into chapters, with one chapter for each corresponding chapter of the description. This makes it easy for interested readers to flip to the back of book and see how things are implemented. The most important functions listed in each chapter are described, and longer functions are split into parts for clarity. Less important functions are omitted so that the reader does not get bogged down in useless details, that are easily understood when reading the actual source. This is very different from the famous Stevens’ approach where every small macro or typedef is explained. This decision fits into the overall style of the book, and works very well: by focusing on the core functions, the reader can keep the big picture in mind. The explanations themselves are quite brief but very clear, and most effectively used by browsing the actual source code while using the code commentary as a guide.

The book also includes a CD-ROM that has some very interesting features, and some more standard features with a novel presentation. Instead of simply opening a file on the CD from a web browser and following links from there, the book suggests installing the copy of apache provided on the CD. This approach does of course require a working copy of Linux which, given the subject of the book, is very likely. Users of other operating systems can of course simply browse the CD directly, though tools like the call graph generator (which was also used to generate the graphs in the book) will of course not work. Using apache makes the integration of these programs very elegant, it is, for example quite easy to generate callgraphs for any function in the VM subsystem.
The CD also includes tools for VM regression testing, to test the correctness and performance of your modified code. It is also quite helpful for examining the behavior of the original VM, and comparing performance to the 2.6 VM system. In addition to this, the CD also contains the entire book as HTML, browseable and searchable code commentary, a cross referenced HTML version of the 2.4.22 kernel source and a program that makes creating patches easy. Some of these programs, such as the callgraph generator were actually written by the author himself, and after playing around with them for a bit, I was quite impressed. Overall, the CD is a very useful addition to the book.

Finally it should be noted that the majority of the book ( including the code description) deals with the 2.4.22 kernel, which is quite a recent iteration of the 2.4 kernel. While it is true that the 2.6 kernel has been recently released, and is now being used in some distributions, pretty much everything in this book is still relevant to the new version. Though it would of course be nice, to have a more detailed treatment of the 2.6 kernel, the fast pace of development of the kernel, means that any book will not be entirely up to date after a few months. In order to ease the transition, every chapter contains a What’s New in 2.6 section, that covers all the changes from the 2.4 kernel described in the book. Once you are familiar with the 2.4 implementation transitioning to the 2.6 kernel should not be too difficult, at least as far as VM is concerned.


About the Author:
Can Sar is a Sophomore in Computer Science at Stanford University where he is focusing on Operating Systems and Networking. He is spending the summer doing independent research on Distributed Virtual Memory and will be busy hacking on the Linux kernel next semester.”








Buy “Understanding the Linux Virtual Memory Manager
at Amazon.com





If you would like to see your thoughts or experiences with technology published, please consider writing an article for OSNews.

10 Comments

  1. 2004-06-30 7:52 am
  2. 2004-06-30 8:25 am
  3. 2004-06-30 9:07 am
  4. 2004-06-30 11:03 am
  5. 2004-06-30 11:17 am
  6. 2004-06-30 12:29 pm
  7. 2004-06-30 2:48 pm
  8. 2004-06-30 3:33 pm
  9. 2004-06-30 5:26 pm
  10. 2004-07-04 3:36 pm