Virtual Machines and the OS

As a recent ACM Queue article observes the evolution of computer language is toward later and later binding and evaluation. So while one might quibble about the virtues of Java or the CLI (also known as microsoft.net) it seems inevitable that more and more software will be written for or at least compiled to virtual machines. While this trend has many virtues, not the least of which is compatibility, current implementations have several drawbacks. However, by cleverly incorporating these features into the OS, or at least including support for them, we can overcome these limitations and in some cases even turn them into strengths.

So as to head off any confusion or a premature dismissal of my ideas, let me make it clear that by operating system I DO NOT mean the kernel. So when I talk about moving support for virtual machines or JIT compiling into the OS I don’t necessarily mean putting it into the kernel. Some kernel support may be necessary but how much and what goes into the kernel will depend on the implementation and the costs associated with switching between user and kernel space. I would expect most of the ideas I suggest below to occupy a space similar to that of a dynamic linker. A fundamental part of the OS but not necessarily part of the kernel.

Current Virtual Machine Technology

The first, and most obvious, drawback to the use of VMs (virtual machines) is performance. I don’t want to get involved in the religious debate about the speed of Java compared to C++, but the current technology for executing VM code has an inherent performance disadvantage. While simple to program, interpreters are far too slow. Just in time (JIT) compiling is much faster than interpretation but still adds the compilation time to the execution time. If one could execute the same code without the overhead of compilation you would have a significant performance gain. Even if impressive development effort and the use of run-time profiling has temporarily made JIT compilers as fast as ahead of time (AOT) compilation it is only a matter of time until these advances make their way into AOT compilers and they regain the performance advantage.


There is also another performance issue facing VM code: start up time. While this may be of limited concern to large applications or to servers which only load code once, it presents a serious difficulty to writing small utility applications in code which compiles to a virtual machine. This large load time is also also a big contributor to the end user’s impression of VM code as slow. Faster processors and other tricks like leaving parts of the VM in memory might make this manageable for most applications it still doesn’t let us use this code in commonly-used libraries and other system code, denying us much of the tantalizing portability promised by VMs.


This load time overhead brings us to the second drawback to current VM technology. The issue is well explained in this article in Java Developers Journal. I won’t repeat the article but the short explanation is that to deal with the large load-time overhead what should be separate tasks end up inside the same OS process. As a result, programs written for a VM do not gain the same benefits of memory protection and other process/thread isolation features in modern operating systems. This also poses great difficulties for any attempt to write code in a VM which acquires permissions or privileges from an operating system.


Towards a Solution

One of the first things that should occur to someone when they learn how modern JIT compilers work is how remarkably inefficient the process is. The most obvious inefficiency is that every time the program is executed we spend time recompiling the exact same code. Despite what present software offerings suggest we don’t need to make a black and white choice between recompiling the program on each instance and binary incompatibility. It is completely possible to cache snippets of compiled code for use in later execution without sacrificing the benefits of using a VM.


The FX!32 binary translator/emulator from DEC is an amazing example of the power of this method. In this case the ‘virtual machine’ the code was written in was actually x86 machine language which was being run on an alpha chip. Instead of simply emulating the code inefficiently or forcing the user to wait while code was recompiled FX!32 would begin by emulating the code and then compile frequently-used code snippets into native binary which would be stored on disk for later execution. This technique of mixed emulation and stored translated code was so powerful that after enough program executions the emulated code would sometimes run faster than the original code. A detailed description of the FX!32 system can be found here .


While this might be the most obvious inefficiency in JIT compilation it is not the only one. Various sorts of code analysis and intermediate representations are often recreated every time the JIT compiler is run. In some cases these are structures already computed, in a much easier fashion, in the prior compilation from a high level language to the VM code. For instance when compiling from C# to CIL a CFG (control flow graph) is computed and then discarded but then the CFG must be computed again by the JIT compiler to convert CIL to machine code. Furthermore, having to analyze a lower level representation, and having less time to do it in, may very well produce a less detailed representation, e.g. believing the entire result of an operation is needed even though part of it may be unnecessary.


Multi-level Binaries

There is an elegant solution to both of these issues. The basic idea is to avoid discarding information in the course of the compilation process. Instead of having a binary which has only machine code, or only VM code or a text file with just source code, we combine them together. When compiling from a high level language we annotate the source with a CFG (or perhaps a SSA tree) and the manner in which the high level language corresponds to the VM code. The same idea applies to our JIT compilation from VM code to machine code. We annotate the VM code with the snippet of machine code the JIT compiler generates so next time the binary is run we need not recompile that snippet. Of course corporate software developers may not wish to include their full source code but they can still benefit from the additional information contained in the multi-level binary and the performance benefits of reducing redundant work. While developers and users gain the convenience of treating source files, at any level, as if they were executables.


The performance benefits to this approach are manyfold. By only calling JIT/compiler features when they are not already cached we remove the fundamental advantage natively compiled code has over VM code while retaining the binary compatibility. When a file is transferred to a different machine the JIT compiler voids the machine specific part of the multi-level binary. By retaining the higher level code future improvements and optimizations can increase the speed of prior programs. Since the code is not fully compiled sensitive operations can still be passed to external emulation functions so as to give the sandbox style features VM code often possesses. Furthermore we can run programs with dynamic function creation (like nearly every lisp program) without the overhead of compiling an interpreter into the program. Finally, since the multi-level binary retains more information on program structure it is easier to make use of profiling information which we could also save in the multi-level binary.


OS Integration

Of course, the benefits of multi-level binaries might be implemented at a purely application level. We might simply create a cache file for each source code file and have this loaded by the JIT compiler. Of course, this approach runs in to all sorts of difficult in copying files and the like but perhaps we just need a special file format. However, this makes things quite difficult if we want to write programs in more than one language without the overhead of RPC or CORBA. This difficult is no doubt one of the reasons integrating source and binary information isn’t more common. Still, we might address this issue by some system of standards and shared libraries used by the various compilers and JIT systems.


If we only want to address the issue of performance this might be a workable solution. However, if we want all the benefits of process/thread isolation without the overhead of loading a new VM/JIT compiler for each execution context we have no choice but to add operating system support or re-implement all the features of task management again in each JIT compiler. By making the various compiler functionalities as specialized OS modules each execution context written in a VM can share the same JIT/VM code while gaining all the benefits of process isolation. Essentially the JIT code would be a shared read only code segment which may be executed in several different processes. Thus the OS may be executing the same JIT code in several different processes at the same time. Also OS support could aid profiling with little overhead by saving the execution location in the saved snippets of machine code whenever it interrupts the process at that point.


Background Optimization

While the FX!32 style saved code snippets provide great performance after the program is run several times it seems silly for the program to under-perform the first several times it is run. Moreover, one of the performance problems with JIT compiling is that only computationally easy optimizations can be performed. Why not solve both of these problems with a background compiler(s) which scans the multi-level binaries on your system in free CPU cycles applying optimizations, interpreting profiling data and generating snippets of machine code to minimize time spent in the JIT compiler. Of course a JIT compiler or interpreter is still needed to handle sensitive operations or run code which the background optimizer hasn’t gotten to yet but it can be much smaller and faster counting on the background pass to perform many optimizations or at least tell it which optimizations should be carried out.


This raises the tantalizing possibility that OS updates would actually speed up programs you already have on disk. The background optimizer could download clever new ways to compile the source code and update your programs while you sleep. Since the multi-layer binary keeps so much program structure this could mean real efficiency enhancements. For instance if someone figures out a much more efficient way to implement the printf function the background optimizer could void the VM code in all your binaries corresponding to the printf function replacing it with the more efficient version without the need to recompile the rest of the code.


Of course the prospect of strangers changing the code on your system is a scary one from the prospect of security or even stability. Thus it is only reasonable for our background optimizer to require a proof of equivalence. Each line of code, or code snippet would be characterized by some manner of contract or even just a piece of reference code in the next level down. In either case the effect of the code would be captured in some formal system which allows for proofs of equivalence and only provably equivalent optimizations would be accepted.


A Vision of the Future

So in my vision of the future the distinction between the compiler and the OS blurs or disappears completely. In so doing we approach the holy grail of code reuse. If someone anywhere discovers a clever optimization or algorithm, and they can prove it equivalent to the standard algorithm, everyone benefits. If some clever assembly language hacker out there discovers a really fast way to implement some VM instruction he can use some specialized tool to prove his implementation is equivalent and it will spread across the internet speeding up every program using that VM instruction. The use of multi-level binaries even hints at the possibility of replacing entire algorithms by proving they are equivalent to some faster version, though our formal proof systems need some time to improve before this can happen.


While some of the ideas about downloading optimizations may be far off I think our constant evolution towards later binding languages and VMs makes the integration of the compiler and OS inevitable. It may not happen the way I suggest but hopefully by getting these ideas out there people can begin to think about it and perhaps avoid the grim possibility of these ideas being patented by a mega-corporation and strangling technology which truly could enable write once run anywhere freeing up the OS and hardware markets to real competition. At the very least I want to know why the technology of caching code snippets that FX!32 had long ago is absent in every JIT compiler I have ever seen.


The author, Peter Gerdes or logicnazi as he is known on OSNews, is a graduate student in mathematics/philosophy not computer science. So it is entirely possible that these ideas suffer from some simple pragmatic problem he doesn’t know about. If you want to contact him to tell him about one of these problems you can email him at [email protected] where he is bravely testing gmail’s spam filters.

32 Comments

  1. 2005-01-10 6:08 pm
  2. 2005-01-10 6:09 pm
  3. 2005-01-10 7:00 pm
  4. 2005-01-10 7:00 pm
  5. 2005-01-10 7:08 pm
  6. 2005-01-10 7:43 pm
  7. 2005-01-10 9:11 pm
  8. 2005-01-10 10:01 pm
  9. 2005-01-10 10:39 pm
  10. 2005-01-10 10:55 pm
  11. 2005-01-10 11:34 pm
  12. 2005-01-11 12:20 am
  13. 2005-01-11 1:06 am
  14. 2005-01-11 1:13 am
  15. 2005-01-11 3:06 am
  16. 2005-01-11 3:38 am
  17. 2005-01-11 3:44 am
  18. 2005-01-11 4:28 am
  19. 2005-01-11 5:07 am
  20. 2005-01-11 6:16 am
  21. 2005-01-11 9:01 am
  22. 2005-01-11 10:29 am
  23. 2005-01-11 11:00 am
  24. 2005-01-11 11:47 am
  25. 2005-01-11 12:18 pm
  26. 2005-01-11 2:52 pm
  27. 2005-01-11 3:06 pm
  28. 2005-01-11 3:25 pm
  29. 2005-01-11 11:34 pm
  30. 2005-01-12 3:51 am
  31. 2005-01-12 2:15 pm