Linked by Peter Gerdes on Mon 10th Jan 2005 17:35 UTC
Editorial As a recent ACM Queue article observes the evolution of computer language is toward later and later binding and evaluation. So while one might quibble about the virtues of Java or the CLI (also known as microsoft.net) it seems inevitable that more and more software will be written for or at least compiled to virtual machines. While this trend has many virtues, not the least of which is compatibility, current implementations have several drawbacks. However, by cleverly incorporating these features into the OS, or at least including support for them, we can overcome these limitations and in some cases even turn them into strengths.
Permalink for comment
To read all comments associated with this story, please click here.
Thanks and more explanation
by logicnazi on Tue 11th Jan 2005 04:28 UTC

First of all I wanted to thank everyone for the thoughtfull consideration and interesting responses. In particular I found the LLVM stuff fascinating. While it doesn't quite address all the things I had in mind it does go a long way there.

Now a few comments in response to what people have said.

<h2>Why Use VMs</h2>

First of all there are several good reasons to use virtual machines rather than static compilation. As several people here have accurately pointed out there are some performance benefits to doing things at runtime. Additionally there are many garbage collection benefits to working in a virtual enviornment as additional information is availible letting one avoid the drawbacks of conservative GC. In particular, I think it would be very difficult to provide guaranteed finalization in a staticly compiled enviornment.

Moreover, static compilation doesn't provide for binary compatibility. While theoretically one could simply provide guaranteed source compatibility the pragmatics of software development make it quite unlikely that this would really be effective. Even pure ANSI C programs usually aren't write once run anywhere. Quite simply as long as the development enviornment is focused around the execution of native binaries the temptation for developers to take advantage of pure binary features incompatible across platforms is simply too great. Furthermore, without fat binaries or a solution like I am suggesting it seems difficult to provide transparent binary copying between platforms and architectures.

Still, these issues may not in themselves provide a compelling justification to make such a major change and some hack like fat binaries or automatice recompilation might offer the user live appearence of perfect binary compatability. However, VMs provide several features that simply can't be provided in staticly compiled code.

Foremost is finely grained permissions/sandbox features. By only allowing the JIT compiler to cache/create 'safe' code we can force all sensitive operations to be performed virtually. While we might introduce rough grained permission features using ACLs or binary scanning these simply can't provide the level or protection and the fine-grained distinctions a virtual enviornment can provide.

For instance suppose you download a program from the internet which edits/updates your bootloader. This program needs direct access to your disk but you don't want to allow an error to overwrite all your data or for a trojan to maliciously modify other executeables on your system. In a virtual machine all calls to the direct disk system call would be sensitive and pass through the emulator portion which can enforce restrictions like requiring all reads and writes to be in a certain range. Since the sector being accesed may be determined by a complicated algorithm once simply can't guarantee these resctrictions at compile time.

So while grsecurity does demonstrate that we can add access controls one at a time to certain system calls it requires specifically dealing with each function one wants to restrict by hand. A virtual enviornment provides a general solution where *any* system call be made subject to near arbitrary restrictions. One might specify that a given program is only to send UDP packets to a particular IP address, or may not start IPC with a particular process or any restriction imaginable not only those which the security people thought about. You can also guarantee the program does not read information that it still must access, for instance the program might need the information from uname but you don't want it to read the uname field or the program might need the result of one syscall to feed to another but the program itself should not be allowed to see the information. Finally, you can implement positive security, giving a particular list of all and only the calls the program is allowed to make rather than negative security which is mostly what binary security can offer.

While some of these features might be possible to implement for native binaries with clever hacks the performance hit would be unacceptable. If we want to block IPC to a process with a particular name in one program a binary solution would require a check for authorization for *every* program seeking to do IPC. While what we really want is system level programs to have fast unrestricted access and sandboxed programs to go through the security checks. So if we want these completly general security restrictions for binaries we either must accept the overhead of every syscall checking for authorization or write a wrapper function for every system call. If we want the ability to replace arbitrary syscalls with our own code, perhaps all programs in a particular sandbox need to be given a modified list of running processes, the difficulty becomes even greater. Not to mention the inherint superiority of virtual security over binary security. Since a pre-compiled binary is directly running on system hardware it is much easier for the slightest error in your security model to allow an arbitrary exploit.

Finally, usint a virtual machine allows trusted computing and contract type programming difficult to implement in pure binary. At heart this is similar to the security issue but differnt in intent. For instance a particular program/plugin may need both to access the internet, say to check for updates or gather data, and handle personal information and a VM based system can track references to allow both but guarantee that the personal information can't exit the local machine (yes this is hard and would have to be conservative). While I don't necessarily like the idea this could also work to protect copyrighted content while allowing the user to load their own tools to search or format the information. It also has the potential to improve grid computing by providing better guarantees that it is really the distributed code which was executed. Finally it offers the possibility of function libraries of unknown origin with enforced contrats.

The rest of this message will be in the next post.