posted by Steven Haryanto on Mon 13th Jan 2003 18:40 UTC
IconA couple of months ago, at the Lightweight Languages Workshop 2002, Matthew Flat made a premise in his talk: Operating system and programming language are the same thing (at least "mathematically speaking"). I find this interesting and has a lot of truth in it. Both OS and PL are platforms on which other programs run. Both are virtualizing machines. Both make it easier for people to write applications (by providing API, abtractions, frameworks, etc.)

The difference between the two, Matthew continued, is that OS focuses more on non-interference--or isolation between OS processes. The main task of a multiuser OS is to let several users use the computer simultaneously. Thus, it is important that no user can take over the machine or use up its resources permanently. Also, no processes shall be able to terminate other processes, peek into their resources, or do any other things that violate privacy unless it is permitted by the OS security policy.

On the other hand, PL focuses on expressiveness and cooperation. PL provides high level constructs and facilities so that one can write programs in less time and with less amount of effort. 10 lines of higher level PL code might be equivalent to 100 to 1000 lines of machine/lower level language code. Additionally, PL provides means for people to share reusable code through the concepts of modules, shared libraries, components, etc.

As time progresses, OS'es are becoming more like PL. And vice versa. OS now provides more and more ways for cooperation/sharing: IPC, threads, COM, etc. PL now provides ways to do isolation: sandboxing, processes, etc.

However, in all programming languages that I am currently using (Perl, Python, Ruby), none of them had been designed from the ground up to do isolation. Thus, none of the isolation mechanisms really work well.

This article will focus on above three languages. It would certainly be interesting to also discuss Scheme, Smalltalk, Java, and Erlang--however since I'm not adequately familiar with any of them I'll leave the readers to give feedback on these.

Why Isolation In PL?

As people construct more and more complex systems, the need for isolation becomes apparent. Complex systems usually untrusted user-level code that need to be restricted. Several examples follow.

  • Database systems usually provide some sort of stored procedure. A remote client can connect to the database and triggers stored procedure to be executed. It is important that if the stored procedure crashes or loops, other clients can continue to use the database.
  • Business applications usually allow users to specify business rules or constraints. Both are basically some simplified high level code. Users might specify these rules incorrectly and the application must ensure that those errors have any unwanted impact.
  • Web application servers usually allow pages/templates to contain code. Since generally the interpreter itself (e.g. Perl or PHP) is exposed to do the execution of the code, the application must somehow ensure that no templates can crash the application.
  • Other applications might allow users to specify regular expressions. Regular expressions is actually a language, though a mini one. Overly complex regexes--either specified accidentally or on purpose--can cause the regex engine to loop endlessly doing backtracking.

So, in essence, complex applications are usually a platform by itself, running subprocesses/subprograms (in a single OS process). Thus, this requires that the PL has isolation mechanisms beyond those provided by the OS: like restricting a piece of code from accessing a certain part of the filesystem, from using more than a specified amount of memory/CPU time, from accessing certain functions/modules/variables. Unfortunately, most PL don't have enough of them.

Perl

The two main security models in Perl are tainting and safe compartments. Tainting are mainly for tracing data, so I will not discuss it here.

In Perl 5.6/5.8 there are about 400 bytecode-level instructions, called opcodes. All Perl code will eventually be compiled to these opcodes. print is actually a single opcode. So are open, sysopen, mkdir, rmdir, fork, gethostbyname, etc. To see the complete list of Perl opcodes, see theOpcode documentation.

Two things are apparent. One, Perl opcodes are higher level than machine level instructions or even Java bytecode instructions. Two, Perl is a monolithic beast. Many facilities (like directory manipulation and even DNS-related stuffs) are built into the language. Perl5 is monolithic because of historical reasons. Perl6 will also be monolithic--so I heard--because of speed reasons.

Every single opcode can be enabled or disabled. This is done in the compilation step. If there is a forbidden opcode encountered by the compiler, the compiler will refuse it and compilation will fail. This has the advantage of speed: the cleansed code will absolutely have no run-time speed impact. The disadvantage: one must be careful to compile code at run-time--otherwise untrusted code can be compiled with dangerous opcodes in it.

The Safe.pm is a standard Perl module that allows a piece code to be compiled with a specified opcode mask (a list of opcodes that are to be forbidden). In addition to that, Safe.pm will do a "namespace chroot". It will make Safe::Root0 (or Safe::Root1 for the second compartment, and so on) as the code's main:: namespace. This means that the code in the compartment cannot access variables in the original main:: namespace, so global variables like $/ is not shared with code outside the compartment (Some variables like $_ or the _ filehandle is shared, though).

That's basically what Perl offers us for security. In practice, Safe.pm is not practical. Choosing a reasonable set of "safe" opcodes is not always straightforward. An opcode like open can range from "rather safe" to "extremely dangerous". Perl's open is so powerful and has many functions: it can open a file for reading, for writing, it can execute programs, open a pipe, duplicate a filehandle, etc. You can't, for instance, make Perl allow only read in open. Overriding open() doesn't make it safe, because the code in compartment can always refer to the builtin version using CORE::open(). Moreover, Perl can be told to read/write files without using any opcode at all (for example, using $^I). Thus it is not possible to restrict an unstrusted Perl code from accessing filesystem. To do this, one must resort to using OS facility (like Unix's chroot or BSD's jail).

The show-stopper for Safe.pm: most modules don't work under Safe.pm. DBI, for example. Embperl 1.x uses Safe.pm but drops it in the 2.x versions. Virtually no other web application servers uses Safe.pm these days. Even Perl experts say that Safe.pm is too broken.

Conclusion: Perl has some sort of sandbox, but it works at the compilation step only. It's not very flexible and it's not very useful. Perl is also monolithic and many functions are built into the interpreter. Thus, it is harder to isolate functionalities.

Table of contents
  1. "Intro, Isolation, Perl"
  2. "Python, Ruby, Conclusion"
e p (0)    29 Comment(s)