Plan 9 is an operating system designed by Bell Labs. It’s the OS they wrote after Unix, with the benefit of hindsight. It is the most interesting operating system that you’ve never heard of, and, in my opinion, the best operating system design to date. Even if you haven’t heard of Plan 9, the designers of whatever OS you do use have heard of it, and have incorporated some of its ideas into your OS.
Plan 9 is a research operating system, and exists to answer questions about ideas in OS design. As such, the Plan 9 experience is in essence an exploration of the interesting ideas it puts forth. Most of the ideas are small. Many of them found a foothold in the broader ecosystem — UTF-8, goroutines, /proc, containers, union filesystems, these all have their roots in Plan 9 — but many of its ideas, even the good ones, remain unexplored outside of Plan 9. As a consequence, Plan 9 exists at the center of a fervor of research achievements which forms a unique and profoundly interesting operating system.
I’ve never used Plan 9, but whenever I read about I feel like it makes sense, like that’s how things are supposed to be. I’m sure its approaches present their own unique challenges, problems, and idiosyncrasies, but the idealised reality in articles like these make me want to jump in.
It is very interesting to study plan 9 as a way we could evolve unix past it’s early roots. I really like plan9 building on unix strengths in the network space, which is good. But it also highlights how our industry gets held back by legacy dominance. It’s not enough to build something worthy of migrating to, we need some way to overcome momentum, which is where most projects fail.
Sometimes I think plan9 would have been a more progressive platform to embrace than linux. Part of me wishes Torvalds had been more innovative with linux instead of just having linux be a unix clone. However at the same time I’m aware that the main reason linux became popular was because it was a free unix clone. If Linus had tried to improve linux in the same ways that plan9 did, then linux would likely have lost the *nix wars.
Drumhellar,
Yeah, the BSDs might have won if it weren’t linux, and who’s to say it wouldn’t be better. I have less experience with BSD, but I think most would agree that linux engineering is chaotic to a fault though. Planning and discipline have not been its strong points. However as is linux was the better supported option, and this ranked highly in the concerns people including me had when migrating from windows.
Pretty certain license matters as well.
The companies who got involved in Linux in the early days probably were happy that some other company couldn’t just take their work and build something closed based on it directly.
Lennie,
That’s an interesting point to bring up, although I think many companies actually prefer the BSD licenses to the “viral” GPL. We don’t have to get into this debate here, but I do wish linux were “GPL 2+” (emphasis on the plus) since linux not allowing GPL 3 is prohibitive for those who believe in GPL 3’s merit.
Just a reminder:
Linux’s success was also the result of timing.
BSD was tied up in the AT&T(USL) lawsuit for 2 years …
https://en.wikipedia.org/wiki/Berkeley_Software_Distribution
https://en.wikipedia.org/wiki/USL_v._BSDi
“The lawsuit slowed development of the free-software descendants of BSD for nearly two years while their legal status was in question, and as a result systems based on the Linux kernel, which did not have such legal ambiguity, gained greater support.”
I’d be glad to see the Minix model flourish, though. Nowadays the “performance hit” for process isolation ain’t that much of a problem. Minix is even used in the IME, which can be a proof by itself that it is a completely viable option.
Kochise,
Makes me wish we could peak into alternate timelines to see how they’ve come along compared to us. I’d enjoy sitting over a beer and discussing these alternatives with our counterparts who’ve seen where it leads to next 🙂
BTW osnews was offline for a period yesterday and since then I’ve lost the ability to edit my posts. Is this change intentional or will edit be brought back? Personally I do make mistakes and find edit useful.
Plan 9 has some good ideas, but IMO it has a few major issues. One of the biggest is that its excessive minimalism holds it back as a practical OS. Another major one that limits its usefulness as a practical system is that it breaks compatibility with Unix in ways that provide relatively little benefit (e.g. making errors strings instead of integers). To be fair, I don’t think they were really trying to make a practical OS for general use though.
Also, the integration of the security model with clustering seems to be intended for a world that never happened due to multi-core processors (the main use case for interactive clustering in the modern world would be for managing servers rather than running desktop applications, so clustering should be entirely optional on desktops). In addition, I don’t think it goes quite far enough when it comes to file-oriented architecture. It really follows an “all I/O-like things are files” model, not a true “everything is a file” model. It would be perfectly possible to make an OS in which memory and process/thread creation are file-based as well.
I do consider it to be one of the main influences on the OS I’m writing, the others being QNX and Linux. My goal is to write a practical OS for many use cases, rather than a research OS like Plan 9. Unlike Plan 9, it will have a high degree of Linux compatibility (both for applications and drivers). I’ve still got a bit of a ways to go before it actually runs user programs though.
andreww591,
I’ve always found the numeric error codes to be restrictive and uninformative since DOS to be honest. Consider this (hypothetical) example…
Mutl-core is not really a (good) replacement for OS integrated clustering though. Integrated clustering is so powerful and keep in mind that the technology would have had decades to continue to evolve under the plan9 tree. Even today I think it would be really awesome to have an OS with better network transparency to make use of several computers as one out of the box.
Doesn’t plan9 go further than unix in terms of everything is a file? I think this is how plan9 manages resources across a network.
https://matt-rickard.com/plan9-everything-is-a-file
I’m not sure about memory and threads specifically, but I don’t know how far it makes sense to take the analogy as there may be tradeoffs.
I think we’ve spoken about it before, I’m curious how it’s coming along! IMHO compatibility is a double edged sword. You need it for obvious reasons, but then it has a tendoncy of harming creativity and innovation while dragging along evolutionary baggage. Unfortunately the vast majority of software will never use APIs & features that are unique to a niche OS, This discouraged me from continuing ongoing work on a kernel I started at university. As much as I enjoy working on low level kernels, the market is already so saturated and I couldn’t justify the work especially when getting paid wasn’t on the table.
Numeric error codes are S.H.I.T.E… they make sense on 8 bit up through around 16 bit systems but systems with larger address spaces than 16bits they are historical dead weight.
Plan9 just needs a win95 work alike UI… ACME/RIO etc is a mountain of a learning curve…. it would be much easier if the similar concepts were mapped over top of a familiar UI.
Plan9 is practical for some uses…Lucent proved that and used it and still uses it as far as I know.
Actually the plan9 clustering paradigm would be SICK if its scope were limited to MY personal PCs (desktop laptop and phone)…. imagine buying new PC and adding it to your scope/domain/realm or whatever you want to call it and all your running programs just appear there when you login to your session.
cb88,
This is what I think as well. The operating systems’s resource pool should extend beyond one node. Making clusters be a native capability of a kernel/OS would be awesome as heck! There is so much innovative potential there. The jobs one has (be it application servers like PHP, databases, games like minecraft, rendering engines, etc) ought to be automatically distributed across the resources one has using high speed interconnects (with appropriately tunable parameters and telemetry as needed). Developers & users should not have to reimplement & reconfigure a cluster solution for every application. These should be part of the base OS such that applications can use all one’s resources out of the box. And applications should have access to rich cluster APIs by default to discover and optimize themselves. Even today this would be an awesome upgrade for unix/linux.
“(e.g. making errors strings instead of integers). To be fair, I don’t think they were really trying to make a practical OS for general use though.”
Strings for errors would be not just better but SIGNIFICANTLY better.
1) Are you purposely being vague because you don’t actually have a clue as to why your program is having a problem? Or
2) Are you purposely being vague for security reasons so that bad people/groups don’t take advantage of your programming mistakes. Errors only happen with programming mistakes. Usually the problem is that people assume that people using the programs will only do X, Y or Z when they quite often doing something totally unexpected like 1, 2 or 3.
My main goal in programming was to ALWAYS have error messages that were clear and concise and I literally paid money to people at the bank I worked for who found bugs or non-clear error messages. Don’t get me wrong, I like money as much as the next person. But at the same time I REALLY want my 800,000 lines of code for programs such as creating mortgage loans (you wouldn’t believe how many different variations there are in mortgage loans based on many, MANY different reasons for them).
I found that the cleaner and clearer my error messages were the easier it was to maintain our group’s code.
1) We wanted code with as few errors as possible. Zero if possible.
2) We wanted our programs to be as short as possible BUT we checked every entry for all possible errors including someone trying to do buffer overflows. The most you test what is coming into the program the less errors you potentially have. We also reused our code over and over again for data validation in both live entering of data and batch files. Catching errors in data (either by accident or deliberately trying to break our program) as it enters our programs stops a lot of problems before they can start.
3) We want our programs are clearly readable as possible.
4) We have a standard for the way our programs look and work. We want anyone of us to be able to look at the code and see a very familiar look to the way the program is written so that it takes very little time to get up to speed if an error has happened and it needs to be debugged.
That’s obviously simplifying things quite a bit but you get the idea. Garbage in means garbage out. But there are a lot of hostile people/groups/countries out there trying to break code for any company that they can make money off of by not going in through the front door with a gun and robbing the bank. Cyber crime pays a lot more and is a lot harder to detect and stop from happening. The better the code, the more checks on the data entering the system the safer the program is. There is a lot more to it than that and I won’t go into what we used other than to say that every time the program is run the programs don’t put the same information in the same place in memory. How the data AND the program is stored in memory changes it time it is run. That way nobody will be able to expect the program or the data to be in any specific memory location even if they figure out where part of the program or data is, the rest is always in a different place the next time the program is run.
String errors are better for 2 specific purposes:
a) Ruining performance and making software suck – e.g. “switch(error_code)” gets replaced with a bloated mess of “if(strcmp()”.
b) In the context of an API that all programmers across the world would use (an app written by a small team in a specific locale is very different); helping to promote racism with “everyone who doesn’t read English is unimportant” ignorance.
For all other purposes; just use a “#define E_STRING_TOO_LONG 0x1234” (or an enumeration, or…) like every sane programmer has always done to hide all magic numbers behind descriptive text.
.
Brendan,
This is kind of a C specific gripe. Many modern languages can handle swtich(string) cleanly.
I do get your point, however I feel it’s a pretty bad justification for using numbers. Rather than meaningfully helping foreigners, it only ends up obfuscating the errors for 100% of users.
It’s only descriptive within the source, but you still need a dictionary to translate the error codes. Not only is this laborious, it’s more work to do things this way while also being less flexible than a string.
To be clear I don’t object to developers wanting to supporting other languages with their software, but given that almost every line of output will need to be translated, I’d rather see a generic solution that also includes errors. There’s not much point in translating the entire application only to keep errors as numbers that users have to look up.
It’s a computer specific gripe. A CPU simply can’t do it efficiently (e.g. converting “switch(string)” into a jump table). Poorly designed languages that make inefficient code convenient are an unrelated problem.
For kernel APIs it’s probably worse (you couldn’t return “pointer to string in kernel space” that user-space can’t use).
To create useful error messages for normal users, you need to add context and should suggest a way for the user to work around the problem. Something like “There wasn’t enough memory” (that is all you can get from generic API that has no idea what the memory is for) is inadequate and should become something like “There wasn’t enough memory to create the list of customers. Please reduce the number of customers, exit other applications, or install more RAM”. For some cases you might also need other information (e.g. an error message sent to a log should have a time stamp).
You might be able to work around the inadequacies of error strings by building a better error message around the original error string, like “Something went wrong while creating the list of customers. The OS said: ‘There wasn’t enough memory'”, but that’s ugly and doesn’t help much with internationalization.
Of course if you actually do want an awful generic error string, an API that provide error codes can also provide a way to convert an error code into an awful generic string (e.g. “strerror()”).
Nobody has suggested that error codes from APIs would ever be shown to any user. (although it might make sense in some cases – e.g. things like compilers and linkers, where you can expect that the user is also a programmer, and where including the original error code within a useful/descriptive error message actually can be helpful).
Of course often software deals with the error itself (retries, falls back to an alternative, was doing something non-essential, …); or the error doesn’t actually indicate a problem (e.g. a time-out from “select()” being used to do something else periodically); and sometimes success is an error condition (e.g. checking that something doesn’t exist yet or was deleted properly, where a “not found error” is expected and “successfully found” indicates something is wrong).
Brendan,
Poorly designed languages? Haha, you are being so dramatic! Anyway, standard operations these days can compare 8 bytes at a time, which is pretty quick but more to the point here we’re talking about errors that deviate from normal code paths. You could criticize throw/catch mechanics of C++ for being inefficient too, but considering their context it’s virtually a complete non-issue for typical projects.
You’re stretching the bounds of the original topic here, but what most software does in practice today is print errors to stderr, which is how users figure out what error happened. In fact it’s often the only feedback we get, which is why systemd captures it into the application status. Unfortunately though it’s usage is not well standardized. Software can and does output everything from debugging, notices, and errors to stderr. Although I concede it’s too late to change anything now, there would be merit in having rigorous format/standard to build consistent tooling.
This is an interesting point of discussion and I agree with you context is extremely important. Consider…
I think it runs counter to the assertion that we need to stick with error numbers though. It’s a lot easier to provide more context using strings that are so much more versatile.
I think this is an excellent use case for using a language’s try/catch mechanism. In most of my projects I use an error class that includes both location & error messages to provide maximal context.
Sabon,
I agree with what you are saying overall. But as for address space randomization, I strongly feel it should be treated as a stop gop measure and not the solution. After all, it’s security through obscurity that treats the symptoms rather than the underlying cause of bugs. IMHO we should just be using safer languages that don’t suffer from memory corruption vulnerabilities in the first place.
Strings for errors suck. You need to do something with the error, which to be fair you can do with both. But, then you get some jerk that decides ERROR_NO_DISK_SPACE_LEFT should really be ERROR_NO_SPACE_LEFT because duh there are other storage mediums other than disks. So now all the programs expecting ERROR_NO_DISK_SPACE_LEFT don’t correctly handle ERROR_NO_SPACE_LEFT. Its trivial, but so many dumb people writing apis as a designer I wouldn’t give future people that option. Use a define, its fine. Then you can rename it in your code to your hearts desire call it NO_FUBFLAB_IN_SLOT I don’t care. Just don’t change the interface, and don’t make it tempting for less future oriented devs to screw it up.
Bill Shooter of Bul,
I’m not opposed to displaying an error number in addition to a message, but just using an error number will leave a user in confusion every single time. Also I want to point out that most programs print out the error text provided by strerror (or FormatMessage for win32) and don’t use their own arbitrary text…
This is ultimately sourced from GNU’s errlist.h:
https://codebrowser.dev/glibc/glibc/sysdeps/gnu/errlist.h.html
I find this to be very clear and moreover since most applications are using the same dictionary the messages are consistent.
The reason we cannot use error numbers alone IMHO is because many applications have to report their own errors and each would have to have it’s own dictionary. The error situation would be intolerable without human readable error messages.
It’s not always enough though, in particular I’d refer to the problem of context as Sabon brought up: a global error number often lacks important context such as where the error happened and what resources it was acting on, etc. A global error table is simply insufficient unless you go crazy with unique error numbers for every combination that the error comes up in, but that’s unmanageable.
In a windows project I was working on recently, we needed to extend a code base to use new libraries, each with their own error numbers using defines as you suggest and to make matters worse one library was ported from linux. The code base for the main project was previously hard coded around win32 errors, which was a reasonable assumption when it was written but but this no longer worked as there were clashes in the error dictionaries and the error reporting broke with the linux code displaying the wrong errors. I ended up retrofitting a considerable amount of code to support a new class containing both the error number as well as the source of the error (in addition to other context to help debugging). This way the errors could be understood properly regardless of the source.
So although I understand your reasoning, I just want to point out that simple error numbers/defines are not always without problems.
1) UNIX is spelled with ALL CAPS and not as Unix. Geez. And this person knows about OSs?
2) “Plan 9 failed, in a sense, because Unix was simply too big and too entrenched by the time Plan 9 came around. It was doomed by its predecessor.” NO IT WAS NOT DOOMED BY UNIX. Plan 9 and OS/2 and other OSs didn’t “make it” because the people behind the OSs “gave up and quit”. If they had pursued Plan 9 and continued to update it and gave it equal or bettering licensing options than UNIX or any other OS then in the end it COULD have dominated server space by now.
Instead the people behind Plan 9 gave up. Victory is not to the group that has the best weapons. Victory is to the group that has the most will to win that doesn’t give up but continues to fight. Obviously if they have bows and arrows vs tanks they aren’t going to win. But if they are anywhere close a team will almost always beat another team that plays like five individuals and are selfish.
I’m sick and tired of hearing that Plan 9 or OS/2 or other OSs “failed”. They DID NOT fail. The people behind them failed. It’s as simple as that.
My wife says way too often, “It’s too late”. I’m sorry, I respond. Are we dead? Exactly what timer SAYS that it is too late. Most often it is my wife (who is not normally lazy) will not want to spend the energy to change mid course even if she is sitting in the passenger seat and she has nothing to do with which direction we are going. She isn’t driving. In those cases I PURPOSELY turn around and go a different way just to make a point that IT IS NOT TOO LATE UNLESS YOU ARE DEAD to change what you are doing and stick to things longer for however long it takes. If you really do have something better, like electric cars vs gas/diesel cars, people will realize this and change.
Every time my wife says it is too late I change our route. She is getting to where she is afraid to say it anymore because we’ve been late to a few things which makes her mad and I tell her all she has to do is stop saying, “It’s too late” and I’ll stop changing our route. Simple as that.
The point is, they could have stayed on their route and not given up on Plan 9. But that isn’t what happened. The OS didn’t fail. The PEOPLE that were in charge of Plan 9 failed because they gave up. That’s totally on them and their will to want Plan 9 to succeed. People fail. Operating Systems maybe good or bad but they don’t fail. It’s the people and not even how good or bad the OS is written (take Windows for instance) that determines which OS will stand in the end. So stop the BS that Plan 9 or OS/2 or any of the other OSs that aren’t maintained anymore failed. THE PEOPLE failed the OS. Not the other way around!
UNIX is capitalized at the beginning of a sentence and lower case when unix is referred to not at the beginning of a sentence.
Perhaps more importantly you should adjust your attitude… as it seems quite a larger problem than the spelling of unix.
Hey, speaking of capitalization, I noticed you have several incorrectly capitalized words and phrases throughout your rant:
FYI, you should be using emphasis tags (* * or _ _) instead of capitalization to assign emphasis in plain text replies online, otherwise you come off as a raving lunatic screaming into the void about random disagreements with your wife. Speaking of, I pity the poor woman who has to put up with your attitude and anger issues, especially since you acknowledge that she is afraid of you.
Sabon
Meh. I’m not really bothered by it. Certainly not worth a rant.
You can’t really separate the two though. Success & failure isn’t just one factor or another, it comes through a combination of many factors. Sometimes good products loose and bad products win. It sucks, but sometimes the underdog doesn’t have a clear path to success absent powerful industry connections and cash flow.
You loose every game you don’t play. This is a fact that lotteries love to point out to convince the public to play, but in reality the odds of winning over others may be extremely slim. Asymmetric markets are very real and I don’t think we can or should ignore this.
Heh… I bet if you checked the bank accounts of the people behind plan9 very few of them are failures by the average person’s metric.
cb88,
Maybe. but it still might not make business sense to keep paying employees without a business model to eventually create revenue. While Bell Labs was no stranger to long term investment, once AT&T decided to pass on plan9 in favor of unix for itself. I think it became a harder case to make. Even now with the benefit of hindsight it’s hard to say what they should have done to get traction. As it were plan9 did not have the needed hardware & driver support and it would have required a larger capital investment to finish plan9 and promote it.
Even if the new platform is better, it doesn’t automatically mean enterprise customers will be willing to switch gears, redesign/retrofit their existing working & debugged software to support new incompatible platform/API with an unproven track record and uncertain future. It’s easier and less risky to use mainstream platforms that have more mindshare and better support.
So, even though I really like the developments plan9 was working on, I still think it faced significant market headwinds.