Alexander Popov, Linux kernel developer and security researcher, takes a very detailed look at Fuchsia and its kernel.
Fuchsia is a general-purpose open-source operating system created by Google. It is based on the Zircon microkernel written in C++ and is currently under active development. The developers say that Fuchsia is designed with a focus on security, updatability, and performance. As a Linux kernel hacker, I decided to take a look at Fuchsia OS and assess it from the attacker’s point of view. This article describes my experiments.
This is a long, detailed account of his findings, much of which goes over my head – but probably not over the heads of many of you.
The main takeaway is Fuchsia is still work in progress, and some important security measures (like kernel address space randomization) were not properly implemented.
The article details how the developer was able to hack the kernel into installing a rootkit from the user space. That included reading the address space, and a use-after-free bug. None of which should happen in a modern production system.
Anyway, Fuchsia is “working” in production, but apparently there is still more to be done.
sukru,
My first thought on seeing a use after free bug is “why are we still building on top of insecure languages?” These kinds of bugs were forgivable in our cowboy days, but for it still to be such a serious problem half a century later is disappointing.
IMHO address space randomization should not be considered a long term solution. It’s pointless on 32bit architectures, and on 64bit it still falls short of the entropy needed to make it impervious to brute forcing. In a kernel that’s running 24×7 ASLR would slow down an attack but that’s it.
https://web.stanford.edu/~blp/papers/asrandom.pdf
We all keep falling into the trap of C/C++ as the defacto systems programming languages. If this doesn’t end it means future generations and operating systems will just continue to experience the vulnerabilities associated with insecure languages. While I am glad there are OS projects that use Rust, they’re not likely to break into mainstream appeal without major backing.
Indeed. I’m waiting for it to become more mature.
Alfman,
While significant portions of Fuchsia is written in Rust, this bug was in the micro kernel (the “kernel” kernel). If I read the code correctly, it is in one of the resource managers:
Hence, even if parts of the kernel would be written in a higher level language, some potentially buggy code would still be responsible for actually making sure those high level constructs are secure. i.e.: it would be written in assembly, “unsafe” Rust, or in this case C++.
In the long run having a smaller micro kernel will limit the attack surface.
Looking back… They could have moved the resource management to an even more central location.
Instead of having each syscall responsible for their own resources, they could make sure there is only a single place (or as little as possible with a sensible design).
Why shouldn’t the core kernel use safe programming though? From my point of view even though the kernel plays a privileged role for the OS, it isn’t so special that it must use an “unsafe” language. Once it’s bootstrapped it should be running safe code. Unsafe code makes sense for developing the primitives themselves obviously, but IMHO the goal should be to use those primitives from safe code exclusively to catch errors.
By “microkernel”, we usually mean separating units into their own address spaces such that the hardware achieves isolation. However a safe language technically offers an alternative to this such that isolation is achieved using logical proofs. Many years ago some of us had a discussion about this on osnews, but it actually turns out that microsoft implemented the ideas in their Singularity research OS.
https://en.wikipedia.org/wiki/Singularity_%28operating_system%29
To me this seems like another potential benefit of safe languages: we can achieve much better isolation than a monolithic kernel without incurring the overhead of traditional hardware based microkernel. The drivers can run in the same address space while the compiler guaranties that they are logically isolated.
sukru,
That did seem odd to me. My instinct would have been to go with a standardized IPC mechanism that everything shares, but it would be interesting to hear their rational.
Alfman,
For reasons using unsafe languages, @Brendan has laid out many valid cases below.
As for the security of the Fuchsia system, they use a very different model than we are used to. Instead of desktop users having access to a set of data and apis, it uses separate modules each with limited capabilities.
This is more like the OAuth2 “Scopes”, where your token only allows you to do certain actions from that application, but nothing else:
https://oauth.net/2/scope/
It is quite interesting. It automatically closes many security attack vectors for desktop applications.
(Cannot for example “hack” notepad and open a socket. Or even directly access file system. Only the services whitelisted through their IPCs are accessible).
sukru,
I’m confused which part of my comment you are replying to. If it’s the microkernel isolation part, then I still think safe language based isolation could work to for seperate “microkernel” modules inside of a single address space.
Part of the problem with syscalls in a microkernel is that a call from one application or module to another kernel module actually has to go through additional CPU context switches compared to a monolithic kernel, it cannot directly go from the caller into the callee address space. As far as I know this is a fundamental limitation of microkernels using hardware isolation. Software enforced isolation doesn’t have this limitation though.
If you are responding to syscalls versus a standardized IPC mechanism, then yes clearly fuchsia is different. POSIX based operating systems have scaled to offer new functionality by overloading primative syscalls instead of adding new syscalls for every function. I’m curious of their reason for doing it this way and I wonder if there was a deliberate rational or if it just evolved this way.
Alfman,
Maybe we are talking about different things. It happens.
According to the diagram on that article, the Fuchsia kernel has:
Which might be a bit “fat” as far as microkernels go. But still leaves all other parts, like device drivers, or filesystems to the sandboxed services.
As for why they go with many syscalls? (cited to be over 170)… I am not sure. Maybe they wanted to include everything in the first phase, and then lock it for the future? Not sure…
I tried building it this morning, but failed. If I can get it running, it will probably answer some more questions.
Alexander modified the kernel source code to create his own use after free bug in TimerDispatcher; then used KASAN to prove that a “use after free” bug like the one he created would never have existed in an official/released kernel; then he exploited the bug he created (that didn’t exist and wouldn’t have existed in an official/released kernel) after also disabling security features built into hardware (Supervisor Memory Execute Protection (SMEP) and Supervisor Memory Access Protection (SMAP) ).
The reasons we’re still building (some) stuff on top of insecure languages are:
a) We have tools to make “insecure” languages a lot more secure (Valgrind, ASAN, etc).
b) Some security problems are in hardware (rowhammer, meltdown, spectre) and “insecure” languages are more able to work around them (allowing the system to be secure) while “secure” languages are less able to work around hardware flaws (leading to an insecure system).
c) Some security problems are from bugs in the compiler, where a “secure” language needs a more complex compiler, and increases the chance of security problems caused by the compiler.
d) Resource management isn’t just about memory. Things like file handles, capabilities, IRQs, threads, … are also resources (e.g. you could just as easily have a “use file handle after closing file” bug); and for modern kernels “memory” isn’t just heap alone (it’s virtual memory and physical memory and memory mapped devices and IOMMU config and…) and software isn’t the only thing using resources (e.g. you could have a “USB controller continues transferring data to physical memory after its freed” bug). Those “secure” languages only solve part of the larger problem while giving people the false illusion that the problem is solved, which encourages developers to pay less attention (and can make security worse because developers are paying less attention to all the resource management problems the language doesn’t solve).
e) Secure languages simply don’t work at all for some types of work. E.g. it’s simply not possible to write a micro-kernel in Rust without using the “unsafe” keyword; and as soon as you start spraying “unsafe” all over the place (which is necessary for something as low level as a micro-kernel) you find that a “secure” language just creates more hassle for developers without solving anything.
f) Performance. Things like basic SIMD/vectorization (where compilers just plain suck), run-time generated code, self-modifying code (e.g. modifiable jumps so you can inject instrumentation at run-time), etc.
g) Availability of developers. Switching from “common language used in the field” to anything else reduces the availability of skilled/experienced developers. E.g. if you want to hire a kernel developer, you can probably find a few thousand kernel developers with 5+ years of experience with C; and you’ll probably only find 2 kernel developers with 5+ years of experience with Rust.
h) Ability to port code from other open source projects.
i) A fear that switching to a newer language today will just be a waste of effort because we’ll be switching to an even newer language tomorrow (or a newer revision of an old language).
Thank you for the lengthy but insightful list of examples.
Brendan,
For a number of reasons those tools aren’t as effective as safe languages that automatically flags unsafe logic as a compile time error.
This seems wrong as safe languages create similar code to unsafe languages that don’t contain errors. But nevertheless provide a source and I’ll take a look.
Only at first, but in the long run once many developers are dog fooding safe languages then the compiler will have some of the best tested code. This offers more confidence than having all those developers continuing to use unsafe languages where we know the risks of errors are high.
Exactly. Safe languages offer primitives to enforce correct logic beyond just memory allocations, which most conventional languages can’t help with.
I disagree, safe languages are perfect for kernel development. Of course safe primitives don’t happen magicly, they have to be written. For conventional software the primitives will come in the form of a runtime library. For an OS some custom primitives are likely needed, but that’s ok. These primatives SHOULD NOT “spray ‘unsafe’ all over the place”, they really should be abstracted from the code. The benefit is that it’s easier to validate the abstractions once and reuse them throughout the project while being assured that the compiler is verifying their usage across the (micro)kernel comprised of safe code.
SIMD intrinsics are not exclusive to unsafe languages. Runtime modifiable jumps are probably best accomplished by defining a safe primitive for it that can be reused instead of relying on “unsafe” code.
On this point we both agree. However if each generation continues to justify putting off safe languages then we’d be subjecting the next half century to a continuation of the same coding problems that we’ve been hammered with over the last half century. Yes switching has a cost, but every year that we don’t switch has an accumulating cost as well. Over time this accumulating cost will be greater than if we had put in the effort to switch sooner. Alas I concede that short term thinking often prevails.
It depends on how much of the project is new versus reusing legacy code. Code that is ported could benefit from new safety checks. IMHO it was a missed opportunity for Fuchsia not to start with a safe language kernel from the beggining. Now that the kernel is written in C++, unfortunately it would take effort to switch. Same for linux.
Nevertheless developers of rust are keenly aware of the importance of linking to preexisting code and it is supported (obviously any externally linked code is “unsafe”).
This seems to go with “g”. If we choose not to evolve for too long, we will actually increase the costs while missing out on the benefits of evolution.
In theory; sure, it’s always better to detect problems at compile time. In practice many extremely simple things (e.g. “x = myArray[ shared_library_function(123) };”) can’t be checked at compile time. This leaves you with 3 possibilities – permanent run-time tests (that suck – extra overhead and extra hassle for programmer to deal with, like run-time exceptions, with a huge risk that software fails because exceptions aren’t handled well or at all); temporary/”pre-release” run-time tests (with a small risk of bugs slipping into released software but no overhead for released software and no programmer hassle); or no checks at all (which is potentially silly/dangerous).
For things like (e.g.) injecting a “speculation barrier”; and flushing or polluting any caches, TLBs, branch target buffers, branch direction buffers, or return buffers; no higher level language supports it and you have to rely on (unsafe) assembly language.
There’s similar problems in other areas – e.g. guaranteeing that the speed of encryption/decryption is not influenced by value/s in the encryption key.
In theory it might be possible to add some support for some of it to higher level languages (and “safe” languages). One example would be allowing data (function input parameters, “data at rest” in kernel structures, etc) to be marked as “tainted” (to indicate that it’s possibly controllable by a malicious caller) then have the compiler use this information to ensure that any value derived from a tainted value is also considered tainted, and ensure that any tainted value is not used by (e.g.) an unguarded indirect branch. Another example would be some way to mark functions as “constant speed” (and let compiler guarantee that all paths through the code take the same amount of time, and there’s no optimizations like “if( temp == 0) { result = 0 } else { result = do_big_number_multiply(temp, val1); }”).
No. You’re confusing “a mixture of safe and unsafe” with “100% safe”. For a monolithic kernel you might be able to use “safe” for as much as 80% of the code. For a micro-kernel that drops significantly (maybe as little as 20% “safe”). As you improve performance (e.g. start going nuts with lock-free and block-free algorithms) the amount of “safe” code you can use drops further.
Auto-vectorization sucks a lot, and SIMD intrinsics suck less but still suck. For complex algorithms hand-tuned assembly is typically twice as fast as intrinsics.
Of course compilers are also bad at optimizing the boundary between compiler generated code and inline assembly (e.g. won’t move instructions from inline assembly into earlier or later compiler generated code where possible/beneficial for instruction throughput, won’t do “peephole optimization” at the boundaries, etc); so for smaller pieces of code there’s a slight advantage to intrinsics.
That assumes we’re evolving in a single direction. We’re not – it’s more like erratically bouncing in random directions (from garbage collection to functional languages to pointer ownership to…), each time ending up a similar distance from a central starting point.
Brendan,
If the function can be inlined, functions that evaluate to constant conditions can often be optimized away and if a check is needed, CPUs are very good at predicting those sorts of branches. Anyways if you really want to you can still explicitly use primitives that don’t perform range checking…
https://doc.rust-lang.org/std/slice/trait.SliceIndex.html#tymethod.get_unchecked
Of course, for mission critical systems this would be ill-advised. The lack of safety in languages like C continues to be detrimental to security and robustness both in theory and in practice.
Even in linux assembly language is the exception and not the norm. Regardless though assembly is still available and that’s not a reason not to use safe languages.
I’m not really convinced that safe code would break this, but ultimately the programmer is not forced to use safe code so it isn’t really restricting what a programmer can do compared to C.
Think about it this way: a project might have many thousands or even millions of lines of code that can still benefit from compiler verification even if a small fraction of the code still needs to be unsafe.
I don’t mean to be difficult here, but without a source I don’t think your numbers are meaningful. I believe most code even in a kernel context can benefit. A single unchecked loop or memory reference can produce serious memory corrupting bugs and exploits. And while the expectation is that human developers will eventually find and fix the bugs, it’s disappointing that these keep happening in the first place. We need to do better and safe languages are a part of it.
I’ve written these myself, why do you think it precludes the use of safe primitives?
Multithreaded code verification in Rust is actually extremely helpful for solving one of the most notoriously complicated problems that human software developers face.
https://blog.rust-lang.org/2015/04/10/Fearless-Concurrency.html
Obviously that depends on the compiler. To the extent that assembly is warranted, we’re not prevented from using it, but it has to be manually verified.
I disagree, at least in the context of low level code. Few people have ever seriously advocated for using GC in low level code. From memory consumption to latency and jitter GC has significant disadvantages compared to traditional C-like languages. Compile time verification doesn’t have those runtime tradeoffs though and is a near-perfect fit for systems programming. Not only is it a natural evolutionary step, I argue it is also a necessary step if we want to overcome human induced software corruption errors. That said, I am very aware of the resistance to change in our industry and I know that you are far from alone in wanting to continue to do things as we’ve been doing them.
I know we’re not going to come to a consensus, but I do appreciate the discussion!
Linux mostly uses inline assembly macros to construct an inner language (a superset of C); and the kernel is written in that inner language.
Sure; but I’m obviously not comparing against C. It’s “unsafe languages (including assembly language)” vs. “safe languages (excluding assembly language)”.
Let’s break a (hypothetical) micro-kernel into its pieces:
a) Physical memory management; where “safe” is completely worthless because it only cares about virtual memory safety.
b) Virtual memory management, where “safe” is mostly useless because it doesn’t understand things like “copy on write”, and doesn’t provide any safety for any of the things you really care about, like getting TLB invalidation right.
c) The upper levels of the scheduler (choosing which thread gets CPU time, etc); where “safe” is good in theory but for performance you end up using lock-free/block-free code and atomics and “safe” is merely a useless annoyance. This includes code to spawn new threads; where you want to acquire a lock and create the first half of the new thread’s state and not release the lock (and let the new thread, possibly on a different CPU, release the lock when it finishes its initialization); which breaks any hope of “language based concurrency safety”.
d) The lower levels of the scheduler (e.g. the actual code to switch from one task to another, and the “CPU is idle, use MONITOR/MWAIT to save power until work arrives” part); where you must use assembly and “safe” is unusable.
e) One or more forms of inter-process communication; often using lock-free queues and full of race conditions, where “safe” is likely a waste of time (unless your code sucks).
f) Time services (things like “nano_sleep_until(when)” and getting the current time); where you’re using hardware specific things (e.g. CPU’s time stamp counter) and more lock-free shenanigans (to manage buckets).
g) IRQ handling; which (for a micro-kernel) is assembly language at the entry point, some code to communicate with/notify corresponding user-space device drivers; and logic to handle the interrupt controller’s “end of interrupt”; where the first and last parts require assembly and the middle part is so small/trivial that there’s no sane reason to care which language it is.
h) Exception handlers. For these “safe” is often meaningless because you have to deal with the possibility that kernel code caused the exception, which directly translates to “assume something (not necessarily software) broke the safety that a safe language depended on”. For all exceptions its worse (need assembly language for entry/exit) and for specific exceptions there are more complications (e.g. access to special registers like CR2 and debug registers that can’t be accessed in high level languages; a need to access the interrupted code’s saved state causing you to depend on a specific stack layout, etc).
i) Kernel API; which is mostly an assembly stub, an explicit “is function number in the support range” check, an “atomically patchable” (to inject instrumentation, tracing, etc) call table of function pointers, then more assembly to return to user-space. For all of that, “safe” is either unusable or worthless.
j) Code to start CPUs and shut them down (for power management and/or hot-plug CPU support and/or just for boot/shutdown); where safe languages are completely unusable.
Mostly; for a micro-kernel, the only thing a “safe” language does is make you waste your time inserting useless “junk keywords” to work-around the compiler’s whining, for no actual benefit whatsoever.
Let’s define “concurrency” as “synchronizing 2 or more agent’s access to data”, where “agents” can be kernel modules written in a completely different language, and a mixture of user-space code and kernel-space code, and code running on radically different types of CPUs (e.g. mixture of ARM, 80×86 and GPUs), and devices (e.g. queue of network packets shared by CPU and a network card that may not even do cache coherency).
Let’s also assume that we don’t want the OS to lock up due to a deadlock; and that what you care about is the order that locks are acquired, and ensuring that the right type of lock is used in the right places (e.g. that locks disable IRQs if an interrupt handler might try to acquire them) and that you honestly couldn’t care less about ensuring “a lock protects data” because that kind of “safety” is only useful for people that shouldn’t be writing kernels in the first place.
Let’s also assume that sometimes you’re mixing multiple strategies – e.g. maybe you acquire a lock, then send a message, and the message receiver releases the lock.
Let’s assume that maybe you want “all readers can always read”; where only writers bother with any locks, and writers gain access in a fair “first come first served” order; and the whole thing uses an atomic “pointer + version number” (to solve “ABA” problems); but the “pointer + version number” won’t fit in an atomic data type so the pointer has to be compressed (like “pointer = x * 8 + base” so that “x” fits in 20 bits leaving 44 bits for the version number and doesn’t cost a whole 64 bits for the pointer alone) and that this compression breaks all normal rules for pointers.
Let’s assume that sometimes code crashes while holding locks; and just to try panic gracefully you need to “force release/reset” those locks (likely after halting other CPUs).
Let’s assume that for almost all of this the programmer has to ensure correctness themselves (regardless of whether it’s an unsafe language or unsafe code in a “safe” language); and that the only thing a “safe” language does is create the additional burden of having to insert worthless decorations to convince the compiler to do what it’s told.
Of course I’m not saying it’s not useful for simple things in user-space, where you’re not doing anything interesting and can accept Rust’s “shrink-wrapped at the factory” concept of concurrency.
Rust originally had full garbage collection. Around 2013 they changed the way “collection” is done – specifically; they switched to “each piece of garbage is collected when its reference count reaches zero” (replacing the overhead of finding garbage with the overhead of updating reference counts). It’s still (a form of) garbage collection.
Note: A lot of my code (and especially my micro-kernels which use nothing else) use what I call sparse arrays; where everything is arrays of “statically pre-allocated at compile time” virtual memory and the underlying physical memory is allocated on demand and freed (by scanning arrays looking for pages that don’t contain any “currently in use” entries) whenever available physical memory becomes more important than whatever else CPU/s could be doing. There’s no “heap memory allocator” in the normal sense (each type of array has its own allocator to allocate “array slots”, where the array indexes are things like “thread IDs” or “process ID”, which improves the performance of lookups – no expensive “search list/tree/whatever to find the data from an ID”) and the same statically allocated virtual memory is simply recycled forever (and as a bonus you get quotas on everything so a fork-bomb can’t gobble all the RAM for process data structures, etc; and as an extra bonus you also get good locality – all the objects of a certain type are in the same area of virtual memory with a higher chance of cache and TLB hits). You can think of this as “garbage collected physical memory” if you like; which is to say that I’m not against garbage collection (even in micro-kernels), I’m just against “garbage collected heap” (and “reference counted garbage collected heap”) as these are focusing on the wrong problem – literally; focusing on freeing the massive abundance of virtual memory/heap, while failing to care about the limited supply of physical memory (while letting everything else suck – allocation/deallocation overhead, lookup costs, etc).
Brendan,
I’ll grant you that’s an interesting point you bring up, but typically the only time we care about physical memory addresses is inside of hardware drivers and in the case of a microkernel these drivers would be external to the kernel. Presumably the drivers would use some API to map in the hardware’s address space, but really don’t see any reason this precludes safe language to protect the driver from corrupting itself even with memory mapped hardware. In other words safe languages still provide a benefit.
We agree there are types of bugs that are outside the scope of safe languages. A safe language will protect us from things like array out of bounds, double free, use after free, etc, but not from things like invalid output. So for example a safe languages won’t catch this bug because it doesn’t perform invalid access…
If the kernel loads erroneous data into tables and the result could be catastrophic. But the types of human programming errors that safe languages do protect us from apply very much to kernel code.
Safe threading primitives can handle this any many more complex scenarios.
You keep reiterating the fact that high level languages (including C) don’t support specific opcodes. Yes that’s a good reason to use assembly, but it’s NOT a good reason to avoid safe languages.
But you’re not limited to what the standard runtime offers. The primatives themselves are basically no-ops except when there’s a bug and that’s the whole point.
So then write your own safe primitives and use them! In your examples you still benefit from safe languages.
I think this is all about resistance to change and not that safe languages cannot be used effectively for low level code.
We care about getting physical memory management right. E.g. it would be bad if something tells the physical memory manager to free a page but then still uses it a little before unmapping it from a virtual address space (because that creates the possibility of the same physical page being allocated by something else, and someone’s data getting corrupted).
There’s also various cases where the same physical memory is mapped in multiple places (e.g. a page containing part of a file that is mapped in 2 different virtual address spaces as “copy on write” plus also used for the operating system’s/virtual file system’s file data cache); which mostly ends up needing reference counts to determine if/when physical pages can be freed. There’s also potentially some cases (file data caches, message queues for larger messages) where physical memory may be used but not mapped into any virtual address space.
Then there’s some fancy stuff – “compressed data in RAM as swap space”, NUMA optimization and optimizations for other differences between pages of RAM (e.g. volatile vs. non-volatile, performance differences between “on chip high bandwidth RAM” and external RAM), support for multiple page sizes, and support for fault tolerance (e.g. offlining physical pages that have seen too many corrected errors to reduce the risk of future uncorrectable errors). All of these things cause the kernel to want to continuously/periodically scan through virtual address spaces and optimize (e.g. so that if a process says it prefers one kind of RAM but there wasn’t any free at the time and it was given another type of RAM instead it can be auto-improved later when the preferred type of RAM becomes available; or so that if you have 510 free small pages that belong to the same large page you can find and replace the missing 2 small pages to be able to merge all 512 small pages back into a single large free page; or…).
Then there’s more fancy stuff – e.g. maybe an ability to dynamically changing the “do/don’t encrypt” status of a physical page to suit demand (which must be done carefully as it breaks cache coherency); and maybe splitting “RAM” into separate physical banks as a rowhammer defense (e.g. untrusted stuff only uses one bank of RAM and trusted stuff only uses another bank of RAM; so trusted stuff is immune to rowhammer attacks done by untrusted stuff).
All of this complicates the daylights out of physical memory management (and has little to do with devices); and a bug in any of that complicated mess can have severe consequences (e.g. infrequent data corruption in anything with no way to track it back to a cause) .
You mean; safe languages suck and don’t help with anything that actually matters; but force you to waste 6+ months learning how to work-around the compiler’s pointless nagging so that you can write your “safe primitives” that aren’t safe and can’t be checked; so that if you ignore all the disadvantages it seems like there’s a tiny advantage.
To see what I mean, look at Linux. They’ve spent about 6 months so far just trying to make Rust’s worthless nonsense work and have 1 experimental “example driver”. It’s probably going to take at least 2 more years before anything actually useful (production drivers written in Rust, etc) get into mainline and they finally reach the “Oh no, all the Rust stuff if full of CVE’s” phase.
My fundamental argument is that (for micro-kernels) the benefits of “safe” languages don’t justify the costs. The need (and/or desire) to use assembly language, and the need (and/or desire) to write your own primitives, reduces the benefits and increases the costs. These things add up to about 95% of the reason that the benefits of “safe” languages don’t justify the costs.
Brendan,
There are plenty of things you have to get right. But don’t conflate “safe language” with “no bugs at all”. A safe language provides safety primitives that a compiler can enforce against extremely common universal software flaws.
BTW your issue is not at all unique to kernels, just about every program we can write has the potential to contain domain specific bugs that safe languages knows nothing about. A heart monitoring device can have bugs that the safe language cannot detect. A self driving car can have bugs too. Nevertheless it is useful to know that the memory corrupting faults are protected against.
So you are right that there are certain classes of errors that are too domain specific to be caught by safe languages out of the box, that fact in and of itself is in no way the same as concluding that safe languages aren’t useful in protecting against the types of software flaws that keep creeping into software of significant complexity.
You are wrong. it solves notorious bugs that one generation after another faces with unsafe languages. It would pay off enormously in the long run. But like I said from the get-go we are too stubborn to change and would prefer to continue to use the old languages even though there is a high risk of human resulting in a fault that automated safety checks would have caught. The pride of software developers is our undoing, haha.
It’s very hard to justify switching an existing code base. Switching linux to rust would be a monumental task and I’ve already said so above. I don’t believe there’s a chance in hell this will happen.
If safe languages were the norm the norm to begin with I am quite certain everyone would be happily using them and virtually no one calling on the industry to move to unsafe languages like C. The main problem is that it is so difficult to change an industry that has so many roots cemented in unsafe languages.
So unsafe languages aren’t going away any time soon, but my position is that it’s a lost opportunity to not promote safe languages for new projects at least. The bottleneck here is one you brought up already and I agreed with about “availability”. Until we have a critical mass of developers skilled in safe languages taking up project management positions throughout the industry, legacy unsafe languages will continue to rule the roost.
You’re being distracted by things I’m not saying.
I’m not saying there’s no benefits (even for micro-kernels); I’m saying that for micro-kernels the benefits of a safe language (which exist) are too tiny to justify the huge costs. It’s “let’s spend an extra 20 weeks to write code that works around the safe language’s pointless requirements and save ourselves 1 week of debugging later; and then ignore the fact that a safe language cost us a total of 19 weeks for nothing and just tell everyone that it saved us 1 week of debugging”.
That’s not what they’re trying to do. They’re only trying to make it possible for some small pieces of Linux (e.g. newly written device drivers) to use Rust; primarily because monolithic kernels are incredibly stupid (thousands of device drivers full of bugs running with the highest privilege) and they think Rust might help with that stupidity.
It doesn’t really change my point – they’ve wasted ~6 months already in the hope that eventually maybe one day it’ll save a few device driver developers from waiting until ASAN finds the same bugs.
Safe languages were the norm – Algol, Ada, Fortran, COBOL, Pascal, BASIC, … They’re all older than C, they’re all safe, they all sucked because they’re safe, and they all died because they’re safe.
Brendan,
If you target safe language from the start costs are minimal. And you shouldn’t ignoring the costs that unsafe languages have piled onto society over the several decades we’ve been using them. Unfortunately this is a long term problem that won’t get solved so long as we refuse to evolve and so long as this is the case we will continue to see these problems over and over again.
I know what they did. For the record, I think trying to mix and match languages where you are not going to be able to build upon safe primitives is not worth it. You end up having to interact with kernel structures that have no safety. Most of the benefits of the safe language are lost this way.
I’ve been pretty consistent in promoting safe languages for new projects where the entire design can be build safety from the ground up. This is where they can be most beneficial with no compromises needed to be compatible with unsafe code. Like I said though, it’s tough to get there when everything is rooted in unsafe legacy code. Change is always hard.
I’m not familiar with all of them, but you are wrong about pascal. It is definitely unsafe when you programmatically allocate and free objects. Those actions are NOT checked at compile time nor runtime, segfaults and other corruption would happen just like in C. I’m guessing you are referring to range checked arrays, which frankly C should have had from the start.
While there have been many safe languages, they’ve traditionally been in the garbage collection camp, which is not well suited for low level system tasks as we’ve already discussed. Compile time safety verification is a kind of holly grail and I hope we’ll see more languages moving in that direction.
For micro-kernels; if you start with a safe language from the start the cost are huge (instead of being extremely huge).
For a start, realize that for kernels in general (and especially micro-kernels, and especially micro-kernels that use a “one kernel stack per CPU” model); almost all of your data is reached via. (mutable) global variables in some way; and Rust’s “object ownership” forces you to navigate a minefield of utter stupidity before you write a single line of code because it can’t handle global data.
The next problem is that maybe you have several different memory allocators (a NUMA domain specific allocator, a CPU specific allocator, a thread specific allocator, …) and when (e.g.) an owned pointer falls out of scope Rust expects to be able to free the object, but Rust is so stupid that you can’t even tell it which allocator is the correct allocator to use when freeing an object so the whole thing doesn’t work.
These 2 things alone are enough to cause you to be constantly wasting a massive amount of effort fighting against the language.
It depends on which Pascal. Most in the early years (1970s) were compiled to byte code (“p code”) and run in a virtual machine where everything (memory safety) could be checked at run time.
Brendan,
I have to disagree. The safe languages are just doing what is necessary to enforce correctness. Just because an unsafe language leaves it to the human developer doesn’t mean that the human developer doesn’t have to go through the same mental logic verification processes in their head. It’s better to automate them.
I’ve already told you that you aren’t limited to the primitives in the standard library. You can use custom primitives if need be so that’s not a convincing reason to avoid using safe languages.
That is the essence of the luddite mentality. You are convincing yourself that you need to fight automation rather than embracing and advancing it. Yes clearly you have to learn new skills, but letting the compiler handle it not only saves us work in repetitive and mundane verification, but it does so faster, more thoroughly, and more consistently than any humans can. It opens up opportunities to refocus our limited time on higher level problem solving.
It seems plausible that pascal and other languages ran inside a VM, but that in itself doesn’t make them safe languages. We could run unsafe C program inside a VM too, but such a program would still be at risk of uncaught faults corrupting it’s own memory and being exploited for it.
Oh.
Have you considered seeing a psychologist about your problem? It might only take a few sessions before you’re able to read and understand what I wrote.
No. Like I said (repeatedly) the “safe” language makes you do a lot of extra work just so that the compiler can help you avoid an insignificant amount of work; and is therefore not better.
It’s like spending 2 hours per day looking after a monkey that brushes your teeth for you; and being so stupid that you think you’re saving 5 minutes (“Yay, teeth brushing is automated!”) when it’s actually costing you 115 minutes.
“Yay; you can waste a massive amount of time writing custom primitives to save almost no time at all because now you have to check the primitives yourself! Let’s buy 2 more monkeys to wash the dishes and spend 6 hours looking after 3 monkeys so that we can still end up the washing dishes ourselves! It’ll save lots of time!”.
That is the essence of your failure to grasp reality. By ignoring all the disadvantages you’ve deluded yourself into thinking there’s only advantages.
Brendan,
Specifying programmer intentions as a contract in the safe language isn’t an unreasonable ask. I think you are exaggerating the amount of extra work involved because even in unsafe languages the programmer should already be thinking about correct usage regardless that a language like C doesn’t make them explicitly type it in. The difference is that In unsafe languages we have to keep track of these contracts mentally whereas in safe languages it is explicit (which is the reason a safe language is able to enforce it). Even if there is a bit more to type in (that a programmer should already have in their head) it saves a lot of time in tracking down race conditions and faults later.
Where safe languages can really shine is with long term maintenance where thousands of patches may come in over time from a variety of sources. In a safe language those safety contracts will continue to be enforced over whole the life of the project with no additional human effort. But with the C model, those contracts remain in the heads of the original devs and they have to be enforced manually. The developer responsible for the original code may no longer be around, gets overloaded and doesn’t have the capacity to audit every change, suffers from human error, and will forget some of their mental contracts with time.
Higher quality is not a waste of time and once you are proficient at it the automated checks can actually make you more efficient in the long run with fewer debugging sessions. The compiler can tell you instantly about those faults.
…said the guy driving next to the car in his horse buggy 🙂
Prove it.
Write code in Rust that:
a) Has a global bitfield, like “uint64_t myBitfield[MAX_SLOTS/64};” and a global array of structures like “struct mystruct myArray[MAX_SLOTS];” where the structures are just 2 integers (like “struct mystruct { int threadID; int value; }”).
b) Has an allocator like “int allocateMyStruct(int currentThreadID)” that does a simple linear search of the bitfield to find a clear bit; then uses an atomic “lock bts [myBitfield}, rcx” to set that bit with some “continue search if bit became set”; then (when a bit was successfully atomically set) does an (atomic) “myArray[slot].threadID = currentThreadID;”) and returns the slot number or an error condition (e.g. returns -1 if there’s no free slots).
c) Has a de-allocator like “int deallocateMyStruct(int slot, int currentThreadID)” that does some permission checks (is the slot number sane? Does “myArray[slot].threadID == currentThreadID”?) then “myArray[slot].threadID = -1”, then atomically clears the corresponding bit in “myBitfield”.
d) Has a “int setMyStructValue(int slot, int currentThreadID, int newValue)” function to set/change the other field in the structure; that does some permission checks (is the slot number sane? Does “myArray[slot].threadID == currentThreadID”?) then does a myArray[slot].value = newValue;”.
e) Has a “int getMyStructValue(int slot, int currentThreadID)” function to return the other field in the structure; that does some permission checks (is the slot number sane? Does “myArray[slot].threadID == currentThreadID”?) then does a return myArray[slot].value;”.
Assume that all these functions may be run by many CPUs at the same time (and that all accesses to “myArray[slot].threadID” must be atomic); but because only the correct thread can de-allocate or ask for “myArray[slot].value”, and the correct thread can’t do both at the same time, there’s no reason to care about a “myArray[slot].threadID” or “myArray[slot].value” changing while you’re using it (and no reason to check the bitmap when determining if a slot is valid because the “myArray[slot].threadID == currentThreadID” is fine on its own – you only care about “if( slot < MAX_SLOTS)" for determining if a slot is valid).
Also assume that it's user-space that calls all these functions (via. kernel API) and nothing else in the kernel uses them.
AFTER you've done this in Rust, have an honest appraisal of how much extra work it was, how much benefit the "safety" is, and whether the extra work justifies the benefits.
Where “safe” languages shine is when you can amortize the cost of the extra work. E.g. if one person can write (and very carefully check) primitives, and then millions of lines of code written by thousands of other programmers can depend on those existing primitives; then it becomes “very small amount of work for huge advantages”. THIS DOES NOT HAPPEN FOR MICRO-KERNELS.
Brendan,
Well, C often does well in academic exercises that are simplified for human clarity and lack the emergent complexity of real projects. But the truth is that safe languages are most beneficial in complex projects where humans start suffering from mental overload. For this reason, I don’t think your example would really “prove” anything one way or the other and we’re still going to disagree.
But if you still want to try anyways then how about you start first with a C implementation.
I do understand your point. While there may be an initial cost to build primatives, you’d need to build them in other languages too. The benefit of those safe primitives adds up every time you make a change and the compiler automatically enforces the contracts. Think of it like an investment.
The example will prove that Rust is fundamentally worthless. It can’t make sure that the memory is de-allocated at the right time (because that’s controlled by user-space), it can’t ensure that any memory access is safe (because Rust is too stupid to prevent things like “array index out of bounds” at compile time), and it can’t ensure there’s no race conditions (because it’s a trivial piece of “lock-free”).
The only thing Rust does is add a massive headache (trying to convince it to let you use global variables, sprinkling “unsafe” everywhere, etc) while not giving you any benefit (or any safety) whatsoever.
Sure; you can argue that this is “too simple” to be representative of a more realistic/more complex scenario. I don’t consider that a valid argument – in a typical good micro-kernel almost all of the code is like this or worse.
Far more likely is that someone writing a micro-kernel in Rust would design code to suit Rust (e.g. instead of implementing something like my “arrays with lock-free algorithms” example they’d do something that uses a single generic allocator and a single generic lock, and pass pointers and not “array slots”); so that they can pretend that Rust was useful for code that is a pure performance disaster. Of course we don’t have to guess here – there’s already a micro-kernel written in Rust (Redox) that proves developers avoid the pointless stupidity of Rust by designing code that sucks badly (global locks, single allocator, algorithms that thrash caches due to linear iteration of linked lists, …), where the kernel developers don’t even try to optimize anything and won’t even do any benchmarks (and still don’t have any meaningful safety).
Brendan,
A lot of your claims are wrong and I disagree with you on the merits. You haven’t made a compelling argument against using safe languages in microkernels. Safe languages are the logical answer to the faults that continue to plague our industry after half a century and I think it’s worth evolving. Yet as I’ve conceded from the start, there are many people who don’t want to change. But I think this has less to do with safe languages lacking merit and more to do with stubborn habits and fear of change.
IMHO there just isn’t a good reason for safe languages to be this divisive. We’re computer scientists, not politicians. If the industry had started out with safe languages from the beginning, today everyone would be 100% comfortable with them and it’s the unsafe languages that would be frowned upon. Not for nothing but safe languages can evolve too, would be even more advanced today if we had transitioned to safe languages decades ago.
Anyways, it’s obvious we’re going to have to agree to disagree.
And yet, you have nothing to back up your unfounded opinion and when given the chance to prove I’m wrong you do nothing more than continue bleating like a mindless sheep.
No. Micro-kernels are the logical answer – making the kernel small enough to be checked manually by humans, possibly with the assistance of tools (like ASAN, and possibly even formal verification as is done for SeL4); WITHOUT sacrificing everything (performance, features, etc) due to suckers that have allowed themselves to be deceived by “safe language” marketing hype.
You’re wrong. It’s due to safe languages being the wrong tool for the job (of writing a micro-kernel) and forcing an unacceptable compromise between “safe” and performance.
For the last several years I’ve been working on my own tools and language (partly to replace the whole “plain text + make + compilers + linkers + revision control system” with a client-server approach involving collaborative real-time semantic editors, where server sanitizes and pre-compiles in the background while you type).
For the language itself; one of my oldest goals is to allow all variables to specify range/s (e.g. like “month is an integer from 1 to 12”), and for the compiler/checker (running while you type) to guarantee that the RHS of an assignment fits correctly in the LHS (e.g. so you can’t do “month = 13;”, and can’t do “month = a*b+c-d” unless the compiler can prove the result fits) making it impossible for various bugs (e.g. overflows, out-of-bounds array index, etc) to exist.
As part of this project I researched Rust’s “object ownership” and stole most of their ideas. What I came up with was to allow user-defined types to include a required “disposal function”; so you can write something like “typedef myUnsafeObject mySafeObject, disposal mySafeObjectDestructor”; where the compiler allows a variable using a type with a disposal function to be assigned to a variable with the original type without a disposal function if/when appropriate; but also ensures that all variables obtained in this way have shorter life-spans than the original, and ensures that the disposal function is called when the original variable’s life time ends (even when the variable is returned, passed to other functions, etc). Basically; it’s the same as Rust’s “owned (with disposal function)” and “borrowed (without disposal function)”; except that it can be used for anything that a destructor in C++ could be used for and isn’t limited to memory management alone, and except that it’s explicit (the programmer must call the disposal function themselves) without any “overhead hidden behind your back” (as making overhead obvious is one of my primary goals).
Basically; even though I stole the same ideas for my own language (and even though I think my language will be safer than Rust) I still say that (even for my own “safer than Rust” language) it’s all mostly worthless for micro-kernels.
As I already pointed out reality disagrees with you. Take Ada – it’s older than C, safer than modern Rust, and virtually dead.
What you’re actually seeing is closer to a fad. People jumping on the “fresh new thing with all the market hype and propaganda” bandwagon for a while until the next shiny new thing comes along.
Maybe in 5 years time people like you will be saying “If the industry had started out with safe languages like SPARK from the beginning, today everyone would be 100% comfortable with them and it’s the unsafe languages like Rust that would be frowned upon”; which would be hilarious (given that SPARK is a descendent of Ada).
Brendan,
You haven’t really refuted what I’ve said except through exaggerated claims. The problem I have is that when we double down on C’s flaws rather than seeking solutions to them, we end up holding back the entire software industry.
Sure microkernels are an answer to the different problem of component isolation, and I agree it theoretically helps mitigate faults, but it doesn’t obliviate the need to catch them in the first place. Also if we’re being honest, microkernels are widely criticized for sacrificing performance. Granted you and I can argue in favor of microkernels: find ways to mitigate the costs and justify the benefits. But it is hypocritical of you to be so abrasive about marginal if any performance costs of a safe language’s compile time fault detection only to then non-critically handwave away those costs for microkernels.
Why do you say Ada is safer than rust? Ada was required by government contracts because they wanted code to be less faulty than C, but I think the price was so exorbitant that Ada remained confined to government contractors.
Calling safe languages a fad is dismisses the real systemic problems that we face under the status quo. You can tell someone that “the green party is a fad”, but that doesn’t necessarily mean the green party is wrong, it just means that overcoming the incumbents is extremely hard. Momentum and the existing power dynamics can be huge obstacles to change.
I think if we could somehow make the transition away from unsafe incumbent languages to safe ones, it would probably be permanent as there’s no compelling motivation to go back to unsafe languages.
From my perspective; I’ve provided clear logical reasons for my claims, with examples, backed up by evidence from a real world Rust micro-kernel; and you just ignore all of it and repeat an “I believe what I want to believe and therefore you are wrong” argument.
The problem I have is people assuming everything is a nail, then trying to tell people that use a screwdriver for screws that they should be using a hammer on screws. Rust (and Haskell and Java and Python and…) are fine tools for the right job; but they’re the wrong tool for many jobs (micro-kernels, JIT compilers, anything that needs to squeeze the most performance out of SIMD, ..).
Micro-kernels need to be fast (because of the extra cost of IPC), more so than any other kind of kernel. Failure to do this causes the entire OS to be considered a toy or worse (micro-kernel advocates are still fighting against the backlash from Mach’s performance in the 1980s).
And no; I’m not talking about “marginal if any performance costs”, I’m talking about (e.g.) Redox being 100 times slower than it could/should; specifically because the developers sacrificed performance in the hope of avoiding “Rust doesn’t help because everything in the micro-kernel is unsafe”.
Ada protects against “out-of-range” and overflows (while Rust doesn’t), protects against memory related bugs at least as well as Rust does (via. run-time checking) including things Rust doesn’t support (e.g. “storage pools” providing protection from memory leaks), and offers some concurrency protection (probably not as much as Rust).
Ada fell out of favor for multiple reasons – partly because it takes more effort to convince the compiler to accept your code (and the world shifted to “developer time is more important than quality” for almost everything); partly because universities don’t bother teaching it, etc. All of that is true of Rust too.
“A fad” is the right word for specific languages (Haskell, Nim, Go, Rust, …).
It’s not the right word for the general trend; and I don’t know what the right words are. If you did a “safety vs. performance” graph over the previous 70 years (and especially if you focus on software that needs high performance and ignore things like web pages using JavaScript) you’d find that it “wobbles” – “safe” has a surge in popularity then there’s a “performance” backlash, then a new surge, then a new backlash, …; where the surge tends to be larger than the backlash causing a slow trend towards “more safe”. Currently we’re in the middle of a surge. There will be a backlash (smaller, mostly due to “tracing garbage collector” being superseded by “reference counting garbage collector”).
Of course part of this is that “unsafe” languages are getting safer (e.g. for C, dangerous older C library functions getting shunned/replaced, stricter type specifiers, introduction of bounds-checking interfaces, static assertions, ..).
The 3 compelling motivations to go back to unsafe languages are performance, flexibility and laziness (people who couldn’t be bothered dealing with extra boilerplate). These motivations will never go away. The most you can possibly achieve is splitting “programmers” into those that create unsafe code (in libraries, etc) and those that consume it.
Brendan,
I’m not doing that though. I’d love to see the industry embrace many safe languages and primitives. Not just one tool but many. Now that you bring it up though, I think this is very much the way C programmers think: all system code is a nail and C is the hammer. The problem with this is that they’re holding the industry back several decades, we’re not only loosing the immediate evolutionary benefits, but we’re also missing out on opportunities for future evolutionary benefits too and the longer we hold out the further back we end up compared to where we ought to be.
And that is exactly why we need to be investing in more compile time safety!!! This is the correct direction for systems programming and all languages should be moving in this direction.
As far as I know rust does the same range checking as Ada (or at least pascal, since I’m not very familiar with Ada). Please provide a specific example if you still think this is wrong. It sounds like you may be conflating static and dynamic range checking. In the static case rust does range checking at compile time because the values are known at compile time. But in the dynamic case we don’t know the value until runtime so a runtime check is needed, which both rust and pascal (and I suspect Ada) can do, which is reasonable. Both pascal and rust let you explicitly disable the runtime range checking if you want to, but the safety checks are just there by default, which is the way it should be. Both pascal and rust are objectively better than C, which just corrupts it’s own memory 🙁 C really should support range checking and it’s really kind of stupid that the compiler doesn’t do it automatically.
Ada compilers didn’t become readily available to the industry until the late 80s and by then it was just too late and C was well established as the dominant language underpinning all commercial operating systems. There is an insightful study covering the C and Ada programming languages in depth. It has a lot of parallels with what we’re talking about now…
http://sunnyday.mit.edu/16.355/cada_art.html
My objection was not using fad in relationship to languages, but rather using fad in relationship to safety. If my motivation wasn’t already clear, it’s not for any specific language to replace C, but it’s for the industry as a whole to evolve past the unsafe practices and languages that tarnish our ability to deliver robust products. It make a lot of sense to delegate tedious and error prone checks to computers so we can focus on higher level tasks.
Again, you’re making false assertions that safe languages->worse performance or that safety is the actual cause of boilerplate code. That’s not true, strong typing is. Rust is strongly typed like C++ is, and consequently both have more boilerplate code than C. So unless you are willing to criticize C++’s strong typing, then you really don’t make a fair argument against rust. For the record I wouldn’t mind seeing loosely typed languages becoming more safe as well. There are many loosely typed languages that are safe thanks to having garbage collection, perhaps they can also benefit from more intelligence compiles performing compile time safety checks as well.
My hope is that safe languages will continue to get much better over time in relation to unsafe languages with an ever increasing set of capabilities to verify code correctness. But we need to be willing to invest in them and be open to change. C is the anchor that’s been holding everything back. I have to accept that you may never be open to change, that’s your prerogative. But while people like you will slow things down, I do believe that over time, as more students grow up with safe languages in school, they will become more influential especially when older generations start retiring out of the system. It’s unfortunate that we have to wait so long, but I think the natural evolution of languages will ultimately prefer safe languages even if it takes a very long time.
In general, in order of best to worst:
a) Bugs would be prevented when the language is designed. We’ve seen some of this (e.g. “for()” and “do/while” and “switch()” replacing “if() goto”) but not much. Sadly, a large part of the problem is complexity and modern languages are making things significantly more complex (and sometimes making things less safe/more complex in the name of “safety”).
b) Bugs would be prevented when you edit. For an example, if an IDE allowed programmers to modify a Nassi–Shneiderman diagram it could become impossible to have flow control bugs. I need to do a lot more research on this at some point (I’m hoping to ignore it now, then add “helpers” to the user interface later).
c) Bugs would be detected while you type. This mostly only works for simple stuff (syntax errors, etc) and I haven’t been able to find ways to do much more than modern IDEs.
d) Bugs would be detected soon after you type. This is a major part of my project – a program is many small pieces (functions, etc), editing a small piece starts/resets a 2 second timer, and if the timer expires any effected small pieces are sanity checked (with error/s reported back to the programmer/editor). I’m also playing with the idea of being able to run “unit tests” (in an emulated environment) in the background soon after you type.
e) Bugs are detected at compile time. My goal is to ensure this doesn’t happen at all (because all bugs that can be detected at compile time are detected sooner instead).
f) Bugs are detected during testing.
g) Bugs are detected at run-time.
h) Bugs aren’t detected.
Ada will let you say “my variable is an integer with a range from 123 to 234” and if you do anything that could cause an out-of-range value to be stored in that variable you get an error. For Rust you can’t say “my variable is an integer with a range from 123 to 234” at all.
Rust will do overflow checking in debug builds only (and not in release builds); but that relies on the range of primitive types (e.g. maybe a range from 0 to 0xFFFFFFFF and not from 123 to 234) and even then there’s a risk that overflows won’t be detected during testing and will cause overflow bugs in release builds. To work around this you can manually add assertions everywhere, but almost nobody is willing to do that.
Because Ada has proper range checking it avoids a different class of bugs (e.g. you can have “Celsius that isn’t allowed to be less than −273.15”, or “month from 1 to 12” or “person’s age from 18 to 120”); but this also works for array indexing (e.g. it knows that “array[variable-123]” is fine because it knows “variable” can’t contain a value smaller than 123).
Because Rust doesn’t do any range checking, Rust can’t ensure array indexes are in a valid range either. Instead Rust resorts to run-time checks for (optionally) bounds-checked arrays; which seems similar but is somewhat sucky (“Bounds error on line 123” causing you to try to figure out why instead of “Bad value in assignment on line 100” that tells you where the real problem is).
Note that Ada (range checking) and Rust (overflow and bounds checking) use “run-time checks in theory”. In practice there are 3 cases – either the compiler can prove it’s safe (and that no run-time check is needed), or the compiler can prove it’s not safe (and you get a compile time error), or the compile can’t prove it’s safe or unsafe (and you get a run-time check). For my language, it’s essentially the same as Ada’s range checking; except that there’s never any run-time checks and you get a compile time error (a “detected soon after you type” error) instead; which means that my project can/will refuse to compile “technically legal” source code (and force you to fix the problem somehow, possibly with a modulo, possibly with explicit run-time checks, possibly by changing the range of the variable, etc).
For all programming there’s a “development time vs. quality of end product” compromise; and over the years/decades this is has been sliding towards “minimum development time, worst quality end product you can imagine”. This is primarily caused by economics (companies increasing profit by reducing development costs) where programmers are sometimes forced to do it by management even though they’re prefer to make software that doesn’t suck. Safe languages underpin the whole “quickly slap together poor quality crapware” mentality – all the “safety” just means they can cut the budget for testing and pump out equally buggy software with worse performance than ever before.
My motivation is to stop this “equally buggy software with worse performance” trend by making it harder to write inefficient software and easier to write efficient software (while still providing some safety in a “very unsafe” language, and still reducing development time by improving the tools and not the language).
Brendan,
I think that’s a decent way to look at things. Although it may not have been explicitly said, anything with compile time checks can in principal be moved into an IDE to alert the programmer before compile time. It comes down to smart tooling like visual studio and intellij. The gap between runtime and compile time checks is a more critical barrier. This is what I’d really like to discuss in more depth, but unfortunately we’re stuck debating whether compile time safety even has merit in system code; without this groundwork set we can’t really build higher safety primitives on top of it. If you could accept the principals of compile time safe languages I think it opens up a lot of very interesting avenues for tackling even higher level software flaws.
It’s unstable in nightly for the time being, but rust is working on a more capable superset of that feature called “generic_const_exprs”.
Rust checks that indexes are valid at the time of dereferencing. And while this isn’t a property of the language, I think rust’s error system deserves a mention for being absolutely fantastic compared to the MSVC & GNUC.compilers.
This is debatable, the Ada study I linked to explicitly concludes that Ada was cheaper than C for them. We know that unsafe languages can cost our industry billions not only in constant debugging cycles but also liability. The industry has largely embraced safe languages, albeit garbage collecting ones in the client/server application domain. I’ve conceded that these garbage collecting languages are not well suited for system programming, however languages with compile time safety primitives can help bring those benefits while maintaining good runtime properties. The merit of safe languages in system programming is there, the real barrier is that change is costly after so much money has already been sunk into unsafe ones like C, which everything was based upon. Same thing goes for the mainframe shops. Nearly all of them would benefit from modern databases and replacing green screens. Given the choice, very few if any of them would rebuild from scratch on the mainframe today, but since it’s already built and starting over is very expensive, it can be easier to stick with what they already have. C is a beneficiary of this same logic. It would be a much less compelling choice if all our code weren’t already based on it.
It’s not a fair fight for new languages even when they may have merit over C, but that’s part of the challenge.
In theory, yes. In practice you need to design the language to allow “detected soon after you type” to work in a reasonable amount of time (e.g. you don’t want to have to compile and link a whole program before you can detect an error). That’s where range checking becomes necessary for “detected soon after you type”. For example, if you have “int foo( integer from 1 to 5 X) { return 10/X; }” you only have to check the function itself in isolation; and if you have “int foo( int X) { return 10/X]; }” you’d have to search the entire program and examine every caller.
I think we agree that “100% safe” is an impossible pipe dream. I think we both understand that there are some things (e.g. accessing a CPU’s control registers) that can never be safe; that there are some things (e.g. a tool like “grep” that converts a regular expression into native machine code then executes the run-time generated native machine code) where the performance cost of “safe” is too high; and that every modern attempt at “safe” has depended on huge libraries full of “unsafe”.
The question then becomes whether the cost is worth the benefits. At “0.1% unsafe, 99.9% safe” there’s huge benefits; and at “99.9% unsafe, 0.1% safe” a safe language becomes pointless annoyance.
Where we really disagree is:
a) where we think the cross-over point is – for “safe is good if more than X% of the code is safe” we disagree on X.
b) how much code should be safe. E.g. I think a micro-kernel should be > 90% unsafe for multiple reasons (mostly performance) and that it’s not enough for my “safe is good if more than X% of code is safe”; and you think a micro-kernels should be < 50% unsafe and that it is enough for your "safe is good if more than X% of code is safe".
c) what "safe" means. E.g. I'd say that "100% safe" protects against things like hardware weaknesses (spectre vulnerabilities, timing side-channels, etc) and compiler bugs, and that Rust is only about 10% safe. You seem to think that "safe" means "providing some protection against bugs that are trivially avoided and/or detected via. other means".
d) what the benefits of "safe" are. For me it's merely about reducing the cost of other methods (e.g. a thorough security audit, a longer "testing" phase before release, etc) without increasing the safety of the final product. You seem to think that "safe" means that the final product is safer.
e) the importance of safe languages. I'm a "containerization" fan – not just for micro-kernel, but extending into user-space (e.g. "huge monolithic word processor as one process" vs. "widget service + internationization service + UI front end + back-end + spell checker service, all as separate isolated processes"). This external/environmental safety ("if one isolated piece is flawed other isolated pieces are still safe") reduces the importance of safe languages.
Sure. Something that’s bad (for safety) is better than something that’s even worse (for safety); and in a similar way, something that’s bad (for performance) is better than something that’s even worse (for performance)
There’s some benefit to safe languages in system programming (especially if your code is under-optimized).
The real problem is that C is bad at performance – developers providing “generic 64-bit 80×86” that isn’t optimized for any specific CPU combined with horribly poor/no support for run-time dispatch in most compilers; combined with poor optimization for locality (e.g. performance guided reshuffling of stuff in “.text” and “.data” can provide a 10% performance improvement, compiler can’t optimize the order of fields in a structure, …); combined with shared libraries/ABIs causing optimization barriers even when you’re using link-time optimization; combined with poor/no support for things like run-time code modification (including simpler things like “fusible branches” and “write once” variables); combined with not even being able to return more than one piece of data from a function without the extra “pointers to outputs as input parameters” overhead; combined with not having arrays/vectors as first class data types (and not being able to do things like “destArray = srcArray1 + srcArray2”) forcing people to not bother to use ugly clunky junk (e.g. SIMD intrinsics); combined with not being able to have multiple versions of the same function (e.g. one in high level code plus 3 more different versions in different assembly languages with different feature sets, where compiler auto-selects the right version to use); combined with…
In other words; the performance cost of safe languages seems “not too bad compared to C” because the performance of C is bad.
Brendan,
It depends. I would point out that C is one of the worst languages for this thanks to the preprocessor. Even a small change in one header file can force us to reinterpret the rest of the project. You can’t even validate a header file on it’s own because it’s validity depends on what other files were included in .c files and the order in which those files are included, etc. This is an awful property and C is one of the worst languages for this sort of thing.
So while I don’t deny you’re point has some validity, you’re application of it to criticize rust or other safe languages instead of C is extremely biassed. Also the fact that it hasn’t stopped us from developing intellisense for C and other hard languages that have poor local correctness properties means that the challenges can be overcome.
I’ve never said safe languages were 100% safe from errors they don’t check for, but IMHO that’s not a good reason to avoid them in favor of unsafe languages.
I don’t entirely agree with you here. Dynamically generated code can absolutely be safe and languages with JIT compilers demonstrate this all the time. I don’t think it’s necessary to think “safe languages need to be 100% of the code or else they cannot be used”. Even if we assume there are pieces that cannot be converted to safe, I don’t see it as an obstacle for safe languages any more than using assembly is an obstacle for C.
I’ve repeatedly acknowledged that there’s a cost in transitioning from one language to another, which can be extremely high. But when starting from scratch that goes away and I think the savings of safe languages are almost immediate in terms of protecting us from human error and those savings can really add up over time.
That’s an interesting assertion, but ultimately it doesn’t have a bearing on C versus safe language argument.
I know that high level languages used to produce very bad code but compilers are evolving all the time and have improved to the point where the vast majority of the industry has decided that low level optimization is not worth it. I think it’s fair to say the main use case for low level assembly is to access CPU features that the language doesn’t expose and/or hasn’t optimized for yet. Obviously not all optimizers are equal either, last I checked intel had one of the most advanced auto vectorization optimizers and I think you’d be hard pressed to beat it by hand. If you have data showing otherwise I’m certainly interested in that 🙂