On C extensions, portability, and alternative compilers

Thom Holwerda 2026-05-25 General Development 14 Comments

Anyone who’s written C knows that full ISO C standard-adhering code is an impractical rarity. Most real world C code out there relies on non-standard behaviors and language extensions to varying extents, and a lot of this isn’t for extra features, but just to work around bugs and gaps in different compilers and libraries. A lot of codebases will try somewhat to support various environments, mostly through the use of preprocessor checks and guards, but these attempts are finicky at best and straight up broken at worst.
I have ran into many of these situations while working on my C compiler, so here’s a small list of some of them.
↫ lemon/Sofia

Sometimes I wonder how computers even get anything done at all.

About The Author

Thom Holwerda

Follow me on Mastodon @[email protected]

14 Comments

2026-05-25 12:57 pm
sukru
Looking at the blog post, it not the compilers that are at fault, but the library writers detecting any required features. And it is an issue as old as time.
They proxy testing for specific compilers and versions for testing for existence of features.
Just from the tip: __attribute__((packed)) is required for a Linux syscall. But only way to support it in glibc is being this this very specific list: defined __GNUC__ || defined __clang__ || defined __TINYC__. Which means, compilers have 3 options:
1 – “Fake” one of these tags, but then have to match idiosyncrasies of that particular compiler
2 – “Monkey patch” GLIBC headers while parsing them, adding brittle changes for these specific regions
3 – Accept they won’t be able to support printf(“Hello world\n”) on glibc
The actual solution would be library owners using ac-config (configure) or similar mechanisms to test for features, which would not only overcome these limitations, but also make the code simpler (have you seen the auto detection headers in those libs? The article also mentions that)
This is expected for any compiler + low level framework code. They are the ones after all that hide the cross-platform complexity. But it is good to see a compiler writer (albeit a smaller one) calling this out.

2026-05-25 1:02 pm
sukru
Btw, according to Google, the following now exists in the standard:
__has_attribute(),
__has_builtin(),
__has_feature()
So, they did not even need those. This is not entirely on glibc.

2026-05-25 1:40 pm
Kochise
It just shows the fragmentation of C “standard” because it lacks of many obvious features, yet integrating other languages tweaks and functionalities to “stay relevant”. It took C23 to define officially what true and false should be, until then it was a “define” up to the coders’ interpretation.
I guess the best way ti the compiler to provide its own headers and libraries to ensure perfect integration.

2026-05-25 2:38 pm
Alfman verbose=1
Kochise,
It just shows the fragmentation of C “standard” because it lacks of many obvious features, yet integrating other languages tweaks and functionalities to “stay relevant”.
Yes, this. C was underspecified (understandable given that it was a product of the wild west in early days). Over time C’s inadequacies lead to language fragmentation and evolutionary baggage; we’re suffering the consequences of this today. Even the include files and macro pre-processor are downright awful compared to features in other modern languages. It’s practically unfixable without going back to foundations.
New languages have taken the opportunity to improved on C and C++. C# is such a better language, D-lang, etc. However their main negative has been a reliance on managed memory & garbage collection, which is seen by system devs as a crutch for low level and real-time work. These are fair criticisms. Now though we can have safe languages that offer compile time safety and don’t require run time memory management: RustLang and SPARK (offshoot of Ada). This is a major evolutionary leap, however most of us are still stuck on antiquated C.
I see two paths forward:
1 – The world eventually switches to languages with cleaner implementations and robust safety leaving C/C++ behind. It’s a really hard habit to break though and will likely take generations to accomplish.
2 – C doesn’t get replaced, but instead just keeps getting new layers of evolutionary baggage. This is less optimal than a having a clean slate, but C may get away with it because it’s already wins the popularity race by default.

2026-05-25 3:31 pm
sukru
Alfman,
Over time C’s inadequacies lead to language fragmentation and evolutionary baggage; we’re suffering the consequences of this today.
This is true
New languages have taken the opportunity to improved on C and C++. C# is such a better language… reliance on managed memory & garbage collection
This is an unfortunate marketing mistake. C# is perfectly capable of writing low level code, even perfectly good microkernels:
https://www.microsoft.com/en-us/research/project/singularity/
https://github.com/dz333n/Singularity-OS
However due to timing they sold this as a “better Java” (which was also true), even though C# has much better integration with low level code, and can actually be compiled to native since day one.
It specifically has solutions for all these even in C# 1.0 (struct, explicit layouts, calling conventions, and so on)
C doesn’t get replaced, but instead just keeps getting new layers of evolutionary baggage.
I would say we are about to experience the opposite.
C has accumulated enough “baggage” that it is now in “cleanup” mode, compacting the core language, and making it more fit for modern programming.
It took them a while, but learning lessons from its progeny like C++, C#, or even Rust (which is actually coming from Modula/OCaml/Scala heritage).
The latest C23 is an example for that.
Kochise
I guess the best way ti the compiler to provide its own headers and libraries to ensure perfect integration.
That is true. At least for a specific subset (like fundamental types) they should maintain a “core” libc.
It might still be based on glibc, and be maintained as a branch, or set of patches. But without specific optimizations they will continue to linger.
On C23, though…. I would disagree.
They removed things like K&R syntax support, no auto ints, no function calls without prototype…
2026-05-25 3:57 pm
Alfman verbose=1
sukru,
This is an unfortunate marketing mistake. C# is perfectly capable of writing low level code, even perfectly good microkernels:
I’m a fan of Singularity and proponent of it’s building blocks even before I had learned about Singularity. However, It’s not a mistake, Garbage Collection, despite making robust software far easier to deliver, is associated with notable cons that make it less than ideal for low level and real time applications. It also tends to require a lot more memory. The benefits are true, but so are the cons.
even though C# has much better integration with low level code, and can actually be compiled to native since day one.
It’s not enough to “integrate with low level code”. We need robust foundational languages that can actually build low level code, not just integrate with it. It’s a subtle but meaningful distinction. Visual basic could integrate with low level code, but nobody was seriously proposing low level code be developed in visual basic.
Don’t get me wrong, I love C# (other than it’s connection to MS). It’s so much better than C, which all too often it comes down to fighting the same old bugs and problems that have been solved long ago with more modern languages.
C has accumulated enough “baggage” that it is now in “cleanup” mode, compacting the core language, and making it more fit for modern programming.
It took them a while, but learning lessons from its progeny like C++, C#, or even Rust (which is actually coming from Modula/OCaml/Scala heritage).
The latest C23 is an example for that.
We’ll see how the future turns out, but I’m not convinced that C’s baggage will ever entirely go away because it runs so deep.
They removed things like K&R syntax support, no auto ints, no function calls without prototype…
Thank god for that…but these were never used in modern contexts so getting rid of them was easy. The far bigger problem is fixing the foundational parts of the language that are in widespread use like macros and include files. Hypothetically we can create new modes to fix them, but it would further fragment the language in order to remain compatible with preexisting code. The resulting language would be worse and more complex than a new language that didn’t have to be compromised by antiquated precedent.
2026-05-25 4:43 pm
sukru
Alfman,
Garbage Collection, despite making robust software far easier to deliver, is associated with notable cons that make it less than ideal for low level and real time applications
That is true.
And that is why I mentioned C#. It can actually work without garbage collection, can work entirely with “malloc” analogues,
And has first class support for for full stack allocation (few languages do, C/C++ has it (alloca),
Java did not have it.
Rust does not yet have it to this day (not allocation of local variables, but variable length structures and dynamic allocation on stack)
So, C# was years ahead, literally in day-1 for systems programming.
You can write a fully functional kernel (again Singularity and others) with basic C# and a small amount of extra.
Thank god for that…but these were never used in modern contexts so getting rid of them was easy. The far bigger problem is fixing the foundational parts of the language that are in widespread use like macros and include files
Yes, it will be a challenge.
Macros are fine though, but they are overused. And like every other thing “too much of a good thing” is counter-productive.
They are already fragmenting the language, many C99 code will not compile on C23. But they will be strategic about it.
Who knows? Maybe they will fix variadic arguments, but they should focus on cutting more first.
In infamous bugs caused by octals?
012 => 10!
Entirely drop strcpy and other broken base functions? (No more copying without buffer sizes)?
I don’t know, but there are many low hanging fruits most C developers can agree on
(And upgrading is a challenge, and LLM can truly be helpful here)
2026-05-25 6:31 pm
Alfman verbose=1
sukru,
And that is why I mentioned C#. It can actually work without garbage collection, can work entirely with “malloc” analogues,
Anyone can implement alloc/free in C# but as far as I know, C# is only safe with Garbage Collection. Otherwise the consequences of disabling it will nullify the safety benefits of C#.
https://github.com/gosub-com/DlMalloc
https://learn.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.marshal.allochglobal
C# and Rustlang achieve safety very differently with different tradeoffs. Both give us a way to execute unsafe code and it’s doable; a legacy C/C++ programmer might even be tempted to do this and treat C# as though it were C/C++, but that’s bad practice and isn’t encouraged.
And has first class support for for full stack allocation (few languages do, C/C++ has it (alloca),
Even in unmanaged languages, stack objects have a natural beginning and end and don’t require GC as heap objects do. alloca lets you dynamically append to the stack frame but it’s lifespan is still bounded by the stack and it still doesn’t require GC. This is useful, but not a replacement for GC. Most software will still need heap allocations.
So, C# was years ahead, literally in day-1 for systems programming.
Well, you don’t get to have your cake and eat it too. Sure you can treat C# like an unmanaged language to defeat the cons of managed code, but then you also loose all the benefits of managed code. This would Introduce all manor of legacy memory bugs in C# software that doesn’t normally have them. I concede we don’t live in an ideal world, but as a best practice unsafe code should only be used very sparingly and be easy to audit. This breaks down very quickly when we resort back to unsafe C/C++ coding practices.
Macros are fine though, but they are overused. And like every other thing “too much of a good thing” is counter-productive.
They’re not fine though. Modern languages have more capable syntactically consistent and typesafe macros.
Who knows? Maybe they will fix variadic arguments, but they should focus on cutting more first.
In infamous bugs caused by octals?
012 => 10!
Entirely drop strcpy and other broken base functions? (No more copying without buffer sizes)?
I don’t know, but there are many low hanging fruits most C developers can agree on
There’s a whole lot to fix, but often times the new replacements in C still end up being more complicated and cumbersome than other modern languages. I write a lot of C code to process strings because it’s such a basic task in programming, but C deserves to be close to the top of a list for programming languages with the worst string handling. Everything about them is manual, tedious, and error prone. As always, we can make inferior tools work, but to what end?
We keep subjugating future generations to antiquated tool just because we don’t want to change. Ultimately I believe older generations are going to keep clutching C, and newer generations have much less love for it. Change will happen when C’s most ardent defenders eventually retire.
2026-05-25 10:34 pm
sukru
Alfman,
I’ll have to “cheat” and use .Net 6:
NativeMemory
Allocate and wrap the memory in a specific unsafe block, and use it natively in other parts of C# code with full benefits of the sandbox
For example, someone implemented a NativeArray which can also to mmap access to file system:
https://github.com/Cysharp/NativeMemoryArray/blob/master/tests/NativeMemoryArray.Tests/NativeArrayTest.cs
I think this is very similar to what “safe” languages like Rust or Swift do.
but C deserves to be close to the top of a list for programming languages with the worst string handling
I agree, but there are worse ones out there. Original PASCAL comes to mind (not modern Pascal), or BASH scripts (especially earlier ones)
2026-05-25 11:45 pm
Alfman verbose=1
sukru,
Allocate and wrap the memory in a specific unsafe block, and use it natively in other parts of C# code with full benefits of the sandbox
For example, someone implemented a NativeArray which can also to mmap access to file system:
Wrappers can shift the problem elsewhere but doesn’t truly eliminate it. You can wrap native memory inside of safe languages, but then if you fail to use the proper semantics for unmanaged memory you could be exposing your managed code to the memory faults of unmanaged code.
I wasn’t actually familiar with this class, but this summary confirms that you do give up C#’s normal memory protection to use it.
https://dev.to/vercidium/using-nativememory-in-net-1o2j
.NET 7 introduced a new static class called NativeMemory, which can be used to allocate unmanaged memory.
Normally when working with arrays in C#, the .NET runtime manages the allocation of memory and frees it when it’s no longer used, hence the name managed memory.
When using NativeMemory, you’re on your own. The .NET garbage collector won’t free it when you no longer need it, and the .NET runtime won’t perform bounds checks when reading or writing to it. The advantage is it’s faster to work with as the bounds checks performed by .NET aren’t present.
I think this is very similar to what “safe” languages like Rust or Swift do.
You might say that except that rust can access the unmanaged heap safely. Unlike C# rust has a static code analyzer that enforces proper memory semantics at compile time so using the standard heap is safe. The compiler verifies that the code follows safety rules eliminating the need for runtime checks for the vast majority of cases.
It would be neat to see a variant of C# that works this way too, but considering that virtually all existing C# code depends heavily on garbage collection, I’d expect all software would need to be rewritten.
2026-05-26 12:07 am
sukru
Alfman,
Wrappers can shift the problem elsewhere but doesn’t truly eliminate it. You can wrap native memory inside of safe languages, but then if you fail to use the proper semantics for unmanaged memory you could be exposing your managed code to the memory faults of unmanaged code.
I wasn’t actually familiar with this class, but this summary confirms that you do give up C#’s normal memory protection to use it.
https://dev.to/vercidium/using-nativememory-in-net-1o2j
That is why you wrap this in helpers. Again this is true for other languages like Rust, C++, or Swift as well.
You take the NativeMemory, which has no safety,
Wrap it inside a safe structure (like that NativeArray github example), which is heavily scrutinized
And rest of the code gets the benefits of both worlds. Static and runtime checks + low level access (gated)
In that example, it is paired with equivalent of Span[T], which always knows the bounds of that memory region, and ensure it is adhered to. It will also keep track of users and automatically de-allocate it (or you can use a move only type)
Thinking back, this is more or less how std::unique_ptr / std::shared_ptr works in C++ (but those still have escape hatches in the safe wrapper, like get(). The C# one does not even offer that, after closing the hatch, there is no opening it back)
I’m not advocating using raw memory pointers, but using safe wrappers around them.
Is there still GC?
Well… this is closer to RAII, especially with using pattern
2026-05-26 12:31 am
Alfman verbose=1
sukru,
That is why you wrap this in helpers. Again this is true for other languages like Rust, C++, or Swift as well.
…
I’m not advocating using raw memory pointers, but using safe wrappers around them.
You can use safe languages to do that, but the thing is that’s the wrong idea for how safe languages should be used. Granted we’re forced to use a wrapper to be compatible with pre-existing unsafe code, but you are speaking as though you want to make such wrappers the goto tool for building software in “safe” languages, but it’s antithetical to safety principals. Wrapping unsafe code is a means to cross the bridge but wrappers are not what a safe code base should end up being.
This is what I was trying to convey before when I said “It’s not enough to ‘integrate with low level code’. We need robust foundational languages that can actually build low level code, not just integrate with it”. Software engineers should not be thinking of safe language code as a wrapper for unsafe code, that’s NOT it’s driving purpose. The goal is building safe software from the ground up. There will be exceptions of course, but these should be few and far in between – like assembly is to C.
2026-05-26 3:01 am
sukru
Alfman,
This is what I was trying to convey before when I said “It’s not enough to ‘integrate with low level code’. We need robust foundational languages that can actually build low level code, not just integrate with it”.
Do you mean no unsafe or assembly code all the way to the kernel and BIOS?
If so, it should be easy to see why that is impossible.
It is a chain of 2 legged chairs. We can extend this chain as much as we want, but someone with 4 legs have to stand at the back.
2026-05-26 4:20 am
Alfman verbose=1
sukru,
Do you mean no unsafe or assembly code all the way to the kernel and BIOS?
I didn’t say none, I said few and far in between. And yes that would even in the BIOS. You may require some unsafe primitives, but the bulk of it should be using safe primitives.
If so, it should be easy to see why that is impossible.
Even when you are writing software with low level accesses to hardware, the amount of code that needs to have direct access to raw memory is relatively minute and with good abstractions it’s often reusable so I don’t agree with the implication that there needs to be tons of unsafe code. Even in the BIOS most of the code is going to be comprised of plain old functions and structures that can use safe primitives. Granted BIOS makers may not have much interest in rewriting their BIOS around rust, but at least in principal there’s no reason a compile time safe language shouldn’t be used there.