Exploring O3 optimization for Ubuntu

Thom Holwerda 2024-08-04 Ubuntu 11 Comments

Following our recent work 5 with Ubuntu 24.04 LTS where we enabled frame pointers by default to improve debugging and profiling, we’re continuing our performance engineering efforts by evaluating the impact of O3 optimization in Ubuntu.
O3 is a GCC optimization 14 level that applies more aggressive code transformations compared to the default O2 level. These include advanced function and the use of sophisticated algorithms aimed at enhancing execution speed. While O3 can increase binary size and compilation time, it has the potential to improve runtime performance.
↫ Ubuntu Discourse

If these optimisations deliver performance improvements, and the only downside is larger binaries and longer compilation times, it seems like a bit of a no-brainer to enable these, assuming those mentioned downsides are within reason. Are there any downsides they’re not mentioning? Browsing around and doing some minor research it seems that -O3 optimisations may break some packages, and can even lead to performance degradation, defeating the purpose altogether.

Looking at a set of benchmarks from Phoronix from a few years ago, in which the Linux kernel was compiled with either O2 and O3 and their performance compared, the results were effectively tied, making it seem not worth it at all. However, during these benchmarks, only the kernel was tested; everything else was compiled normally in both cases. Perhaps compiling the entire system with O3 will yield improvements in other parts of the system that do add up.

For now, you can download unsupported Ubuntu ISOs compiled with O3 optimisations enabled to test them out.

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

11 Comments

2024-08-04 12:43 pm
Alfman verbose=1
Thom Holwerda,
Are there any downsides they’re not mentioning?
Testing is very important because it can break some binaries, although you already mentioned this. It’s also common for -O3 to interfere with debugging. Often the debugger cannot inspect variables correctly. Other times it reports a completely incorrect execution logic that does not follow along the original code path because the underlying assembly does not have a 1:1 relationship to the high level code. Ultimately this creates sticky debug situations. I’ve struggled with debugging a bug in one project that was only happening in production with -O3 but there was no bug otherwise. When bugs only present themselves using compiler flags that are unfriendly to debuggers, those bugs can be extremely difficult to iron out.
The purpose of -O3 is to improve runtime speed and sometimes it helps, but not always. A side effect of larger binaries is potentially more cache/tlb misses and other associated performance reductions, which could be more noticeable under heavy multitasking due to context switching. Other compiler options I like to turn on are fast-math, which allows the compiler to perform a number of optimizations while ignoring edge conditions that many real programs don’t care about.
https://kristerw.github.io/2021/10/19/fast-math/
All these settings have tradeoffs, otherwise we wouldn’t need a setting for them. -fomit-frame-pointer is another that can affect debugging.
https://gcc.gnu.org/onlinedocs/gcc-3.4.4/gcc/Optimize-Options.html
For these reasons it might be nice to disable optimizations just when debugging, but you can run the risk of debugging a binary that doesn’t run the same way as when optimizations are turned on. Such a dilemma, haha.
2024-08-04 12:48 pm
malxau
Recent and relevant: https://blog.cr.yp.to/20240803-clang.html
The TL;DR is that performance improvements need to be evaluated against loss of correctness, which is very subtle and hard to detect. Also a link to an earlier talk suggesting that the performance win is minimal because the vast majority of code is not part of an inner, hot loop.
AFAICT larger binaries take longer to load, increase memory pressure, etc. So the win from optimizing hot code is offset by making cold code slower, and there’s a lot more cold code than hot code.

2024-08-05 12:52 pm
Alfman verbose=1
malxau,
From your link…
Compiler “optimizations”. Try skimming through recent changes to LLVM and GCC. You’ll find “optimizations”, and tests for “optimizations”, and fixes to tests for “optimizations”, and fixes to bugs in “optimizations”.
The bugs admitted in the compiler changelogs are just the tip of the iceberg. Whenever possible, compiler writers refuse to take responsibility for the bugs they introduced, even though the compiled code worked fine before the “optimizations”. [2024.08.03 edit: Added more links here.] The excuse for not taking responsibility is that there are “language standards” saying that these bugs should be blamed on millions of programmers writing code that bumps into “undefined behavior”, rather than being blamed on the much smaller group of compiler writers subsequently changing how this code behaves. These “language standards” are written by the compiler writers.
It makes me chuckle this is exactly what’s happening in the comments.
Given that so much of C’s behavior is left undefined for corner cases, technically compilers are free to interpret things however they please – breaking things is allowed and it’s our fault for making assumptions. This isn’t a great answer though and my own opinion is that programming languages should do as much as possible to stomp out undefined behaviors favoring more obvious interpretations.. Optimizations shouldn’t conceal anything surprising. I’d be ok with compilers offering those optimizations but it should probably be done more explicitly, kind of like rustlang’s unsafe declaration. We’re allowed to do it, but it’s much clearer that we need to be extra careful around such code.. Of course C/C++ notoriously did not evolve under this philosophy.

2024-08-06 1:14 am
malxau
The part that resonated with me most strongly was in the linked 2015 talk, saying that in the modern world the role of the programmer is to write an architecture neutral algorithm, and the role of the compiler is to turn it into an optimized, architecture specific one. This is a subtle but significant shift: C started as a high level language used to write a UNIX kernel that simultaneously strived to be portable but also supported generating the architecture specific semantics that kernels need.
After noticing the mindset shift, a lot of the outcomes start to make sense. (Integer overflow is architecture specific, therefore UB, therefore we don’t need to preserve bounds checks, because those aren’t architecture neutral.) This is jarring for my generation who grew up seeing #if _PPC_ followed by architecture specific assumptions, which is no longer expressible at all. We still need those bounds checks, it’s just that they can’t be expressed in C. Presumably we need to go back to linking C against architecture specific asm stubs, sigh.

2024-08-06 4:22 am
Alfman verbose=1
malxau,
The part that resonated with me most strongly was in the linked 2015 talk, saying that in the modern world the role of the programmer is to write an architecture neutral algorithm, and the role of the compiler is to turn it into an optimized, architecture specific one.
That makes sense.
Sometimes I think about ways to build a language that isn’t meant to be directly compiled but instead acts as a specification for the compiler to write the software. The compiler would choose things like memory layout, data types, calling conversions, data structures, algorithms based on functional requirements in the specification. The compiler would have a database of known algorithms at it’s disposal as well as some heuristics to decide when to use one versus another.
The idea was that programmers only concern themselves with writing a correct specification but the compiler would try and automatically find the best solution to achieve it on a given target. On paper the programming interface began to take the shape of a graph database and instead of writing code you produce output or modify state with queries and constraints, and the compiler figures out the details, kind of like how SQL works. I never implemented it. It seems like a very different way of writing software and I’m not sure how well it would work but It’d be an interesting project,
We still need those bounds checks, it’s just that they can’t be expressed in C. Presumably we need to go back to linking C against architecture specific asm stubs, sigh.
When I was learning C I do remember the language shortcomings striking me as odd. Pascal, which I learned first had overflow checking. The x86 assembly I also learned has overflow flags, but C has no standardized way of exposing this information. Even the size of int types are only loosely defined for C. At least there’s stdint.h.

2024-08-04 9:39 pm
Mote
From my user experience perspective canonical would be orders of magnitude more effective if they got rid of their snaps.

2024-08-05 7:51 pm
Sysau
Like Debian?

2024-08-04 10:17 pm
lefticus
As a professional C++ trainer and long-time reader of OSnews, I felt compelled to comment here.
If -O3 “breaks your code” it is virtually guaranteed that your C or C++ program has undefined behavior in it (this is defined in the standard and includes things like uninitialized reads, reads after free, etc).
Our compiler optimizations are extremely well tested and many of them are actually proven (with theorem provers) as being correct. In over 20 years of C++ development myself, I’ve seen I think 0 cases where the optimizer broke correct code (it certainly does happen, I won’t claim otherwise).
I have seen niche C compilers break correct code with optimizations, but that is not the same thing as a mainstream compiler like GCC or clang.
Counterintuitively, -O3 can actually result in smaller binaries as well, depending on the nature of the code. In heavily templated, header-only libraries I often see this.
But to the point of the article – yes, -O3 can actually result in slower code too. Generally the optimizer’s inlining calculations are good at estimating the actual cost, but each package should be evaluated separately if ultimate performance is the goal.

2024-08-04 11:37 pm
Alfman verbose=1
lefticus,
As a professional C++ trainer and long-time reader of OSnews, I felt compelled to comment here.
If -O3 “breaks your code” it is virtually guaranteed that your C or C++ program has undefined behavior in it (this is defined in the standard and includes things like uninitialized reads, reads after free, etc).
I can partially agree. It’s the same as saying correct C/C++ code should work when compiled for another architecture, but sometimes subtle problems creep in anyway. Like memory fence behavior etc.
https://stackoverflow.com/questions/286629/what-is-a-memory-fence
https://www.codewithc.com/c-concurrency-detailed-study-of-memory-fences/
Most of the developers assume (rightly or wrongly) that the compiler will execute their instructions as given, but optimizing compilers can break that assumption, which is made absolutely clear when you attempt to debug an -O3 binary.
It’s fair to blame humans for making errors, but on the other hand given sufficient complexity no human programmer generates perfect code at scale. Enabling -O3 can expose bugs, for the sake of argument I’d turn the tables and say an “optimization” that doesn’t execute the same way is not actually a faithful optimization of the original code when the results are demonstrably different. Bug or no bug, the compiler itself should know this. But ultimately C/C++ allow programmers to shoot themselves in the foot. Correctness gets left as an exercise for error prone humans. Personally though I am a proponent of safe languages. These make an effort to flag “undefined behaviors” as compiler errors, which IMHO is as it should be because C/C++ undefined behaviors have been a major source of software exploits.
2024-08-05 10:55 am
Bill Shooter of Bul Platinum Prime
When you say “Our compiler optimizations” are you referencing GCC in particular? I can guarantee you Borland C++ 4.5 did not meet that level of quality. Nor did code warrior for that matter. I don’t doubt that *most* of my code breakage on -o3 was my fault. But sometimes compilers do screw up. Probably more recent when I was exploring optimizations in the 90’s/early 2000’s.

2024-08-05 10:51 am
Bill Shooter of Bul Platinum Prime
I tried this with gentoo back in the day, it did not go well, but it was a “fun” way to spend a week after work.