Big news from the Debian release team: Debian is going for reproducible package builds.
Aided by the efforts of the Reproducible Builds project, we’ve decided it’s time to say that Debian must ship reproducible packages. Since yesterday, we have enabled our migration software to block migration of new packages that can’t be reproduced or existing packages (in testing) that regress in reproducibility.
↫ Paul Gevers
Reproducible means, in short, that you can verify that the source code used to build a package is indeed that source code. This provides a layer of defense against people tampering with code or otherwise trying to fiddle with the process between source code and final package on your system. This effort constitutes a tremendous amount of work, but it’s massively important.

If you go verifying the packages/builds, at that point you might as well run Gentoo 🙂
Serafean,
This is a very different thing, and I don’t think Gentoo does it either.
They have 100% perfect bit for bit reproducibility of Debian packages. Or at least that is the aim.
It is not that “we have this source to bin-utils, and it will compile into a binary that contains ls and such”
It is “we have these sources, exact compiler version, environment settings, host packages, …. and the output hash would be 0x123131231231231231231231231231312312”
Many packages will unfortunately not do that. Temporary paths, random numbers, hashing “current” directory… there are 100 different reasons the final output might differ bitwise.
Debian is now enforcing that stability, which is a very high bar.
sukru,
Obviously distros distribute binaries as an optimization and not doing so is costly, but when you have a distro that distributes source code and builds from source, it seems technically even better – the question of how binaries are generated and what source code is used answers itself.
https://www.linuxlinks.com/gentoo-flexible-source-based-linux-distribution/
I’m not able to read the article as the link in the article returns 403 – forbidden.
I don’t know if they had good examples to offer, but isn’t this the exception rather than the norm?
Most builds are already deterministic. I sampled some of my own linux projects and they were reproducible out of the box…and why wouldn’t they be? Yes you’re right random numbers would break reproducibility, but how often do you use anything random to build software? That seems to be an unusual requirement to me.
Conceptually I do see how a build process could use randomly generated directories, but even these randomized paths don’t normally impact the binaries that you distribute. The binary doesn’t normally contain paths from the developer’s machine. It might happen accidentally,/carelessly but I’m having a hard time thinking of cases where 1) you need to do this on purpose or 2) it’s hard to rectify.
I think it’s a good goal for Debian to have reproducible builds, but in my mind the main difficulty stems from how much software they have to build. So even a change the only affects a very tiny minority of them can be still cause many hours of work. But in principal reproducible software isn’t normally hard to achieve IMHO.
Alfman,
Technically Gentoo distributes pre-compiled binaries now,
https://wiki.gentoo.org/wiki/Gentoo_Binary_Host_Quickstart
And Debian provides source packages:
https://wiki.debian.org/BuildingTutorial
But, yes, their default mode of operations are different.
I assume those are your own projects, not public open source ones… but it does not matter.
Building the same code, in the same place twice might produce deterministic results.
However, can you also verify, it would still be 100% deterministic:
1 – If you build on another host machine?
2 – If you use a different source directory?
3 – If you use another user to build?
4 – tomorrow?
…. among other things?
(Many binaries would have for example the source code name like __FILE__, sometimes timestamps for build like __DATE_, might have constants set in “./configure” — which changes after a clean build, might have git repo tags, might have __USER__ recorded, might change wrt. -march=current,users might be using different languages, C=ALL ….)
Just trying “./make” twice usually does not have an effect. But even that might not be perfectly stable of course.
sukru,
#1 if you have the same build tools, then yes it should still be 100% deterministic. If you don’t use the same build tools, then there’s no expectation of reproducible binaries. In the context of a project like this it is ok to specify the environment and say “you’ll get reproducible binaries using the same XYZ environment we used”.
#2 Someone could intentionally inject paths from the developer’s machine. If software is found which does this, it would need to be corrected obviously, but this not something software normally does.
#3 Same as #2, someone could intentionally inject user information from the developer’s machine, but it’s really not normal practice.
#4 Of your cases, this one is seems like the most likely you’d find in the wild. The developer might include something like “Software built on 5/11/26” or have an incrementing build number. Such messages are obviously incompatible with reproducible binaries and would have to be removed.
#5 “Software pid under PID 18234”. This is just a catch-all to concede that many non-deterministic sources are possible, but while it’s technically true that they exist, build tools are generally not designed to change the output based on them and it doesn’t seem normal for developers to add sources of non-determinism to their builds. Is there evidence this is all that common?
#6 You didn’t mention this, but CPU feature detection can change the build output. This can be a source of non-determinism, but with the right environment variables most compilers can emit consistent code.
Naturally configure is very system specific, but as with #1, it’s ok (and even necessary) to specify the environment you are using to create reproducible binaries. configure scripts are normally designed to be deterministic In the event you find one that isn’t, obviously it would need to be fixed, but I think it would be rare to non-determinism with normal software.
Luckily it shouldn’t be too hard to test and debian can automate tooling to catch non-reproducible builds.
Alfman,
I’d suggest actually looking up why these are issues that need resolving.
Should is the operative word here. It should, but it doesn’t.
In real life, you’d have package put “%hostname%” somewhere, binary, code, documentation. Something that would ultimately affect the hash of that debian package.
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=962021
Here is was even more innocent. They used a fixed timestamps… but forgot to inlcude time zone. Not even the hostname, but UTC vs local broke reproducibility.
sukru,
I feel we’re talking too abstractly and would like some specific real life examples. I’m not denying that It’s possible for developers to add sources of non-determinisic behaviors to the build process, but normally one has no reason to do so and without more proof I’m not sure it’s all that common.
Yes, there are countless environment variables they could add $SHELL $PWD $USER $DESKTOP_SESSION… However it goes back to point #5 earlier. Their existence in and of themselves does not prove there is a widespread problem. More data is needed to show that it’s a significant problem.
Also I want to point out how easy it is to mitigate leaky environment variables in practice by sanitizing the environment for the build.
Now you can build software in a sanitized environment. Obviously you can set environment variables as needed. I don’t know whether Debian builds binaries in a clean environment like this, but it’s easy to do and addresses the problem you are talking where leaked environment variables might affect the build.
You can go further and build in a chroot, which incidentally is exactly what I do when building my linux distro. The build environment for all of my distro’s software is built from a Debootstrap environment.
https://wiki.debian.org/Debootstrap
I did not do this with the goal of creating reproducible binaries, I did it this way because it minimizes dependencies from the host OS. The chroot environment means I can easily launch the build process with very little prep or modification to the host. I don’t know if other distros do this, but it works and I think it’s a good idea. Reproducible binaries aren’t hard when you produce them from a standardized environment.
Alfman,
Did you skip the second half? I specifically kept it short, and only added one example.
If you want to dig deeper
[email protected]
This is their mail archive
https://alioth-lists.debian.net/pipermail/reproducible-builds/
And their status page
https://reproduce.debian.net/all/forky.html
It is not one off random package, reallly (they still have 414 ones after many years of work)
sukru,
Sorry I was multitasking.
In terms of this specific patch, it looks like they actually do want to keep timestamps, but they didn’t want it to be affected by system specific locale settings.
That’s fair enough but to this end I still think the sanitized environment I proposed in the last post would create the same date format no matter where it was run.
As for the clock itself changing, it should go without saying that real timestamps are fundamentally incompatible with reproducibility. But if we’re willing to give up real timestamps, then it’s not technically that difficult to solve with fake timestamps.
That status page shows that 230 of those packages are documentation packages, so those documentation packages probably include timestamps. To fix these, someone would either have to go in and remove the timestamp to make it reproducible or if we want a technical solution that solves everything in one go they could build everything under a fake but deterministic timestamp mechanism that ensures that everyone building the software would always output the same timestamps.
For example here’s a FakeTime.c library:
makefile:
And an example of what it can do…
This isn’t intended as a final solution, but there are ways to fake timestamps to make packages more reproducible – the documentation and build dates would show the wrong date, but at least everyone building the package would show the same date, which is what reproducibility is concerned with.
Alfman,
I’m not sure what to say.
Please do not focus on one example. Nor that “only” a few hundred of them were left after decade long work…
This is very prominent in almost all large codebases. One document there, another script here. One more diagnostic inside the binary…
It is a very big task. That is why only a few distros could actually achieve this. Debian is crossing a very major milestone.
sukru,
Most of those packages are documentation, It’s not really conclusive that these weren’t fixed because of difficulty. It seems quite plausible that the reason they weren’t fixed is because they just weren’t a big priority.
The data on the status page you linked to shows it to be relatively uncommon. You might argue that’s because the work is already done and maybe that’s true, but IMHO there needs to be more data before saying it’s widespread.
I feel like you keep ignoring my points to say this though. Do you disagree that sanitized environments can mitigate differences caused by local configurations? Why? Do you disagree that fake timestamps could make them deterministic? Why?
I do want to point out that due to the large number of packages involved, we can both be simultaneously right: non-deterministic behavior can affect many packages and also be a relatively small portion of software overall. My gripe is claims that aren’t backed by the data and I feel that comments like “This is very prominent in almost all large codebases.” cross this line.
Alfman,
Again… you are looking at the final stretch. Not the original scope… which is a Google away…
Here is a snapshot
https://youtu.be/YN1XiflsK2w?t=422
I’d recommend rest of the movie, as you can hear first hand why this was a challenge.
(And to why the LD_ hack not working? It sounds it should… but you’d need to remember for a build system to work, it requires consistent monotonic timestamps. Without actual functional timestamps, you don’t have a build system. You can also find at least 5 other reasons to why it would fall short.
Maybe look up “libfaketime”, debian’s own attempt at this, and why they gave up)
sukru,
I watched several minutes into the video and I still don’t see data showing what you are claiming.
From the video…
https://youtu.be/YN1XiflsK2w?t=633
He’s right. Packages don’t get fixed because they don’t have to be fixed. When they are required to be fixed or else risk being kicked out, that’s when it becomes a priority and not really before then.
The data from when this project begin in 2017 shows the vast majority of packages were reproducible almost a decade ago. Non-reproducible packages seem to be the exception rather than the norm. Can we just agree here?
(incomplete quote to reduce data entry & added dates)
Of course it’s still a lot of packages being non-reproducible, but going by the data provided in the video, we’re talking about a relatively small proportion of them.
I understand there are challenges, but can we agree with the presenter that the challenges are more political than technical? Being that most of the stragglers are document packages and fixing them was officially optional, it seems to me that most package maintainers really haven’t been bothered to fix it. Making it required for everyone might finally light a fire under stragglers to deal with non-deterministic builds or drop them if nobody is bothered.
Fake timestamps can be used for builds. The percentage of software that cares about real time would be relatively small and could be dealt with on a case by case basis.
sukru,
This?
https://github.com/wolfcw/libfaketime
https://manpages.debian.org/bookworm/faketime/faketime.1.en.html
I’m not quite sure what attempt you are referring to, do you have a link? A challenge like this sounds like fun to me 🙂
Alfman,
I hope you don’t get me wrong, but for this last response, I had to consult Gemini. It is getting very late here.
Frankly, I don’t like it, so I’ll add a few more sentence, but my head it already groggy from sleeplessness
Why would they spend an entire decade+
Attend many conferences, hold workshops,
Have popular article on hacker news,
Have to convince 4,000 + projects to change their build setups,
And have their current achievement celebrated all over the tech media…
If this was trivial?
sukru,
I confess for being repetitive… it’s generally unnecessary for compilers to do that though, and while developers can do this to themselves, I’m not seeing evidence that it’s widespread.
Hmm, local build tools don’t need accurate file timestamps, only relative consistency matters. I don’t think I need to explain that though. I find it weird that gemini said this, it should know better. I suppose it might not have enough sources to draw from.
I agree, getting upstream projects involved is time-consuming work. Although I don’t feel gemini adequately made the case that local environment mitigations aren’t a viable way to limit non-deterministic sources leaking into the build process.
I think it’s one thing to suggest that a change could be trivial to implement, but another to suggest that it’s trivial to get 3rd party devs on board to actually do it in a timely manner. This is what I got out of the presenter’s slide that said “100% reproducible is a political decision and nothing technical. We need to change debian-policy!”
Alfman,
(Ah… I can’t sleep, and tomorrow would be a difficult day)
Long story short… working with people is much harder than working with code. And those bug reports would require significant engineering time.
It is not just “please do not use $TIMESTAMP” here. It will probably be days or weeks long pushback, including threats to drop them from debian.
[…] there are 100 different reasons the final output might differ bitwise.
The big one is timestamps.
Also, Debian started on reproducibility back in 2013 and was like 90% of the way there ten years ago. It just took Debian until now to be certain enough that everything could be built reproducibly to make non-reproducibility a blocker.
Gentoo, AFAIK, is till debating whether reproducibility is something they want (Portage uses timestamps to e.g. determine what needs to be updated, so reproducibility will involve a lot of work).
Brainworm
Wow. I know this is hard work. But I did not realize it took them a decade. But, yes 80/20 rule means that last few ones would be much harder to do.