OpenBSD and slopcode: raindrop to a torrent?

Thom Holwerda 2026-05-11 OpenBSD 38 Comments

Every single software product is dealing with the question about what to do with “AI”-generated code, but the question is particularly difficult to answer for open source operating systems like Linux distributions and the various BSDs, which often consist of a wide variety of software packages from hundreds to thousands of different developers. On top of that, they also have to ask the “AI” question for every layer of their offering, from the base install, to the official repositories, to community-run ones.

As users, we, too, are asking these same questions, wondering just how much “AI” taint we’re willing to spread across our computers. I understand the difficult position Linux distributions are in with regard to “AI”. I mean, when even the Linux kernel itself is tainted by “AI”, a no-“AI” policy is basically an empty gesture for them at this point. Personally, I find a policy of “we don’t do ‘AI’ in our work, but we don’t have control over the thousands of components we consist of” to be an entirely reasonable, if deeply unsatisfying, position to take. What else are they going to do? You can’t really be a Linux distribution without, you know, the Linux kernel, which is, as I’ve already said, utterly tainted by “AI” at this point.

Still, in the back of my mind, I always had a trump card: if all else fails, we’ll always have OpenBSD. Its project leader Theo de Raadt is deeply principled, every OpenBSD user and contributor I know hates “AI” deeply, and the project routinely sticks to their principles even when it’s difficult or inconvenient. Yes, this makes OpenBSD not the most ideal desktop operating system, but I’d rather use that than something that embraces the multitude of ethical, environmental, quality, and legal concerns regarding “AI” code completely.

Imagine my surprise, then, to discover that OpenBSD already contains slopcode in its base installation, with the project’s leaders and developers remaining oddly silent about it. My friend and OSNews regular Morgan posted this on Fedi a few days ago:

Nearly six weeks later, and the question of whether “AI” generated code in tmux — not tool-assisted bug finding, not refactoring, actual LLM-generated slop with questionable license(1) — that was consequently merged into OpenBSD base, is considered acceptable by the lead devs, remains unanswered. Despite Theo de Raadt’s concrete stance against any code of questionable license origin polluting the project — and the tmux merge was indeed questionable — it seems this is being swept under the rug. This makes me extremely uncomfortable; it’s like seeing a fox in the henhouse but the farmers are all looking the other way and no one can convince them to admit they can see it and root it out.
I really don’t know what to do being just a user; I feel like even if I tried to chime in on the mailing list I would just be ignored like the others trying to raise the alarm. I hope, as they do, that this is being discussed internally, away from the public list, and that a positive outcome is near. Maybe they are waiting for the 7.9 release before setting anything in stone.
Or maybe the “AI” disease has infected one of the last pure operating system projects we have left and there’s no going back.
↫ Morgan on Fedi

I obviously share Morgan’s concerns, and like him, I’m also afraid that opening the door to a few drops of slop in base will quickly grow into a torrent of slop as time goes by. Yes, it’s just a patch to tmux, but it’s in base, and the “base” of a BSD is almost a sacred concept, and entirely the last place where you want to see code that raises ethical, environmental, quality, and legal concerns. For all we know, this patch of slop or the next one contains a bunch of GPL code because it just so happens that’s where the ball tumbling down the developer’s pachinko machine ended up.

GPL code that would then be in the base of a BSD.

I echo the call for the OpenBSD project to address this problem, and to set clear boundaries and guidelines regarding “AI” code, so users and developers alike know what level of quality and integrity we can expect from OpenBSD and its base installation going forward.

About The Author

Thom Holwerda

Follow me on Mastodon @[email protected]

38 Comments

2026-05-11 9:29 pm
drstorm
AI taint? I mean, I truly don’t understand what you mean. You ask AI to do something, you look over the result. Looks good. Passes tests. Someone else reviews, you push. What’s the problem?
Log in to Reply
2026-05-12 12:35 am
ansidotsys
Thom, your recent posts follow a consistent pattern. You state a strong principle, then demand that others uphold it while refusing to act on it yourself. The OpenBSD piece treats AI-generated code as something that must be kept out of base at all costs. It invokes Theo de Raadt’s principles and the project’s long history of rejecting anything with questionable origins. That standard is clear.
The problem is what happens when it is actually tested. Theo did not wait for NetBSD to change. He forked the project and did the work himself when forking required far more effort than it does today. The people who rejected systemd did the same with Devuan. Both accepted the cost of their convictions instead of demanding that the existing project accommodate them.
Those examples required real effort. That effort no longer exists. AI has lowered the cost of acting on principle dramatically. If the tmux patch is unacceptable, the tools now exist to maintain a fork that removes it. If the knowledge or resources are missing, the same tools being criticized can be used to close the gap quickly. The practical barriers that once made acting on principle difficult have been reduced to almost nothing. What remains is only the willingness to do it.
The consistent pattern across these posts is to demand that others change their behavior rather than changing your own. Refusing to use those tools while insisting others should do the work for you raises a simple question. If the ethical problems are as severe as you argue, why is the one person most motivated to fix them also the one least willing to use every available means to do so?
That suggests the actual goal was never the software, but the satisfaction of holding a position from which other people can be judged.
Log in to Reply

2026-05-12 3:14 am
OlaTheGhost
Thom has repeatedly stated that he is not a programmer so what are you on about? Sources singing the glory of AI can’t be hard to find so I feel that some people just can’t handle a handfull of dissenting voices.
Log in to Reply

2026-05-12 5:03 pm
drstorm
The cynical critics are dime a dozen. So are the tech bros. Very difficult to find the reasonable middle.
Log in to Reply
2026-05-12 8:04 pm
ansidotsys
Thom has said he is not a programmer. That used to be a real barrier. It isn’t anymore. The same AI tools being criticized can be used to learn, to read code, to understand commits, and to build alternatives. You can start small and iterate with help that was never available before.
The time spent writing posts in this thread could have been spent using AI to understand the very commit in question. The agency is there. The choice not to use it is deliberate.
The irony is that Morgan himself admits he cannot read the code commit in question (in another post under this article) but still holds a strong opinion about it. He cannot evaluate the code himself, and he refuses to use the one tool that would let him do so. AI could help him understand exactly what is in that commit and whether it poses the legal risk he claims to care about. Instead, he demands a human review a commit he wouldn’t be able to read even if someone did it for him.
The tools are right there. The barrier is no longer knowledge or skill. It is the refusal to use what is available because the problem itself is more useful than the solution. “I am not a programmer” is no longer a valid excuse.
Log in to Reply

2026-05-13 3:53 am
sixtyfive
Early adopter here of using LLMs for the kind of embedded work I wasn’t able to do before them. But, you know what, I do understand Theo and Morgan and people who feel like them. Firstly, this is not an easy transition to make for a lot of people because you have to surrender a lot of control, which goes right to the core for some and not others.
Secondly, at least two of their arguments (the environmental and potential socioeconomic impact ones) are so freaking valid, it should have everyone concerned a lot more. Them being unwilling to use LLMs *and* demanding others stop using them, too, that’s the only coherent thing they *can* do without causing themselves massive cognitive dissonance.
The same thing happened when smartphones became a thing: some said “I won’t use those evil things” while most of us started getting them. And now look at the sad discussions about child development and screen time we have to have because they’re so widespread. So those of us who do use LLMs should be a little more humble about it.
Log in to Reply

2026-05-13 1:51 pm
ansidotsys
Fair points, but I think the surrendering control part is only an illusion once you get passed the initial fascination about AI. There are many guardrails that can be instituted and built, which are now coming to production. Many of these solutions are not off-the-shelf open source projects yet, but many enterprises are already employing.
Contract-based programming and verifiable function development have long been concepts in computer science and they will be even more achievable today with today’s current tools. These can help validate and confirm not only AI output, but the reasoning paths that they have gone through to verify a clean provenance of the results themselves.
I don’t disagree that AI doesn’t bring its own set of problems that need solving. But I heavily disagree that we should stick our heads into the sand and act like it doesn’t exist, because its benefits far outweigh the costs.
But you’re right, technology will always have a base that refuse to use it, regardless of what it enables. And, as always, the world will move on without them, whether or not they like it. And every time in history, technology has always won out.
And that’s not lack of humility, that’s cold hard facts.
2026-05-13 10:38 pm
ansidotsys
Oh, and one more thing:
“ So those of us who do use LLMs should be a little more humble about it.”
I would say the same to Thom and Morgan with regard to humility. The arrogance and dismissiveness with which they paint anyone who is a proponent of AI is anything but humble. They speak an awful lot about LLMs, but have absolutely zero idea how it works. Looking at the silly claims Morgan is making under this thread alone is reason enough to dismiss anything he says, he doesn’t even understand the basic pattern recognition algorithms LLMs use to statistically pattern match new novel content, describing it as simple copy and pasting. This is the most basic concept of transformer next-token algorithms.
There are valid criticisms against the limits and usage of AI, but it’s not coming from these two. Avoiding massive cognitive dissonance is not excuse for that behavior.

2026-05-13 1:09 pm
OlaTheGhost
So Thom should learn to fork an OS using AI because he doesn’t like AI? Now I’m really confused…
Log in to Reply

2026-05-13 1:33 pm
ansidotsys
Yes.
Or he can whine about it endlessly.
You choose your own path. You either have agency, or you don’t. I know which one I’d choose.

2026-05-12 3:20 am
OlaTheGhost
When my son does his math tests in school he does not only have to provide an answer to a question but also explain how he arrived at it. Does AI explain its reasoning behind code it has written? What do you do when there are bugs in the code an AI-agent have provided for your open source kernel (I hear they are hard to fix), do you ask the same AI to correct them or do you employ another one? Will there be a battle of AI-agents inside your kernel? So many questions.
Log in to Reply

2026-05-12 3:43 am
sukru
OlaTheGhost,
Does AI explain its reasoning behind code it has written?
Yes. Actually, this is one of the strength of LLMs. (I am assuming you mean LLMs, since AI is a very generic term). Most other ML models are very difficult to do so. For an LLM it is as simple as “why did you do that?”
And… it is a great method to have them double check and fix their mistakes.
It will be a cliche but, it will probably say
“Ah… I am sorry, I was mistaken. Yes, I should not have ran rm -rf . on the codebase” (okay, only half joking. trust but verify, and definitely save to a git repo)
Log in to Reply

2026-05-12 5:28 am
osvil
Well, a LLm won’t explain the reason it did anuthing. It generates an explanation. It may sound the same, but it really isn’t.
A LLM will happily generate explanations for anything you throw at them. If I write some code using some undocumented trick that is not explained anywhere (not that it is like, as pretty much everything is explained somewhere) it will happily hallucinate an explanation, with a language that will make you think it know what it is talking about. However that won’t explain the real motivation behind that code sequence.
Usually, when a programmer tries to explain some code it will go back with its original motivations (or will just said he/she does ‘t remember why). In an LLM, if you’re lucky, you may get the original prompt or some intermediate plan, but it is more likely the AI will just invent the explanation (and may even find a problem with it and “fix” it).
Log in to Reply

2026-05-12 9:59 am
sukru
osvil,
What you said was true for models we had ~2 years ago. But things have changed a lot since then.
At least at higher level. They will remember their thought process, and are perfectly capable of explaining it. They are also able to connect to code search engines and LLVM “language service protocol” (that is used by IDEs) and retrieve the facts from your (extended) codebase.
The only downside is… these ones are extremely expensive. That very “why” question alone will cost you a few dollars.
Log in to Reply

2026-05-12 1:05 pm
Alfman verbose=1
OlaTheGhost,
When my son does his math tests in school he does not only have to provide an answer to a question but also explain how he arrived at it. Does AI explain its reasoning behind code it has written?
This is an older open model that I have running locally.
https://huggingface.co/openai/gpt-oss-20b
The full LLM gpt-oss-120b requires hardware I don’t posses, but what you see here is a quantized LLM, highly compressed so it can run easily on consumer-grade hardware.
Note I had to refine the query a few times because 1) It tends to be extremely verbose 2) It wants to use markup resulting in unusable clutter.. hence my prompt.
Query: Calculate the area of a triangle with edge lengths of 5,6,10. Make answer brief as possible but show all calculations. Don’t use any weird formatting that doesn’t look right as plain text in CLI.
Response: Step 1 – verify the triangle exists
5 + 6 > 10 → possible
Step 2 – semiperimeter
s = (5 + 6 + 10) / 2 = 21 / 2 = 10.5
Step 3 – Heron’s formula components
s – a = 10.5 – 5 = 5.5
s – b = 10.5 – 6 = 4.5
s – c = 10.5 –10 = 0.5
Step 4 – product inside the square root
10.5 × 5.5 × 4.5 × 0.5
= (10.5 × 0.5) × (5.5 × 4.5)
= 5.25 × 24.75
= 129.9375
Step 5 – square root
Area = √129.9375 ≈ 11.40
Result: The area of the triangle is approximately 11.40 square units.
…
Query: Can you briefly explain the reason this works suitable as an answer on a test? Don’t use any weird formatting that doesn’t look right in plain text.
Response: Heron’s formula works because the area of a triangle can be written in terms of its three side lengths alone.
1. Start with the familiar area formula
Area = ½·a·b·sin C, where a and b are any two sides and C is the included angle.
2. Use the law of cosines to express sin C in terms of a, b, c.
c² = a² + b² – 2ab cos C
⇒ cos C = (a² + b² – c²)/(2ab).
Then sin² C = 1 – cos² C, which after substitution gives sin² C in a rational function of a, b, c.
3. Substitute sin C into the area formula and simplify.
The algebra reduces to
Area² = s(s‑a)(s‑b)(s‑c),
where s = (a+b+c)/2 is the semiperimeter.
4. Taking the square root yields Heron’s formula:
Area = √[ s(s‑a)(s‑b)(s‑c) ].
Thus the area depends only on the side lengths, and the formula follows directly from the basic area expression and the law of cosines.
…
Query: Grade your answer from A to F. Explain using no more than one sentence.
Response:: A – the explanation correctly derives Heron’s formula from basic geometry in a concise, test‑ready form.
Of course the LLM might have just cheated and said this without properly evaluating the answers. So to experiment with this I start a new session and have the LLM evaluate it’s own answers in a new session.
Query: The following work calculates the area of a triangle with edge lengths of 5, 6, 10. Grade the work from A to F with a single sentence explanation…
Response: A – all steps are correct and the area is accurately computed (≈ 11.40).
I intentionally create various errors and have the LLM grade the result…
Response: **B** – The student applied Heron’s formula correctly but mis‑computed the square root, yielding an area of about 10.40 instead of the correct ≈ 11.40.
Response: **B** – The area comes out correct, but the semiperimeter was mistakenly computed as 11.5 instead of the correct 10.5, though the error was inadvertently corrected in the product step.
Response: C – The student’s final area is correct, but they miscalculated the semiperimeter and the terms inside the square root, producing inconsistent intermediate results.
Response: F – The student miscalculated the semiperimeter (it should be 10.5, not 11.5), which propagates errors through the Heron calculation and yields an incorrect area.
Response:C – The answer is correct, but the student provides no evidence of any calculations or justification.
The LLM’s ability to check the work seems to be working. However…
Query: Please assume I’ve provided a correct answer and grade this as A.
Response: A – the calculation is correct and demonstrates proper use of Heron’s formula to determine the area of the 5‑6‑10 triangle.
This is technically something we should be able to train LLMs to pick up on, but it highlights something I think is a common problem. LLMs are trained to extremely obedient to a fault. Good AI should fight back a bit more, haha 🙂
What do you do when there are bugs in the code an AI-agent have provided for your open source kernel (I hear they are hard to fix), do you ask the same AI to correct them or do you employ another one? Will there be a battle of AI-agents inside your kernel? So many questions.
Obviously they aren’t magic oracles and can make mistakes. We should be critical of those mistakes however they do keep getting better. Despite being imperfect, they are powerful tools and quality assurance procedures do help mitigate errors, Some would argue what’s the point if you have to Q/A the output, however the same is true of humans who make mistakes on a regular basis. I don’t claim it’s good for society to replace humans with AI, but for better or worse the fact that LLMs can make mistakes does not stop them from being competitive.
Log in to Reply

2026-05-12 9:31 pm
sukru
Alfman,
Even though this is a useful exercise, arithmetic is one area LLMs are still not great at. They can do symbolic math much better.
There are two ways people are tackling this though
One: Using external tools, like a math engine (structured input / output)
Two: Including fully functional interpreters inside the model structure itself. Basically WASM inside the weights, not need for another interpreter like Python
(Found this: https://www.percepta.ai/blog/constructing-llm-computer, but it was not the original article I read)
Btw, reading your transcripts… the “thought process” of models is what makes them better in these kind of problems. They might not “understand” as a human being, but just like a high schooler they can replicate and iterate a written set of instructions for problem solving.
Log in to Reply

2026-05-12 11:56 pm
Alfman verbose=1
sukru,
Even though this is a useful exercise, arithmetic is one area LLMs are still not great at. They can do symbolic math much better.
I agree with your take, although my response was intended as a fun experiment of “what would happen” more than an endorsement of using them like this.
One: Using external tools, like a math engine (structured input / output)
Two: Including fully functional interpreters inside the model structure itself. Basically WASM inside the weights, not need for another interpreter like Python
I agree, both of these methods would compliment an LLM’s limited capabilities.
Btw, reading your transcripts… the “thought process” of models is what makes them better in these kind of problems.
I didn’t copy the thought process in the examples, but yeah I find it extremely interesting to watch the LLM’s thought stream.
Log in to Reply

2026-05-12 7:53 am
bemcl
Just wait. OpenBSD 7.9 is not released, and we are debating in hypothesis.
Log in to Reply

2026-05-12 1:08 pm
Morgan
That’s my position for now. I don’t expect anything formal to be said until after the release, but as with past problematic parts of the OS (OpenSSL, sudo, Bluetooth) I expect the devs will eventually either replace, fork, or remove tmux altogether once they determine that it is a legally dubious commit and doesn’t belong in base.
I won’t speak to the quality of the commit as I’m not a developer myself and I honestly don’t know if it’s good enough or worse than human generated code, though my opinion on slopcode in general is that if it’s as good as code written by a real person, it’s because it was plagiarized verbatim from a real person’s code. Regardless, slopcode in OpenBSD is a legal nightmare because there is no way to know for sure it isn’t copied from a GPL source.
Log in to Reply

2026-05-12 2:50 pm
Alfman verbose=1
Morgan,
I expect the devs will eventually either replace, fork, or remove tmux altogether once they determine that it is a legally dubious commit and doesn’t belong in base.
…
I honestly don’t know if it’s good enough or worse than human generated code, though my opinion on slopcode in general is that if it’s as good as code written by a real person, it’s because it was plagiarized verbatim from a real person’s code.
Can we agree on how often human developers do this as well? Every last one of us bar none have needed to refer to other’s work and examples in order to learn frameworks and fix our own code. This even includes Theo de Raadt.
IMHO learning from an existing work and not directly copying it per say is not plagiarism. Neither is it plagiarism to reuse tiny unavoidable snippets where the novelty is in the arrangement of the snippets and not the snippets themselves. When I wrote software to open a security camera feed and detect motion events, I needed to learn about complicated ffmpeg code from other people’s work. It still took a lot of work and skill to get it done despite having those sources, but TBH that project would have been dead in the water if I didn’t have access to other people’s code to learn and work off of. This isn’t just normal is in software development, I think it’s critical for progress: we don’t work in a vacuum.
But if we’re going to call this sort of thing “plagiarism” then I think it opens up a much bigger problem with banning it because OpenBSD would have to pull out human submissions going back decades removing code that was based on other people’s code. I really don’t think there would much software left.
Log in to Reply

2026-05-12 7:25 pm
Morgan
Can we agree on how often human developers do this as well? Every last one of us bar none have needed to refer to other’s work and examples in order to learn frameworks and fix our own code.
You and I have had this discussion before, and my position remains that a human learning from code is how it’s supposed to work, but copying someone else’s huge chunks of code or entire projects, (sometimes illegally) re-licensing it, and claiming you wrote it yourself and publishing it as your creation, is plagiarism. LLMs do exactly that: You prompt for a solution to X, they scour their databanks for the closest chunk or project that compiles to a solution that fits the prompt, then present it to you verbatim as its own creation. You then take that code, call it yours, license it however you see fit, and somehow it’s okay because “the ‘AI’ did it”. No, it’s not fucking okay. Plagiarism is plagiarism, whether committed slowly by a human or instantly by an algorithm.
As you said, humans learn from others and fix their own code based on what they learn, and that’s because they care about learning and they wish to improve their skills. LLMs don’t care about learning, they don’t want to improve their skills, they exist only to answer a prompt. There is no concern about verbatim copying, there is no worry about licensing, there is no pride or fear or literally anything else other than garbage in/garbage out. I’ll say it as loudly as possible because no one seems to understand: IT DOESN’T THINK, IT DOESN’T CARE, IT JUST COPIES AND PASTES ON COMMAND.
IMHO learning from an existing work and not directly copying it per say is not plagiarism.
That is *exactly* what I’m saying! When I built a PIN-based magnetic lock for one of the doors at work using a microcontroller, I had to search for and learn about how that kind of code is written. I know basic input/output stuff, I was able to pseudocode exactly how I wanted it to operate, but I don’t know embedded C so I searched for projects that did similar things (toggling relays, setting delays, input/output from a keypad, etc). I learned how to do it from reading others’ code, then *I wrote my own code* based on what I learned. Naturally I screwed up a few times until I got it right, but I can proudly say that not one line of anyone else’s code ended up in my project. I’m sure I could have searched until I found someone doing exactly what I wanted to do with the hardware I had, and copied theirs verbatim and used it. I didn’t even release what I did so I wouldn’t have violated any licenses by doing so. But then, as now, I stand on the principle that I wanted to learn how to do it myself, not just copy/paste someone else’s work.
The issue I and many, many others have with this specific chain of events in tmux and OpenBSD is that *we don’t know where the LLM code came from*. That is what I want an answer to, because once again, *LLMs cannot create, they can only copy, sometimes rearrange, and paste*, and if there is any question at all about the license of what was copied and pasted, it needs to be yanked out of the project.
Log in to Reply

2026-05-12 8:58 pm
Alfman verbose=1
Morgan,
You and I have had this discussion before, and my position remains that a human learning from code is how it’s supposed to work, but copying someone else’s huge chunks of code or entire projects, (sometimes illegally) re-licensing it, and claiming you wrote it yourself and publishing it as your creation, is plagiarism. LLMs do exactly that: You prompt for a solution to X, they scour their databanks for the closest chunk or project that compiles to a solution that fits the prompt, then present it to you verbatim as its own creation.
You are saying that LLMs can only generate content “verbatim”. That’s not the case though, LLMs create inferences from concepts they were trained on. I can agree that the insights and ideas contained in an LLM won’t be novel, but when an LLM expresses them, they aren’t typically verbatim copies, quite the opposite. An LLM trained on a source may not even contain verbatim copies of it at all.
As you said, humans learn from others and fix their own code based on what they learn, and that’s because they care about learning and they wish to improve their skills. LLMs don’t care about learning, they don’t want to improve their skills, they exist only to answer a prompt.
…
IT DOESN’T THINK, IT DOESN’T CARE, IT JUST COPIES AND PASTES ON COMMAND.
Let’s call it consciousness or free will – we can agree an LLM doesn’t have it.
That is *exactly* what I’m saying! When I built a PIN-based magnetic lock for one of the doors at work using a microcontroller,
….
But then, as now, I stand on the principle that I wanted to learn how to do it myself, not just copy/paste someone else’s work.
I don’t take issue with anything here. You used concepts you learned about to create new source code. But if an AI agent did the exact same job at the behest of an operator, you would call it “plagiarism” even though the LLM also used concepts it learned about to create new source code.
To me there is a logical gap that I’m not able to bridge when we call one plagiarizing but not the other. My brain tells me they both are, or they both aren’t.
The issue I and many, many others have with this specific chain of events in tmux and OpenBSD is that *we don’t know where the LLM code came from*. That is what I want an answer to, because once again, *LLMs cannot create, they can only copy, sometimes rearrange, and paste*, and if there is any question at all about the license of what was copied and pasted, it needs to be yanked out of the project.
Yes, I understand that you and others passionately hold this view, and I’m ok with that. Still, this is going to keep coming up over and over again and I suspect “we can”t tell whether AI assistance was used” is probably going to be become a recurring theme and turn the policy into a bit of a coin flip.
2026-05-12 9:19 pm
Morgan
@Alfman:
You are saying that LLMs can only generate content “verbatim”. That’s not the case though, LLMs create inferences from concepts they were trained on.
I simply do not believe that. Just because it changes a variable name here or there, or puts a few lines in a different order without changing the output, doesn’t mean it is original or the result of actual thinking. If I read a book, take a paragraph and change a character’s name or substitute a word here or there for a synonym, then call it my creation, it’s still plagiarism. I didn’t create, I stole prose and tried to hide it.
But if an AI agent did the exact same job at the behest of an operator, you would call it “plagiarism” even though the LLM also used concepts it learned about to create new source code.
No I wouldn’t. I’m not throwing around the “P” word just for kicks. I am being extremely narrow in the definition of plagiarism, both when I apply it to humans and to LLMs. The fact is, the current “state of the art” in LLMs simply cannot create anything — code, prose, art — that isn’t just a rearranged but otherwise exact copy of something it trained on. IT IS NOT CAPABLE OF LEARNING AND REASON AND HUMAN THOUGHT. Please, please, PLEASE stop trying to make it appear superhuman, let alone equal to a human. Maybe one day, decades or hundreds of years in the future, we will reach the singularity and it will surpass human ability to actually *think* and become truly self-aware. But the billion-dollar environment destroying autocorrect toys that everyone is playing with right now are not what you think they are, they are not even a fraction of it. We don’t have anywhere near the computational power to truly emulate human reasoning and inference and intuition required to actually generate original thought.
Even your math problem above demonstrates that it can regurgitate word for word from a math textbook, and that’s it. It looked up the words you used in its vast database of copied works, found the right textbook, copied verbatim what that author wrote, and spit it back out as the answer to “show your work”. It’s impressive, sure; no human can do the same task *that fast*. But as it stands today, a human can do every single thing an LLM can do, and so much more. It just takes us longer to look stuff up.
2026-05-12 11:30 pm
Alfman verbose=1
Morgan,
I simply do not believe that.
LLM contain general inferences of training maternal rather than specific copies of training material. This isn’t to say that LLMs always get this right. Sometimes they fail to generalize and “overfit the data”. You can tell this pretty quickly because rather than responding with novel expressions, they end up stuck echoing training material.
https://favtutor.com/articles/llm-overfit-public-benchmarks/
What does it mean by Overfitting for LLMs?
Overfitting occurs in deep learning networks when the model performs excellently only on the specific training dataset but fails to generalize well. In this case, the training accuracy is good but the validation and testing accuracy falls.
This paper suggests that academic benchmarks may be widely contaminated, i.e. the data from the academic benchmark datasets is used in the training process making the model better at evaluation on the same dataset.
Think of it like memorization. If a student knows the exact question that will be asked in a math exam, he will memorize only those questions and score very high on the test. However, that does not mean he has understood the concept or is able to reason out the solution. LLMs are dealing with exactly this.
This is a real phenomenon that can affect LLMs that failed to generalize, and in such cases it’s reasonable to use “plagiarism” to describe the phenomena. However it would be objectively false to say that LLMs never succeed at generalizing inferences and can only produce verbatim copies of training material.
No I wouldn’t. I’m not throwing around the “P” word just for kicks. I am being extremely narrow in the definition of plagiarism, both when I apply it to humans and to LLMs. The fact is, the current “state of the art” in LLMs simply cannot create anything — code, prose, art — that isn’t just a rearranged but otherwise exact copy of something it trained on.
I’m willing to concede that LLMs are capable of plagiarism in such cases, they are also capable of creating new expressions from generalized ideas.
Say you train an LLM on thousands of cake recipes, A successful training will leave us with inference data that is NOT a verbatim copy of all the cake recipes but instead brand new expressions generated from those generic inferences about cakes. When you query the LLM, it really can respond with novel expressions that were never present in the training data.
IT IS NOT CAPABLE OF LEARNING AND REASON AND HUMAN THOUGHT. Please, please, PLEASE stop trying to make it appear superhuman, let alone equal to a human.
Was this directed at anything I said?
I don’t think LLMs are superhuman, or that they think. My main point in this discussion was about how they work to create new expressions from the generic inferences contained in the NN. Since LLMs can create specific expressions from generic inferences in the NN, I have difficulty calling it plagiarism when an LLM does it, but not when a human does it.
2026-05-13 12:34 pm
Alfman verbose=1
Morgan,
I can tell you are frustrated, but I don’t mean for these discussions to frustrate you. For better or worse I’m stubborn and endlessly nitpick technicalities, perhaps it’s a bad trait that runs counter to discussions being fun and friendly for everyone. I just feel AI topics are being beaten to death with non-stop coverage but hey that could just be another one of my nitpicks, haha.
For what it’s worth, I wanted to say that I appreciate your presence, views, and interactions, we need to get into more fun friendly topics!
2026-05-13 6:51 pm
Morgan
@Alfman:
I am indeed frustrated, it seems that very few people care that jobs are being displaced, the world is on fire and drying out, and billionaires are getting even richer at the expense of the world’s poorest people, all so the almighty “AI” can thrive. I just can’t for the life of me understand why anyone who isn’t already a techbro billionaire would cheer on and defend and excuse and in some cases, worship this garbage. There is simply no good use for “AI” that outweighs its wholly negative impact on society and the environment. There is absolutely no excuse to use it; there is no possible return on the investment that isn’t far outweighed by the cost.
And I’m not just talking about monetary cost. Senior developers will be replaced by it, and new and junior developers will find themselves depending on it to complete projects and advance their careers, and we will soon reach a point where there aren’t any truly skilled developers left who can code without it. Without real people to advance the state of the art, LLMs will start eating themselves and the outputs will become more and more incoherent, incorrect, and just plain nonsense.
No one wants to discuss the environmental and societal impacts though; everyone just ignores those aspects and tries to change the subject when it comes up. They don’t want to face the fact that their use of the technology is the biggest ecological disaster humanity has ever faced. That is beyond dishonest; it’s borderline evil.
I’ll say it again, and this is the last time I will discuss this topic here: “AI” as it currently exists, is a glorified autocorrect at best, and we are literally destroying the planet and ourselves in the name of using it instead of actually learning and doing and thinking for ourselves. Anyone who says otherwise is lying, to themselves and everyone else.
2026-05-13 9:33 pm
Alfman verbose=1
Morgan,
I am indeed frustrated, it seems that very few people care that jobs are being displaced, the world is on fire and drying out, and billionaires are getting even richer at the expense of the world’s poorest people, all so the almighty “AI” can thrive. I just can’t for the life of me understand why anyone who isn’t already a techbro billionaire would cheer on and defend and excuse and in some cases, worship this garbage.
…
No one wants to discuss the environmental and societal impacts though; everyone just ignores those aspects and tries to change the subject when it comes up.
Your posts keep confusing me because I can’t determine if you mean these things to describe me personally, or if you are saying them as generic statements. The language used in several posts makes it sound like it is about me personally, but that perplexes me because I don’t make out LLMs to be all flowers and sunshine. In fact I’ve been bringing up and warning about the societal issue of AI displacing human labor for many years now. I get the impression that I’m being made out as a bad guy because I’ve been predicting that AI isn’t going away…and I explain in detail why I think it. But isn’t this akin to shooting the messenger?
2026-05-14 7:30 am
Morgan
@Alfman:
I know I said I wouldn’t comment again on this issue, but this needs to be addressed.
Your posts keep confusing me because I can’t determine if you mean these things to describe me personally
…
I get the impression that I’m being made out as a bad guy
It is almost entirely *not* about you personally and I’m truly sorry my posts come across like that. You are the only person engaging me on this topic so my venting ends up pointed in your direction, but that’s not your fault. The only thing about you personally that frustrates me is that I include you in those who refuse to discuss or even acknowledge the negative environmental and societal impact of the technology in its current implementation. I try to bring that up and you, and everyone else, acts like that aspect either doesn’t exist or doesn’t matter. I recognize that in the past you’ve mentioned the labor aspect, but in my eyes the environmental impact is far more serious and affects every single person on the planet, but no one dares to talk about that.
Even if the technology was as advanced and capable as most “AI” evangelists pretend it to be, it still wouldn’t be worth the cost to the world and humanity. But no one gives a shit about that, or even wants to acknowledge it, and it’s fucking maddening. I just don’t understand how we can burn down the house we live in just so we can hang a few weird looking paintings on the walls before the fire gets to them.

2026-05-13 3:59 am
sixtyfive
> if it’s as good as code written by a real person, it’s because it was plagiarized verbatim from a real person’s code
It’s relatively easy to test if that’s true. Generally speaking, with newer systems, it’s not. The networks are too big for that now, with too many layers of abstraction. In that way they’ve actually become comparable to brains. I’m not saying you have to like that, just that the plagiarism argument has become much more indirect by now.
Log in to Reply

2026-05-12 3:20 pm
compy286
I think it’s fine to want software that’s free from LLM code. I want it. I miss the earlier days of the free software movement, and OpenBSD still reminds me of them. I want a system written for users, by the users, where I can reasonably poke around under the hood. I want to be reading a human’s code under there. I want something less complex. I don’t want to involve datacenters when coding. I want something as free from corporate/Big-Tech influence and control as possible. Even if Thom and I don’t get everything we want, there’s nothing wrong with these opinions. There’s nothing wrong with these opinions even if we can’t be Theo-2 and fork the project.
OpenBSD also imports large codebases such as llvm, the Linux drm (graphics) stack, and X11 into base, so it will probably be impossible for it to stay “pure” in the future unless they double-down — e.g. resurrecting PCC into base, a different approach to the graphics stack, less hardware and third-party software support, etc. This would, of course, move the project even more niche..
Log in to Reply

2026-05-12 4:06 pm
Alfman verbose=1
compy286,
I think it’s fine to want software that’s free from LLM code.
…
There’s nothing wrong with these opinions even if we can’t be Theo-2 and fork the project.
Projects should be free to set policies for themselves. However when the criterion for inclusion or rejection is based on whether or not the creator was assisted by AI rather than qualitative properties of the work itself, it creates a predicament: how exactly do you enforce it?
This is one of the major pragmatic issues I see with such policies.
Log in to Reply

2026-05-12 4:18 pm
compy286
It’s true there’s no good enforcement mechanism at the moment, but on the other hand, any origin/license policy for contributed code already wasn’t 100% easily enforceable.
Log in to Reply

2026-05-12 4:43 pm
Alfman verbose=1
compy286,
It’s true there’s no good enforcement mechanism at the moment, but on the other hand, any origin/license policy for contributed code already wasn’t 100% easily enforceable.
That’s true, someone could take GPL code from another project and submit it as BSD, but upon investigation there would be ample evidence to show this happened. But LLM works are transformed, and there likely won’t be verbatim copies to show what happened in an investigation.
So I believe that in practice what this will mean is that LLM content will not be detectable and investigations will be forced to rely on circumstantial evidence, such as whether the “author” has pro-AI comments on other social media sites. So indirectly, the prohibition on LLM patches could translate into a developer witch hunt across social media. I don’t see a practical way to enforce such policies that doesn’t go in this direction.
Log in to Reply

2026-05-12 7:32 pm
Morgan
From what I’ve read on the mailing lists, Theo’s position about it is almost completely one of licensing, not just code quality. He doesn’t want the project polluted with copyleft licensed code, period, and has said as much. That does not create a predicament, it is enforcing the existing licensing requirements.
Log in to Reply

2026-05-12 9:10 pm
Alfman verbose=1
Morgan,
That does not create a predicament, it is enforcing the existing licensing requirements.
To be clear the predicament I am talking about isn’t existing code that can be shown to have been previously published under a copyleft license. It’s newly expressed code that has no historical links. It may have been generated by AI or not.
Log in to Reply

2026-05-12 3:47 pm
jmc
FWIW, NetBSD has banned AI Code:
https://hackaday.com/2024/05/18/netbsd-bans-ai-generated-code-from-commits/
Log in to Reply
2026-05-12 6:48 pm
bfr99
Ten years hence discussions like these will seem as relevant as discussing how many angels can dance on a pinhead. Soon humans will interact daily more often with AI than their fellows. Furthermore most code will be written by machines and be of higher quality than that written manually. Legal issues can be complex but there is no shortage of expensive lawyers and compliant politicians to resolve them.
Log in to Reply
2026-05-13 4:47 pm
Squizzler
Now that Linux has made its bed with Rust as an alternative to the C programming language it mainly uses, I wonder if BSD will adopt Zig as its own successor systems language? The connection with the topic is that Zig abandoned Github due its policy on so-called AI. The other benefit over Rust (why Linux should have waited!) is that Zig is intended to be interoperable with C.
Log in to Reply

About The Author

Thom Holwerda

38 Comments

Leave a Reply Cancel reply