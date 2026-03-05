In the world of open source, relicensing is notoriously difficult. It usually requires the unanimous consent of every person who has ever contributed a line of code, a feat nearly impossible for legacy projects. chardet, a Python character encoding detector used by requests and many others, has sat in that tension for years: as a port of Mozilla’s C++ code it was bound to the LGPL, making it a gray area for corporate users and a headache for its most famous consumer.
Recently the maintainers used Claude Code to rewrite the whole codebase and release v7.0.0, relicensing from LGPL to MIT in the process. The original author, a2mark, saw this as a potential GPL violation.↫ Tuan-Anh Tran
Everything about this feels like a license violation, and in general a really shit thing to do. At the same time, though, the actual legal situation, what lawyers and judges care about, is entirely unsettled and incredibly unclear. I’ve been reading a ton of takes on what happened here, and it seems nobody has any conclusive answers, with seemingly valid arguments on both sides.
Intuitively, this feels deeply and wholly wrong. This is the license-washing “AI” seems to be designed for, so that proprietary vendors can take code under copyleft licenses, feed it into their “AI” model, and tell it to regurgitate something that looks just different enough so a new, different license can be applied. Tim takes Jim’s homework. How many individual words does Tim need to change – without adding anything to Jim’s work – before it’s no longer plagiarism?
I would argue that no matter how many synonyms and slight sentence structure changes Tim employs, it’s still a plagiarised work.
However, what it feels like to me is entirely irrelevant when laws are involved, and even those laws are effectively irrelevant when so much money is riding on the answers to questions like these. The companies who desperately want this to be possible and legal are so wealthy, so powerful, and sucked up to the US government so hard, that whatever they say might very well just become law.
“AI” is the single-greatest coordinated attack on open source in history, and the open source world would do well to realise that.
If it becomes trivial to write using a tool, there’s no significant effort to protect from being stolen. The books containing logarithm tables were very valuable human work in the past, but a trivial computer program can write one in milliseconds today.
And surely there’s nothing creative in the code produced by LLMs. If there is any by means of verbatim copying then it’s easy to check.
I’m a fan of knowledge being unshackled, though fear how it will be misused. It’s a loss for proprietary code and restrictive licenses, but not for code, and means of coding, being democratized. Maybe open source wasn’t the final destination, just an imperfect step along the way. Maybe we won’t keep needing to fight over contracts.
For better or worse ideas can still be patented, and patents will still apply.
Thom Holwerda,
I do not agree with this. AI is merely a tool. Of course tools can be used abusively but AI tools themselves are neither pro nor anti-FOSS.
When you read open source licenses including GPL, there is nothing in it that prohibits it’s use for AI. The GPL is not just imp[icity compatible with training AI, but it goes so far as to explicitly reject all prohibitions on how the code can be used downstream. The only requirement is that derivative works also be GPL licensed. So the argument should not be that we cannot train AI on open source, because this violates both the text and spirit of FOSS. IMHO a more solid argument would be that derivative works should themselves be FOSS, including AI works. If one truly believes in the virtues of GPL & FOSS, then this is what supporters should be clamoring for.
The fly in the ointment is that AI works are not eligible for copyright. No copyright => No AI generated open source. All FOSS licenses are prefaced on there being a copyright to enforce.
SlothNinja,
This isn’t obvious to me, is there some legally settled case law that I’m not aware of? I’d like to learn more about it, let me know!
Alfman,
There was a recent one about AI generated art.
But… that is bollocks without setting a threshold or a guideline.
Today *all* digital art is basically AI generated.
The brushes you use in illustrator
The photographs you “take”
Filters, smudges, effects that you use
Use one form or another of AI
Even the “auto correct” today uses modern versions of BERT (first proper LLM)
I may be wrong, but I believe when Thom says “AI” in the context of this discussion, he is speaking of the entire “AI” industry, particularly the big companies pirating and otherwise consuming all content that exists on the Internet.
When you say “AI is merely a tool”, you conveniently leave out that this tool is built almost entirely from stolen content. I know you and I don’t agree on much when it comes to “AI”, but you have to admit that the tool would be nearly useless if it were to be built solely from properly licensed and willingly offered content. The companies behind the “AI” software know this, or they wouldn’t be illegally slurping up every byte of copyrighted content they can.
An “AI” that is built only on public domain fiction, for example, will only regurgitate classical feeling and sounding results. The fact that they are actually trained on almost 100% pirated content means they can produce modern sounding and feeling narrative works for that would-be author who just can’t seem to write well enough to publish their own books without plagiarizing established, living, modern authors. Now they can plagiarize all they want and when someone calls them on it they can say they were “vibe writing” and get away with it.
When it comes to source code licensing, the issues are obvious and no amount of hand-waving can counter the fact that “AI” (i.e. the evil corporations behind the algorithms, and the rampant IP theft used to feed them) is destroying open source licensing right before our eyes.
Morgan,
My point is that when we’re talking about open source code licenses, it already IS properly licensed and willingly offered. There is no copyright issue to train AI because the licenses are expressly permissive of derivative works. I accept that some dislike AI, but for better or worse FOSS licenses like GPL are compatible with AI. On copyright grounds the criticism is not the training of AI, but that derivative works might not be properly licensed, that’s the only copyright issue I see.
I completely understand that some are vehemently against AI even if it follows copyrights. However the fact remains that a lot of FOSS software is being licensed under permissive terms that don’t rule out AI in any legal way. There might need to be new licenses to add AI restrictions.
Fundamentally, the problem is that we’re in the middle of a fight over whether the copyright concept of “derived work” has any meaning in the era of LLMs.
“Field of endeavour” restrictions are forbidden by all three major definitions of this (FSF Four Freedoms, Debian Free Software Guidelines, Open Source Definition) but they shouldn’t be necessary.
Either the LLM is producing derived works (in which case restrictions on AI are unnecessary and we’re seeing massive worldwide copyright infringement) or it’s not, and everything that’s not protected by trademarks or patents is fair game as long as you launder it through an LLM first, because you weren’t delegated the ability to restrict Fair Use in your licenses. (Which would be a very bad situation, since companies like Disney can still sue you for infringement of trademarked characters, and companies like Amazon can still sue you for infringing their software patents, but pretty much anything you do can be copied willy-nilly.)