Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
Programming AI Open Source

Does Generative AI Threaten the Open Source Ecosystem? (zdnet.com) 47

"Snippets of proprietary or copyleft reciprocal code can enter AI-generated outputs, contaminating codebases with material that developers can't realistically audit or license properly."

That's the warning from Sean O'Brien, who founded the Yale Privacy Lab at Yale Law School. ZDNet reports: Open software has always counted on its code being regularly replenished. As part of the process of using it, users modify it to improve it. They add features and help to guarantee usability across generations of technology. At the same time, users improve security and patch holes that might put everyone at risk. But O'Brien says, "When generative AI systems ingest thousands of FOSS projects and regurgitate fragments without any provenance, the cycle of reciprocity collapses. The generated snippet appears originless, stripped of its license, author, and context." This means the developer downstream can't meaningfully comply with reciprocal licensing terms because the output cuts the human link between coder and code. Even if an engineer suspects that a block of AI-generated code originated under an open source license, there's no feasible way to identify the source project. The training data has been abstracted into billions of statistical weights, the legal equivalent of a black hole.

The result is what O'Brien calls "license amnesia." He says, "Code floats free of its social contract and developers can't give back because they don't know where to send their contributions...."

"Once AI training sets subsume the collective work of decades of open collaboration, the global commons idea, substantiated into repos and code all over the world, risks becoming a nonrenewable resource, mined and never replenished," says O'Brien. "The damage isn't limited to legal uncertainty. If FOSS projects can't rely upon the energy and labor of contributors to help them fix and improve their code, let alone patch security issues, fundamentally important components of the software the world relies upon are at risk."

O'Brien says, "The commons was never just about free code. It was about freedom to build together." That freedom, and the critical infrastructure that underlies almost all of modern society, is at risk because attribution, ownership, and reciprocity are blurred when AIs siphon up everything on the Internet and launder it (the analogy of money laundering is apt), so that all that code's provenance is obscured.

Does Generative AI Threaten the Open Source Ecosystem?

Comments Filter:
  • by alvinrod ( 889928 ) on Sunday October 26, 2025 @04:40PM (#65751954)
    I'm not sure it will really matter. With these kinds of tools the only closed source will be that on servers which are secured enough not to leak their code. It's already possible to reverse engineer a binary and LLMs will do a better job at converting that back into high-level programming language code snippets that will be used by other programmers. No one will be able to prove anything and courts will never be able to keep up with it all even if someone were inclined to threaten legal action.

    The future of software is effectively open source whether anyone wants it to be or not.
    • AI is going to replace coding. Maybe not in the next 5 years but certainly within the next 20. Users will either write UML or a flowchart/decision tree and the AI will generate the code, test it, refine it, and publish it out in under an hour. This is no different than people learning code from a book of samples routines or libraries.

      • by allo ( 1728082 ) on Sunday October 26, 2025 @06:48PM (#65752170)

        That's the same promise as every "no code" framework had. What is the problem with it? If you write a specification detailed enough to explain what you really want, it gets very very long. If you create a concise version of it with an efficient notation ... you end up inventing a programming language syntax.

        What AI can do:
        - Create me a Tetris clone
        - Create me a random game

        What AI cannot do by itself:
        - Create the game I have in mind

        The random game approach has the difficulty that randomness and creativity are still a problem in LLM, but I think that's something that will be solved in the next years. If not, you can replace it by adding some half-baked human idea to the input to get the LLM to produce something that does not only depend on a random seed.

        • Re: (Score:2, Interesting)

          by allo ( 1728082 )

          Another word to the creativity: I do not believe into divine inspiration. I think humans have the same options like AIs model, what is a trained brain, some kind of randomness and their input. But humans to have a lot more input. If you suddenly feel inspired, it is a combination of the inputs of the last days being processed by your brain and triggered by a recent input. But we're talking about terabyte of input, while for example LLM get less than a megabyte of input in each evaluation (and usually do not

        • by unrtst ( 777550 )

          What AI cannot do by itself:
          - Create the game I have in mind

          100% this. We've already made things as easy as possible to create things with varying level of specification and work. Being a really good autocomplete, LLM's can help fill in for the gaps in specification input (ex. you don't have to tell it that a password input should be set to the password type of input box), but it also does math waaaaay worse than Excel.

        • by tlhIngan ( 30335 ) <slashdot AT worf DOT net> on Monday October 27, 2025 @04:34AM (#65752774)

          The problem is, do we want to replace coders?

          RAD tools evolved in the 90s where anyone and everyone could create programs. One became especially popular (Visual Basic), and has a reputation not because it was a particularly bad tool, but because it was used and abused to the point where a manager created a tool that evolved into a giant mess that's now a critical business application.

          No one wants to touch it, you must click things in a certain order, and looking at it funny will cause it to crash and corrupt itself and all the backups. And the byzantine logic and spaghetti code means it's impossible to figure out why it's doing it or how to fix it.

          Vibe coded apps are going to be the next big thing in critical business applications and it's going to be yet more fun with poorly coded applications. The only good thing is that vibe coding has just as much chance of working as it has of destroying the program it was trying to create - as part of many revisions the AI decided to wipe out the core logic of the program.

          • by allo ( 1728082 )

            I think you ask two questions.
            1) Do we want to replace coders?
            2) Do we want everyone to be a coder?

            The first one is basically a no. People fear being replaced and some companies try to sell solutions claiming they can replace humans, but they can't and won't.

            The second one is just a question of how much it is our business. If everyone can be a coder, everyone can have their try.
            In LLM spaces you see these vibe coded projects, with a nice start, a horrible LLM generated README and a structure that looks over

        • What AI cannot do by itself:
          - Create the game I have in mind

          It’s the same with the picture or animation you have in mind. As a creative person you see it as a limitation or even as a showstopper for the technology that it cannot do the work exactly as you imagine, but for the manager/ceo/customer of the creative service, it’s not that, they either don’t have the exact image/game/etc in mind or they won’t get it exactly as they imagined anyway. They come with their description and get what the artists or programmers give them, then they maybe

          • by allo ( 1728082 )

            I mostly use AI tool iteratively. I don't believe much in the people creating the "perfect" prompt. You start with a few prompts to get a rough base you imagined and then work iteratively on small parts of the image to adapt them to your vision. For code the same. For text ... I must say I have yet to see a LLM that doesn't suck for prose, and I am not enough of an author to really fix it up. For technical writing it can help with some cleanup afterward, but for general texts it has way too many repetitive

      • > Open software has always counted on its code being regularly replenished. As part of the process of using it, users
        > modify it to improve it. They add features and help to guarantee usability across generations of technology.

        This still remains to be seen for the wider open source ecosystem once the first generation of open source developers has stopped writing code for 20 years (say about 2055)..

        An interesting study would be to track Python packages and their downloads per year for the thousands of

      • I have to agree I use to use code project to find things i need to due them adapt them to my needs. Programing as much as the business people hate has always been a collaborative effort. And also a cutthroat. Bill Gates said it best don't cover your eyes plagiarizer. The only problem i see is it mean the end to (cowboy, super, 10x what ever) elite software engineers.

  • Folk Music (Score:5, Interesting)

    by RossCWilliams ( 5513152 ) on Sunday October 26, 2025 @05:01PM (#65752004)

    How is this any different than what has happened to folk music. It has been mined and used over and over again to create new music with no real acknowledgement of its province. There has been no significant additions or modifications to folk music in the modern era.

    The owners of AI are going to try to claim ownership of its output even though that output relies on mining the commons. They will have effectively claimed a monopoly on of humanity's common ownership. Its either the end of intellectual property or the end of new human creativity.

    • by 0123456 ( 636235 )

      One big difference is that since this is just algorithmic modification of a database of code you could track every piece of generated code back to the code it's based on. A single line of generated code may be based on dozens of chunks of original human-written code all with different licenses.

      However, if the AI developers talked to a lawyer first they probably got permission from all of those developers to use their code and it doesn't matter. Of course many of those developers may not care.

      • Which makes me wonder, why can't we have AI models trained solely on code with various open-source licenses, and have the option to query the model/s that match/es the particular licenses which are compatible with your open-source project? Seems like that would be an *advantage* to using generative AI, which open source could leverage over commercial code.
  • I don't think so (Score:5, Interesting)

    by vadim_t ( 324782 ) on Sunday October 26, 2025 @05:04PM (#65752006) Homepage

    I don't think it does, at least not currently.

    AI currently doesn't generate whole big projects, just smaller snippets of code. You can't just go "Make me a non-GPL VLC" in VSCode. You can have AI write smaller things, like "Create a skeleton for a Wayland program", but in such usages it's not all that different from copying stuff from Stack Overflow and random snippets from Google.

    I'd say in general anything where one would worry about licensing is too large for AI yet.

    If we do get to the point where we can just have a LLM spit out a full video decoding library that actually works, then it's fair to say that we're living in the future and any concern about licensing is probably obsolete. If AI gets to that point it's probably now able to do projects of almost unlimited size and the world is being turned upside down.

    • by allo ( 1728082 )

      Did someone of you *ever* saw a program acknowledging the cc-license of code from StackOverflow? I have never seen it. And I think it doesn't matter because most snippets are too short to be protected. Yeah, you didn't had the idea (otherwise you would not have needed SO), but the final snippet is usually either trivial or you have to heavily adapt it for your purpose.

      And you can be sure you don't want to see how often commercial code uses MIT/BSDL code (in principle legally) but forgets to acknowledge the

  • by drinkypoo ( 153816 ) <drink@hyperlogos.org> on Sunday October 26, 2025 @05:09PM (#65752016) Homepage Journal

    I know I do, but I mean more specifically, do enough people seek out OSS to keep it around? I go looking for OSS solutions both to save money of course but also to be able to have the code so that I can update it if it breaks later. I have successfully done this several times despite not being much of a programmer, getting hints by googling compiler errors.

    • Same here, I had this older Ten-Tec HF receiver that was computer controlled by a serial port, the software was made with older CRT monitors in mind and with newer LED screens with higher resolutions you could not read the fonts because they were so small and blurry, so I got the source code and found the entries for the fonts and changed them to TTF and after a couple of adjustments the software was usable again
    • I go looking for OSS solutions both to save money of course but also to be able to have the code so that I can update it if it breaks later.

      Reason 3: I know it will be around in perpetuity. There's never going to be a rug-pull leaving me with a bunch of stuff I can't access.

    • Maybe it's not logical, but I look for OSS when I have a choice because I assume that someone else has already looked at the code. The fact that the code is public seems to imply that it is less likely to do secretive and nefarious things.
  • This is idiotic. (Score:2, Interesting)

    by Lendrick ( 314723 )

    AI doesn't "siphon up and launder" code any more than your brain does. Both *can* memorize (which is why sometimes comedians can accidentally steal a joke), but both are also learning from patterns.

    AI isn't going to make people stop contributing to open source.

    • by unrtst ( 777550 )

      I like your rant, so to continue... ... People aren't going to stop contributing and fixing bugs because of this. If anything, LLM's will greatly assist in open source bugfixing because it can be used to help identify the bugs, and it can then help to apply an open source solution to each codebase, without the user necessarily knowing the codebase well (IE: leaning on the LLM for that).

      The same premise the author is leaning on will also lead to more open source use and contributions.

      That said, I do question

  • by SubmergedInTech ( 7710960 ) on Sunday October 26, 2025 @05:14PM (#65752022)

    I've implemented linked list traversal f*ck-all-knows how many times over the last 40 years, in a dozen languages. I'm sure similar or identical code exists in hundreds of open source repositories. And millions of CS homework assignments over the decades.

    If you compare my code with enough projects, I'm sure you'll find matches. Not because I copied them or Stack Overflow (I was coding long before that was a thing). But because there are really only a few sane ways to implement most algorithms. Which is also why most software patents are stupid, but that's a different can of worms...

  • Whoever trained the LLM is the one that knowingly stole the IP. If you end up in court over something then be sure to bill OpenAI for the experience.

  • Complete fallacy (Score:4, Insightful)

    by topham ( 32406 ) on Sunday October 26, 2025 @05:27PM (#65752036) Homepage

    First, the assumption that a snippet of code if actually copyrightable is generally untrue.

    Secondly, open source exists to help people learn to code; wholesale copying is frowned upon, but snippets or concepts aren't.

    Third, unless it negatively impact the project itself, anything short of wholesale copying is likely to be ignored - when developers do that today -.

    AI doesn't change this equation very much.

    • Re:Complete fallacy (Score:4, Interesting)

      by Todd Knarr ( 15451 ) on Sunday October 26, 2025 @11:44PM (#65752570) Homepage

      Thanks to Google LLC v. Oracle America, Inc., 593 U.S. 1 (2021), even relatively small pieces of code (such as function declarations in header files) must be considered copyrightable. It's possible they aren't, but the appeals in that case resulted in rulings that they were copyrightable, and the SC decision in favor of Google turned on fair use, not whether the code in question was copyrighted or not, so it can't really be used to stand for the proposition that the appeals courts got it wrong.

      With AI-generated snippets, it's going to turn on whether the snippet is close enough to identical to the original code to be considered a copy and whether that copying could constitute fair use. I think any lawyer would tell you that's not the kind of thing you want to bet on in court. If the code's simple enough that it clearly wouldn't be a copyright violation even if it were nigh-identical, it's simple enough you're better off not using AI and having your engineers write the code themselves, and if it's significant enough that that's not feasible then it's almost certainly copyrightable and the fair-use argument is going to be an uphill battle for something that significant. Either way, you're better off avoiding anything where you don't know the provenance of every line of your code.

  • ... by hiding the source code, hides both its dependency on open source and on other less reputable sources.

  • by MindPrison ( 864299 ) on Sunday October 26, 2025 @05:57PM (#65752082) Journal

    This is basically why everyone is panicking about AI "stealing" everything.

    Well yes, it steals, like your words, not "word for word", like in "not image for image", as in "not code snippet for code snippet", it just don't work that way.
    The way I have understood it is that it's more stochastic in nature, meaning it will pick a meaning out of an image, a curve, a circle, a ball, same with code as in language translation and interpretation.

    You could in fact compare it to an analytical translator that is capable of taking a set of words, code (like the meaning of commands and how they are used) and put it together in a way that is predictable.

    So no, AI don't "steal" in the traditional way of just cutting and pasting code, words or images. It can however interpret meanings according to a rule set and thus predict the desirable outcome.

    The short story is: It's kind of how we learned through history too, we learn to speak, we learn to talk, we learn to sing, we learn to code, we learn to create. We don't steal it like we mean it, but we come up with inspired works based on what exist and what we have learned. You see a couple on the beach and you paint it, but someone else can paint it too, but it will be based on what they interpret, not the fact you both painted the same "idea", the idea of a house on the beach is not copyrighted - the design however is. The idea of making a piece of software that drives you to the store isn't copyrighted, the exact algorithm how it does that - is however copyrighted.

    Ai doesn't steal these, but it can learn what you wish for, and stitch something together into a new framework that resemble what you wish, not what exist.

    • by gweihir ( 88907 )

      So no, AI don't "steal" in the traditional way of just cutting and pasting code, words or images.

      Actually, it can do that too, just not reliably.

  • by gweihir ( 88907 )

    But idiots that think they can barge in with no skill, just because AI "helps" them may well do so.

  • Is because a lot of those projects aren't from obvious they are from people doing school projects or something to build a portfolio for applying for a job.

    Generative AI is going to rip through the programming market, it already is devouring Junior programming jobs.

    So all those people who wrote lots of useful code because they were in college for computer science or doing portfolio work for future jobs are going to go away.

    That means that generative AI eventually won't have anything to train on.
  • Not because of steeling or use IMHO. It will be mostly because of the sheer amount of low quality open source or freeware products out there with no maintenance and people's increasing ability to generate bespoke fit for "purpose" code. I think large projects have the potential to benefit with the right gate keeping and processes.
  • What O'Brien says is bullshit. He seems to fundamentally misunderstand how free software works. If I use a program and I find a bug in it, I'm gonna send in a patch to fix it. I don't care how many times LLMs have scraped the code. Just because a LLM scanned the code, that doesn't mean I can't or won't read it.

    "The developers downstream", who incorporate other people's code into their own, never had any intention of contributing. This was true years before the emergence of LLMs (think StackOverflow).

    • > What O'Brien says is bullshit.
      Maybe I'm cynical but every time I see AI in the topic, all I see is an attempt to catch crumbs of attention attracted by AI marketing budget but dropped by the AI companies.

      In that context, it doesn't matter if you're making barking sounds as long as you can get enough attention - just check YouTube for kids to see what I mean OR pretty much anywhere involving marketing.

  • > the legal equivalent of a black hole
    How can we get more legal black holes? Surely a good thing for society?

  • I don't think genAI is a threat to the open-source ecosystem as far as it's copying of FOSS code goes. The people looking for that kind of code wouldn't be looking for the source code for FOSS projects anyway. The threat, if any, will be from genAI code being contributed back to FOSS projects. Aside from provenance issues, it tends to be low-quality and buggy and will just increase the workload for FOSS maintainers without offering anything useful. Witness genAI offering a suggestion to a bugfix submission:

  • The GPL requires software systems that 1) make use of GPL licensed code and 2) distribute such code as part of a larger commingled software project, to 3) publish their own full source code whenever a distributee asks for it.

    AI training on GPL code == commingling through weights and model evaluation architecture

    AI outputting code to a user == distributing code which is part of the AI project's internal architecture (weights + model architecture)

    AI outputting actual GPL snippets == AI is making essent

  • At the time when IBM started to embrace Open Source Software, they had a major issue with the Copyleft principle. The fear was, that any GPL Code that gets into their proprietary code would make their private code base a derived work and thus object to the GPL license.

    The answer of IBM at the time was to separate between developers that work with Open Source Software from those who work on proprietary code, to prevent Open Source getting into the proprietary Code Base.

    Now, with AI having widespread acc

  • Code scanning tools like Black Duck exist to avoid licence compliance risk. AI generated code that is the same as or too close to publicly available code will be rejected and cannot be committed.

The road to hell is paved with NAND gates. -- J. Gooding

Working...