Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Programming AI Microsoft Open Source The Courts

Microsoft's GitHub Copilot Sued Over 'Software Piracy on an Unprecedented Scale' (itpro.co.uk) 97

"Microsoft's GitHub Copilot is being sued in a class action lawsuit that claims the AI product is committing software piracy on an unprecedented scale," reports IT Pro.

Programmer/designer Matthew Butterick filed the case Thursday in San Francisco, saying it was on behalf of millions of GitHub users potentially affected by the $10-a-month Copilot service: The lawsuit seeks to challenge the legality of GitHub Copilot, as well as OpenAI Codex which powers the AI tool, and has been filed against GitHub, its owner Microsoft, and OpenAI.... "By training their AI systems on public GitHub repositories (though based on their public statements, possibly much more), we contend that the defendants have violated the legal rights of a vast number of creators who posted code or other work under certain open-source licences on GitHub," said Butterick.

These licences include a set of 11 popular open source licences that all require attribution of the author's name and copyright. This includes the MIT licence, the GNU General Public Licence, and the Apache licence. The case claimed that Copilot violates and removes these licences offered by thousands, possibly millions, of software developers, and is therefore committing software piracy on an unprecedented scale.

Copilot, which is entirely run on Microsoft Azure, often simply reproduces code that can be traced back to open-source repositories or licensees, according to the lawsuit. The code never contains attributions to the underlying authors, which is in violation of the licences. "It is not fair, permitted, or justified. On the contrary, Copilot's goal is to replace a huge swath of open source by taking it and keeping it inside a GitHub-controlled paywall...." Moreover, the case stated that the defendants have also violated GitHub's own terms of service and privacy policies, the DMCA code 1202 which forbids the removal of copyright-management information, and the California Consumer Privacy Act.

The lawsuit also accuses GitHub of monetizing code from open source programmers, "despite GitHub's pledge never to do so."

And Butterick argued to IT Pro that "AI systems are not exempt from the law... If companies like Microsoft, GitHub, and OpenAI choose to disregard the law, they should not expect that we the public will sit still." Butterick believes AI can only elevate humanity if it's "fair and ethical for everyone. If it's not... it will just become another way for the privileged few to profit from the work of the many."

Reached for comment, GitHub pointed IT Pro to their announcement Monday that next year, suggested code fragments will come with the ability to identify when it matches other publicly-available code — or code that it's similar to.

The article adds that this lawsuit "comes at a time when Microsoft is looking at developing Copilot technology for use in similar programmes for other job categories, like office work, cyber security, or video game design, according to a Bloomberg report."
This discussion has been archived. No new comments can be posted.

Microsoft's GitHub Copilot Sued Over 'Software Piracy on an Unprecedented Scale'

Comments Filter:
  • by ffkom ( 3519199 ) on Saturday November 05, 2022 @09:42AM (#63026469)
    ... then let it generate YetAnotherOffice software, then let's see whether Microsoft is still fine with allowing the transition of code pieces through an artificial neural network to remove all copyrights on it.
    • by znrt ( 2424692 )

      libreoffice already exists and it isn't any competition for them. i don't think ms would give a damn unless there were a clearly juicy lawsuit to be won. the money they make from office doesn't come from their actual code as much as from their marketing, bundling and platforming.

      as a developer myself i wouldn't touch copilot with a meter long pole, but claiming code assist snippets constitute a reasonable case of copyright infringement is just plain disingenuous, trying to exploit one of those gray areas wh

      • libreoffice already exists and it isn't any competition for them.

        Microsoft would behave differently if it weren't lurking there. And MS office is gradually becoming less relevant as more and more processes are moved to some kind of database store (often web-interfaced to make it cross-platform) instead of being implemented by moving around doc and xls files. Everyone has figured out that is unsustainable.

    • by pegr ( 46683 )

      Let me remind you that copyright only applies to creative expressions. Purely functional expressions are not copyrightable.

  • haha no (Score:5, Insightful)

    by drinkypoo ( 153816 ) <drink@hyperlogos.org> on Saturday November 05, 2022 @09:45AM (#63026475) Homepage Journal

    Reached for comment, GitHub pointed IT Pro to their announcement Monday that next year, suggested code fragments will come with the ability to identify when it matches other publicly-available code [github.blog] â" or code that it's similar to.

    Oh yeah? Let's see what they actually said about that:

    By helping developers understand the community context of their code in a manner that also preserves developer flow, we believe Copilot will continue to deliver responsible innovation and true happiness at the keyboard.

    That's not how anything works. Github is delivering code to the user without licensing compliance. Every single time they hand you some code without considering the license the code came from, they are violating the terms of the license. The source can only legally be distributed with the license attached. It doesn't matter if that is in a tarball, or in your IDE.

    • But how many time can someone use the argument "my email appears more times in the mailing lists...so their code must be an infringement..." before someone says enough?

    • Stick with me for a sec, because it'll sound like I'm defending Copilot, but I'm really not.

      Github is delivering code to the user without licensing compliance. Every single time they hand you some code without considering the license the code came from, they are violating the terms of the license.

      You're assuming Apache, GNU, etc. are the only licenses in play. What GitHub has argued elsewhere is that their Terms of Service explicitly grant them a separate license to every single line of code uploaded to their service. And the general consensus among legal circles seems to be that they're in the clear in that regard. You can find a few references to such things in their TOS: https://docs.github.com/en/sit... [github.com]

      He

  • by zendarva ( 8340223 ) on Saturday November 05, 2022 @09:50AM (#63026479)
    Is anyone who learns to program by reading other peoples code in violation of copyright? (No...) If they later type a line that's identical to a line they first read in a licensed piece of software, is that a license violation? (No, probably not unless it's a really unique line of code) Is a tool that does that same thing somehow more in violation of copyright than a person?
    • There are fundamental differences between how a human being assimilates and comprehends a new concept and some software that makes a highly connected data set.

      • Can you please prove this? I suspect there's a Nobel Prize in it for you, if you can show how humans learn.
        • You're the one asserting that an autoregressive language model [wikipedia.org] is equivalent to how humans learn. I'm suggesting that without proof that they are the same that they must be different. I can prove this by counterexample. GPT-3 can't learn to tie a knot. It might be able to learn to solve a crossword puzzle, it probably can't describe the weather from a photograph.

          • I'm actually not asserting that, thanks. :) I'm saying that the process is externally the same, the result is externally the same, and no one has any clue how similar the internal processes are. We think human ones are different and special, because we can't check the source code.
            • I'm saying that the process is externally the same,

              It's not. A student doesn't read a billion lines of code from thousands of different projects. At least part of the time a student spends on theory and abstract concepts. For example when you learn to program you are presented with a simplified model of the grammar and some rules.

              When you just feed one of these specialized systems a bunch of data, it is intentionally given only minimal rules up front so it can determined from the data what model might fit best.

              the result is externally the same,

              I suspect the results aren't on the system. Do

              • s/on the system/the same/

                (I reworded my response a few times and didn't properly finish before submitting, sorry)

              • Do... do you know what "externally" means? We think we have a clue. We can't prove we do.
                • Feel free to conduct a scientific experiment where you teach students the same way you train an AI. Good luck. We have centuries of philosophy on effective ways of teaching that doesn't include showing terabytes of example data to students.

                  We can provide testable hypothesis about learning without any understanding of the internal mechanisms. And people done so long before you and I were born. We don't have to blaze an new trails to assert that two processes are different.

                  I do understand the desire to not to

                  • People can learn to code from reading code. Are you saying otherwise? I can tell you understand the desire not to concede an argument. You're desperate not to.
    • by ffkom ( 3519199 )
      The better analogy here would be an automated translator, first transforming a piece of C Code into Java, then translating it back from Java into C. Do you consider all copyrights gone via this process?
      • Did I say i considered all copyrights gone during hte previous process? I explicitly said the opposite. Did you not read the parentheticals?
      • by pegr ( 46683 )

        Man, I’m going to get tired typing this.

        Purely functional expressions are not copyrightable.

    • I haven't read the lawsuit but I'd guess there's a lawsuit because someone made a system for scraping code from github, then changing a few things to obscure its origin, then passing it off as their own. If someone did that in college, they'd probably be accused by the professor of plagiarism.
      I gave a buddy of mine a paper back in college to pass off as his own. He was on academic probation, slacked off on a class he needed to pass. I told him just that once as I wasn't going to supply him with all my old p

      • I mean, this is what all learning is. Take input, mutate it via previous input, and generate output. When i was first learning to program, my code was often very, very, very derivative of previous code i'd seen. Now it tends to be more stylistically mine. This is why i compare it to a regular programmer learning.
    • by Misagon ( 1135 )

      If you learn to copy a piece of code verbatim and type that out, then that is still a copy of the code. It does not matter if it passed your brain first.

      There's a big difference between humans learning and "AI" (neural networks) doing "deep learning" (=buzzword) here. When humans read code, they interpret the code and its comments and create an understanding of its workings and the intent behind the code. So when a human typically types some piece of code they already knew, then they type it from that under

      • The last bit is what humans do too. Input mutated over input produces output. If you've read copyrighted code, it's in your input, to be mutated into your output. The only difference is that you can't look at the code for human thought, so you assume it's special somehow.
      • Neural networks don't have understanding.

        We have no idea if that's true. We don't really know what "understanding" means. We have very little idea what's going on in your brain when you appear to "understand" something. We know how neurons work, but how do billions of them work together to represent and process information? Very little idea.

        And we also have very little idea how these huge language models work. We tune a hundred billion parameters to minimize a loss function, and somehow behavior emerges that seems to reflect knowledge and und

        • by jvkjvk ( 102057 )

          >>Neural networks don't have understanding.
          >We have no idea if that's true.

          Yes, we do. Because ordinary feed-forward neural networks are NOT Turing complete. They can be replaced with a *lookup table*. Get that, a LOOKUP TABLE.

          If that counts as "understanding" to you then I don't know what to say, you are beyond my help.

          • You're making two false/unjustified assumptions. First, that "understanding" requires Turing completeness. Second, that language models are feed forward networks. In fact they're applied iteratively, exactly like a Turing machine performing one read/write operation at a time.

        • By that logic all rocks have 'understanding' plus an IQ of 160. Until you can conclusively prove they don't.

          No AI has any 'understanding' at present or in the near future. We will reach there when we have 1st understood "completely" how animal & human brains & minds work and then replicate in software.
          Every beginner thinks they have figured out some skill within the first few months before realizing it needs 10 years.
          Understanding is if your autonomous driving software can learn to play the violin o

          • By that logic all rocks have 'understanding' plus an IQ of 160. Until you can conclusively prove they don't.

            Actually, that's the logic you just used. I said the opposite.

            Me: "We don't know what understanding means. We don't understand how the brain works or how neural networks work. Until we do, we have no idea whether neural networks understand or not."

            You: "I'm completely certain neural networks don't have understanding. I know exactly what understanding means, and I *know* the brain's mechanism is the only possible way to produce it. Until computers work *exactly* the same way the brain does, they can't h

    • Re: (Score:3, Insightful)

      by HiThere ( 15173 )

      Your comments are correct, but I think the lawsuit is probably also correct. This illustrates the insanity of current copyright law. Well, one aspect of it.

    • by godrik ( 1287354 )

      The problem is that it is not a defense that MS can easily make. Saying "so what, that's what everybody does" is not a legal defense. It is essentially an admission of guilt.

      • Well, you don't frame it that way. You say "That's how learning works", or something similar. I'm not a lawyer, and i'm not gonna pretend to be, heh.
    • GitHub Copilot does more than that, given similar prompts, it will generate full-on copied code. I work with rather obscure file types and obviously you end up doing similar things for different purposes, and I needed it to iterate over the metadata of a file, so I commented I needed to loop over the thing and it generated my own loop written a long time ago for another project, pretty much verbatim but with the variable names I was using in this project. I was simply shocked that it was capable of so quick

    • by pegr ( 46683 )

      Purely functional expressions are not copyrightable.

  • by BrendaEM ( 871664 ) on Saturday November 05, 2022 @10:10AM (#63026499) Homepage
    Did not MS steal disk compression from Stack, video compression from Apple complete with banner, networking from Novell and Lantastic, and Office from Amipro?
  • I am glad to see this, anything to rein in MS and I hope they succeed. Curious if FSF and/or GNU will join that suite since they first sounded the alarms about co-pilot.

    .

    I was toying with moving away from github, but the dumb things I have there is of no commercial value to anyone. So I decided to wait to see if the Lawyers got involved. So heading to get popcorn and see how this plays out.

    • by dskoll ( 99328 ) on Saturday November 05, 2022 @03:15PM (#63027071) Homepage

      Somewhat OT, but I recently did move my projects off GitHub because of CoPilot. If you're looking for GitHub alternatives, I recommend:

      • 1. salsa.debian.org [debian.org], a hosted GitLab instance. They'll likely give you an account if your software is open-source.
      • 2. codeberg.org [codeberg.org] a hosted Gitea instance for open-source software.
      • 3. Self-hosted Gitea [gitea.io]

      I use all three of the above options, so every push pushes my code to Salsa, Codeberg and my self-hosted gitea instance. I did look at setting up a self-hosted GitLab instance, but Gitea is much easier to set up and way lighter.

      Note that if your code is open-source, you can't stop someone else from mirroring a git repo onto GitHub, but if more developers switch away from GitHub, its network effect becomes less attractive. Free yourself from GitHub today!

  • The perfect reaction Microsoft can do to even the threat of this lawsuit is to release the model as open source. That would be letting the cat out of the bag. Is it worth holding a $10/user/mo service when the lawsuit can be immense? Or just throw it out there, then sell professional services against it? (Maybe they could sell fine-tuning the model after everyone forgets about this lawsuit.)

    • by godrik ( 1287354 )

      Actually that probably does not clear anything.
      Many open source license are not compatible with one another. So unless they only generated a model out codes from a single license, or from a set of compatible license, they are probably still in violation of the licenses, and of copyright laws.

      It also does not solve the issue that the coed produced by copilot would need to be licensed with a compatible license. Which would make the tool unusable for many of the users they are trying to sell it to.

      • by pegr ( 46683 )

        Purely functional expressions are not copyrightable. Only creative expressions are copyrightable.

    • by nasch ( 598556 )

      The perfect reaction Microsoft can do to even the threat of this lawsuit is to release the model as open source.

      That would have no effect on past infringement. It is possible the plaintiff would accept it and drop the suit, but it would not be grounds for dismissal. You can't get out of a suit for damages for past actions by not doing it any more in the future.

  • Amusing that the only form of copyright this community cares about is software. Every other form is creator greed. But here is something to help amateur programmers and all the neck beards emerge from hibernation with their creative concerns.
    • I think most /.ers don't object so much to the idea, as they do to the fact that MS is *selling* this. I have public repos on GitHub containing code that I am happy to share with my students. I would be pissed if one of my students tried to monetize this by selling it to others. It has a CC license for a reason.

      Also,I don't believe Copilot is actually useful to most programmers. I know some students use it. Have they learned what they were supposed to. No, because they did not solve the exercises themselv

      • Copilot might not be useful, but this theft will be hardened and sold as drag and drop blocks for major corps. The stolen code will be used to replace developers and make Microsoft money.
        • And those major corps will not succeed, not if their business depends on the quality of the software they write anyway.

          Companies that didn't fire developers and replace them with "AI" will take their place.

        • Or⦠it might just save somebody a ton of time allowing them to focus on higher order bits and build something great. Automation comes for every profession on some level. You did not see cooks protesting when the electric mixer was invented
      • by ET3D ( 1169851 )

        I think that it's quite a reasonable service for experienced developers, who can judge the quality of the code . It could save coding time of simple functions.

        I agree that in general going over several Stack Overflow answers and choosing what code to copy is a better way to go if you're not totally familiar with the subject. Copilot would probably be a better service if it offered several options and the explanations that came with them. Though that could plunge it even deeper into the copyright violation h

      • So if your student invented the next great service but did so using your 4 lines of optimized string compare code in the process you would be pissed? U kinda proved my point. Automation is great when we design it and impacts others but as soon as it touches your life it is evil incarnate
    • by nasch ( 598556 )

      I think the concern is over open source licenses, which have copyright as the enforcement mechanism. If this were about a company selling an AI service that scanned creative commons licensed books and output them into new stories, I think you'd see the same concern.

    • I agree: lots of people complaining here who probably had no problem with AI-powered reuse when they were playing with Stable Diffusion.

      Furthermore, it's not clear to me that any copyright violation has occurred. U.S. copyright law allows for transformative use, and the transformation from raw code to an ML model seems quite valid. The potential for the model to reproduce existing code (to the standards that constitute infringement) must be proven for the lawsuit to have any merit.

      • by godrik ( 1287354 )

        That argument is somewhat weakened by the fact that copilot can spit out entire files with code and comments as an output.

    • by godrik ( 1287354 )

      I don't think we can summarize slashdot as a single archetype. But we may be able to have a few types that summarize the typical slashdotter.

      In general, I think we would be better off without copyright. Both in movies and music, no copyright, and also in codes. But since we are going to have copyright, then it should be enforced across the board.

      And that would be great if, software would have to be provided in source form.

  • by quonset ( 4839537 ) on Saturday November 05, 2022 @11:25AM (#63026603)

    It's not as if Microsoft would have paid for the software.

    "If companies like Microsoft, GitHub, and OpenAI choose to disregard the law, they should not expect that we the public will sit still."

    As opposed to you regarding the law and paying for all the music and movies you have, right?

    It's always amusing to watch the mental leaps people will go through to justify it's okay for them to take someone else's work without compensation, but it's a travesty when it's done to them.

    • by twms2h ( 473383 )

      You know what the word Whataboutism means, do you?

      • Sure do, and in this case it's a perfect example of hypocrisy.

        Nothing is being lost in this case. The original code is still there for anyone to use. Sound familiar?

        • The big, big, big difference is that somebody is charging money for access to other people's code that is protected by open source licenses.

          Does that clear it up for you?

          If I had money to waste on wagers, I'd bet on this lawsuit succeeding.

          • by nasch ( 598556 )

            The big, big, big difference is that somebody is charging money for access to other people's code that is protected by open source licenses.

            AFAIK there's no issue with that. If you can convince someone to pay you for a copy of open source software, you can sell it, as long as you comply with the terms of the license. The allegation is that Microsoft is not complying with the license.

            • by tlhIngan ( 30335 )

              AFAIK there's no issue with that. If you can convince someone to pay you for a copy of open source software, you can sell it, as long as you comply with the terms of the license. The allegation is that Microsoft is not complying with the license.

              That's the problem, Microsoft isn't passing the license information onwards. It is such a potential minefield.

              A simple one is license incompatibility - after all, BSD is incompatible with GPL, GPLv2 is incompatible with GPLv3 (note, GPLv2+ is compatible with GPLv3 b

          • But the question in front of the court will be, does this person bringing the lawsuit have standing, if so, it needs to prove the class exists and is somehow damaged by the inclusion of its source in other projects. The majority of open source projects does not care about licensing violations, the majority of code is not âprofitableâ(TM) in any way, shape or form.

        • Sure do, and in this case it's a perfect example of hypocrisy.

          If you assume one exists without proving it, lump groups of people together ignoring nuance in saying similar things, and act like a group of people act like some siltary unit.
          Which IMO only shows you don't know how a contradiction works.

    • And what is the source where you've seen Matthew Butterick advocate disregarding copyright law with regard to music and movies? I actually took a minute to google him and, on the first page of hits at least, found nothing to support your assertion.

    • Hey - a fun-house mirror moment! I had the same thought for the exact opposite reason.

      There are some folks who have made headway with crowd-funding or band-camp like services, and it looks like that model can sometimes work out.

      But the corporate lobbyists who hijacked democracy while weeping crocodile tears "on behalf of the poor creators" are just large scale predators fighting to keep the small scale competition from taking a nibble. .. or are you predicting that Microsoft is about to declare mea culpa a

    • The article did point out that according to Bloomberg's report, Copilot technologies were planned to be used in video game design starting in 2023, and this is a concern. After all, the decision of the court can have a bad effect on the industry of creating games. Last month I wrote a video games essay https://paperap.com/free-papers/video-game/ [paperap.com] as part of my student project in which I wanted to review the technologies that will soon be used in game creation and how this will improve the industry. Litigatio
  • My AI device is a xerox machine.

  • by S_Stout ( 2725099 ) on Saturday November 05, 2022 @12:27PM (#63026707)
    Microsoft will pay a fine and that will be it. This is not good enough. The highest level person who authorized this project needs to go to jail. If it is the CEO, believe it or not, jail.
    • by nasch ( 598556 )

      Does this meet the standard for criminal copyright infringement?

    • That's an interesting call considering precisely no one at this point is even sure if a crime has been committed here. Frankly I find the entire claim to be on very shaky ground. It's not like co-pilot delivers fully functioning and feature complete code. It'll be interesting if this all gets thrown out under fair use.

      At best the most solid claim being made is that Github's ToS is being violated. But it's kind of hard to violate the ToS of a service you yourself own...

      Maybe put more thought into what is goi

  • ...is an advanced web-based AI tool that can generate books, films and music just by hearing their title! It's trained on terabytes of user-uploaded data sets. Unlike Microsoft's AI it doesn't require a monthly fee, and it's even able to generate Microsoft software!
  • Sounds like SCO all over again...

  • Since the code is created by an AI, is it copyrightable?
    The case of the monkey that took a selfie (https://en.wikipedia.org/wiki/Monkey_selfie_copyright_dispute), has some interesting takes on non-human creations; basic answer is no.

    • But the code isnâ(TM)t generated by an AI, the âoeAIâ copy/pastes code from other projects, without attribution.

      The question is then whether copyright protections apply to âpublicâ(TM) code. Based on legal history, Linus isnâ(TM)t willing to enforce his copyright nor does the majority of developers, thinking it is too costly or onerous. By not defending their copyright, most Linux code is now public domain. I think the same can be said about most code on StackOverflow and GitHu

  • Its been proven in court that you cant copyright AI generated art. Shouldn't be able to copyright AI generated code. Therefore shouldn't be able to AI generate copyright infringing code. My 2 cents.
  • Another case of copyright law hindering progress. If an AI who learns from public source code repositories needs to trace the origins of all the algorithms that it then suggests new programmers to properly attribute or compensate all the authors of said code, such a tool can not exist, or if it does, nobody could use it for anything serious out of fear of legal repercusions. Another case for severely limiting or abrogating copyright law entirely.

Blinding speed can compensate for a lot of deficiencies. -- David Nichols

Working...