Microsoft's GitHub Copilot Sued Over 'Software Piracy on an Unprecedented Scale' (itpro.co.uk) 97
"Microsoft's GitHub Copilot is being sued in a class action lawsuit that claims the AI product is committing software piracy on an unprecedented scale," reports IT Pro.
Programmer/designer Matthew Butterick filed the case Thursday in San Francisco, saying it was on behalf of millions of GitHub users potentially affected by the $10-a-month Copilot service: The lawsuit seeks to challenge the legality of GitHub Copilot, as well as OpenAI Codex which powers the AI tool, and has been filed against GitHub, its owner Microsoft, and OpenAI.... "By training their AI systems on public GitHub repositories (though based on their public statements, possibly much more), we contend that the defendants have violated the legal rights of a vast number of creators who posted code or other work under certain open-source licences on GitHub," said Butterick.
These licences include a set of 11 popular open source licences that all require attribution of the author's name and copyright. This includes the MIT licence, the GNU General Public Licence, and the Apache licence. The case claimed that Copilot violates and removes these licences offered by thousands, possibly millions, of software developers, and is therefore committing software piracy on an unprecedented scale.
Copilot, which is entirely run on Microsoft Azure, often simply reproduces code that can be traced back to open-source repositories or licensees, according to the lawsuit. The code never contains attributions to the underlying authors, which is in violation of the licences. "It is not fair, permitted, or justified. On the contrary, Copilot's goal is to replace a huge swath of open source by taking it and keeping it inside a GitHub-controlled paywall...." Moreover, the case stated that the defendants have also violated GitHub's own terms of service and privacy policies, the DMCA code 1202 which forbids the removal of copyright-management information, and the California Consumer Privacy Act.
The lawsuit also accuses GitHub of monetizing code from open source programmers, "despite GitHub's pledge never to do so."
And Butterick argued to IT Pro that "AI systems are not exempt from the law... If companies like Microsoft, GitHub, and OpenAI choose to disregard the law, they should not expect that we the public will sit still." Butterick believes AI can only elevate humanity if it's "fair and ethical for everyone. If it's not... it will just become another way for the privileged few to profit from the work of the many."
Reached for comment, GitHub pointed IT Pro to their announcement Monday that next year, suggested code fragments will come with the ability to identify when it matches other publicly-available code — or code that it's similar to.
The article adds that this lawsuit "comes at a time when Microsoft is looking at developing Copilot technology for use in similar programmes for other job categories, like office work, cyber security, or video game design, according to a Bloomberg report."
Programmer/designer Matthew Butterick filed the case Thursday in San Francisco, saying it was on behalf of millions of GitHub users potentially affected by the $10-a-month Copilot service: The lawsuit seeks to challenge the legality of GitHub Copilot, as well as OpenAI Codex which powers the AI tool, and has been filed against GitHub, its owner Microsoft, and OpenAI.... "By training their AI systems on public GitHub repositories (though based on their public statements, possibly much more), we contend that the defendants have violated the legal rights of a vast number of creators who posted code or other work under certain open-source licences on GitHub," said Butterick.
These licences include a set of 11 popular open source licences that all require attribution of the author's name and copyright. This includes the MIT licence, the GNU General Public Licence, and the Apache licence. The case claimed that Copilot violates and removes these licences offered by thousands, possibly millions, of software developers, and is therefore committing software piracy on an unprecedented scale.
Copilot, which is entirely run on Microsoft Azure, often simply reproduces code that can be traced back to open-source repositories or licensees, according to the lawsuit. The code never contains attributions to the underlying authors, which is in violation of the licences. "It is not fair, permitted, or justified. On the contrary, Copilot's goal is to replace a huge swath of open source by taking it and keeping it inside a GitHub-controlled paywall...." Moreover, the case stated that the defendants have also violated GitHub's own terms of service and privacy policies, the DMCA code 1202 which forbids the removal of copyright-management information, and the California Consumer Privacy Act.
The lawsuit also accuses GitHub of monetizing code from open source programmers, "despite GitHub's pledge never to do so."
And Butterick argued to IT Pro that "AI systems are not exempt from the law... If companies like Microsoft, GitHub, and OpenAI choose to disregard the law, they should not expect that we the public will sit still." Butterick believes AI can only elevate humanity if it's "fair and ethical for everyone. If it's not... it will just become another way for the privileged few to profit from the work of the many."
Reached for comment, GitHub pointed IT Pro to their announcement Monday that next year, suggested code fragments will come with the ability to identify when it matches other publicly-available code — or code that it's similar to.
The article adds that this lawsuit "comes at a time when Microsoft is looking at developing Copilot technology for use in similar programmes for other job categories, like office work, cyber security, or video game design, according to a Bloomberg report."
Train ANN on MicroSoft Office+LibreOffice sources (Score:5, Insightful)
Re: (Score:2)
libreoffice already exists and it isn't any competition for them. i don't think ms would give a damn unless there were a clearly juicy lawsuit to be won. the money they make from office doesn't come from their actual code as much as from their marketing, bundling and platforming.
as a developer myself i wouldn't touch copilot with a meter long pole, but claiming code assist snippets constitute a reasonable case of copyright infringement is just plain disingenuous, trying to exploit one of those gray areas wh
Re: (Score:1)
libreoffice already exists and it isn't any competition for them.
Microsoft would behave differently if it weren't lurking there. And MS office is gradually becoming less relevant as more and more processes are moved to some kind of database store (often web-interfaced to make it cross-platform) instead of being implemented by moving around doc and xls files. Everyone has figured out that is unsustainable.
Re: (Score:2)
it's ok, matthew, we will see ... :'D
Re: (Score:2)
a bit too vehement and on the gross side but ... very well put, sir.
now, coming back to this thread and looking at the moderation ... i would say /. audience so far falls in one of 2 categories: 1) would nail m$ for no matter what or 2) copyright snowflake.
this isn't good news indeed, but lets just watch the circus spin!
Re: (Score:1)
If the code is neither special nor unique let Microsoft and Copilot write it themselves.
It's true that there are only so many ways to solve certain programming problems and if Microsoft chooses to churn out common code snippets to this end on their own they can probably do whatever with them even if their implementations are neither new nor unique. If, however, they choose to utilise licensed code in their process they are obligated to follow the license even if it the most vanilla code in the world.
If
Re: (Score:3)
The real question is what fragments of code that currently exist, code written from experience and insight would fail the purity test put forth by the co-pilot adversaries? If one deeply exam
Re: (Score:2)
Please. No one gives a shit about your code.
Apparently enough people do to pay $10 per month for co-pilot. And apparently a large company like Microsoft cares enough to scan all the code uploaded to GitHub for their co-pilot.
Re: (Score:2)
Let me remind you that copyright only applies to creative expressions. Purely functional expressions are not copyrightable.
haha no (Score:5, Insightful)
Oh yeah? Let's see what they actually said about that:
That's not how anything works. Github is delivering code to the user without licensing compliance. Every single time they hand you some code without considering the license the code came from, they are violating the terms of the license. The source can only legally be distributed with the license attached. It doesn't matter if that is in a tarball, or in your IDE.
Re: haha no (Score:1)
But how many time can someone use the argument "my email appears more times in the mailing lists...so their code must be an infringement..." before someone says enough?
Re: haha no (Score:1)
GitHub Copilot does nothing more than search for code snippets and fill them in, if you let it, it will generate the associated comments, bugs and other errors or poor programming practices from some random software repository. Which is a huge problem, since most code available online is buggy and very bad practice.
Iâ(TM)ve typed in a comment prompt like import pymongo and then a comment # Connect to database, and it starts generating the text to a Python tutorial from a blog about MySQL. It will use p
Re: (Score:2)
I can see where if you let it generate enough code, it probably is a copyright violation, the problem is then whether code should be copyrightable at all or as we used to argue here on Slashdot, most code is just writing math equations and since a machine can generate it, doesnÃ(TM)t qualify as copyrightable.
That is not and never has been how copyright works. It doesn't matter if a machine generates it or not. If you were operating the machine, you can file for copyright on the generated works. The USPTO has only rejected copyright registration for works created by software without prompting, and registered for in the name of the software. By the same token, software operated by a person can violate copyright. And ultimately, the courts will not look kindly on software which runs around violating copyrights wit
Re: (Score:2)
Stick with me for a sec, because it'll sound like I'm defending Copilot, but I'm really not.
Github is delivering code to the user without licensing compliance. Every single time they hand you some code without considering the license the code came from, they are violating the terms of the license.
You're assuming Apache, GNU, etc. are the only licenses in play. What GitHub has argued elsewhere is that their Terms of Service explicitly grant them a separate license to every single line of code uploaded to their service. And the general consensus among legal circles seems to be that they're in the clear in that regard. You can find a few references to such things in their TOS: https://docs.github.com/en/sit... [github.com]
He
Isn't this how new progrmamers learn too? (Score:3, Interesting)
Re: (Score:2)
There are fundamental differences between how a human being assimilates and comprehends a new concept and some software that makes a highly connected data set.
Re: (Score:1)
Re: (Score:2)
You're the one asserting that an autoregressive language model [wikipedia.org] is equivalent to how humans learn. I'm suggesting that without proof that they are the same that they must be different. I can prove this by counterexample. GPT-3 can't learn to tie a knot. It might be able to learn to solve a crossword puzzle, it probably can't describe the weather from a photograph.
Re: (Score:1)
Re: (Score:3)
I'm saying that the process is externally the same,
It's not. A student doesn't read a billion lines of code from thousands of different projects. At least part of the time a student spends on theory and abstract concepts. For example when you learn to program you are presented with a simplified model of the grammar and some rules.
When you just feed one of these specialized systems a bunch of data, it is intentionally given only minimal rules up front so it can determined from the data what model might fit best.
the result is externally the same,
I suspect the results aren't on the system. Do
Re: (Score:2)
s/on the system/the same/
(I reworded my response a few times and didn't properly finish before submitting, sorry)
Re: (Score:1)
Re: (Score:2)
Feel free to conduct a scientific experiment where you teach students the same way you train an AI. Good luck. We have centuries of philosophy on effective ways of teaching that doesn't include showing terabytes of example data to students.
We can provide testable hypothesis about learning without any understanding of the internal mechanisms. And people done so long before you and I were born. We don't have to blaze an new trails to assert that two processes are different.
I do understand the desire to not to
Re: (Score:1)
Re: (Score:3)
Re: (Score:1)
Re: (Score:3)
Man, I’m going to get tired typing this.
Purely functional expressions are not copyrightable.
Re: (Score:3)
I haven't read the lawsuit but I'd guess there's a lawsuit because someone made a system for scraping code from github, then changing a few things to obscure its origin, then passing it off as their own. If someone did that in college, they'd probably be accused by the professor of plagiarism.
I gave a buddy of mine a paper back in college to pass off as his own. He was on academic probation, slacked off on a class he needed to pass. I told him just that once as I wasn't going to supply him with all my old p
Re: (Score:1)
Re: (Score:2)
If you learn to copy a piece of code verbatim and type that out, then that is still a copy of the code. It does not matter if it passed your brain first.
There's a big difference between humans learning and "AI" (neural networks) doing "deep learning" (=buzzword) here. When humans read code, they interpret the code and its comments and create an understanding of its workings and the intent behind the code. So when a human typically types some piece of code they already knew, then they type it from that under
Re: (Score:2)
Re: (Score:2)
Neural networks don't have understanding.
We have no idea if that's true. We don't really know what "understanding" means. We have very little idea what's going on in your brain when you appear to "understand" something. We know how neurons work, but how do billions of them work together to represent and process information? Very little idea.
And we also have very little idea how these huge language models work. We tune a hundred billion parameters to minimize a loss function, and somehow behavior emerges that seems to reflect knowledge and und
Re: (Score:2)
>>Neural networks don't have understanding.
>We have no idea if that's true.
Yes, we do. Because ordinary feed-forward neural networks are NOT Turing complete. They can be replaced with a *lookup table*. Get that, a LOOKUP TABLE.
If that counts as "understanding" to you then I don't know what to say, you are beyond my help.
Re: (Score:2)
You're making two false/unjustified assumptions. First, that "understanding" requires Turing completeness. Second, that language models are feed forward networks. In fact they're applied iteratively, exactly like a Turing machine performing one read/write operation at a time.
Re: Isn't this how new progrmamers learn too? (Score:1)
By that logic all rocks have 'understanding' plus an IQ of 160. Until you can conclusively prove they don't.
No AI has any 'understanding' at present or in the near future. We will reach there when we have 1st understood "completely" how animal & human brains & minds work and then replicate in software.
Every beginner thinks they have figured out some skill within the first few months before realizing it needs 10 years.
Understanding is if your autonomous driving software can learn to play the violin o
Re: (Score:2)
By that logic all rocks have 'understanding' plus an IQ of 160. Until you can conclusively prove they don't.
Actually, that's the logic you just used. I said the opposite.
Me: "We don't know what understanding means. We don't understand how the brain works or how neural networks work. Until we do, we have no idea whether neural networks understand or not."
You: "I'm completely certain neural networks don't have understanding. I know exactly what understanding means, and I *know* the brain's mechanism is the only possible way to produce it. Until computers work *exactly* the same way the brain does, they can't h
Re: (Score:3, Insightful)
Your comments are correct, but I think the lawsuit is probably also correct. This illustrates the insanity of current copyright law. Well, one aspect of it.
Re: (Score:1)
Re: Isn't this how new progrmamers learn too? (Score:2)
And staying in context here... that is just your understanding of it.
Re: (Score:2)
The problem is that it is not a defense that MS can easily make. Saying "so what, that's what everybody does" is not a legal defense. It is essentially an admission of guilt.
Re: (Score:1)
Re: Isn't this how new progrmamers learn too? (Score:1)
GitHub Copilot does more than that, given similar prompts, it will generate full-on copied code. I work with rather obscure file types and obviously you end up doing similar things for different purposes, and I needed it to iterate over the metadata of a file, so I commented I needed to loop over the thing and it generated my own loop written a long time ago for another project, pretty much verbatim but with the variable names I was using in this project. I was simply shocked that it was capable of so quick
Re: (Score:2)
Purely functional expressions are not copyrightable.
Who Actually Thought MS Would Be Honest? (Score:3, Interesting)
Re: Who Actually Thought MS Would Be Honest? (Score:2)
Re: Who Actually Thought MS Would Be Honest? (Score:2)
Re: (Score:2)
Then everybody else stole the concept of a central operating system (the OS) from MS.
Microsoft has never originated anything. Not MS-DOS, not Windows, not Microsoft Office, not X-Box, not Cloud Computing, not anything. It was run by an immoral marketing guy willing to do anything and everything, legal or illegal, to dominate the desktop.
Central operating systems predate Microsoft by many years.
Re: (Score:2)
Then everybody else stole the concept of a central operating system (the OS) from MS.
And in actual reality, MS stole that too. Seriously.
Re: (Score:2)
Then everybody else stole the concept of a central operating system (the OS) from MS.
What do you think you mean here? What do you allege is copied from what?
Glad to see this (Score:2)
I am glad to see this, anything to rein in MS and I hope they succeed. Curious if FSF and/or GNU will join that suite since they first sounded the alarms about co-pilot.
.
I was toying with moving away from github, but the dumb things I have there is of no commercial value to anyone. So I decided to wait to see if the Lawyers got involved. So heading to get popcorn and see how this plays out.
Moving away from GitHub (Score:4, Interesting)
Somewhat OT, but I recently did move my projects off GitHub because of CoPilot. If you're looking for GitHub alternatives, I recommend:
I use all three of the above options, so every push pushes my code to Salsa, Codeberg and my self-hosted gitea instance. I did look at setting up a self-hosted GitLab instance, but Gitea is much easier to set up and way lighter.
Note that if your code is open-source, you can't stop someone else from mirroring a git repo onto GitHub, but if more developers switch away from GitHub, its network effect becomes less attractive. Free yourself from GitHub today!
Perfect Reaction (Score:2)
The perfect reaction Microsoft can do to even the threat of this lawsuit is to release the model as open source. That would be letting the cat out of the bag. Is it worth holding a $10/user/mo service when the lawsuit can be immense? Or just throw it out there, then sell professional services against it? (Maybe they could sell fine-tuning the model after everyone forgets about this lawsuit.)
Re: (Score:2)
Actually that probably does not clear anything.
Many open source license are not compatible with one another. So unless they only generated a model out codes from a single license, or from a set of compatible license, they are probably still in violation of the licenses, and of copyright laws.
It also does not solve the issue that the coed produced by copilot would need to be licensed with a compatible license. Which would make the tool unusable for many of the users they are trying to sell it to.
Re: (Score:2)
Purely functional expressions are not copyrightable. Only creative expressions are copyrightable.
Re: (Score:2)
The perfect reaction Microsoft can do to even the threat of this lawsuit is to release the model as open source.
That would have no effect on past infringement. It is possible the plaintiff would accept it and drop the suit, but it would not be grounds for dismissal. You can't get out of a suit for damages for past actions by not doing it any more in the future.
Slashdotters and copyright (Score:2, Insightful)
Re: Slashdotters and copyright (Score:3)
I think most /.ers don't object so much to the idea, as they do to the fact that MS is *selling* this. I have public repos on GitHub containing code that I am happy to share with my students. I would be pissed if one of my students tried to monetize this by selling it to others. It has a CC license for a reason.
Also,I don't believe Copilot is actually useful to most programmers. I know some students use it. Have they learned what they were supposed to. No, because they did not solve the exercises themselv
Re: Slashdotters and copyright (Score:2)
Re: (Score:2)
And those major corps will not succeed, not if their business depends on the quality of the software they write anyway.
Companies that didn't fire developers and replace them with "AI" will take their place.
Re: Slashdotters and copyright (Score:1)
Re: (Score:2)
I think that it's quite a reasonable service for experienced developers, who can judge the quality of the code . It could save coding time of simple functions.
I agree that in general going over several Stack Overflow answers and choosing what code to copy is a better way to go if you're not totally familiar with the subject. Copilot would probably be a better service if it offered several options and the explanations that came with them. Though that could plunge it even deeper into the copyright violation h
Re: Slashdotters and copyright (Score:1)
Re: (Score:2)
I think the concern is over open source licenses, which have copyright as the enforcement mechanism. If this were about a company selling an AI service that scanned creative commons licensed books and output them into new stories, I think you'd see the same concern.
Re: (Score:1)
I agree: lots of people complaining here who probably had no problem with AI-powered reuse when they were playing with Stable Diffusion.
Furthermore, it's not clear to me that any copyright violation has occurred. U.S. copyright law allows for transformative use, and the transformation from raw code to an ML model seems quite valid. The potential for the model to reproduce existing code (to the standards that constitute infringement) must be proven for the lawsuit to have any merit.
Re: (Score:2)
That argument is somewhat weakened by the fact that copilot can spit out entire files with code and comments as an output.
Re: (Score:2)
I don't think we can summarize slashdot as a single archetype. But we may be able to have a few types that summarize the typical slashdotter.
In general, I think we would be better off without copyright. Both in movies and music, no copyright, and also in codes. But since we are going to have copyright, then it should be enforced across the board.
And that would be great if, software would have to be provided in source form.
What's the big deal? (Score:3, Insightful)
It's not as if Microsoft would have paid for the software.
"If companies like Microsoft, GitHub, and OpenAI choose to disregard the law, they should not expect that we the public will sit still."
As opposed to you regarding the law and paying for all the music and movies you have, right?
It's always amusing to watch the mental leaps people will go through to justify it's okay for them to take someone else's work without compensation, but it's a travesty when it's done to them.
Re: (Score:2)
You know what the word Whataboutism means, do you?
Re: (Score:2)
Sure do, and in this case it's a perfect example of hypocrisy.
Nothing is being lost in this case. The original code is still there for anyone to use. Sound familiar?
Re: (Score:2)
The big, big, big difference is that somebody is charging money for access to other people's code that is protected by open source licenses.
Does that clear it up for you?
If I had money to waste on wagers, I'd bet on this lawsuit succeeding.
Re: (Score:2)
The big, big, big difference is that somebody is charging money for access to other people's code that is protected by open source licenses.
AFAIK there's no issue with that. If you can convince someone to pay you for a copy of open source software, you can sell it, as long as you comply with the terms of the license. The allegation is that Microsoft is not complying with the license.
Re: (Score:2)
That's the problem, Microsoft isn't passing the license information onwards. It is such a potential minefield.
A simple one is license incompatibility - after all, BSD is incompatible with GPL, GPLv2 is incompatible with GPLv3 (note, GPLv2+ is compatible with GPLv3 b
Re: What's the big deal? (Score:2)
I don't know for sure but I doubt Copilot has any awareness of license terms at all, so if this lawsuit goes badly for MS the whole project could get scrapped.
Re: What's the big deal? (Score:1)
But the question in front of the court will be, does this person bringing the lawsuit have standing, if so, it needs to prove the class exists and is somehow damaged by the inclusion of its source in other projects. The majority of open source projects does not care about licensing violations, the majority of code is not âprofitableâ(TM) in any way, shape or form.
Re: (Score:2)
Sure do, and in this case it's a perfect example of hypocrisy.
If you assume one exists without proving it, lump groups of people together ignoring nuance in saying similar things, and act like a group of people act like some siltary unit.
Which IMO only shows you don't know how a contradiction works.
Re: (Score:2)
And what is the source where you've seen Matthew Butterick advocate disregarding copyright law with regard to music and movies? I actually took a minute to google him and, on the first page of hits at least, found nothing to support your assertion.
Re: (Score:1)
Hey - a fun-house mirror moment! I had the same thought for the exact opposite reason.
There are some folks who have made headway with crowd-funding or band-camp like services, and it looks like that model can sometimes work out.
But the corporate lobbyists who hijacked democracy while weeping crocodile tears "on behalf of the poor creators" are just large scale predators fighting to keep the small scale competition from taking a nibble. .. or are you predicting that Microsoft is about to declare mea culpa a
Re: (Score:1)
My AI device (Score:2)
My AI device is a xerox machine.
Needs to be jail time (Score:3)
Re: (Score:2)
Does this meet the standard for criminal copyright infringement?
Re: (Score:2)
That's an interesting call considering precisely no one at this point is even sure if a crime has been committed here. Frankly I find the entire claim to be on very shaky ground. It's not like co-pilot delivers fully functioning and feature complete code. It'll be interesting if this all gets thrown out under fair use.
At best the most solid claim being made is that Github's ToS is being violated. But it's kind of hard to violate the ToS of a service you yourself own...
Maybe put more thought into what is goi
The Pirate Bay... (Score:2)
Where's Groklaw. (Score:1)
Sounds like SCO all over again...
Bigger question (Score:2)
Since the code is created by an AI, is it copyrightable?
The case of the monkey that took a selfie (https://en.wikipedia.org/wiki/Monkey_selfie_copyright_dispute), has some interesting takes on non-human creations; basic answer is no.
Re: Bigger question (Score:1)
But the code isnâ(TM)t generated by an AI, the âoeAIâ copy/pastes code from other projects, without attribution.
The question is then whether copyright protections apply to âpublicâ(TM) code. Based on legal history, Linus isnâ(TM)t willing to enforce his copyright nor does the majority of developers, thinking it is too costly or onerous. By not defending their copyright, most Linux code is now public domain. I think the same can be said about most code on StackOverflow and GitHu
Cant copywright AI generated art (Score:1)
Bad copyright (Score:2)