Mixed Reactions to GitHub's AI-Powered Pair Programmer 'Copilot' (github.blog) 39
Reactions are starting to come in for GitHub's new Copilot coding tool, which one site calls "a product of the partnership between Microsoft and AI research and deployment company OpenAI — which Microsoft invested $1 billion into two years ago."
According to the tech preview page: GitHub Copilot is currently only available as a Visual Studio Code extension. It works wherever Visual Studio Code works — on your machine or in the cloud on GitHub Codespaces. And it's fast enough to use as you type. "Copilot looks like a potentially fantastic learning tool — for developers of all abilities," said James Governor, an analyst at RedMonk. "It can remove barriers to entry. It can help with learning new languages, and for folks working on polyglot codebases. It arguably continues GitHub's rich heritage as a world-class learning tool. It's early days but AI-assisted programming is going to be a thing, and where better to start experiencing it than GitHub...?"
The issue of scale is a concern for GitHub, according to the tech preview FAQ: "If the technical preview is successful, our plan is to build a commercial version of GitHub Copilot in the future. We want to use the preview to learn how people use GitHub Copilot and what it takes to operate it at scale." GitHub spent the last year working closely with OpenAI to build Copilot. GitHub developers, along with some users inside Microsoft, have been using it every day internally for months.
[Guillermo Rauch, CEO of developer software provider Vercel, who also is founder of Vercel and creator of Next.js], cited in a tweet a statement from the Copilot tech preview FAQ page, "GitHub Copilot is a code synthesizer, not a search engine: the vast majority of the code that it suggests is uniquely generated and has never been seen before."
To that, Rauch simply typed: "The future."
Rauch's post is relevant in that one of the knocks against Copilot is that some folks seem to be concerned that it will generate code that is identical to code that has been generated under open source licenses that don't allow derivative works, but which will then be used by a developer unknowingly...
GitHub CEO Nat Friedman has responded to those concerns, according to another article, arguing that training an AI system constitutes fair use: Friedman is not alone — a couple of actual lawyers and experts in intellectual property law took up the issue and, at least in their preliminary analysis, tended to agree with Friedman... [U.K. solicitor] Neil Brown examines the idea from an English law perspective and, while he's not so sure about the idea of "fair use" if the idea is taken outside of the U.S., he points simply to GitHub's terms of service as evidence enough that the company can likely do what it's doing. Brown points to passage D4, which grants GitHub "the right to store, archive, parse, and display Your Content, and make incidental copies, as necessary to provide the Service, including improving the Service over time." "The license is broadly worded, and I'm confident that there is scope for argument, but if it turns out that Github does not require a license for its activities then, in respect of the code hosted on Github, I suspect it could make a reasonable case that the mandatory license grant in its terms covers this as against the uploader," writes Brown. Overall, though, Brown says that he has "more questions than answers."
Armin Ronacher, the creator of the Flask web framework for Python, shared an interesting example on Twitter (which apparently came from the game Quake III Arena) in which Copilot apparently reproduces a chunk of code including not only its original comment ("what the fuck?") but also its original copyright notice.
The issue of scale is a concern for GitHub, according to the tech preview FAQ: "If the technical preview is successful, our plan is to build a commercial version of GitHub Copilot in the future. We want to use the preview to learn how people use GitHub Copilot and what it takes to operate it at scale." GitHub spent the last year working closely with OpenAI to build Copilot. GitHub developers, along with some users inside Microsoft, have been using it every day internally for months.
[Guillermo Rauch, CEO of developer software provider Vercel, who also is founder of Vercel and creator of Next.js], cited in a tweet a statement from the Copilot tech preview FAQ page, "GitHub Copilot is a code synthesizer, not a search engine: the vast majority of the code that it suggests is uniquely generated and has never been seen before."
To that, Rauch simply typed: "The future."
Rauch's post is relevant in that one of the knocks against Copilot is that some folks seem to be concerned that it will generate code that is identical to code that has been generated under open source licenses that don't allow derivative works, but which will then be used by a developer unknowingly...
GitHub CEO Nat Friedman has responded to those concerns, according to another article, arguing that training an AI system constitutes fair use: Friedman is not alone — a couple of actual lawyers and experts in intellectual property law took up the issue and, at least in their preliminary analysis, tended to agree with Friedman... [U.K. solicitor] Neil Brown examines the idea from an English law perspective and, while he's not so sure about the idea of "fair use" if the idea is taken outside of the U.S., he points simply to GitHub's terms of service as evidence enough that the company can likely do what it's doing. Brown points to passage D4, which grants GitHub "the right to store, archive, parse, and display Your Content, and make incidental copies, as necessary to provide the Service, including improving the Service over time." "The license is broadly worded, and I'm confident that there is scope for argument, but if it turns out that Github does not require a license for its activities then, in respect of the code hosted on Github, I suspect it could make a reasonable case that the mandatory license grant in its terms covers this as against the uploader," writes Brown. Overall, though, Brown says that he has "more questions than answers."
Armin Ronacher, the creator of the Flask web framework for Python, shared an interesting example on Twitter (which apparently came from the game Quake III Arena) in which Copilot apparently reproduces a chunk of code including not only its original comment ("what the fuck?") but also its original copyright notice.
Problematic. (Score:5, Interesting)
copyright does not only cover copying and pasting; it covers derivative works. github copilot was trained on open source code and the sum total of everything it knows was drawn from that code. there is no possible interpretation of "derivative" that does not include this
This is exactly right which means that Microsoft is in violation of the most individual licenses of any entity on this planet.
Re:Problematic. (Score:5, Interesting)
In addition, as TFS notes (below), the "service" is likely hosted remotely meaning your code and what you type will be transmitted over the Internet and processed by GitHub. I can't imagine any companies I've worked for that would want their product (code) shared like this. Even if they would ultimately share the final product, or parts of it, this exposes the development process and (partial) product in an uncontrolled manner. I'm going to flag this "Do Not Want".
The issue of scale is a concern for GitHub" ...
Re: (Score:1)
Think that through... (Score:1)
github copilot was trained on open source code and the sum total of everything it knows was drawn from that code....This is exactly right
Think carefully about what you are saying here, any software developer that looked at ANY open source code is tainted and all works they produce must be considered "derivative" because the programmer learned from others. Would would mean every developer on Earth.
Beyond that, of course all developers who have ever read a design patterns book are engaged in copyright infri
Re: (Score:2)
Cute but you presume that humans learn how to do things in similar fashion to neural networks, which they do not. Neural networks simulate how synapses function but the similarities end there.
Re: (Score:1)
Cute but you presume that humans learn how to do things in similar fashion to neural networks
No, I do not.
Humans learn in different ways also, which human preferred learning methods are legal and which are not?
Re: (Score:2)
In the copilot demos, I see one of two scenarios.
One where it effectively translates small chunks of comments to little bits of code. For example:
"Skip lines that begin with #" becomes "if line.startswith('#'): continue" It's a lifted template snippet, but so trivially short that no one would ever be able to claim it. This is fine, but also not that useful.
Then another sort of demo, where they say something like 'int get_hash_value(const char* key) {' and then it just spews out a boilerplate function. That
Re: (Score:2)
MS build its whole fortune on stealing and copying (badly). No surprise they continue that.
Licensing (Score:4, Insightful)
Copilot data set was trained with a mix of GPL and GPL-incompatible licenses, and could very easily be described as a "derivative work".
They've also admitted that in several cases, direct copy-paste results from training data show up as the "suggested" results.
The entire training set is a licensing NIGHTMARE. Microsoft may have just committed the single largest intellectual property licensing violation.
Until this stands the tests of the courts, it is no where near worth the risk of taint to anyone's codebase to even look at Copilot for now. The risk is simply far too high.
Re: Licensing (Score:2)
Re: Licensing (Score:3)
Yeah, but who cares about Microsoft's end. The part where the user of this service could end up with GPL code (or others) copied verbatim into their codebase is absolutely fatal.
Re: (Score:2)
Re: (Score:2)
It would be a GPL violation. The 'service' gives you code, and if that code is clearly lifted from GPL, and given to a user without any attribution or license terms, then that would violate the GPL.
Same for BSD-style license, offering up the 'learned' functions without attribution.
Public domain code and 2-3 line snippets are about all that would be credibly safe.
useful in a limited way (Score:3)
Re: (Score:1)
ROFL
You mean like all those devs who rely on sites like Stackoverflow for all their code.
Yep, that is the future of coding... Not
I'm so glad that I don't use any MS products when writing code even though the end result is for Windows.
SatNad can keep his sticky fingers off my code.
An Idiot Savant's Idiot (Score:5, Interesting)
I am curious if the example referred to where the system apparently reproduced an entire chunk of code with command and copyright notice was the system actually cutting and pasting, or if it has simply 'learned' that those text items were 'supposed' to be there from processing other code.
In either case, if it is not actually applying any understanding of the code, then this is a glorified, automated, cut-and-paste coding system -- which means if the source material is poisoned with errors, security holes, or backdoors, then the system is just going to cut-and-paste the problems into what is generated.
Re: (Score:2)
It also writes bad code (Score:5, Insightful)
In an example I saw, it saved a developer time by filling in some boiler plate code for logging in to Twitter, then finally it creates a new Tweet with an image.
But this boiler-plate code should be abstracted out into a function of its own by any competent developer. This is the problem: if you spend a lot of your time writing boiler-plate code, you are doing programming wrong.
Re: (Score:2)
To that point, if it learns and mimicks the function developed in a library, but not make your code a call to said maintained library, it would suck when twitter changes the API and the project that *would* update to provide the same function is out of the loop as your code becomes broken.
Re: (Score:2)
Oh good point, this thing could be a maintainability nightmare.
Automated copypaste programming (Score:3)
So this tool basically automates the process of looking up stuff on Stackoverflow and copying them into your project without understanding what they do. I hope that I never have to work on code written using it.
Re: (Score:2)
So this tool basically automates the process of looking up stuff on Stackoverflow and copying them into your project without understanding what they do.
And even more than with StackOverflow, you still need to look up the relevant documentation to make sure the code is actually doing what you want, because there is no guarantee it will even produce syntactically correct code (at least, GPT-3 doesn't, maybe they've improved it in some way).
Re: (Score:2)
Indeed. Basically automated incompetence and stupidity. Of course, the "coders" that operate on this level like it. The rest recoils in horror.
Who is this aimed at? (Score:2)
If you need this tool, you're probably not experienced enough to use it effectively.
If you're experienced enough to get good results out of the tool, you probably don't need it.
Re: Who is this aimed at? (Score:1)
Maybe it's for people who like code-reviewing bots.
Programmers programming themselves out of a job (Score:2)
it won't be too long before you can sit down and just describe a general app and have the machine build it for you, turning most "programming" jobs into minor stylesheet tweaking.
Re: (Score:3)
Compare punch cards or even assembly to python, you can just about give an overview of what you want done - and it happens.
If we get to a stage where programming jobs are really minor stylesheet tweaking, we should be able to write programs to tweak those spreadsheets.
Perhaps one day soon, we can say "Program, improve your own code - make yourself smarter", and it will.
Re: Programmers programming themselves out of a jo (Score:3)
Re: (Score:2)
Hardly. Software created on this "cretin" skill level becomes very, very expensive if used for anything real. Far more expensive that software created by people with a clue and experience, even if they are much more expensive per hour. It is as if _still_ nobody managing software development has read "The Mythical Man-Month".
Re: (Score:2)
Now we see the truth (Score:2)
Just for learning, then? or... (Score:2)
So, this is pretty much advanced auto-complete.
It does make me shudder if anyone entirely leans on this tool, rather than fully learning how to code.
If it is treated as "for learning" a
Of AI, coding and decisions (Score:1)
Who owns the IP if Copilot is used for original ? (Score:2)
Yes it may infringe copyright BUT if it does who gets sued?
Which also begs the question: if original code is generated using Copilot, does the latter have any claim to any of the IP as each individual in a pair-programming partnership would?
AI copy and paste (Score:1)
Code quality (Score:2)
Its just copying from Stackoverflow (Score:1)