Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Programming AI Microsoft Open Source

Mixed Reactions to GitHub's AI-Powered Pair Programmer 'Copilot' (github.blog) 39

Reactions are starting to come in for GitHub's new Copilot coding tool, which one site calls "a product of the partnership between Microsoft and AI research and deployment company OpenAI — which Microsoft invested $1 billion into two years ago." According to the tech preview page: GitHub Copilot is currently only available as a Visual Studio Code extension. It works wherever Visual Studio Code works — on your machine or in the cloud on GitHub Codespaces. And it's fast enough to use as you type. "Copilot looks like a potentially fantastic learning tool — for developers of all abilities," said James Governor, an analyst at RedMonk. "It can remove barriers to entry. It can help with learning new languages, and for folks working on polyglot codebases. It arguably continues GitHub's rich heritage as a world-class learning tool. It's early days but AI-assisted programming is going to be a thing, and where better to start experiencing it than GitHub...?"

The issue of scale is a concern for GitHub, according to the tech preview FAQ: "If the technical preview is successful, our plan is to build a commercial version of GitHub Copilot in the future. We want to use the preview to learn how people use GitHub Copilot and what it takes to operate it at scale." GitHub spent the last year working closely with OpenAI to build Copilot. GitHub developers, along with some users inside Microsoft, have been using it every day internally for months.

[Guillermo Rauch, CEO of developer software provider Vercel, who also is founder of Vercel and creator of Next.js], cited in a tweet a statement from the Copilot tech preview FAQ page, "GitHub Copilot is a code synthesizer, not a search engine: the vast majority of the code that it suggests is uniquely generated and has never been seen before."

To that, Rauch simply typed: "The future."

Rauch's post is relevant in that one of the knocks against Copilot is that some folks seem to be concerned that it will generate code that is identical to code that has been generated under open source licenses that don't allow derivative works, but which will then be used by a developer unknowingly...

GitHub CEO Nat Friedman has responded to those concerns, according to another article, arguing that training an AI system constitutes fair use: Friedman is not alone — a couple of actual lawyers and experts in intellectual property law took up the issue and, at least in their preliminary analysis, tended to agree with Friedman... [U.K. solicitor] Neil Brown examines the idea from an English law perspective and, while he's not so sure about the idea of "fair use" if the idea is taken outside of the U.S., he points simply to GitHub's terms of service as evidence enough that the company can likely do what it's doing. Brown points to passage D4, which grants GitHub "the right to store, archive, parse, and display Your Content, and make incidental copies, as necessary to provide the Service, including improving the Service over time." "The license is broadly worded, and I'm confident that there is scope for argument, but if it turns out that Github does not require a license for its activities then, in respect of the code hosted on Github, I suspect it could make a reasonable case that the mandatory license grant in its terms covers this as against the uploader," writes Brown. Overall, though, Brown says that he has "more questions than answers."
Armin Ronacher, the creator of the Flask web framework for Python, shared an interesting example on Twitter (which apparently came from the game Quake III Arena) in which Copilot apparently reproduces a chunk of code including not only its original comment ("what the fuck?") but also its original copyright notice.
This discussion has been archived. No new comments can be posted.

Mixed Reactions to GitHub's AI-Powered Pair Programmer 'Copilot'

Comments Filter:
  • Problematic. (Score:5, Interesting)

    by Gravis Zero ( 934156 ) on Saturday July 03, 2021 @12:39PM (#61547448)

    copyright does not only cover copying and pasting; it covers derivative works. github copilot was trained on open source code and the sum total of everything it knows was drawn from that code. there is no possible interpretation of "derivative" that does not include this

    This is exactly right which means that Microsoft is in violation of the most individual licenses of any entity on this planet.

    • Re:Problematic. (Score:5, Interesting)

      by fahrbot-bot ( 874524 ) on Saturday July 03, 2021 @12:48PM (#61547474)

      In addition, as TFS notes (below), the "service" is likely hosted remotely meaning your code and what you type will be transmitted over the Internet and processed by GitHub. I can't imagine any companies I've worked for that would want their product (code) shared like this. Even if they would ultimately share the final product, or parts of it, this exposes the development process and (partial) product in an uncontrolled manner. I'm going to flag this "Do Not Want".

      The issue of scale is a concern for GitHub" ...

      ... GitHub's terms of service as evidence enough that the company can likely do what it's doing. Brown points to passage D4, which grants GitHub "the right to store, archive, parse, and display Your Content, and make incidental copies, as necessary to provide the Service, including improving the Service over time.

    • If an individual can draw from the well, so can a corporation, so can any entity we mandate as thus in the future. The difference is what can done with that resource once taken. A license, a copyright, a trademark,...they are designed to cover the instance of a work not what that work will go in to as a resource. particularly is the instance creator has drawn from the same well....
    • github copilot was trained on open source code and the sum total of everything it knows was drawn from that code....This is exactly right

      Think carefully about what you are saying here, any software developer that looked at ANY open source code is tainted and all works they produce must be considered "derivative" because the programmer learned from others. Would would mean every developer on Earth.

      Beyond that, of course all developers who have ever read a design patterns book are engaged in copyright infri

      • Cute but you presume that humans learn how to do things in similar fashion to neural networks, which they do not. Neural networks simulate how synapses function but the similarities end there.

        • Cute but you presume that humans learn how to do things in similar fashion to neural networks

          No, I do not.

          Humans learn in different ways also, which human preferred learning methods are legal and which are not?

      • by Junta ( 36770 )

        In the copilot demos, I see one of two scenarios.

        One where it effectively translates small chunks of comments to little bits of code. For example:
        "Skip lines that begin with #" becomes "if line.startswith('#'): continue" It's a lifted template snippet, but so trivially short that no one would ever be able to claim it. This is fine, but also not that useful.

        Then another sort of demo, where they say something like 'int get_hash_value(const char* key) {' and then it just spews out a boilerplate function. That

    • by gweihir ( 88907 )

      MS build its whole fortune on stealing and copying (badly). No surprise they continue that.

  • Licensing (Score:4, Insightful)

    by darkain ( 749283 ) on Saturday July 03, 2021 @12:42PM (#61547462) Homepage

    Copilot data set was trained with a mix of GPL and GPL-incompatible licenses, and could very easily be described as a "derivative work".

    They've also admitted that in several cases, direct copy-paste results from training data show up as the "suggested" results.

    The entire training set is a licensing NIGHTMARE. Microsoft may have just committed the single largest intellectual property licensing violation.

    Until this stands the tests of the courts, it is no where near worth the risk of taint to anyone's codebase to even look at Copilot for now. The risk is simply far too high.

    • So they to destribute it under GPL - but they don't: it is a service. No GPL violation here.
      • Yeah, but who cares about Microsoft's end. The part where the user of this service could end up with GPL code (or others) copied verbatim into their codebase is absolutely fatal.

        • Why is nobody considering the possibility of using a bloom filter to skip those snippets that are similar to GPL? It's a straightforward application.
      • by Junta ( 36770 )

        It would be a GPL violation. The 'service' gives you code, and if that code is clearly lifted from GPL, and given to a user without any attribution or license terms, then that would violate the GPL.

        Same for BSD-style license, offering up the 'learned' functions without attribution.

        Public domain code and 2-3 line snippets are about all that would be credibly safe.

  • by cjonslashdot ( 904508 ) on Saturday July 03, 2021 @01:09PM (#61547530)
    Real programming doesnâ(TM)t happen at the keyboard. It happens in the mind. The AI system does not know what you are trying to accomplish and so at best it can suggest common constructs like it has seen before perhaps guided by the structure of a method. But again it does not know your intentions, and the real challenge in programming is creating complex patterns that you visualize in your mind. So unless the AI system is the entity that comes up with those patterns in the first place, it will not really be able to help in a significant way.
    • by Anonymous Coward

      ROFL
      You mean like all those devs who rely on sites like Stackoverflow for all their code.

      Yep, that is the future of coding... Not
      I'm so glad that I don't use any MS products when writing code even though the end result is for Windows.
      SatNad can keep his sticky fingers off my code.

  • by Bookwyrm ( 3535 ) on Saturday July 03, 2021 @01:15PM (#61547546)

    I am curious if the example referred to where the system apparently reproduced an entire chunk of code with command and copyright notice was the system actually cutting and pasting, or if it has simply 'learned' that those text items were 'supposed' to be there from processing other code.

    In either case, if it is not actually applying any understanding of the code, then this is a glorified, automated, cut-and-paste coding system -- which means if the source material is poisoned with errors, security holes, or backdoors, then the system is just going to cut-and-paste the problems into what is generated.

    • You got that right, there are no guarantees on the code Copilot generates. It does not validate its output, and can't do that.
  • by phantomfive ( 622387 ) on Saturday July 03, 2021 @01:23PM (#61547574) Journal

    In an example I saw, it saved a developer time by filling in some boiler plate code for logging in to Twitter, then finally it creates a new Tweet with an image.

    But this boiler-plate code should be abstracted out into a function of its own by any competent developer. This is the problem: if you spend a lot of your time writing boiler-plate code, you are doing programming wrong.

    • by Junta ( 36770 )

      To that point, if it learns and mimicks the function developed in a library, but not make your code a call to said maintained library, it would suck when twitter changes the API and the project that *would* update to provide the same function is out of the loop as your code becomes broken.

  • by Hentes ( 2461350 ) on Saturday July 03, 2021 @03:09PM (#61547886)

    So this tool basically automates the process of looking up stuff on Stackoverflow and copying them into your project without understanding what they do. I hope that I never have to work on code written using it.

    • So this tool basically automates the process of looking up stuff on Stackoverflow and copying them into your project without understanding what they do.

      And even more than with StackOverflow, you still need to look up the relevant documentation to make sure the code is actually doing what you want, because there is no guarantee it will even produce syntactically correct code (at least, GPT-3 doesn't, maybe they've improved it in some way).

    • by gweihir ( 88907 )

      Indeed. Basically automated incompetence and stupidity. Of course, the "coders" that operate on this level like it. The rest recoils in horror.

  • If you need this tool, you're probably not experienced enough to use it effectively.

    If you're experienced enough to get good results out of the tool, you probably don't need it.

  • it won't be too long before you can sit down and just describe a general app and have the machine build it for you, turning most "programming" jobs into minor stylesheet tweaking.

    • by Chozabu ( 974192 )
      Hopefully. Doesn't higher level programming language brings us closer to this?

      Compare punch cards or even assembly to python, you can just about give an overview of what you want done - and it happens.

      If we get to a stage where programming jobs are really minor stylesheet tweaking, we should be able to write programs to tweak those spreadsheets.

      Perhaps one day soon, we can say "Program, improve your own code - make yourself smarter", and it will.
    • This reminds me of a task I did last week. The webapp I'm working on mandates a company-wide design system which includes CSS and some general custom $JS_FRAMEWORK UI components. It also uses a pretty widely used JS UI component. Which works OK, until for some unknown reason some aspect of the layout or presentation is screwed up. To fix that, i.so. easily changing some parameter on the high-level public API, one has to reverse-engineer the undocumented arcane internals and hack together a workaround in a w
    • by gweihir ( 88907 )

      Hardly. Software created on this "cretin" skill level becomes very, very expensive if used for anything real. Far more expensive that software created by people with a clue and experience, even if they are much more expensive per hour. It is as if _still_ nobody managing software development has read "The Mythical Man-Month".

    • That's exactly what programming is. You sit down, describe an app, and the machine builds it for you. People have been looking for ways to write programming languages or provide development environments that save the programmer's time since day 1. And yet people still program in C with VIM and Emacs. There is a reason for this. Programming is very hard for even very talented human beings, and right now I would automating it is still unimaginable. At least it is for me.
  • For so many years I've witnessed people here complaining about copyrights, and now that the equivalent of bittorrent for code is here nobody is defending it. Funny thing
  • “People have to remember, however, that it is entirely unable to write creative code. Creativity — for now — is still in the hands of humans. So, is using Copilot pair programming? Only if you don’t mind that one of the two programmers isn’t creative.”
    Jason Bloomberg, an analyst at Intellyx

    So, this is pretty much advanced auto-complete.

    It does make me shudder if anyone entirely leans on this tool, rather than fully learning how to code.
    If it is treated as "for learning" a

  • Why there is not a site where everyone could post in any given programming language a snippet (or algorithm) of code under subject matter with explanation of the code and the peer could then vote the best solution of each underlying problem. One could then have a plugin in IDE that would make it possible to search those snippets and insert them. The co-pilot AI is most likely using the github like that I just don't understand why you need an AI to do it?
  • Yes it may infringe copyright BUT if it does who gets sued?

    Which also begs the question: if original code is generated using Copilot, does the latter have any claim to any of the IP as each individual in a pair-programming partnership would?

  • Sounds like AI assisted copyright infringement that crowd sources code to auto complete algorithms you aren't really familiar with
  • To be honest, my initial concern is the quality of code the algorithm is going to suggest. Public github repos represent the full gamut, from the most elegant solutions created by the best teams to the most naive, cumbersome code created by students in their first coding class. Trained systems are only as good as the data they're trained on, and so far I haven't seen any documentation on how they selected the code to use for training. If they used everything at their disposal, you may get code that uses
  • This is cool and I'd like to believe that it's just querying from StackOverflow and provides an answer. Slowly with the tools like GPT-3, even the jobs which traditionally cannot be replaced can now be replaced/aided by machines.

Everything should be made as simple as possible, but not simpler. -- Albert Einstein

Working...