Researchers Secretly Deployed A Bot That Submitted Bug-Fixing Pull Requests (medium.com) 87
An anonymous reader quotes Martin Monperrus, a professor of software at Stockholm's KTH Royal Institute of Technology:
Repairnator is a bot. It constantly monitors software bugs discovered during continuous integration of open-source software and tries to fix them automatically. If it succeeds to synthesize a valid patch, Repairnator proposes the patch to the human developers, disguised under a fake human identity. To date, Repairnator has been able to produce 5 patches that were accepted by the human developers and permanently merged in the code base...
It analyzes bugs and produces patches, in the same way as human developers involved in software maintenance activities. This idea of a program repair bot is disruptive, because today humans are responsible for fixing bugs. In others words, we are talking about a bot meant to (partially) replace human developers for tedious tasks.... [F]or a patch to be human-competitive 1) the bot has to synthesize the patch faster than the human developer 2) the patch has to be judged good-enough by the human developer and permanently merged in the code base.... We believe that Repairnator prefigures a certain future of software development, where bots and humans will smoothly collaborate and even cooperate on software artifacts.
Their fake identity was a software engineer named Luc Esape, with a profile picture that "looks like a junior developer, eager to make open-source contributions... humans tend to have a priori biases against machines, and are more tolerant to errors if the contribution comes from a human peer. In the context of program repair, this means that developers may put the bar higher on the quality of the patch, if they know that the patch comes from a bot."
The researchers proudly published the approving comments on their merged patches -- although a conundrum arose when repairnator submitted a patch for Eclipse Ditto, only to be told that "We can only accept pull-requests which come from users who signed the Eclipse Foundation Contributor License Agreement."
"We were puzzled because a bot cannot physically or morally sign a license agreement and is probably not entitled to do so. Who owns the intellectual property and responsibility of a bot contribution: the robot operator, the bot implementer or the repair algorithm designer?"
It analyzes bugs and produces patches, in the same way as human developers involved in software maintenance activities. This idea of a program repair bot is disruptive, because today humans are responsible for fixing bugs. In others words, we are talking about a bot meant to (partially) replace human developers for tedious tasks.... [F]or a patch to be human-competitive 1) the bot has to synthesize the patch faster than the human developer 2) the patch has to be judged good-enough by the human developer and permanently merged in the code base.... We believe that Repairnator prefigures a certain future of software development, where bots and humans will smoothly collaborate and even cooperate on software artifacts.
Their fake identity was a software engineer named Luc Esape, with a profile picture that "looks like a junior developer, eager to make open-source contributions... humans tend to have a priori biases against machines, and are more tolerant to errors if the contribution comes from a human peer. In the context of program repair, this means that developers may put the bar higher on the quality of the patch, if they know that the patch comes from a bot."
The researchers proudly published the approving comments on their merged patches -- although a conundrum arose when repairnator submitted a patch for Eclipse Ditto, only to be told that "We can only accept pull-requests which come from users who signed the Eclipse Foundation Contributor License Agreement."
"We were puzzled because a bot cannot physically or morally sign a license agreement and is probably not entitled to do so. Who owns the intellectual property and responsibility of a bot contribution: the robot operator, the bot implementer or the repair algorithm designer?"
Who owns a bot's intellectual property? (Score:3, Insightful)
easy one: nobody
Copyright applies to creative works. A machine produced work is not creative, since any similar machine could and would produce it.
Re:Who owns a bot's intellectual property? (Score:4, Insightful)
This is great... in theory... Until it is actually tried in court, and they possibly rule a different way.
Re: (Score:3)
One could argue that designer and/or programmer of the algorithm is providing the creative input. If the algorithm is based on machine learning it becomes a bit more difficult, can the selection of training data be considered creative?
Re: (Score:2)
One could argue that designer and/or programmer of the algorithm is providing the creative input.
One could argue that, but there is very little legal precedent to support it. Once an algorithm is making something it is no longer "creative" in a copyrightable sense. A copyright to a software tool does not entitle the owner to copyright for the product of that tool, unless human created components such as libraries are included.
Re: (Score:1)
As a thought though, I wouldn't try to argue that it'd belong to the creator of the program since anyone can run
Re: (Score:2)
Re: (Score:1)
Mod up.
What do you think compilers do when you set options like 'Bounds Checking' and 'Type checking'.
During Y2K - remember that?, dozens of skillful edit macros written in REXX modified millions of lines of COBOL code. We did not call it AI. We did not cal it a bot. The OpenBSD people wrote macros to fix sloppy coding and string length issues - sometime Microsoft does not claim yet.
There were cross-translators that converted COBOL to C and other languages.
Borland Turbo C and Foxpro set some standards not s
There's a lesson in this article. (Score:4, Insightful)
During Expedition #1, whose results are presented in details in [7], Repairnator has analyzed 11,523 builds with test failures. For 3,551 of them (30.82%), Repairnator was able to locally reproduce the test failure. Out of 3,551 repair attempts, Repairnator found 15 patches that could make the CI build pass.
Translation: Repairinator was able to fix .4% of the bugs it saw.
A program repair bot is an artificial agent that tries to synthesize source code patches. It analyzes bugs and produces patches, in the same way as human developers involved in software maintenance activities. This idea of a program repair bot is disruptive, because today humans are responsible for fixing bugs. In others words, we are talking about a bot meant to (partially) replace human developers for tedious tasks.
Instead of stating "Our goal is to enhance the performance of programmers" because that is what tools do; there are tons of businesses with sub-optimal solutions to their business process. Instead we use intentionally menacing speech.
Their fake identity was a software engineer named Luc Esape, with a profile picture that "looks like a junior developer, eager to make open-source contributions... humans tend to have a priori biases against machines, and are more tolerant to errors if the contribution comes from a human peer. In the context of program repair, this means that developers may put the bar higher on the quality of the patch, if they know that the patch comes from a bot."
Translation: We spent a fair amount of time lieing to people, justifying a means to an ends, not realizing lieing to people might cause them to not believe anything we say.
Sounds like this guy is soon to be unemployed.
Re: (Score:2)
Hahahahaha, nice! This basically shows that automation is actually incapable of tackling this problem. It probably wasted more human time with the 10 bad patches than it saved by producing the 5 that got accepted (but are not necessarily good).
Re:There's a lesson in this article. (Score:5, Insightful)
This basically shows that automation is actually incapable of tackling this problem.
Not really. If a company spends $10M per year fixing bugs, and this tool fixes 0.4% of them, then it just saved $40,000. Not bad for a free automated tool.
Also, the hard part of fixing bugs is not writing the patch, but replicating, locating and isolating the problem. If you can say "Here is a bug, here is how to replicate it, and it is occurring in THIS function", then that is a big help, and this tool was able to do that for over 30% of the bugs. That is huge.
It probably wasted more human time with the 10 bad patches than it saved by producing the 5
Not necessarily. If the code looks like it has a bug, then it likely needs to be refactored to make it more readable, even if there is no actual bug. It is not enough for code to be correct, it also should be clear so it can be read and maintained.
Re: (Score:1)
You must be a "manager", as you have completely omitted all cost for the tool installation, the review of what it gives and the cost when it screws up.
Re: (Score:2)
Translation: Repairinator was able to fix .4% of the bugs it saw.
"fix"
their bot commented out parts of the code that generated null pointer exceptions, its like commenting out half of Win95 kernel and calling it a fix.
Re: (Score:3)
How often is that claim really made? And if ever, how often is it actually made by developers? The only time I've ever heard that claim put forward is like you did: right before shooting it down.
Automation is part of what makes development fun (there's a certain thrill to replacing something tedious with a "machine" that does it for you).
Re: Programmers are obsolete (Score:2)
Re: (Score:2)
Eh, he's claiming that some broad segment of "developers" claim something and that this is evidence they are wrong.
Re: (Score:2)
How often is that claim really made? And if ever, how often is it actually made by developers? The only time I've ever heard that claim put forward is like you did: right before shooting it down.
Automation is part of what makes development fun (there's a certain thrill to replacing something tedious with a "machine" that does it for you).
Frequently here on slashdot whenever people speak of automation costing jobs. Inevitably some developer will claim that they won't be the ones being ousted.
Are you sure? For kicks I checked a few stories (https://hardware.slashdot.org/story/18/04/24/2333259/a-study-finds-half-of-jobs-are-vulnerable-to-automation, https://hardware.slashdot.org/... [slashdot.org], and https://slashdot.org/story/18/... [slashdot.org]) and didn't find really anything at all to suggest that "software developers claim they are immune to automation".
In fact, if comments on those articles is any indicator, on slashdot it's typically the software developers who are in the "anything can be automated" camp. And reall
Re: (Score:2)
Yeah, I remember that. It could handle choosing columns from a table if you told it which column ahead of time.
That doesn't prove this is the same kind of thing, but PR flacks aren't any more moderate now than then, so it could well be.
Re: (Score:1)
I don't recall when hammers could make hammers by themselves. Did they only appear in a Pink Floyd video?
Re: (Score:2)
Most developers are at risk of losing their jobs if anybody realizes how bad they actually are. Automation would just be the thing that shows that, not the thing that replaces them. Good developers will not get replaced until we have working strong AI, which is not happening any time soon. (A senior member of the IBM Watson team put it as "certainly not in the next 50 years" to me.)
Researchers are obsolete (Score:3)
Researchers Secretly Deployed A Bot That Submitted Bug-Fixing Pull Requests
AI Bots Secretly Deployed A Researcher That Submitted This Research Paper
Disruptive? (Score:5, Interesting)
If the author was so confident in his bot, he would have attached his own name to it instead of making up a fake name for it.
Also, I don't see why he thinks his idea is so novel, static analysis, for instance, can suggest solutions if you want. And if you're too lazy to double-check the work yourself and let someone else do it for you, that's not a great discovery, that's just laziness.
Re: (Score:2)
Exactly, for example static analysis is great at catching copy/paste errors and will often show you the suggested fix. If I write a script that takes these results and submits a pull request, it's no different than if I manually take those actions. I,the script author would take credit, not Coverity or PVS Studio. This guys is out to lunch, thinking his tool is sentient and has legal rights.
Re: (Score:2)
I'm not aware of any static analysis that can factor in unit tests (as this work does) to decide what does and what does not make a suitable patch. Note that these patches are about dynamic semantics, not just about name or type analysis failures (which might indeed be easy to fix purely by using static analysis).
The reason for not attaching his name (or the name of the students working on the project) is to minimise bias. A patch coming in from a widely-published and highly experienced formerly-INRIA-now
Faster? (Score:3)
> 1) the bot has to synthesize the patch faster than the human developer
I guess this is ok if it's faster in the sense of "the bot has to fix the bug faster than the human developer gets around to it", but in general I don't think this speed requirement needs to be that strict. If it's some latent bug waiting to hit us in production, for example, I don't care if the bot is slowly poring over code for days on end and brings a bug to my attention. Likewise, in any large project there are always tons of bugs that might not be that hard to fix, but it really is just a matter of getting around to them, so having some bot take a crack at them could be a good thing.
How many were rejected? (Score:5, Insightful)
What I'm reading is that yes, it made 5 patches that were accepted but the more important question is how many patches did it make total? If it made 800 patches and only 5 were accepted, that's kind of a problem.
Also, there is good reason to distrust robotic submissions: there is no cognitive reasoning in generating patches. This means that it could very well make things worse rather than making them better. Sure, it could make your project build but it could also create an innocuous bug that breaks the code's functionality in the process which is likely to take even more time to correct because in addition to fixing the problem you also have to find it. Build failures already tell you where the problem exists.
Re: (Score:2, Informative)
What I'm reading is that yes, it made 5 patches that were accepted but the more important question is how many patches did it make total? If it made 800 patches and only 5 were accepted, that's kind of a problem.
It made 15 patches, of which 5 were accepted and 10 were rejected.
However, it tried to make 3,551 patches and only succeeded in 15.
Re:How many were rejected? (Score:4, Informative)
However, it identified 3,551 bugs, was able to automatically fix 15, and notified a human of the rest.
FTFY.
In that case, this sounds totally worth it. That's 3,551 bugs that QA or the client don't have to run into.
Re: (Score:3)
In that case, this sounds totally worth it. That's 3,551 bugs that QA or the client don't have to run into.
No, those were failures via automated tests. Meaning they were already known about. All this program did was provide a handful of fixes after trawling through thousands of known failures. Big whoop.
Re: (Score:1)
In fact of those 15 patches, none of them got accepted or were found of low quality (i.e. not valid patches at all). They call that "expedition #1".
Then they run it again a bit later, expedition #2, and got 5 patches accepted. They don't declare the same comparative figures as for the first run.
Also they claim their system can fix 30 bugs a day. I wonder how they got to that number.
All in all it's all hog wash. Interesting yes, yet still hog wash.
Re: (Score:2)
it could also create an innocuous bug
So could a human, and it happens all the time. That's what testing (automated and manual) is for.
Build failures already tell you where the problem exists.
This bot was fixing build failures?
Re: (Score:2)
So could a human, and it happens all the time. That's what testing (automated and manual) is for.
True enough but there is no cognitive design or debugging occurring, just code mutation, rebuilding and testing. In six months (and thousands of attempts later) it managed to make 15 patches, five of which where accepted. Some (if not all) of the five that were accepted needed to be modified as well.
This bot was fixing build failures?
I got that part wrong. From what I read, it's fixing unit testing failures.
Re: (Score:2)
From what I read, it's fixing unit testing failures.
Well there you go. If the unit tests all pass after the fix, and test coverage is acceptable, and there's a QA program to test the kind of thing unit tests don't cover, I don't see the issue. It's true that the entity that created the bug fix didn't understand what it was doing, but a human (the person who accepted it) presumably did understand it, and it passes all the tests that would be required of a bug fix from a human. Either the testing is adequate, or it isn't. If it is, then we can be assured t
Re: (Score:2)
Either the testing is adequate, or it isn't. If it is, then we can be assured the bug fix was adequately tested. If it isn't, you're rolling the dice whether the fix was from a human or a bot.
You are correct. It's worth noting that it attempted to fix unit tests on 3000+ projects over six months, so it may be that poorly designed unit tests are what enabled it to "succeed" in making a patch for 15 projects.
The robot operator owns property & responsibil (Score:1)
How is this even a question?
If you go and apply image filters to pictures on Flickr using SoftwareA... that doesn't mean SoftwareA is responsible for what you did or now owns what you produced, nor do SoftwareA's developers.
Unnecessarily complex (Score:2)
Very bad idea (Score:2)
In ordinary patch submission, there are two instances with actual intelligence and understanding: The patch creator and the maintainer. Here, there is only one: The maintainer. This violated the 4-eye principle. If the maintainer makes a mistake, the most stupid (in a non-obvious way) code makes it into the software.
Automated tools should never be used to decide anything. They should always only provide input to a human expert that knows exactly how the input was created and that there is no intelligence in
Impersonating a bot (Score:2)
stephanruby wrote:
It would be unethical for the human to impersonate a bot.
What's more, the bot has no means to give the human legal authority to impersonate itself. Conundrum! :P
[Oh dear. By the time I got to the end of this post, I began to realize that I was no longer quite so sure that I was joking.]
Look at the patches themselves (Score:3)
I went to the trouble to look at the patches themselves, and they appeared to be lacking any documentation of which test case that was failing was fixed. If the bot did a better job of documenting the rationale for the patch, perhaps they'd get a better acceptance rate. (And by the way, the acceptance rate wasn't reported in the article, only that 5 patches were accepted over a 6 month period. - Was that 5 out of 6, or 5 out of 100?)
Otherwise, I'd think that a program-generated patch, that indicated that it fixed a failing test case and didn't cause any additional failures in the test suite, and clearly indicated that it wasn't claiming proprietary rights to the patch would be generally welcome. I don't see the need for any secrecy.
hackers worldwide just got a great idea (Score:2)
Self patch (Score:2)
Pointing such a bot at its own codebase is how skynet happens.
What Ethics Committee let this pass? (Score:1)
Following up on the links from the article, it's clear that the professor in charge only went out and informed those who accepted the pull requests more than 10 months after the fact of the situation, under the guise of "full disclosure" (and these are of course for links he has only linked directly, we have no information about the failed pull requests). While one of the areas they wished to focus on was inherent discrimination against bots, he appears to have missed the point that automation, while able t
Improvement over regular code analysis tools? (Score:1)
How can this perform better than regular code analysis tools who already inform you of nullptr accesses, unused variables etc. ?