Does GitHub Copilot Improve Code Quality? (github.blog) 31
Microsoft-owned GitHub published a blog post asking "Does GitHub Copilot improve code quality? Here's what the data says."
Its first paragraph includes statistics from past studies — that GitHub Copilot has helped developers code up to 55% faster, leaving 88% of developers feeling more "in the flow" and 85% feeling more confident in their code.
But does it improve code quality? [W]e recruited 202 [Python] developers with at least five years of experience. Half were randomly assigned GitHub Copilot access and the other half were instructed not to use any AI tools... We then evaluated the code with unit tests and with an expert review conducted by developers.
Our findings overall show that code authored with GitHub Copilot has increased functionality and improved readability, is of better quality, and receives higher approval rates... Developers with GitHub Copilot access had a 56% greater likelihood of passing all 10 unit tests in the study, indicating that GitHub Copilot helps developers write more functional code by a wide margin. In blind reviews, code written with GitHub Copilot had significantly fewer code readability errors, allowing developers to write 13.6% more lines of code, on average, without encountering readability problems. Readability improved by 3.62%, reliability by 2.94%, maintainability by 2.47%, and conciseness by 4.16%. All numbers were statistically significant... Developers were 5% more likely to approve code written with GitHub Copilot, meaning that such code is ready to be merged sooner, speeding up the time to fix bugs or deploy new features.
"While GitHub's reports have been positive, a few others haven't," reports Visual Studio magazine: For example, a recent study from Uplevel Data Labs said, "Developers with Copilot access saw a significantly higher bug rate while their issue throughput remained consistent."
And earlier this year a "Coding on Copilot" whitepaper from GitClear said, "We find disconcerting trends for maintainability. Code churn — the percentage of lines that are reverted or updated less than two weeks after being authored — is projected to double in 2024 compared to its 2021, pre-AI baseline. We further find that the percentage of 'added code' and 'copy/pasted code' is increasing in proportion to 'updated,' 'deleted,' and 'moved 'code. In this regard, AI-generated code resembles an itinerant contributor, prone to violate the DRY-ness [don't repeat yourself] of the repos visited."
Its first paragraph includes statistics from past studies — that GitHub Copilot has helped developers code up to 55% faster, leaving 88% of developers feeling more "in the flow" and 85% feeling more confident in their code.
But does it improve code quality? [W]e recruited 202 [Python] developers with at least five years of experience. Half were randomly assigned GitHub Copilot access and the other half were instructed not to use any AI tools... We then evaluated the code with unit tests and with an expert review conducted by developers.
Our findings overall show that code authored with GitHub Copilot has increased functionality and improved readability, is of better quality, and receives higher approval rates... Developers with GitHub Copilot access had a 56% greater likelihood of passing all 10 unit tests in the study, indicating that GitHub Copilot helps developers write more functional code by a wide margin. In blind reviews, code written with GitHub Copilot had significantly fewer code readability errors, allowing developers to write 13.6% more lines of code, on average, without encountering readability problems. Readability improved by 3.62%, reliability by 2.94%, maintainability by 2.47%, and conciseness by 4.16%. All numbers were statistically significant... Developers were 5% more likely to approve code written with GitHub Copilot, meaning that such code is ready to be merged sooner, speeding up the time to fix bugs or deploy new features.
"While GitHub's reports have been positive, a few others haven't," reports Visual Studio magazine: For example, a recent study from Uplevel Data Labs said, "Developers with Copilot access saw a significantly higher bug rate while their issue throughput remained consistent."
And earlier this year a "Coding on Copilot" whitepaper from GitClear said, "We find disconcerting trends for maintainability. Code churn — the percentage of lines that are reverted or updated less than two weeks after being authored — is projected to double in 2024 compared to its 2021, pre-AI baseline. We further find that the percentage of 'added code' and 'copy/pasted code' is increasing in proportion to 'updated,' 'deleted,' and 'moved 'code. In this regard, AI-generated code resembles an itinerant contributor, prone to violate the DRY-ness [don't repeat yourself] of the repos visited."
No. (Score:3)
/betteridge
It's also... (Score:2)
Anecdotal evidence (Score:2)
Every account of using it I've read online has been negative about code quality.
Re: Anecdotal evidence (Score:2)
Is that different to "security through obscurity"?
Re: (Score:2)
Of course, the Java solution is to just make every error condition an exception and shut down the whole thing.
Re: (Score:2)
My rule of thumb is: if it's readable and clear to a random stranger, then it's not security hardened.
Wouldn't this defeat the entire premise of security when it comes to open-source code? If the code is available for anyone to read and no one can understand it it's not possible for others to scrutinize it, and if they can read it then by your rule it must not be very secure.
Re: (Score:2)
Phillip Morris says cigarettes don't cause cancer (Score:3)
No conflict of interest at Github/Micro$oft either. ;-)
Re: (Score:2)
I'm sure Microsoft is being 1000% ethical and if there was any evidence that AI actually makes code worse they would definitely let Github publish stuff about that despite Microsoft investing $100 billion or more in AI and AI related stuff.
AI is trained on peoples mistakes (Score:2)
Re: AI is trained on peoples mistakes (Score:2)
& some languages have more gotchas..& some languages are more common for beginners, who make more mistakes.
Re: (Score:1)
One thing which gives me the greatest cause for concern is that internet tech is changing continuously yet the AIs seem to consider the version of each piece of software to be immaterial or it is version aware yet its knowledge cut-off isn't aware that for the version being used, the recommendations are no-longer appropriate.
Yes and no (Score:4, Insightful)
Based on absolutely no studies or anything but my own opinion... I suspect AI will make good coders better and bad coders worse. Good coders will consider the suggestions, take the good ones and reject the bad ones. Bad coders will take everything.
Re: (Score:2)
I would agree with that. It's anecdotal but I've noticed when using Copilot at my job that it usually gets me a "mostly" proper solution. But even getting you mostly to a solution can save you an hour or more of digging through documentation. "Hey Copilot, I have an Excel workbook in a memory stream. Load it up with the Open XML library, open up the Summary spreadsheet, and copy out the contents of cell D:3." AI bots are pretty good at crawling through lots of information and summarizing it; I've been
I investigated: three answers so far (Score:2)
The best idea is Advait Sarkar's. He noticed how bad LLMs are at anything creative, and instead suggested we use them for things they're good at, predicting what humans would say. Especially if they were asked what a critic would say. See https://leaflessca.wordpress.c... [wordpress.com]
Trying Pull Requests with CodeRabbit. One of the things I think LLMs can do well is compare my text with a whole body of other people's work. In that vein, CodeRabbit now offers to review git pull requests. https://leaflessca.wordpress.c [wordpress.com]
Re: (Score:2)
Confidence (Score:3)
"85% feeling more confident in their code."
Imagine having such low confidence in the quality of your own code that you feel an LLM is doing you better.
Re: (Score:2)
Oooo. Burn.
CoPilot in Python is excellent (Score:1)
The success you'll have with copilot will depend on the language used. Python is the language that CoPilot generates more useful code between java, Angular et C#.
Re: CoPilot in Python is excellent (Score:2)
Interesting. Some languages have more "gotcha"s, so there are surely more examples our there of code that falls into those traps, and so the AI surely uses those in its answers too. Languages with fewer "gotchas" result in better AI code...
No?
CoPilot can't even declare a Java String correctly (Score:2)
The success you'll have with copilot will depend on the language used. Python is the language that CoPilot generates more useful code between java, Angular et C#.
Hmm, I tried it 2 weeks ago with a question of "for /aaa/bbb/.../xxx[12334]yyy write me a RegEx in Java that replaces 1234 with abcd" (roughly...can't share the details).
1. The RegEx was declared on a Java String with newlines, so it didn't even compile.
2. The RegEx was wrong. It didn't work if I had fixed it for them...it just completely fucked up the RegEx.
3. The RegEx they tried to do was about 10x more complicated than it needed to be
4. The Java API they used was really outdated
5. Their general
\o/ (Score:1)
This whole thing seems super-self-serving and suspect so in that setting:
I'm no expert but isn't the idea that you tweak your code until 100% pass is reached and don't stop until then - regardless whether microsoft are watching everything you type? ProTip: Yes.
Re: (Score:2)
Re: (Score:1)
Or a time-limit; both of which invalidate the whole exercise.
Re: (Score:2)
Every junior is leaning on AI... hard (Score:3)
It's literally impossible to tell the level developers are at with AI. Juniors are abusing it so hard (and hiding it) that they do the most inscrutable, wrong, zero-context solutions, and when I called them on it-- my manager received a report I was being mean. I feel like it's time to get my hose and spray the kids to keep them off my lawn with how this is coming off, but what in the hell is going on. Total circus.
Re: (Score:2)
and when I called them on it-- my manager received a report I was being mean.
Use more emojis and memes. You can't say someone is mean when they send you a positive cat GIF.
Like, Stop using AI, you fucker! [giphy.com]
Re: (Score:1)