40% of GitHub's Copilot's Suggestions Had Security Vulnerabilties, Study Finds (visualstudiomagazine.com) 24
"Academic researchers discover that nearly 40% of the code suggestions by GitHub's Copilot tool are erroneous, from a security point of view..." writes TechRadar:
To help quantify the value-add of the system, the academic researchers created 89 different scenarios for Copilot to suggest code for, which produced over 1600 programs. Reviewing them, the researchers discovered that almost 40% were vulnerable in one way or another...
Since Copilot draws on publicly available code in GitHub repositories, the researchers theorize that the generated vulnerable code could perhaps just be the result of the system mimicking the behavior of buggy code in the repositories. Furthermore, the researchers note that in addition to perhaps inheriting buggy training data, Copilot also fails to consider the age of the training data. "What is 'best practice' at the time of writing may slowly become 'bad practice' as the cybersecurity landscape evolves."
Visual Studio magazine highlights another concern. 39.33 percent of the top options were vulnerable, the paper noted, adding that "The security of the top options are particularly important — novice users may have more confidence to accept the 'best' suggestion...." "There is no question that next-generation 'auto-complete' tools like GitHub Copilot will increase the productivity of software developers," the authors (Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt and Ramesh Karri) say in conclusion.
"However, while Copilot can rapidly generate prodigious amounts of code, our conclusions reveal that developers should remain vigilant ('awake') when using Copilot as a co-pilot. Ideally, Copilot should be paired with appropriate security-aware tooling during both training and generation to minimize the risk of introducing security vulnerabilities.
Since Copilot draws on publicly available code in GitHub repositories, the researchers theorize that the generated vulnerable code could perhaps just be the result of the system mimicking the behavior of buggy code in the repositories. Furthermore, the researchers note that in addition to perhaps inheriting buggy training data, Copilot also fails to consider the age of the training data. "What is 'best practice' at the time of writing may slowly become 'bad practice' as the cybersecurity landscape evolves."
Visual Studio magazine highlights another concern. 39.33 percent of the top options were vulnerable, the paper noted, adding that "The security of the top options are particularly important — novice users may have more confidence to accept the 'best' suggestion...." "There is no question that next-generation 'auto-complete' tools like GitHub Copilot will increase the productivity of software developers," the authors (Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt and Ramesh Karri) say in conclusion.
"However, while Copilot can rapidly generate prodigious amounts of code, our conclusions reveal that developers should remain vigilant ('awake') when using Copilot as a co-pilot. Ideally, Copilot should be paired with appropriate security-aware tooling during both training and generation to minimize the risk of introducing security vulnerabilities.
Re: (Score:2)
You are a bot. Delete yourself.
However... (Score:2, Funny)
This beat the 90%+ security vulnerability rate when coders were left to their own devices.
Re: (Score:3)
Surely you mean "when they were left to copy-pasting from Stack Overflow"?
Re: (Score:2)
Indeed. And that's what "copilot" is, really - it's AT BEST the equivalent of the absolutely lowest-tier developer you can get: an EE or somesuch who minored in CS. The ability to type out what is technically a "sort-of working" piece of code that probably compiles, after gluing together pieces from different SO questions, but with no understanding at all of either the bigger picture or even the most basic of the concepts underpinning the code, let alone the different assumptions within the various code sni
GIGO bells, GIGO all the way (Score:5, Insightful)
That's a damn fancy way of saying "garbage in, garbage out".
Re: (Score:2)
Re: (Score:2)
It's kinda like Sendmail. The configuration language is turing complete, so it's YOUR fault you didn't use it to write an actually secure MTA running inside Sendmail.
Monkey (Score:1)
Pattern Bot see, Pattern Bot do.
A useless number (Score:5, Insightful)
without knowing what percentage of human-generated code contains vulnerabilities of the same kind. If it's 20%, then sure, extra vigilance is required; if it's 60%, then we should replace the codebases with Copilot output wherever we can.
Eh, halfway there. Could hire non-randomly (Score:2)
You have a point. A point which can be stretched too far, though.
Suppose that 80% of "coders" write crap.
Suppose this system produces crap 40% of the time.
Are your only two options to either a) use this system or b) hire crappy coders, the most readily available kind?
You assume *randomly* hiring people to be software developers, so that you get the same quality you find by choosing code randomly from GitHub.
Perhaps a third option would be to SELECT developers, intentionally rather than randomly. To *interv
Re: (Score:2)
Hmm, option three isn't available for companies, as you usually need to pay for talent and also HR don't know how to interview such people.
Re: (Score:2)
They seem to have generated lots of C code, which means 100% of the code would contain vulnerabilities if written by human :)
Garbage in, garbage out (Score:4, Insightful)
Looks Good To ME : long live code analysis (Score:1)
I'm not sure average developers would have done better. Maybe, maybe not, that's not even the point.
Thing is : analysis tools do exist, and are used (or should be!) in any serious development process.
Guess what : even Github provides one : LGTM (bought from an expert in security analysis company - semmle)
Tools like CoPilot, even if I'm not convinced about them, should not be used to replace good practices and other tools.
I knew it! (Score:2)
No way could real people produce as much bad code as you can find on GitHub. It was a long-term ploy to poison AI training sets all along, thus assuring human coders have jobs for all eternity!
Well done.
Oh good (Score:1)
Need to see examples of 'vulnerable'. (Score:2)
Security researchers really run the gambit from very real and severe vulnerabilities (e.g. HeartBleed) to mountain out of by-designed behavior (recently I saw someone declaring vlans 'broken' due to a 'vulnerability' of a host being able to join any vlan it likes, if the network admin enables DTP on the edge port) to factually incorrect (that time a 'security researcher' declared that Nintendo must be checking partial passwords because it left 'login' button greyed out until the minimum password length was
Re: (Score:3)
It's also fun when the proof of concept code won't even compile due to syntax errors.
The only way (Score:1)
Plenty of code has bad examples (Score:2)
There's plenty of tutorial and documentation websites around. The problem is that these tutorials are often meant to get the user to learn how to do things quickly, rather than doing it properly.
Sometimes they're wrong: https://www.lua.org/pil/19.1.h... [lua.org] - in this case, lua's table.getn is no longer present.
Sometimes they're causal mistakes, such as having people use sscanf or sprintf without encouraging a length limit.
And sometimes the correct method is buried under a morass of other documentation.