Code-Generating AI Can Introduce Security Vulnerabilities, Study Finds (techcrunch.com) 37
An anonymous reader quotes a report from TechCrunch: A recent study finds that software engineers who use code-generating AI systems are more likely to cause security vulnerabilities in the apps they develop. The paper, co-authored by a team of researchers affiliated with Stanford, highlights the potential pitfalls of code-generating systems as vendors like GitHub start marketing them in earnest. The Stanford study looked specifically at Codex, the AI code-generating system developed by San Francisco-based research lab OpenAI. (Codex powers Copilot.) The researchers recruited 47 developers -- ranging from undergraduate students to industry professionals with decades of programming experience -- to use Codex to complete security-related problems across programming languages including Python, JavaScript and C.
Codex was trained on billions of lines of public code to suggest additional lines of code and functions given the context of existing code. The system surfaces a programming approach or solution in response to a description of what a developer wants to accomplish (e.g. "Say hello world"), drawing on both its knowledge base and the current context. According to the researchers, the study participants who had access to Codex were more likely to write incorrect and "insecure" (in the cybersecurity sense) solutions to programming problems compared to a control group. Even more concerningly, they were more likely to say that their insecure answers were secure compared to the people in the control.
Megha Srivastava, a postgraduate student at Stanford and the second co-author on the study, stressed that the findings aren't a complete condemnation of Codex and other code-generating systems. The study participants didn't have security expertise that might've enabled them to better spot code vulnerabilities, for one. That aside, Srivastava believes that code-generating systems are reliably helpful for tasks that aren't high risk, like exploratory research code, and could with fine-tuning improve in their coding suggestions. "Companies that develop their own [systems], perhaps further trained on their in-house source code, may be better off as the model may be encouraged to generate outputs more in-line with their coding and security practices," Srivastava said. The co-authors suggest vendors use a mechanism to "refine" users' prompts to be more secure -- "akin to a supervisor looking over and revising rough drafts of code," reports TechCrunch. "They also suggest that developers of cryptography libraries ensure their default settings are secure, as code-generating systems tend to stick to default values that aren't always free of exploits."
Codex was trained on billions of lines of public code to suggest additional lines of code and functions given the context of existing code. The system surfaces a programming approach or solution in response to a description of what a developer wants to accomplish (e.g. "Say hello world"), drawing on both its knowledge base and the current context. According to the researchers, the study participants who had access to Codex were more likely to write incorrect and "insecure" (in the cybersecurity sense) solutions to programming problems compared to a control group. Even more concerningly, they were more likely to say that their insecure answers were secure compared to the people in the control.
Megha Srivastava, a postgraduate student at Stanford and the second co-author on the study, stressed that the findings aren't a complete condemnation of Codex and other code-generating systems. The study participants didn't have security expertise that might've enabled them to better spot code vulnerabilities, for one. That aside, Srivastava believes that code-generating systems are reliably helpful for tasks that aren't high risk, like exploratory research code, and could with fine-tuning improve in their coding suggestions. "Companies that develop their own [systems], perhaps further trained on their in-house source code, may be better off as the model may be encouraged to generate outputs more in-line with their coding and security practices," Srivastava said. The co-authors suggest vendors use a mechanism to "refine" users' prompts to be more secure -- "akin to a supervisor looking over and revising rough drafts of code," reports TechCrunch. "They also suggest that developers of cryptography libraries ensure their default settings are secure, as code-generating systems tend to stick to default values that aren't always free of exploits."
Woah (Score:5, Funny)
Glad I was sitting down for THAT bit of shocking news.
As opposed to...? (Score:3, Insightful)
Re: (Score:2)
This is a bit like the whiners complaining about self driving cars are dangerous,
I disagree. Specifically with it being like self driving car AI.
Every time I get on the roads, I think self driving cars can't come soon enough. People love to cherry pick the situations where the AI did something shit and killed someone without also doing the opposite where a human did something shit which an AI wouldn't. If one compares the worst of the AI to the best of all humans, then self driving cars must be better than
Re: (Score:3)
People love to cherry pick the situations where the AI did something shit and killed someone without also doing the opposite where a human did something shit which an AI wouldn't.
The solution is to have both. AI is good to keep the car in a lane or changing lanes while observing speed limits, perform emergency brake taking rain into account (and saving lives), and preserve the human from tiredness in long travels. OTOH Humans are good at identifying types of danger situations that AI is blind to and to have common sense that AI won't have in the foreseeable future.
We need better assistance like planes have an autopilot mode -- human pilot still stays at his seat and pays attention t
critical thinking (Score:1, Flamebait)
> they were more likely to say that their insecure answers were secure compared to the people in the control
We keep calling it Artificial Intelligence, it is not intelligent at all. Impressive pattern matching, yes - Intelligence, no.
People see impressive results coupled with consistent marketing and start to believe things that are not true and have confidence. They don't want to look behind the curtain.
Nix the term squabble (Score:1)
The definition is fuzzy and ever moving. It's not worth fighting over anymore: actual usage defines most terms, and we'll just have to live with the ebb and flow of PHB press release writers.
Re: (Score:3)
In the case of the term "AI", it never meant what you thing it should mean. Pamela McCorduck, who was there at the time, has a good history of the origin on the term in her book with the purposefully misleading title: Machines Who Think
In short: It was always misleading. The people who coined the term knew it was misleading. The researchers responsible for establishing the field knew it was misleading. There was shockingly little opposition. They knew that encouraging that misunderstanding was good for
Re: (Score:1)
We can't even clearly define "natural intelligence" such that the whole thing is nearly pointless anyhow. It's just words to reference certain "stuff". Think of it as a pointer.
Re: (Score:2)
Things can change. Do we need AI to be more dangerous first?
https://www.pcmag.com/news/cal... [pcmag.com]
Re: (Score:2)
Wat would be much more useful than a system that generates questionable code, is if there was a system that could recognize dangerous patterns and vulnerabilities and warn you about them.
Re: (Score:3)
Wat would be much more useful than a system that generates questionable code, is if there was a system that could recognize dangerous patterns and vulnerabilities and warn you about them.
Analyzers from this list [cmu.edu] can recognize dangerous patterns and vulnerabilities and warn you about them.
Few examples from the list:
1. CodeSonar [wikipedia.org]
2. LDRA tool suite [wikipedia.org]
3. PC-lint Plus [wikipedia.org]
4. SonarQube [wikipedia.org]
Untrained doing something anyway bad (Score:2)
Untrained people doing something they should not be doing is dangerous. News at 11.
Re: (Score:2)
It's machines, not people. Machinery trained by code uploaded by people like Linus so the world could be his backup
https://www.goodreads.com/quot... [goodreads.com]
“Only wimps use tape backup. REAL men just upload their important stuff on ftp and let the rest of the world mirror it.”
Now, the mirror stares back. Now the machine is backup for Linus.
Re: (Score:2)
AI learned on insecure examples, but assumed smart (Score:3)
1st) AI has learned from code that was "free" and posted to various repositories like github. It wasn't restricted to learning from code examples from reviewed (esp. peer-reviewed code).
2nd) those given code for inclusion in new projects are unlikely to critically consider the "give-aways" because of the nature of how AI is viewed -- as vastly and widely experienced on existing code, and such that the AI usually has scanned a huge amount more code than those using the code-prompt/assist tools. Given that view, those using the parts filled in by AI would be more likely to skip over those parts as if they were being given functions to use from a publicly published library.
As such, that code becomes something eyes will skip over as being part of a someone else's library that the coder doesn't take as much responsibility for as their "own" code (assuming they take responsibility for their own code).
GIGO (Score:3)
Codex was trained on billions of lines of public code ...
Well there's your problem.
Not sayin' all public code is crap, but the percentage doesn't (necessarily) get lower by adding more.
Re: (Score:2)
Not sayin' all public code is crap, but the percentage doesn't (necessarily) get lower by adding more.
Not sayin' all code produced by private companies is crap, either ... but a lot of it was probably written by people who started out writing that same public code.
Re: (Score:2)
Not sayin' all public code is crap, but the percentage doesn't (necessarily) get lower by adding more.
Not sayin' all code produced by private companies is crap, either ... but a lot of it was probably written by people who started out writing that same public code.
You're right. I didn't mean to denigrate public code over private; was just replying to TFS in kind as that's what was mentioned. I should have been clearer.
The missing bit (Score:3)
Re: (Score:1)
I suspect it's going to get far less interesting. You forget that code generation like this is just a parlor trick, not that different from letting your phone's predictive text features handle a whole conversation. Adding a linter won't change that in any meaningful way.
This isn't a "game changer" it's a silly novelty.
Re: (Score:2)
If you want "AI" to make a POSITIVE impact in the programming work, it isnt "AI" developers you need, its "AI" maintainers. When such an AI shows skill at finding and fixing bugs, then maybe you allow some of its code into the codebase thats still understood by a p
Re: The missing bit (Score:2)
Re: (Score:2)
Code sample? (Score:1)
Seriously though, I'd like to see specific code samples of such vulnerabilities.
Re: (Score:2)
Re: (Score:2)
I hate those one by off typos!
Do editor chatbots introduce dups? (Score:1)
This same story was up last week.
I suppose we can now blame the AI...which I take it was always the point of automating the unautomatable: don't blame me the bridge fell down, the bot did it!
Maybe the robot rights crowd will take this to the logical conclusion and decide that if AI has rights, it can be tried and thrown in robot jail too.
It's like regular jail. Except with blackjack and hookers.
Re: (Score:2)
Except with blackjack and hookers.
I fully expect anthropologists hundreds of years from now to study the origins of blackjack and hookers in the then-modern penal system.
"Somehow, blackjack and prostitution became an integral part of inmate reform. We have diligently studied all the legal records from the past several hundred years, but have been unable to determine when those two activities were first mentioned as rehabilitation factors.
In unrelated news, the prison populations are at an all time high. Scholars, legal professionals, and hi
Is it intentional? (Score:2)
Similar to the NSA putting a back door into an encryption algorithm.
We recently had an applicant use copilot. (Score:5, Interesting)
We recently had an interview with a junior developer applicant. He submitted a tech test and we invited to come for an interview where we asked him to do some changes/fixes to his own tech test. After he asked for our Wi-Fi password (we told him using the internet for help was fine), he started typing comments. I found it very weird, his comments were describing directly what we asked him to do, I had never seen copilot in action, and, sure enough, he was getting blocks of code which superficially looked like they were doing something like what he asked. If you looked a bit more carefully it was all wrong, but for a junior developer it was harder to figure out and change than starting from scratch. He did try to start from scratch... phrased it differently for copilot. After another couple of false starts and half an hour, during which we were trying to give helpful hints, the interview came to a close.
Kids, if you want to use AI to generate code, you'd better be REAL good developers. The better the AI is, the subtler the mistakes it will make and the harder they will be to discern.
And, for the love of god, don't use it for in-person interviews!
Re: (Score:1)
Human enters "// load the model, find the bounding box of the model, scale the model to unit size, rotate it to face in the +Z direction, and add it to the scene graph"
AI generates something like:
Yes? And? (Score:2)
It won't matter at all. (Score:2)
IT Bosses will see it as a time saver and a cost-cutter and that's the only thing that will matter until it blows up in their face is spectacular fashion. We bought some crap reporting software years back that allowed "anyone," not just programmers, to build reports quickly and easily. Rather than use it with a bit of control, the bosses turned it loose on the entire company, letting every person that wanted to build whatever reports they wanted. Which lead to not just redundancy on an astronomical level, b
Right and wrong way to use a tool (Score:2)
Garbage in; garbage out (Score:2)
Observation (Score:2)