Code-Generating AI Can Introduce Security Vulnerabilities, Study Finds (techcrunch.com) 37

Posted by BeauHD on Wednesday December 28, 2022 @06:40PM from the potential-pitfalls dept.

An anonymous reader quotes a report from TechCrunch: A recent study finds that software engineers who use code-generating AI systems are more likely to cause security vulnerabilities in the apps they develop. The paper, co-authored by a team of researchers affiliated with Stanford, highlights the potential pitfalls of code-generating systems as vendors like GitHub start marketing them in earnest. The Stanford study looked specifically at Codex, the AI code-generating system developed by San Francisco-based research lab OpenAI. (Codex powers Copilot.) The researchers recruited 47 developers -- ranging from undergraduate students to industry professionals with decades of programming experience -- to use Codex to complete security-related problems across programming languages including Python, JavaScript and C.

Codex was trained on billions of lines of public code to suggest additional lines of code and functions given the context of existing code. The system surfaces a programming approach or solution in response to a description of what a developer wants to accomplish (e.g. "Say hello world"), drawing on both its knowledge base and the current context. According to the researchers, the study participants who had access to Codex were more likely to write incorrect and "insecure" (in the cybersecurity sense) solutions to programming problems compared to a control group. Even more concerningly, they were more likely to say that their insecure answers were secure compared to the people in the control.

Megha Srivastava, a postgraduate student at Stanford and the second co-author on the study, stressed that the findings aren't a complete condemnation of Codex and other code-generating systems. The study participants didn't have security expertise that might've enabled them to better spot code vulnerabilities, for one. That aside, Srivastava believes that code-generating systems are reliably helpful for tasks that aren't high risk, like exploratory research code, and could with fine-tuning improve in their coding suggestions. "Companies that develop their own [systems], perhaps further trained on their in-house source code, may be better off as the model may be encouraged to generate outputs more in-line with their coding and security practices," Srivastava said. The co-authors suggest vendors use a mechanism to "refine" users' prompts to be more secure -- "akin to a supervisor looking over and revising rough drafts of code," reports TechCrunch. "They also suggest that developers of cryptography libraries ensure their default settings are secure, as code-generating systems tend to stick to default values that aren't always free of exploits."

Code-Generating AI Can Introduce Security Vulnerabilities, Study Finds

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 37 Comments Log In/Create an Account

Comments Filter:

Woah (Score:5, Funny)

by 93 Escort Wagon ( 326346 ) writes: on Wednesday December 28, 2022 @06:50PM (#63164666)

Glad I was sitting down for THAT bit of shocking news.

As opposed to...? (Score:3, Insightful)

by RightSaidFred99 ( 874576 ) writes: on Wednesday December 28, 2022 @07:10PM (#63164706)

This is a bit like the whiners complaining about self driving cars are dangerous, when in fact they are probably posting said whinings to Reddit literally while barreling down the road at 90mph and drinking a latte. I run into at least 2 horrible drivers every single day and I don't drive that much, but "AI is dangerous, it hit a few people!". By all means fix it and make it as good as you can, but when complaining about something one usually has a baseline and the baselines here in both cases are "shit".

- Re: (Score:2)
  
  by serviscope_minor ( 664417 ) writes:
  
  This is a bit like the whiners complaining about self driving cars are dangerous,
  I disagree. Specifically with it being like self driving car AI.
  Every time I get on the roads, I think self driving cars can't come soon enough. People love to cherry pick the situations where the AI did something shit and killed someone without also doing the opposite where a human did something shit which an AI wouldn't. If one compares the worst of the AI to the best of all humans, then self driving cars must be better than
  - Re: (Score:3)
    
    by test321 ( 8891681 ) writes:
    
    People love to cherry pick the situations where the AI did something shit and killed someone without also doing the opposite where a human did something shit which an AI wouldn't.
    
    The solution is to have both. AI is good to keep the car in a lane or changing lanes while observing speed limits, perform emergency brake taking rain into account (and saving lives), and preserve the human from tiredness in long travels. OTOH Humans are good at identifying types of danger situations that AI is blind to and to have common sense that AI won't have in the foreseeable future.
    We need better assistance like planes have an autopilot mode -- human pilot still stays at his seat and pays attention t
critical thinking (Score:1, Flamebait)

by awwshit ( 6214476 ) writes:

> they were more likely to say that their insecure answers were secure compared to the people in the control
We keep calling it Artificial Intelligence, it is not intelligent at all. Impressive pattern matching, yes - Intelligence, no.
People see impressive results coupled with consistent marketing and start to believe things that are not true and have confidence. They don't want to look behind the curtain.
- Nix the term squabble (Score:1)
  
  by Tablizer ( 95088 ) writes:
  
  The definition is fuzzy and ever moving. It's not worth fighting over anymore: actual usage defines most terms, and we'll just have to live with the ebb and flow of PHB press release writers.
  - Re: (Score:3)
    
    by narcc ( 412956 ) writes:
    
    In the case of the term "AI", it never meant what you thing it should mean. Pamela McCorduck, who was there at the time, has a good history of the origin on the term in her book with the purposefully misleading title: Machines Who Think
    In short: It was always misleading. The people who coined the term knew it was misleading. The researchers responsible for establishing the field knew it was misleading. There was shockingly little opposition. They knew that encouraging that misunderstanding was good for
    - Re: (Score:1)
      
      by Tablizer ( 95088 ) writes:
      
      We can't even clearly define "natural intelligence" such that the whole thing is nearly pointless anyhow. It's just words to reference certain "stuff". Think of it as a pointer.
  - Re: (Score:2)
    
    by awwshit ( 6214476 ) writes:
    
    Things can change. Do we need AI to be more dangerous first?
    https://www.pcmag.com/news/cal... [pcmag.com]
- Re: (Score:2)
  
  by vivian ( 156520 ) writes:
  
  Wat would be much more useful than a system that generates questionable code, is if there was a system that could recognize dangerous patterns and vulnerabilities and warn you about them.
  - Re: (Score:3)
    
    by guest reader ( 2623447 ) writes:
    
    Wat would be much more useful than a system that generates questionable code, is if there was a system that could recognize dangerous patterns and vulnerabilities and warn you about them.
    Analyzers from this list [cmu.edu] can recognize dangerous patterns and vulnerabilities and warn you about them.
    Few examples from the list:
    1. CodeSonar [wikipedia.org]
    2. LDRA tool suite [wikipedia.org]
    3. PC-lint Plus [wikipedia.org]
    4. SonarQube [wikipedia.org]
Untrained doing something anyway bad (Score:2)

by AcidFnTonic ( 791034 ) writes:

Untrained people doing something they should not be doing is dangerous. News at 11.
- Re: (Score:2)
  
  by sonamchauhan ( 587356 ) writes:
  
  It's machines, not people. Machinery trained by code uploaded by people like Linus so the world could be his backup
  https://www.goodreads.com/quot... [goodreads.com]
  “Only wimps use tape backup. REAL men just upload their important stuff on ftp and let the rest of the world mirror it.”
  Now, the mirror stares back. Now the machine is backup for Linus.
- Re: (Score:2)
  
  by kmoser ( 1469707 ) writes:
  
  It's like asking amateur auto mechanics to repair Pratt & Whitney aircraft engines after looking at a few YouTube videos. Not gonna turn out well.
AI learned on insecure examples, but assumed smart (Score:3)

by lpq ( 583377 ) writes: on Wednesday December 28, 2022 @07:22PM (#63164750) Homepage Journal

1st) AI has learned from code that was "free" and posted to various repositories like github. It wasn't restricted to learning from code examples from reviewed (esp. peer-reviewed code).
2nd) those given code for inclusion in new projects are unlikely to critically consider the "give-aways" because of the nature of how AI is viewed -- as vastly and widely experienced on existing code, and such that the AI usually has scanned a huge amount more code than those using the code-prompt/assist tools. Given that view, those using the parts filled in by AI would be more likely to skip over those parts as if they were being given functions to use from a publicly published library.
As such, that code becomes something eyes will skip over as being part of a someone else's library that the coder doesn't take as much responsibility for as their "own" code (assuming they take responsibility for their own code).

GIGO (Score:3)

by fahrbot-bot ( 874524 ) writes: on Wednesday December 28, 2022 @07:23PM (#63164756)

Codex was trained on billions of lines of public code ...
Well there's your problem.
Not sayin' all public code is crap, but the percentage doesn't (necessarily) get lower by adding more.

- Re: (Score:2)
  
  by PCM2 ( 4486 ) writes:
  
  Not sayin' all public code is crap, but the percentage doesn't (necessarily) get lower by adding more.
  Not sayin' all code produced by private companies is crap, either ... but a lot of it was probably written by people who started out writing that same public code.
  - Re: (Score:2)
    
    by fahrbot-bot ( 874524 ) writes:
    
    Not sayin' all public code is crap, but the percentage doesn't (necessarily) get lower by adding more.
    Not sayin' all code produced by private companies is crap, either ... but a lot of it was probably written by people who started out writing that same public code.
    You're right. I didn't mean to denigrate public code over private; was just replying to TFS in kind as that's what was mentioned. I should have been clearer.
The missing bit (Score:3)

by LindleyF ( 9395567 ) writes: on Wednesday December 28, 2022 @07:36PM (#63164780)

Is merging these generators with rules-based restrictions. An expert system could help avoid factually incorrect claims. A security audit package could reduce bad code. Once someone figures out how to merge these things it's going to get interesting.

- Re: (Score:1)
  
  by narcc ( 412956 ) writes:
  
  I suspect it's going to get far less interesting. You forget that code generation like this is just a parlor trick, not that different from letting your phone's predictive text features handle a whole conversation. Adding a linter won't change that in any meaningful way.
  This isn't a "game changer" it's a silly novelty.
- Re: (Score:2)
  
  by Rockoon ( 1252108 ) writes:
  
  The thing about the generated code, is while nobody even considered it existing, there it is in your codebase now...
  
  ...ready to be maintained by someone that doesnt know anything about that code, or and potentially even the entire codebase ..
  
  If you want "AI" to make a POSITIVE impact in the programming work, it isnt "AI" developers you need, its "AI" maintainers. When such an AI shows skill at finding and fixing bugs, then maybe you allow some of its code into the codebase thats still understood by a p
  - Re: The missing bit (Score:2)
    
    by LindleyF ( 9395567 ) writes:
    
    I like that perspective.
- Re: (Score:2)
  
  by Canberra1 ( 3475749 ) writes:
  
  This means making a procedure call, and not checking the return code is rife. Not doing input validation. In 1980 when there were code reviews, you would loose your job if you tried that.
Code sample? (Score:1)

by Tablizer ( 95088 ) writes:

10 PRINT "We bots will eat all humans!" 20 GOTO 10

Seriously though, I'd like to see specific code samples of such vulnerabilities.
- Re: (Score:2)
  
  by Rockoon ( 1252108 ) writes:
  
  if the predictive ability of visual studio is an example then I can tell you that off by 1 errors will be very common
  - Re: (Score:2)
    
    by Tablizer ( 95088 ) writes:
    
    I hate those one by off typos!
Do editor chatbots introduce dups? (Score:1)

by RightwingNutjob ( 1302813 ) writes:

This same story was up last week.
I suppose we can now blame the AI...which I take it was always the point of automating the unautomatable: don't blame me the bridge fell down, the bot did it!
Maybe the robot rights crowd will take this to the logical conclusion and decide that if AI has rights, it can be tried and thrown in robot jail too.
It's like regular jail. Except with blackjack and hookers.
- Re: (Score:2)
  
  by StormReaver ( 59959 ) writes:
  
  Except with blackjack and hookers.
  I fully expect anthropologists hundreds of years from now to study the origins of blackjack and hookers in the then-modern penal system.
  "Somehow, blackjack and prostitution became an integral part of inmate reform. We have diligently studied all the legal records from the past several hundred years, but have been unable to determine when those two activities were first mentioned as rehabilitation factors.
  In unrelated news, the prison populations are at an all time high. Scholars, legal professionals, and hi
Is it intentional? (Score:2)

by schwit1 ( 797399 ) writes:

Similar to the NSA putting a back door into an encryption algorithm.
We recently had an applicant use copilot. (Score:5, Interesting)

by Ecuador ( 740021 ) writes: on Wednesday December 28, 2022 @08:47PM (#63164916) Homepage

We recently had an interview with a junior developer applicant. He submitted a tech test and we invited to come for an interview where we asked him to do some changes/fixes to his own tech test. After he asked for our Wi-Fi password (we told him using the internet for help was fine), he started typing comments. I found it very weird, his comments were describing directly what we asked him to do, I had never seen copilot in action, and, sure enough, he was getting blocks of code which superficially looked like they were doing something like what he asked. If you looked a bit more carefully it was all wrong, but for a junior developer it was harder to figure out and change than starting from scratch. He did try to start from scratch... phrased it differently for copilot. After another couple of false starts and half an hour, during which we were trying to give helpful hints, the interview came to a close.
Kids, if you want to use AI to generate code, you'd better be REAL good developers. The better the AI is, the subtler the mistakes it will make and the harder they will be to discern.
And, for the love of god, don't use it for in-person interviews!

- Re: (Score:1)
  
  by Rockoon ( 1252108 ) writes:
  
  suppose one of these "copilots" .. instead of generating code from your overall comment of the task, generated MORE comments instead of code ..
  
  Human enters "// load the model, find the bounding box of the model, scale the model to unit size, rotate it to face in the +Z direction, and add it to the scene graph"
  
  AI generates something like:
  
  // load model
  // begin with a zero-sized bounding box
  // for each vertex
  // ...update bounding box
  // for each vertex
  // ...scale the vertex to unit box
  // ...rotate t
Yes? And? (Score:2)

by shadowwynd ( 6310460 ) writes:

I can easily go on StackOverFlow and find PHP code (upvoted, no less) that uses straight mySQL built from assembled strings - no prepared statements or PDO or input sanitation - the little things that make little Bobby Tables' heart go pitty-pat. Security vulnerabilities that have been known and solved for ~2 decades only to have moe-rons perpetuating the cycle of abuse through crap code. I have seen way to many coding projects built from roughly assembled stackoverflow snippits. That the things run at a
It won't matter at all. (Score:2)

by nightflameauto ( 6607976 ) writes:

IT Bosses will see it as a time saver and a cost-cutter and that's the only thing that will matter until it blows up in their face is spectacular fashion. We bought some crap reporting software years back that allowed "anyone," not just programmers, to build reports quickly and easily. Rather than use it with a bit of control, the bosses turned it loose on the entire company, letting every person that wanted to build whatever reports they wanted. Which lead to not just redundancy on an astronomical level, b
Right and wrong way to use a tool (Score:2)

by joe_frisch ( 1366229 ) writes:

I can see AI generated code as a great way to get ideas / suggestions. I use existing code completion / generation tools that way, but it seems crazy to imagine using it without carefully reviewing every line. The AI can *really* know what you are trying to do, it can just generate code form similar problems - based on its idea of what "similar" means.
Garbage in; garbage out (Score:2)

by chas.williams ( 6256556 ) writes:

If code-generating is trained on code with existing security vulnerabilities, why would you expect to generate anything else? If anything, "bad" code dominates "good" code. The AI is going to think we _want_ insecure code for some reason.
Observation (Score:2)

by paul_engr ( 6280294 ) writes:

"Codex was trained on billions of lines of public code to suggest additional lines of code and functions given the context of existing code." So they wrote a script that searches google and copy pastes the first stackoverflow response?

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Woah (Score:5, Funny)

As opposed to...? (Score:3, Insightful)

Re: (Score:2)

Re: (Score:3)

critical thinking (Score:1, Flamebait)

Nix the term squabble (Score:1)

Re: (Score:3)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Untrained doing something anyway bad (Score:2)

Re: (Score:2)

Re: (Score:2)

AI learned on insecure examples, but assumed smart (Score:3)

GIGO (Score:3)

Re: (Score:2)

Re: (Score:2)

The missing bit (Score:3)

Re: (Score:1)

Re: (Score:2)

Re: The missing bit (Score:2)

Re: (Score:2)

Code sample? (Score:1)

Re: (Score:2)

Re: (Score:2)

Do editor chatbots introduce dups? (Score:1)

Re: (Score:2)

Is it intentional? (Score:2)

We recently had an applicant use copilot. (Score:5, Interesting)

Re: (Score:1)

Yes? And? (Score:2)

It won't matter at all. (Score:2)

Right and wrong way to use a tool (Score:2)

Garbage in; garbage out (Score:2)

Observation (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals