Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Programming

Study Finds AI Assistants Help Developers Produce Code That's More Likely To Be Buggy (theregister.com) 50

Computer scientists from Stanford University have found that programmers who accept help from AI tools like Github Copilot produce less secure code than those who fly solo. From a report: In a paper titled, "Do Users Write More Insecure Code with AI Assistants?", Stanford boffins Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh answer that question in the affirmative. Worse still, they found that AI help tends to delude developers about the quality of their output. "We found that participants with access to an AI assistant often produced more security vulnerabilities than those without access, with particularly significant results for string encryption and SQL injection," the authors state in their paper.

"Surprisingly, we also found that participants provided access to an AI assistant were more likely to believe that they wrote secure code than those without access to the AI assistant." Previously, NYU researchers have shown that AI-based programming suggestions are often insecure in experiments under different conditions. The Stanford authors point to an August 2021 research paper titled "Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions," which found that given 89 scenarios, about 40 per cent of the computer programs made with the help of Copilot had potentially exploitable vulnerabilities.

That study, the Stanford authors say, is limited in scope because it only considers a constrained set of prompts corresponding to 25 vulnerabilities and just three programming languages: Python, C, and Verilog. The Stanford scholars also cite a followup study from some of the same NYU eggheads, "Security Implications of Large Language Model Code Assistants: A User Study," as the only comparable user study they're aware of. They observe, however, that their work differs because it focuses on OpenAI's codex-davinci-002 model rather than OpenAI's less powerful codex-cushman-001 model, both of which play a role in GitHub Copilot, itself a fine-tuned descendant of a GPT-3 language model.

This discussion has been archived. No new comments can be posted.

Study Finds AI Assistants Help Developers Produce Code That's More Likely To Be Buggy

Comments Filter:
  • by arglebargle_xiv ( 2212710 ) on Monday December 26, 2022 @09:09AM (#63158308)
    ... "Surprising Exactly Nobody". I mean, it's great to have a definite reference to point to, but it's also a complete no-brainer: Point a pattern-matcher (it's not AI no matter how many times someone selling it claims it is) at code and you get an approximate pattern-match for some of what you sort-of want to do. The graphical equivalent would be asking a so-called AI "create an image of Emma Watson naked" and you get a picture of something with three eyes and a hand coming out of its forehead.
    • by rabbirta ( 10188987 ) on Monday December 26, 2022 @10:49AM (#63158494) Homepage
      I think a simpler explaination is that people only post code when they need help with a bug, so training is based on buggy code. Simple as.
    • by gweihir ( 88907 )

      Indeed. It is like common sense has gone out of fashion in certain circles. Apparently more and more people in the "AI space" are bereft of common sense and actual intelligence, thereby more and more resembling the machines they pray to.

    • by narcc ( 412956 )

      Point a pattern-matcher (it's not AI no matter how many times someone selling it claims it is)

      The trouble here is that you want the term 'AI' to refer to something else entirely. Some science fiction thing. Linear regression, for example, falls under the category of AI. So do decision trees, clustering algorithms, and logistic regression. None of which you'd probably consider 'AI'.

      I agree that the term is extremely misleading, but that was the entire point! It was all about marketing from the instant the term was coined. Pamela McCorduck, who was there at the time, gives an interesting history

      • There is no understanding here at all, just probability. It will happily generate insecure code, inefficient code, code with obvious or subtle bugs, and even code that is just flat-out wrong.

        So they've figured out a way to replace undergraduate computer science students? Or, in some cases, their professors?

    • So a programming aid that does not actually understand requirements at all does not produce good code? Who knew?

    • by Pieroxy ( 222434 )

      in any case, if your AI is generating the code, no matter the quality, nobody has been through the process of designing it, so nothing remains in the collective mind of your dev teams. As such, people are more likely to overlook something at some point, and the code won't fit into your way of doing things for that very reason.

    • by AmiMoJo ( 196126 )

      These things are not pattern matchers. That's not how they work at all.

      You can still argue that they aren't AI, but they are something new and useful that previously most people did not have access to.

  • This is the modern-day version of trusting your libraries not to shoot you in the foot.

    • by EMN13 ( 11493 )

      And sometimes that works.

      It's conceivable the issue here is one of habits and training - i.e. that one day we'll use such tools all the time, and have developed the right kind of instincts to code-review the suggestions that come up. Having a good feel for where tools are reliable, and where they need special attention is nothing new - and perhaps that just a question of practice here too.

      Mind you, it might not be. It's sometimes harder to read code than to write it, so even if AI could write code just as

      • by narcc ( 412956 )

        I doubt it. This is a parlor trick, not a tool. The nature of the technology guarantees that it will never be anything more.

  • So people who don't bother writing their own code produce more bugs? Call the nobel committee!

    I'm sure if you had a way of tracing direct copy pastes from stack overflow into production code, you'd find a similar correlation.

    Also...a "boffin" sounds like it might be a garnish or a side on that plate of pasta. Talk American, comrade.

  • The people that know what the fuck they are doing don't need the assistance so they aren't using the system so they aren't being sampled. Guess I should RTFA but did they control for years of experience? I bet inexperienced developers that don't use the AI assistant write even worse code.

    • I've seen a plugin that does something similar to this in an IDE I've been using, it keeps suggesting ways to do certain code all the time. And most of the time they're dumb suggestions or attempts to format your code in a certain way that you might not like or be less readable than what you're used to.
    • by HiThere ( 15173 )

      But this is the kind of bug that I would expect a good AI to avoid. SQL injection attacks? And AI should just avoid the possibility. (Well, of course the "intelligence" is limited to what phrases follow what other phrases, but THAT should suffice.)

      The other interesting bit is that the coders who used an AI tool were more confident of their code's accuracy than the others. There are several possible explanations for that, but the top one off my head is the guess that the sampled programmers were first

  • more so than using a tool that you don't undrestand. Developers who know what they are doing and the tools they are using don't need to rely on some silly chatbot fed data from stack overflow. :Using those tools would make them slower because it would require them to read and understand the code produced by the chatbot anyway to ensure that it does what they expected.

    These chatbots are ptetty much no different than existing incompetent developers already. They have the exact same flaw. Writing code to ma

  • And these were coders who were testing it, ie otherwise-competent writers of code.

    Can you imagine how bad it will be when you have "coders" who *only* learn with these crutches, thus never learning the sort of practices and self-rigor that one applies while programming?

    As evidence I'd submit the abysmal casual math skills and terrible spelling wherever we see people today deprived of these technological assists.

  • by Somervillain ( 4719341 ) on Monday December 26, 2022 @09:38AM (#63158372)
    ...because no one has ever met anyone from the future. If they were real, future men would walk among us. Similarly, if AI/pattern-matcher code programs provided any value, these DEEP POCKETED advertising (Google/Meta) and software companies (MS/Apple) would be actually using them and demo-ing how it is ACTUALLY helping. Engineers are EXPENSIVE, unreliable, entitled, fickle and let's be honest...are annoying and often don't smell good. No one hires a software engineer to boost the ambiance. (I'm a software engineer and yeah, I don't want to hang out with my peers and am surprised anyone ever wants to be around me). Google would be pumping as much money as they possibly can into this goldmine if there was anything to it.

    Machines can't generate decent code. When they can, YOU WILL KNOW. All your device drivers will suddenly get lean and reliable. All your programs will become leaner. Crashes would be nearly unheard of...your device battery would last a lot longer. Video games will become amazing in terms of background details and efficiency. There are many unprofitable opportunities for improvement, especially in non-mission-critical systems, like video games, that could run MUCH better with a little TLC. If AI could write code, that would make the process of turning an AI into perfect code nearly free.

    Also, this code will basically become the next-gen JVM or CLI or V8 or some other runtime-interpreter and be used in modern developer tools to halt bugs and vulnerabilities or make existing code a lot more efficient.

    Long story short...if it COULD be done, it WOULD be done and it wouldn't be a secret. It would be all anyone talks about and a major revolution in our daily lives.
    • by gweihir ( 88907 )

      Indeed. In addition to the understanding-based evaluation that Artificial Ignorance does have no clue about anything, this provides a nice observation based second indicator that this stuff does not work and cannot solve real problems reliably except for very simplistic (relatively speaking) and strongly normalized situations (like Chess or Go).

  • by bill_mcgonigle ( 4333 ) * on Monday December 26, 2022 @09:49AM (#63158394) Homepage Journal

    The middle managers don't care as long as the code produces a plausible answer 90% of the time. They need to fool their boss just long enough to get a promotion or change companies.

    At least in the majority of programming jobs (that you should quit).

    I know a guy who's a wealthy Senior VP at a major health firm who stays in a job for a while and is expert at switching companies right before they figure out he doesn't know anything and has to go.

    His 401(k) confirms that his strategy is very successful for the corporate system and he plays the game presented as an expert.

    Would he use an AI assistant?

  • GIGO means "garbage in, garbage out." If you feed an AI the wrong information, it can end up like Norman [bbc.com], a machine learning Redditor. If you dump a bunch of Stack Overflow code into your "AI," you will get a bunch of buggy, insecure stuff.

  • If people can trust "AI" to the point of following directions off a cliff, then people can easily trust "AI" to the point of believing that the code is correct.

    People seem to shut off their brain if the answer comes from a computer program. Frighteningly, judges don't question sentences that are "suggested" by "AI", even if the "AI" is just a badly implemented Excel spreadsheet.
  • In recent versions, Microsoft has introduced some "AI" code completion in Visual Studio. On one hand, I like some of what it can do. For example, if I'm building a series of "if" statements that handle different possible cases, the code completion tool will try to figure out what I'm going to type in the next iteration, and often it gets close. It certainly reduces keystrokes.

    On the other hand, it's kind of like assistive steering in new cars. It can lull you into trusting it too much, leading to errors.

    • by Entrope ( 68843 )

      Microsoft Excel is really bad about this, too. It will offer auto-completion. For a "choose 1 of N" type column, it's usually not too bad, but if a column has prose text then after a certain number of rows, its suggestions will be pure gibberish.

      With the software coding, the AI seems to be suggesting plausible but wrong completions, but Excel doesn't have enough smarts to make its output plausible.

  • by DrXym ( 126579 ) on Monday December 26, 2022 @10:21AM (#63158436)
    I was playing with ChatGPT today and I was amazed that it would generate code from a simple description of what it should do. Certainly not complex code, but when I told it to "generate a Rust app that listens to web requests on port 8080 that handles /foo" and I saw it happen, that's pretty cool. I wouldn't rely on it for anything complex, simply because I wouldn't trust it to write good or correct code, but it's neat for for simple test harnesses and the like.
    • I experimented a bit more than you and I was even more impressed. I asked it to produce c++ code to pass data between multiple sockets and message queues. It gave me code that used multiple threads. Then I asked it to use async I/O instead of threads and it did that. I asked it to produce dart code with various layouts and using various patterns and libraries. All pretty impressive.
  • From Wikipedia [wikipedia.org]:

    Copy-and-paste programming is often done by inexperienced or student programmers, who find the act of writing code from scratch difficult or irritating and prefer to search for a pre-written solution or partial solution they can use as a basis for their own problem solving.[1] (See also Cargo cult programming)

    Inexperienced programmers who copy code often do not fully understand the pre-written code they are taking. As such, the problem arises more from their inexperience and lack of courage i

    • by gweihir ( 88907 )

      Yep, fits. As Artificial Ignorance has absolutely no understanding of anything, this is what it does. With even less insight, but a larger search database.

  • I was initially blown away until I actually tried to run the code. I couldn't get the Javascript code to run at all. I asked it to generate animated snowflakes over an image. It called a non existent animate() function. It looks interesting but I think in its current state it's easier to just code from scratch.
    • by leptons ( 891340 )
      That outcome doesn't surprise me at all. I'm not that impressed by any of these new AI tools. Text written by an AI is really easy to tell, it always seems like it was written by an AI, it's just bland and regurgitated text that's just barely satisfies the input requirements. The AI art I see is often deranged, and obviously mashed-up from various sources and stitched together rather badly. I mean if an AI can't understand that people don't have 3 legs or 5 eyes, then what good is it? It's just bizarre, and
  • Sorry, it's more like developers who are more likely to produce buggy code use AI "assistants."

  • The first bunch of comments on this are wrong-wing assholes, who have nothing to say other than propaganda.

    As opposed, say, to what I have to type - oh, yes, use AI, use this, that or the other, ANYTHING but experienced professionals who know what they're doing, and you give them the time to do it, not tell them "whatever it takes",and it has to be done last Friday,and, oh, worst of all, PAY THEM WHAT THEY'RE WORTH.

    • by gweihir ( 88907 )

      These assholes cannot keep their mouths shut and believe _their_ issues are so important that they need to spam them everywhere. Kind of what the religious nut-jobs do, although they have overstayed their welcome enough that they have gotten quieter. Kind of like any other group of extremists with a lack of brains and huge egos.

      The only good thing is that you can see their massive stupidity and lack of honor and integrity in action. Being regularly reminded how stupid and dysfunctional a rather large part o

  • Like many others I am way of code produced by AI as being an amalgam of mediocre code.

    However, there is one area I think AI could be really useful - writing unit tests!

    Either from a problem description where AI would produce unit tests to develop to, or looking at existing code and writing unit tests that basically locked in behaviour and tested to make sure it was maintained.

    Basically I think AI could be best at eliminating the non crucial code that is often tedious to write.

    Or, to flip this problem on it

  • At least nobody with two working brain cells. All Arificial Ignorance can do is pattern matching and copy&paste. That does not cut it for coding. Competent coding requires insight, even if it is only simple business logic.

  • Let's write an AI tool that finds security bugs!

panic: can't find /

Working...