Study Finds AI Assistants Help Developers Produce Code That's More Likely To Be Buggy (theregister.com) 50
Computer scientists from Stanford University have found that programmers who accept help from AI tools like Github Copilot produce less secure code than those who fly solo. From a report: In a paper titled, "Do Users Write More Insecure Code with AI Assistants?", Stanford boffins Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh answer that question in the affirmative. Worse still, they found that AI help tends to delude developers about the quality of their output. "We found that participants with access to an AI assistant often produced more security vulnerabilities than those without access, with particularly significant results for string encryption and SQL injection," the authors state in their paper.
"Surprisingly, we also found that participants provided access to an AI assistant were more likely to believe that they wrote secure code than those without access to the AI assistant." Previously, NYU researchers have shown that AI-based programming suggestions are often insecure in experiments under different conditions. The Stanford authors point to an August 2021 research paper titled "Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions," which found that given 89 scenarios, about 40 per cent of the computer programs made with the help of Copilot had potentially exploitable vulnerabilities.
That study, the Stanford authors say, is limited in scope because it only considers a constrained set of prompts corresponding to 25 vulnerabilities and just three programming languages: Python, C, and Verilog. The Stanford scholars also cite a followup study from some of the same NYU eggheads, "Security Implications of Large Language Model Code Assistants: A User Study," as the only comparable user study they're aware of. They observe, however, that their work differs because it focuses on OpenAI's codex-davinci-002 model rather than OpenAI's less powerful codex-cushman-001 model, both of which play a role in GitHub Copilot, itself a fine-tuned descendant of a GPT-3 language model.
"Surprisingly, we also found that participants provided access to an AI assistant were more likely to believe that they wrote secure code than those without access to the AI assistant." Previously, NYU researchers have shown that AI-based programming suggestions are often insecure in experiments under different conditions. The Stanford authors point to an August 2021 research paper titled "Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions," which found that given 89 scenarios, about 40 per cent of the computer programs made with the help of Copilot had potentially exploitable vulnerabilities.
That study, the Stanford authors say, is limited in scope because it only considers a constrained set of prompts corresponding to 25 vulnerabilities and just three programming languages: Python, C, and Verilog. The Stanford scholars also cite a followup study from some of the same NYU eggheads, "Security Implications of Large Language Model Code Assistants: A User Study," as the only comparable user study they're aware of. They observe, however, that their work differs because it focuses on OpenAI's codex-davinci-002 model rather than OpenAI's less powerful codex-cushman-001 model, both of which play a role in GitHub Copilot, itself a fine-tuned descendant of a GPT-3 language model.
Re: (Score:2, Informative)
Actually, there is a correlation between the more years of education you have and you having liberal views -- thus more likely being a Democrat. Less education is associated with more conservative views. This is another reason why Democrats are more likely to support increased educational opportunities, especially college, while Republicans generally only support enough education for people to be able to work in support and service jobs. Creativity is con
Re: (Score:3, Insightful)
Re: (Score:1)
Sure, as long as you define indoctrination camps as places where they encourage you to think for yourself, or in other words, the opposite of indoctrination camps.
Re: (Score:2)
And yet another story that needs to begin with... (Score:3, Insightful)
Re:And yet another story that needs to begin with. (Score:4, Interesting)
Re: (Score:2)
Indeed. It is like common sense has gone out of fashion in certain circles. Apparently more and more people in the "AI space" are bereft of common sense and actual intelligence, thereby more and more resembling the machines they pray to.
Re: (Score:3)
Point a pattern-matcher (it's not AI no matter how many times someone selling it claims it is)
The trouble here is that you want the term 'AI' to refer to something else entirely. Some science fiction thing. Linear regression, for example, falls under the category of AI. So do decision trees, clustering algorithms, and logistic regression. None of which you'd probably consider 'AI'.
I agree that the term is extremely misleading, but that was the entire point! It was all about marketing from the instant the term was coined. Pamela McCorduck, who was there at the time, gives an interesting history
Re: (Score:2)
There is no understanding here at all, just probability. It will happily generate insecure code, inefficient code, code with obvious or subtle bugs, and even code that is just flat-out wrong.
So they've figured out a way to replace undergraduate computer science students? Or, in some cases, their professors?
Re: (Score:3)
So a programming aid that does not actually understand requirements at all does not produce good code? Who knew?
Re: (Score:2)
in any case, if your AI is generating the code, no matter the quality, nobody has been through the process of designing it, so nothing remains in the collective mind of your dev teams. As such, people are more likely to overlook something at some point, and the code won't fit into your way of doing things for that very reason.
Re: (Score:3)
These things are not pattern matchers. That's not how they work at all.
You can still argue that they aren't AI, but they are something new and useful that previously most people did not have access to.
More a reflection on the state of the tools (Score:3)
This is the modern-day version of trusting your libraries not to shoot you in the foot.
Re: (Score:2)
And sometimes that works.
It's conceivable the issue here is one of habits and training - i.e. that one day we'll use such tools all the time, and have developed the right kind of instincts to code-review the suggestions that come up. Having a good feel for where tools are reliable, and where they need special attention is nothing new - and perhaps that just a question of practice here too.
Mind you, it might not be. It's sometimes harder to read code than to write it, so even if AI could write code just as
Re: (Score:2)
I doubt it. This is a parlor trick, not a tool. The nature of the technology guarantees that it will never be anything more.
Copyapasta with red sauce or white? (Score:1)
So people who don't bother writing their own code produce more bugs? Call the nobel committee!
I'm sure if you had a way of tracing direct copy pastes from stack overflow into production code, you'd find a similar correlation.
Also...a "boffin" sounds like it might be a garnish or a side on that plate of pasta. Talk American, comrade.
Re: (Score:3)
> Also...a "boffin" sounds like it might be a garnish or a side on that plate of pasta. Talk American, comrade.
The Register is a UK site, and they are inclined to speak British rather than American English.
Re:Copyapasta with red sauce or white? (Score:4, Funny)
Also...a "boffin" sounds like it might be a garnish or a side on that plate of pasta.
Actually a boffin is a type of hat, a bit like a trilby but not quite a homburg, worn to cricket matches, up until about the sixth innings.
Re:Copyapasta with red sauce or white? (Score:4, Funny)
"Eggheads?" Was this article written in 1950, daddy-o?
... because they don't know how (Score:2)
The people that know what the fuck they are doing don't need the assistance so they aren't using the system so they aren't being sampled. Guess I should RTFA but did they control for years of experience? I bet inexperienced developers that don't use the AI assistant write even worse code.
Re: (Score:2)
Re: (Score:2)
But this is the kind of bug that I would expect a good AI to avoid. SQL injection attacks? And AI should just avoid the possibility. (Well, of course the "intelligence" is limited to what phrases follow what other phrases, but THAT should suffice.)
The other interesting bit is that the coders who used an AI tool were more confident of their code's accuracy than the others. There are several possible explanations for that, but the top one off my head is the guess that the sampled programmers were first
Knowing what you're doing helps (Score:1, Insightful)
more so than using a tool that you don't undrestand. Developers who know what they are doing and the tools they are using don't need to rely on some silly chatbot fed data from stack overflow. :Using those tools would make them slower because it would require them to read and understand the code produced by the chatbot anyway to ensure that it does what they expected.
These chatbots are ptetty much no different than existing incompetent developers already. They have the exact same flaw. Writing code to ma
not to mention... (Score:2)
And these were coders who were testing it, ie otherwise-competent writers of code.
Can you imagine how bad it will be when you have "coders" who *only* learn with these crutches, thus never learning the sort of practices and self-rigor that one applies while programming?
As evidence I'd submit the abysmal casual math skills and terrible spelling wherever we see people today deprived of these technological assists.
I know time machines aren't real... (Score:4, Insightful)
Machines can't generate decent code. When they can, YOU WILL KNOW. All your device drivers will suddenly get lean and reliable. All your programs will become leaner. Crashes would be nearly unheard of...your device battery would last a lot longer. Video games will become amazing in terms of background details and efficiency. There are many unprofitable opportunities for improvement, especially in non-mission-critical systems, like video games, that could run MUCH better with a little TLC. If AI could write code, that would make the process of turning an AI into perfect code nearly free.
Also, this code will basically become the next-gen JVM or CLI or V8 or some other runtime-interpreter and be used in modern developer tools to halt bugs and vulnerabilities or make existing code a lot more efficient.
Long story short...if it COULD be done, it WOULD be done and it wouldn't be a secret. It would be all anyone talks about and a major revolution in our daily lives.
Re: (Score:2)
Indeed. In addition to the understanding-based evaluation that Artificial Ignorance does have no clue about anything, this provides a nice observation based second indicator that this stuff does not work and cannot solve real problems reliably except for very simplistic (relatively speaking) and strongly normalized situations (like Chess or Go).
PHB (Score:3)
The middle managers don't care as long as the code produces a plausible answer 90% of the time. They need to fool their boss just long enough to get a promotion or change companies.
At least in the majority of programming jobs (that you should quit).
I know a guy who's a wealthy Senior VP at a major health firm who stays in a job for a while and is expert at switching companies right before they figure out he doesn't know anything and has to go.
His 401(k) confirms that his strategy is very successful for the corporate system and he plays the game presented as an expert.
Would he use an AI assistant?
GIGO applies to AI also (Score:1)
GIGO means "garbage in, garbage out." If you feed an AI the wrong information, it can end up like Norman [bbc.com], a machine learning Redditor. If you dump a bunch of Stack Overflow code into your "AI," you will get a bunch of buggy, insecure stuff.
Driving off a cliff. (Score:2)
People seem to shut off their brain if the answer comes from a computer program. Frighteningly, judges don't question sentences that are "suggested" by "AI", even if the "AI" is just a badly implemented Excel spreadsheet.
First hand experience (Score:2)
In recent versions, Microsoft has introduced some "AI" code completion in Visual Studio. On one hand, I like some of what it can do. For example, if I'm building a series of "if" statements that handle different possible cases, the code completion tool will try to figure out what I'm going to type in the next iteration, and often it gets close. It certainly reduces keystrokes.
On the other hand, it's kind of like assistive steering in new cars. It can lull you into trusting it too much, leading to errors.
Re: (Score:2)
Microsoft Excel is really bad about this, too. It will offer auto-completion. For a "choose 1 of N" type column, it's usually not too bad, but if a column has prose text then after a certain number of rows, its suggestions will be pure gibberish.
With the software coding, the AI seems to be suggesting plausible but wrong completions, but Excel doesn't have enough smarts to make its output plausible.
Funnily enough (Score:3)
Re: Funnily enough (Score:2)
Re: (Score:1)
OK, lighten up. Let's make classic jokes then:
"Your mom is so poor she can't even pay attention."
Copy-Paste Engineering (Score:2)
From Wikipedia [wikipedia.org]:
Re: (Score:2)
Yep, fits. As Artificial Ignorance has absolutely no understanding of anything, this is what it does. With even less insight, but a larger search database.
Hardly works for me (Score:2)
Re: (Score:2)
Haha backwards (Score:2)
Sorry, it's more like developers who are more likely to produce buggy code use AI "assistants."
What happened to the techies here? (Score:2)
The first bunch of comments on this are wrong-wing assholes, who have nothing to say other than propaganda.
As opposed, say, to what I have to type - oh, yes, use AI, use this, that or the other, ANYTHING but experienced professionals who know what they're doing, and you give them the time to do it, not tell them "whatever it takes",and it has to be done last Friday,and, oh, worst of all, PAY THEM WHAT THEY'RE WORTH.
Re: (Score:2)
These assholes cannot keep their mouths shut and believe _their_ issues are so important that they need to spam them everywhere. Kind of what the religious nut-jobs do, although they have overstayed their welcome enough that they have gotten quieter. Kind of like any other group of extremists with a lack of brains and huge egos.
The only good thing is that you can see their massive stupidity and lack of honor and integrity in action. Being regularly reminded how stupid and dysfunctional a rather large part o
A way AI could be really helpful: Unit Tests (Score:2)
Like many others I am way of code produced by AI as being an amalgam of mediocre code.
However, there is one area I think AI could be really useful - writing unit tests!
Either from a problem description where AI would produce unit tests to develop to, or looking at existing code and writing unit tests that basically locked in behaviour and tested to make sure it was maintained.
Basically I think AI could be best at eliminating the non crucial code that is often tedious to write.
Or, to flip this problem on it
And nobody is surprised (Score:2, Insightful)
At least nobody with two working brain cells. All Arificial Ignorance can do is pattern matching and copy&paste. That does not cut it for coding. Competent coding requires insight, even if it is only simple business logic.
I know! (Score:2)