

AI Code Generators Are Writing Vulnerable Software Nearly Half the Time, Analysis Finds (nerds.xyz) 43
BrianFagioli writes: AI might be the future of software development, but a new report suggests we're not quite ready to take our hands off the wheel. Veracode has released its 2025 GenAI Code Security Report, and the findings are pretty alarming. Out of 80 carefully designed coding tasks completed by over 100 large language models, nearly 45 percent of the AI-generated code contained security flaws.
That's not a small number. These are not minor bugs, either. We're talking about real vulnerabilities, with many falling under the OWASP Top 10, which highlights the most dangerous issues in modern web applications. The report found that when AI was given the option to write secure or insecure code, it picked the wrong path nearly half the time.
That's not a small number. These are not minor bugs, either. We're talking about real vulnerabilities, with many falling under the OWASP Top 10, which highlights the most dangerous issues in modern web applications. The report found that when AI was given the option to write secure or insecure code, it picked the wrong path nearly half the time.
Just like humans: How we train them. (Score:4, Insightful)
If you train the coder to write secure software, it will. If you don't, it won't. Same rule applies to people as AI.
The problem is we have so much insecure code out there, that the AI is being trained on it.
Re: (Score:1)
But it's easier to fire questionable humans, there are more of them than coding-bot venders.
Re: (Score:3)
While there are some scenarios where the secure way is relatively straight forward (e.g. if you know you don't need anything peculiar, a reasonably secure CSP header is just common sense to opt into more locked down browser environment), much of it is actually thinking what the code does and how it could be bent. LLM style training isn't likely up to the task as it just generates code that is consistent with something that sounds like it fits the pattern, it's not actually conceptualizing things.
LLM can be
Re:Just like humans: How we train them. (Score:5, Insightful)
Als don't work like that. They have already been trained on more code that any group of humans will be able to digest. They have been trained on a huge quantity of code. But the quality of the code they have been "trained" on is not of the quality you need to produce secure code. These Al do not understand what secure code it, and you cannot tell them like you can tell a human. All these Als can do produce what they have commonly seen within a provided context, and that most common part, is insecure.
Re: (Score:2)
Als don't work like that. They have already been trained on more code that any group of humans will be able to digest.
The word I feel inclined to question is "digest". In a human, it means to understand (to some degree) and assimilate. In an LLM, what exactly can it mean?
Re:Just like humans: How we train them. (Score:5, Insightful)
If you train the coder to write secure software, it will. If you don't, it won't. Same rule applies to people as AI.
No, it does not.
The AI doesn't know what makes secure code or not. It just makes code that looks like it's secure. Humans can do the same thing of course, but the difference is that they have a chance to know. The AI doesn't know anything at all. It only contains analyses of tokens which causes them to be stuck together in statistically familiar ways.
You could train your AI only on the best code in the world, and it would still hallucinate insecure code for you.
Re:Just like humans: How we train them. (Score:5, Interesting)
Indeed, and I've dealt with this in human terms a fair amount too.
Found a vulnerability in a technology that applied plausible security practices, but missed a key implication, so strike one.
Then the stewards of that technology reviewed the findings and after a few days published that they were mandating a formerly optional feature that was meant to imitate a key security practice in other technologies to resolve the issue. This again sounded just right as that imitated mechanism was indeed directly meant to address that sort of weakness, however the answer was made without actually thinking about the vulnerability, as their version implemented the feature *after* the vulnerability would have already landed.
In an analogous AI experience, I asked a code generator to implement a pinned certificate to an HTTPS service in a language I wasn't immediately familiar with. So it:
-Dutifully disabled traditional certificate validation
-Submitted HTTPS request, including username and password
-*Then* implemented explicit certificate check, after the data had already been transmitted...
It's somewhat mitigated because it hallucinated arguments that were invalid for controlling the TLS behavior, but structurally it was code that was trying to be outwardly careful with explicit validation but entirely missing the point.
All the time I deal with people trying to do security that manage to align with the lingo, but fall short of actual thoughtful implementation. LLM could likely compete with those folks, but in some ways I prefer the totally naive results, it's easier to tell problems at a glance.
Re: (Score:2)
> The AI doesn't know what makes secure code or not. It just makes code that looks like it's secure.
It'll only do that when making predictions that lean on training samples that were secure.. In general it just makes code that look like what it was trained on, warts and all.
It's like having an LLM continue a partial chess game - it's not trying to win, but rather trying to predict how the player whose turn it is assuming would have played. If it's continuing the moves of a bad player then it will predict
Re: (Score:2)
It'll only do that when making predictions that lean on training samples that were secure.. In general it just makes code that look like what it was trained on, warts and all.
It's not clear that LLMs could write secure code, even if the training samples were all 100% secure.
Re: (Score:2)
Sure, I think we can be pretty certain that it wouldn't 100% of the time, although with an LLM this is all you can do - give it the best training data possible and cross your fingers. Training could also include things like task-level RL with evaluation for security, but at the end of the day all bets are off since an LLM is just a prediction machine.
One day we'll have human level AGI, based on a more complete brain-like architecture, capable of continual learning, with short term memory, and traits like cu
Re: (Score:2)
Re: (Score:3)
If you train the coder to write secure software, it will. If you don't, it won't. Same rule applies to people as AI.
The problem is we have so much insecure code out there, that the AI is being trained on it.
... and obviously nobody could have foreseen this.
Re: (Score:2)
The problem is that AI has an ego of three vegetas.
It just KNOWS it's right!
You can make a smart one, but we didn't, we just believe on the AI so much we have people that can't read code generating it.
Re: Just like humans: How we train them. (Score:2)
It's also the fact that AI pieces together your ask from multiple sources without looking at the big picture. They'll never see business logic exploits and such like.
Re: (Score:3)
A human starting out programming will write code that has all manner of bugs in it, including security bugs. S/he will have to fix such bugs as they are reported -- this is how they learn what does/does-not work and, over time, they get better. The problem with AI is that it does not get this feedback and so does not learn & get better.
Re: (Score:2)
The problem with AI is that it does not get this feedback and so does not learn & get better.
LLMs simply do not work that way. We do update models using human feedback, just not they way people imagine. These things are not science fiction robots. They do not work like biological brains. They do not learn and change as they're used, neither can they be taught through pretend conversations or by having them fix bugs or make other corrections. Remember that they operate strictly on relationships between tokens, not on facts and concepts. All they do is next-token prediction with no internal stat
Re: (Score:2)
Re: (Score:2)
Oldest rule in the book:
GIGO
I have noticed more bugs... (Score:1)
Welcome the the future, it's stupid. (Score:2)
Pay no attention to the fact that every new service exists solely to fix the problems created by the last service you subscribed to that solves a problem you didn't know you had until they made it a problem by telling your CEO that everyone is using this new hot thing and if they don't everyone will laugh at them at the country club.
and the other half the time... (Score:1)
Re:and other half doesn't compile (Score:2)
Just use Perl, that problem then goes away ;-)
Re: (Score:1)
I didn't criticize it; I gave it no value judgement.
Always-running-without-syntax-errors can be seen as a benefit because you don't have to stop to figure out the syntax error. (Possible down-sides of this shall not be addressed here.)
Re: (Score:2)
Re: (Score:2)
Or AI winter is going to happen hard and fast, and all the companies that pivoted to AI are screwed. Could go either way, but we all know how boom/bust cycles work.
AI coding (Score:4)
Re:AI coding (Score:5, Insightful)
Re: (Score:2)
That's not really a reasonable explanation of how modern LLMs work, that's more the previous generation.
With modern ones it's taking into account all the previous text in the session and comparing that to the training data to generate an output, similar to how you might say a response to someone in Chinese based on your knowledge of Chinese
Re: (Score:2)
With modern ones it's taking into account all the previous text in the session and comparing that to the training data to generate an output, similar to how you might say a response to someone in Chinese
That's not at all similar to how people talk in Chinese. When humans speak Chinese, they understand what they are saying. If they say apple, they know what an apple tastes like.
This is about qualia, but qualia are just the beginning of the problems. LLMs are not like the human brain, and the researchers building them do not claim them to be.
Marketing? (Score:4, Interesting)
The link in the article leads to a blog rather than the primary source. Clicking the report link takes you to a "Veracode" website, which offers solutions for securing (AI) code. The report link there takes you to another page with reports by them, but there's still no download option. It seems you must fill out a form with personal information to get the PDF.
In other words: Don't trust anything until you've see a serious paper about it and not just marketing sites.
If anyone has the PDF, please upload it so we have something we can actually discuss.
GIGO (Score:2)
AI code generators are doing there best to copy (predict) code they were trained on, so best case is the code they are generating is as secure as the StackOverflow/github/etc slop they were trained on.
Common (vs best) case is that the AI is generating something somewhat novel (not an exact regurgitation of something it was trained on), in which case you are getting a statistical prediction which is a mashup of multiple training sources and all bets are off.
Don't anthropomorphize and think the AI has some go
So what? (Score:2)
If you look at human-written code from online advice sites, it invariably comes with provisos like "error handling has not been included." If AI is trained on that, and knows not better, then whaddaya expect?
Re: (Score:2)
Right - code samples from places like Stack Overflow are likely to be stripped down examples of "how to do the thing", rather than hardened code copied from production systems (that would get the poster in legal trouble).
Nearly half the time? (Score:4, Insightful)
So...better than I thought.
Security people vastly overestimate their import (Score:2)
The fact is, huge security apparatus is there to be the fall guy for the corp when accidents do happen.
The reason I say this is because most "breaches" are just human errors that involve social interactions that can't be prevented regardless of how draconian your security apparatus is.
It's a classic problem where a small effort on security results in a big improvement, but every sequential improvement results in very small gains.
Shhhh... You're going to ruin it... (Score:1)
Think how productive you are with AI! Now you're a 10x programmer! Keep that code flowing. Don't worry about security, that's a job for someone else, you're just suppose to crank out code.
Question (Score:2)
'Vulnerable' or best practice? (Score:2)