AI Code Generators Are Writing Vulnerable Software Nearly Half the Time, Analysis Finds (nerds.xyz) 43

Posted by msmash on Wednesday July 30, 2025 @12:01PM from the closer-look dept.

BrianFagioli writes: AI might be the future of software development, but a new report suggests we're not quite ready to take our hands off the wheel. Veracode has released its 2025 GenAI Code Security Report, and the findings are pretty alarming. Out of 80 carefully designed coding tasks completed by over 100 large language models, nearly 45 percent of the AI-generated code contained security flaws.

That's not a small number. These are not minor bugs, either. We're talking about real vulnerabilities, with many falling under the OWASP Top 10, which highlights the most dangerous issues in modern web applications. The report found that when AI was given the option to write secure or insecure code, it picked the wrong path nearly half the time.

AI Code Generators Are Writing Vulnerable Software Nearly Half the Time, Analysis Finds

Post Load All Comments

Search 43 Comments Log In/Create an Account

Comments Filter:

Just like humans: How we train them. (Score:4, Insightful)

by gurps_npc ( 621217 ) writes: on Wednesday July 30, 2025 @12:04PM (#65555454) Homepage

If you train the coder to write secure software, it will. If you don't, it won't. Same rule applies to people as AI.
The problem is we have so much insecure code out there, that the AI is being trained on it.

Reply to This Share
Flag as Inappropriate
- Re: (Score:1)
  
  by Tablizer ( 95088 ) writes:
  
  But it's easier to fire questionable humans, there are more of them than coding-bot venders.
- Re: (Score:3)
  
  by Junta ( 36770 ) writes:
  
  While there are some scenarios where the secure way is relatively straight forward (e.g. if you know you don't need anything peculiar, a reasonably secure CSP header is just common sense to opt into more locked down browser environment), much of it is actually thinking what the code does and how it could be bent. LLM style training isn't likely up to the task as it just generates code that is consistent with something that sounds like it fits the pattern, it's not actually conceptualizing things.
  LLM can be
- Re:Just like humans: How we train them. (Score:5, Insightful)
  
  by El_Muerte_TDS ( 592157 ) writes: on Wednesday July 30, 2025 @12:29PM (#65555500) Homepage
  
  Als don't work like that. They have already been trained on more code that any group of humans will be able to digest. They have been trained on a huge quantity of code. But the quality of the code they have been "trained" on is not of the quality you need to produce secure code. These Al do not understand what secure code it, and you cannot tell them like you can tell a human. All these Als can do produce what they have commonly seen within a provided context, and that most common part, is insecure.
  
  Reply to This Parent Share
  Flag as Inappropriate
  - Re: (Score:2)
    
    by Archtech ( 159117 ) writes:
    
    Als don't work like that. They have already been trained on more code that any group of humans will be able to digest.
    The word I feel inclined to question is "digest". In a human, it means to understand (to some degree) and assimilate. In an LLM, what exactly can it mean?
- Re:Just like humans: How we train them. (Score:5, Insightful)
  
  by drinkypoo ( 153816 ) writes: <drink@hyperlogos.org> on Wednesday July 30, 2025 @12:49PM (#65555562) Homepage Journal
  
  If you train the coder to write secure software, it will. If you don't, it won't. Same rule applies to people as AI.
  No, it does not.
  The AI doesn't know what makes secure code or not. It just makes code that looks like it's secure. Humans can do the same thing of course, but the difference is that they have a chance to know. The AI doesn't know anything at all. It only contains analyses of tokens which causes them to be stuck together in statistically familiar ways.
  You could train your AI only on the best code in the world, and it would still hallucinate insecure code for you.
  
  Reply to This Parent Share
  Flag as Inappropriate
  - Re:Just like humans: How we train them. (Score:5, Interesting)
    
    by Junta ( 36770 ) writes: on Wednesday July 30, 2025 @01:12PM (#65555614)
    
    Indeed, and I've dealt with this in human terms a fair amount too.
    Found a vulnerability in a technology that applied plausible security practices, but missed a key implication, so strike one.
    Then the stewards of that technology reviewed the findings and after a few days published that they were mandating a formerly optional feature that was meant to imitate a key security practice in other technologies to resolve the issue. This again sounded just right as that imitated mechanism was indeed directly meant to address that sort of weakness, however the answer was made without actually thinking about the vulnerability, as their version implemented the feature *after* the vulnerability would have already landed.
    In an analogous AI experience, I asked a code generator to implement a pinned certificate to an HTTPS service in a language I wasn't immediately familiar with. So it:
    -Dutifully disabled traditional certificate validation
    -Submitted HTTPS request, including username and password
    -*Then* implemented explicit certificate check, after the data had already been transmitted...
    It's somewhat mitigated because it hallucinated arguments that were invalid for controlling the TLS behavior, but structurally it was code that was trying to be outwardly careful with explicit validation but entirely missing the point.
    All the time I deal with people trying to do security that manage to align with the lingo, but fall short of actual thoughtful implementation. LLM could likely compete with those folks, but in some ways I prefer the totally naive results, it's easier to tell problems at a glance.
    
    Reply to This Parent Share
    Flag as Inappropriate
  - Re: (Score:2)
    
    by SpinyNorman ( 33776 ) writes:
    
    > The AI doesn't know what makes secure code or not. It just makes code that looks like it's secure.
    It'll only do that when making predictions that lean on training samples that were secure.. In general it just makes code that look like what it was trained on, warts and all.
    It's like having an LLM continue a partial chess game - it's not trying to win, but rather trying to predict how the player whose turn it is assuming would have played. If it's continuing the moves of a bad player then it will predict
    - Re: (Score:2)
      
      by phantomfive ( 622387 ) writes:
      
      It'll only do that when making predictions that lean on training samples that were secure.. In general it just makes code that look like what it was trained on, warts and all.
      It's not clear that LLMs could write secure code, even if the training samples were all 100% secure.
      - Re: (Score:2)
        
        by SpinyNorman ( 33776 ) writes:
        
        Sure, I think we can be pretty certain that it wouldn't 100% of the time, although with an LLM this is all you can do - give it the best training data possible and cross your fingers. Training could also include things like task-level RL with evaluation for security, but at the end of the day all bets are off since an LLM is just a prediction machine.
        One day we'll have human level AGI, based on a more complete brain-like architecture, capable of continual learning, with short term memory, and traits like cu
  - Re: (Score:2)
    
    by phantomfive ( 622387 ) writes:
    
    First I'd like to see AI generate unit tests that give 100% code coverage. Until it can do that, we can't even begin to talk about security correctness.
- Re: (Score:3)
  
  by Freischutz ( 4776131 ) writes:
  
  If you train the coder to write secure software, it will. If you don't, it won't. Same rule applies to people as AI.
  The problem is we have so much insecure code out there, that the AI is being trained on it.
  ... and obviously nobody could have foreseen this.
- Re: (Score:2)
  
  by Z80a ( 971949 ) writes:
  
  The problem is that AI has an ego of three vegetas.
  It just KNOWS it's right!
  You can make a smart one, but we didn't, we just believe on the AI so much we have people that can't read code generating it.
- Re: Just like humans: How we train them. (Score:2)
  
  by devslash0 ( 4203435 ) writes:
  
  It's also the fact that AI pieces together your ask from multiple sources without looking at the big picture. They'll never see business logic exploits and such like.
- Re: (Score:3)
  
  by Alain Williams ( 2972 ) writes:
  
  A human starting out programming will write code that has all manner of bugs in it, including security bugs. S/he will have to fix such bugs as they are reported -- this is how they learn what does/does-not work and, over time, they get better. The problem with AI is that it does not get this feedback and so does not learn & get better.
  - Re: (Score:2)
    
    by narcc ( 412956 ) writes:
    
    The problem with AI is that it does not get this feedback and so does not learn & get better.
    LLMs simply do not work that way. We do update models using human feedback, just not they way people imagine. These things are not science fiction robots. They do not work like biological brains. They do not learn and change as they're used, neither can they be taught through pretend conversations or by having them fix bugs or make other corrections. Remember that they operate strictly on relationships between tokens, not on facts and concepts. All they do is next-token prediction with no internal stat
- Re: (Score:2)
  
  by gosso920 ( 6330142 ) writes:
  
  IOW, Working as intended.
- Re: (Score:2)
  
  by eneville ( 745111 ) writes:
  
  Oldest rule in the book:
  GIGO
I have noticed more bugs... (Score:1)

by TheStickBoy ( 246518 ) writes:

I have noticed more bugs and seemingly UI error laziness in my phone apps in the last 6 months. I am sure some of these are from small team developers using AI and pushing though to dev without proper testing.
- Welcome the the future, it's stupid. (Score:2)
  
  by ebunga ( 95613 ) writes:
  
  Pay no attention to the fact that every new service exists solely to fix the problems created by the last service you subscribed to that solves a problem you didn't know you had until they made it a problem by telling your CEO that everyone is using this new hot thing and if they don't everyone will laugh at them at the country club.
and the other half the time... (Score:1)

by GorillaSapiens ( 10413899 ) writes:

and the other half the time it doesn't compile at all.
- Re:and other half doesn't compile (Score:2)
  
  by Tablizer ( 95088 ) writes:
  
  Just use Perl, that problem then goes away ;-)
  - - Re: (Score:1)
      
      by Tablizer ( 95088 ) writes:
      
      I didn't criticize it; I gave it no value judgement.
      Always-running-without-syntax-errors can be seen as a benefit because you don't have to stop to figure out the syntax error. (Possible down-sides of this shall not be addressed here.)
      - Re: (Score:2)
        
        by phantomfive ( 622387 ) writes:
        
        Ironically, it will do better with Python because in Perl the parens need to be balanced.
- Re: (Score:2)
  
  by ebunga ( 95613 ) writes:
  
  Or AI winter is going to happen hard and fast, and all the companies that pivoted to AI are screwed. Could go either way, but we all know how boom/bust cycles work.
AI coding (Score:4)

by gary s ( 5206985 ) writes: on Wednesday July 30, 2025 @12:45PM (#65555552)

AI is just grabbing publicly available code and slapping it together into something new. Bad code in bad code out..

Reply to This Share
Flag as Inappropriate
- Re:AI coding (Score:5, Insightful)
  
  by RobinH ( 124750 ) writes: on Wednesday July 30, 2025 @01:22PM (#65555650) Homepage
  
  It's not even doing that. It's more like... if you studied hundreds of thousands of Chinese books and meticulously worked out the probabilities of certain characters following other characters, and used that knowledge to repeatedly choose the next character in an ongoing "conversation" with someone who knew Chinese. That person might actually think you knew Chinese, but you really don't.
  
  Reply to This Parent Share
  Flag as Inappropriate
  - Re: (Score:2)
    
    by Ksevio ( 865461 ) writes:
    
    That's not really a reasonable explanation of how modern LLMs work, that's more the previous generation.
    With modern ones it's taking into account all the previous text in the session and comparing that to the training data to generate an output, similar to how you might say a response to someone in Chinese based on your knowledge of Chinese
    - Re: (Score:2)
      
      by phantomfive ( 622387 ) writes:
      
      With modern ones it's taking into account all the previous text in the session and comparing that to the training data to generate an output, similar to how you might say a response to someone in Chinese
      That's not at all similar to how people talk in Chinese. When humans speak Chinese, they understand what they are saying. If they say apple, they know what an apple tastes like.
      
      This is about qualia, but qualia are just the beginning of the problems. LLMs are not like the human brain, and the researchers building them do not claim them to be.
Marketing? (Score:4, Interesting)

by allo ( 1728082 ) writes: on Wednesday July 30, 2025 @01:08PM (#65555606)

The link in the article leads to a blog rather than the primary source. Clicking the report link takes you to a "Veracode" website, which offers solutions for securing (AI) code. The report link there takes you to another page with reports by them, but there's still no download option. It seems you must fill out a form with personal information to get the PDF.
In other words: Don't trust anything until you've see a serious paper about it and not just marketing sites.
If anyone has the PDF, please upload it so we have something we can actually discuss.

Reply to This Share
Flag as Inappropriate
GIGO (Score:2)

by SpinyNorman ( 33776 ) writes:

AI code generators are doing there best to copy (predict) code they were trained on, so best case is the code they are generating is as secure as the StackOverflow/github/etc slop they were trained on.
Common (vs best) case is that the AI is generating something somewhat novel (not an exact regurgitation of something it was trained on), in which case you are getting a statistical prediction which is a mashup of multiple training sources and all bets are off.
Don't anthropomorphize and think the AI has some go
So what? (Score:2)

by groobly ( 6155920 ) writes:

If you look at human-written code from online advice sites, it invariably comes with provisos like "error handling has not been included." If AI is trained on that, and knows not better, then whaddaya expect?
- Re: (Score:2)
  
  by SpinyNorman ( 33776 ) writes:
  
  Right - code samples from places like Stack Overflow are likely to be stripped down examples of "how to do the thing", rather than hardened code copied from production systems (that would get the poster in legal trouble).
Nearly half the time? (Score:4, Insightful)

by Zcar ( 756484 ) writes: on Wednesday July 30, 2025 @01:30PM (#65555682)

So...better than I thought.

Reply to This Share
Flag as Inappropriate
Security people vastly overestimate their import (Score:2)

by TheStatsMan ( 1763322 ) writes:

The fact is, huge security apparatus is there to be the fall guy for the corp when accidents do happen.
The reason I say this is because most "breaches" are just human errors that involve social interactions that can't be prevented regardless of how draconian your security apparatus is.
It's a classic problem where a small effort on security results in a big improvement, but every sequential improvement results in very small gains.
Shhhh... You're going to ruin it... (Score:1)

by Cognivore ( 301236 ) writes:

Think how productive you are with AI! Now you're a 10x programmer! Keep that code flowing. Don't worry about security, that's a job for someone else, you're just suppose to crank out code.
Question (Score:2)

by nut ( 19435 ) writes:

Is this better or worse than the typical human software developer?
'Vulnerable' or best practice? (Score:2)

by TJHook3r ( 4699685 ) writes:

If I run a security tool against pretty much any site I'd get issues reported - are they actually problems or is it a case of a developer not following the best practice for that week? I wonder, can a regular developer write secure code on their own without double-checking each step against code analysis tools?

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

AI Code Generators Are Writing Vulnerable Software Nearly Half the Time, Analysis Finds More | Reply Login

Just like humans: How we train them. (Score:4, Insightful)

Re: (Score:1)

Re: (Score:3)

Re:Just like humans: How we train them. (Score:5, Insightful)

Re: (Score:2)

Re:Just like humans: How we train them. (Score:5, Insightful)

Re:Just like humans: How we train them. (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: Just like humans: How we train them. (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

I have noticed more bugs... (Score:1)

Welcome the the future, it's stupid. (Score:2)

and the other half the time... (Score:1)

Re:and other half doesn't compile (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

AI coding (Score:4)

Re:AI coding (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Marketing? (Score:4, Interesting)

GIGO (Score:2)

So what? (Score:2)

Re: (Score:2)

Nearly half the time? (Score:4, Insightful)

Security people vastly overestimate their import (Score:2)

Shhhh... You're going to ruin it... (Score:1)

Question (Score:2)

'Vulnerable' or best practice? (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals