Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
AI Programming Bug

AI Slop? Not This Time. AI Tools Found 50 Real Bugs In cURL (theregister.com) 59

The Register reports: Over the past two years, the open source curl project has been flooded with bogus bug reports generated by AI models. The deluge prompted project maintainer Daniel Stenberg to publish several blog posts about the issue in an effort to convince bug bounty hunters to show some restraint and not waste contributors' time with invalid issues. Shoddy AI-generated bug reports have been a problem not just for curl, but also for the Python community, Open Collective, and the Mesa Project.

It turns out the problem is people rather than technology. Last month, the curl project received dozens of potential issues from Joshua Rogers, a security researcher based in Poland. Rogers identified assorted bugs and vulnerabilities with the help of various AI scanning tools. And his reports were not only valid but appreciated. Stenberg in a Mastodon post last month remarked, "Actually truly awesome findings." In his mailing list update last week, Stenberg said, "most of them were tiny mistakes and nits in ordinary static code analyzer style, but they were still mistakes that we are better off having addressed. Several of the found issues were quite impressive findings...."

Stenberg told The Register that about 50 bugfixes based on Rogers' reports have been merged. "In my view, this list of issues achieved with the help of AI tooling shows that AI can be used for good," he said in an email. "Powerful tools in the hand of a clever human is certainly a good combination. It always was...!" Rogers wrote up a summary of the AI vulnerability scanning tools he tested. He concluded that these tools — Almanax, Corgea, ZeroPath, Gecko, and Amplify — are capable of finding real vulnerabilities in complex code.

The Register's conclusion? AI tools "when applied with human intelligence by someone with meaningful domain experience, can be quite helpful."

jantangring (Slashdot reader #79,804) has published an article on Stenberg's new position, including recently published comments from Stenberg that "It really looks like these new tools are finding problems that none of the old, established tools detect."

AI Slop? Not This Time. AI Tools Found 50 Real Bugs In cURL

Comments Filter:
  • The real question is how many wrong reports those 50 had to be filtered from. If it is a larger number, then this is still a fail and unusable.

    • by backslashdot ( 95548 ) on Sunday October 12, 2025 @11:40AM (#65719856)

      Huh? It's clearly usable since they obtained this list of 50 valid bugs. How is that a fail. Without the AI it may have taken a decade to find these bugs by which time North Korea may have found one or two that we'd have missed.

      • by Shaiku ( 1045292 ) on Sunday October 12, 2025 @01:45PM (#65720068)

        If it reported 1500 bugs but only 50 of those were valid, then the signal to noise ratio is too low for a real person with budgeted time to be able to filter the output.

        • by gweihir ( 88907 )

          Indeed. It may have been done this once, but it is not sustainable. I have no idea why people do not know or understand this well-established basic fact. Seems the AI fans are operating more like cult-members than like rational people.

        • If it reported 1500 bugs but only 50 of those were valid, then the signal to noise ratio is too low for a real person with budgeted time to be able to filter the output.

          If a real person with budgeted time only finds a handful otherwise then that is still a vast improvement. Noisy signals can still point you in the right direction better than no signal at all.

        • Of course. Scientific Method accounts for 100% of valid concerns. I give 50:50 odds there is a crappy article about medicine in the next 20 years.
        • Assuming you're the manager, compare: the time (or cost) it takes in highly skilled security specialist to identify 50 security issues, to: the time (or cost) it takes to a bug wrangler to filter out 1450 invalid reports and identify 50 security issues. I assume a bug filtering employee is probably less expensive than a security consultant.

          Also, 1500 candidates is a fixed (even if large) number that ensures you can manage the task with your resources, for example you can cap the time they spend on each so t

        • by SeaFox ( 739806 )

          ...too low for a real person with budgeted time to be able to filter the output.

          I'm waiting with bated breath for someone to suggest an AI filter the reports so the human can work more efficiently.

          • I'm waiting with bated breath for someone to suggest an AI filter the reports so the human can work more efficiently.

            Well, obviously that's your mistake. You should have asked the AI whether it''s a good idea to replace the humans with AIs.

    • by Tailhook ( 98486 )

      If it is a larger number, then this is still a fail and unusable.

      Do you have any rational basis for this claim? If there were 101 reports, and 51 were bogus, the discovery only 50 legitimate flaws in a widely used and mature code base is somehow an unworkable process?

      I believe we're witnessing the emergence AIDS. AI derangement syndrome.

      • by gweihir ( 88907 )

        This is really well established. False positives make any detection system unusable. Sometimes not initially, but always in the longer run. This is _basics_.

        Also, what about vulnerabilities these tools do not find? If there are patterns for those, this helps the attackers.

        • by Tailhook ( 98486 )

          This is really well established.

          [citation needed]

          If there are patterns for those, this helps the attackers.

          Now you're engaging in a regression chain: If false positives are high it's useless. And if it's not useless, it helps attackers. And if it doesn't help attackers...

          • > [citation needed]

            If your test has a high false positive rate, you are spending extra time and resources investigating potential problems that are not actually problems. It also undermines trust in the system. The real world consequences of this are not hard to spot; If the fire alarm in your office or apartment has a record of going off without there ever being an actual fire, how much more likely are you to delay acting every time it goes off, or ignore it completely? People die because of this effect

            • by gweihir ( 88907 )

              Thanks. Good to see that some people are aware of the basic facts of the matter.

            • If the fire alarm in your office or apartment has a record of going off without there ever being an actual fire, how much more likely are you to delay acting every time it goes off, or ignore it completely? People die because of this effect.

              Now apply the same logic to a colonoscopy.

              • > Now apply the same logic to a colonoscopy.

                Okay?

                A colonoscopy is diagnostic. It analogy in terms of fire alarms is a fire drill, where you practice efficient evacuation and condition yourself to respond to the alarm.

                What's your point? Did you have one?
                =Smidge=

        • Depends on the resources you have to clean up your false positives. I wonder how many false positives you can have per true positive in this case to be worth manually lookingat all the reports. In other cases it can be automated, and then it is a case of compute resources needed to verify the first detector. If it has a kind of graduation system, you can put your efford into the most likely candidates first.
      • by MrLint ( 519792 )

        Is this what clanker apologetics looks like?

        • by gweihir ( 88907 )

          I think so. Or mindless cult-like fanbois that are mentally incapable of seeing or accepting any problems with their fetish.

        • by flux ( 5274 )

          Would you describe your refutal as being based on logic and facts, or rather an appeal to emotion?

    • by Kisai ( 213879 )

      My guess is that there was probably thousands. It took someone who actually knows what the heck is being reported to ignore those.

      Like one of the problems for just about everyone who operates a website is getting thousands of bug reports from China and India "bug hunters" who ask for bug bounties while disclosing absolutely nothing. These all go straight to the trash. If you're not willing to give a two sentence explanation of what "bug" you found, I'm assuming you just thought pressing F12 and seeing the s

      • by gweihir ( 88907 )

        Indeed. And then there is the problem of bugs it does not find. Creating a false sense of security is worse than knowing your code is insecure.

    • Need:
      - Number of possible bugs found by the AI
      - Cost or runtime of the AI to scan the source code
      - Total human hours needed to analyze them
      - Total human hours needed to setup and run the AI based source code scans (assuming multiple AI tools were used)
      - Breakdown of the findings into

      not a bug
      a style issue
      a bug that is handled elsewhere (null checked parameter is already done in another function)
      a minor bug
      an invalid OS library call
      a post-API or other call which does not check for the right return value or m

      • by gweihir ( 88907 )

        Indeed. Without the full picture, this is just mindless "It did find bugs!" hype that has no real-world meaning. This hype is really stupid, bust seem to have become the standard for AI "success" reporting.

        I can find all bugs in cURL or any other software trivially, after all: Simply report all code. That is obviously a completely useless approach, even if it does really find all bugs.

        • The person who made the report is a professional penetration tester. His usual method is to look for anything that could be wrong and then test whether it actually is. What he found is that the AI tools came up with potential issues he hadn't thought of, and they weren't all wrong, so it's a valuable tool to him because he normally runs out of ideas rather than running out of time to test them. He complained about the UI making it hard to go through large lists of reported issues exhaustively, and he only u

    • If a real expert did the work of validating the bug reports before submitting them, then I don't see the problem here. At least for the time being. The AI would have served as a useful tool, even though the bugs need to and should be validated a couple of times. Just part of the process of fixing the software without introducing new bugs or regression bugs. (But of course the worst bugs are going to involve interfaces with other software...)

      So from that relatively optimistic perspective, you could argue thi

      • by gweihir ( 88907 )

        You do not see the problem that filtering out the false positives and validating the rest may have taken more effort than can be spent longer-term? Or more effort than manual bug-search may have taken? There is indications that using AI coding assistants reduces productivity by about 20%. I would not be surprised if we see something like that here as well.

        Sure, these bugs have to be fixed, because attackers will use AI as well to find them, but overall it is quite possible AI use makes the situation worse.

        T

        • by shanen ( 462549 )

          Currently reading Nexus and feeling increasingly bleak about the future...

          • by gweihir ( 88907 )

            Well, whenever things start to look up, some assholes with power and money start being destructive and turning things back to shit.

  • I'll quote TFS just in case:

    Rogers identified assorted bugs and vulnerabilities

    • by LindleyF ( 9395567 ) on Sunday October 12, 2025 @11:56AM (#65719880)
      Don't expect AI to be autonomous. It's not there yet. But that doesn't mean the human/AI pair programmer isn't better than the human alone.
      • Don't expect AI to be autonomous

        Why not? You must have missed all the hype that's telling me how humanity will be out of things to do because of the "AI".

        In fact, there is no "AI", and the human programmers have always "paired" with tools to do their jobs.

        • Thats overstating things. There are absolutely new tools that do things tools couldn't do a few years ago. We used to need to review pages of documentation to find what we need; now we can just ask, and it will point to the relevant section of the documentation. Usually. But that's just a search engine! Well, yes, but also no. It's qualitatively different.

          These tools are commonly called AI. Disagree with the name if you like, but not the reality.
          • It's qualitatively different.

            No, it isn't.

            Good luck using a complex piece of software for a non-trivial task just by "asking AI" instead of understanding the documentation.

            • That's not at all what I said. There is a wrong way to use AI. There is also a right way. We, collectively, have been figuring out the difference between the two. "Ignore it" isn't the right answer.
      • by JBMcB ( 73720 )
        Copilot doesn't produce good code all the time. But it does often enough that it's worth trying it out first.
  • Conclusion (Score:5, Insightful)

    by kmoser ( 1469707 ) on Sunday October 12, 2025 @11:46AM (#65719866)

    The Register's conclusion? AI tools "when applied with human intelligence by someone with meaningful domain experience, can be quite helpful."

    Experts using tools can actually get work done? That's some fine reporting there, Lou.

    • Is that I can find a bunch of potential bugs and vulnerabilities and then somebody can check them.

      It's much harder to find a bug that you don't know is there than it is to validate whether something you think is a bug is or not.
    • by ebunga ( 95613 )

      This is like that person that self-posted their own blog to HN with their remarkable insight: things that fit in your CPU's cache are faster than things that sit way out there in main memory. And yes, they did indeed ask ChatGPT to give them numbers, how did you guess?

  • There was a point in the development of chess engines where a human and chess engine combined would perform better than either alone. This is no longer true, the human does not add value. I assume that coding bots are going to go the same way.
    • by Kisai ( 213879 )

      No the thing is, ever since the 90's, it was possible to compute all possible moves. Like a lot of early chess games on the 8088/8086/80286 the "computer" AI players had to be nerfed to only think ahead by X many turns because they would be unbeatable.

      Like I remember playing Battlechess, and the computer would sometimes take a whole minute to "think" before making a move. Computers are on the whole 1000 times faster in clock speed alone since then, never mind anything else. I'd assume that anyone playing ch

  • by Gravis Zero ( 934156 ) on Sunday October 12, 2025 @12:39PM (#65719950)

    "In my view, this list of issues achieved with the help of AI tooling shows that AI can be used for good," he said in an email. "Powerful tools in the hand of a clever human is certainly a good combination. It always was!"

    * This is how AI should be used, not the view from corporate executives and MBAs it should be used to create machine slaves to replace people.
    * Powerful tools in the hands of foolish humans results in bullshit bug reports and "vibe coded" applications with a total lack of critical analysis.

    AI has real uses but it is far from being [slashdot.org] the wish-granting genie [slashdot.org] that a significant [slashdot.org] portion of [slashdot.org] the population [slashdot.org] seems to [slashdot.org] believe it is. [slashdot.org]

  • by EmperorOfCanada ( 1332175 ) on Sunday October 12, 2025 @12:41PM (#65719952)
    I use various AI tools to not only identify bugs I am presently hunting, but to just give my code a code review for performance issues, and bugs in general.

    The tools I use are fantastic at this. But, there is a massive caveat. I can look at the bug identified, and I can then proceed to fix it. Great. But, if I use the AI tool to provide me the "fixed" code, it is often very broken. To the point of not compiling, or leaving out major functionality. Along with it may very well introduce major bugs of its own.

    One of my favourite examples was where I was using threading very correctly. It then yanked out everything which was there to prevent obvious race conditions and other critical aspects of threading. It was hot garbage. But, the original bug I had been hunting was correctly identified.

    AI is a very useful too, but it is not a programmer. I'm sick of seeing people think it is a programmer by "proving" this with apps with about the complexity of a TODO app.
  • "Joshua Rogers, a security researcher based in Poland."

    I know that "based" doesn't mean he was born there, but did anyone else do a double take at that name + country combination?

    That's right up there was "Dmitri Peskov, a cattle breeder working out of Texarcana, TX"...

  • "A fool with a tool is still a fool"

    You have to know how to use the tool...

If the code and the comments disagree, then both are probably wrong. -- Norm Schryer

Working...