Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
Programming

Stack Overflow Data Reveals the Hidden Productivity Tax of 'Almost Right' AI Code (venturebeat.com) 55

Developers are growing increasingly frustrated with AI coding tools that produce deceptively flawed solutions, according to Stack Overflow's latest survey of over 49,000 programmers worldwide. The 2025 survey exposes a widening gap between AI adoption and satisfaction: while 84% of developers now use or plan to use AI tools, their trust has cratered.

Only 33% trust AI accuracy today, down from 43% last year. The core problem isn't broken code that developers can easily spot and discard. Instead, two-thirds report wrestling with AI solutions that appear correct but contain subtle errors requiring significant debugging time. Nearly half say fixing AI-generated code takes longer than expected, undermining the productivity gains these tools promise to deliver.

Stack Overflow Data Reveals the Hidden Productivity Tax of 'Almost Right' AI Code

Comments Filter:
  • by gurps_npc ( 621217 ) on Thursday July 31, 2025 @09:21AM (#65557556) Homepage

    But it does everything else horribly.

    Think of it as a partly trained intern. Tell it to do something it has done before or something really simple and it does a good job. Then you start to trust it and think it is smart, so you give it more and more.

    When it fails, it does not come forward and ask for help. Instead it panics, lies, makes up crap, and covers up it's failures.

    • I would call ai a knowledgeable but clumsy assistant that should check its code by running it.

      I literally today got code for a parser from Copilot that did "if lines begins with '%' ... else if line begins with '%%' ...".
      • Ship It! (Score:4, Funny)

        by SlashbotAgent ( 6477336 ) on Thursday July 31, 2025 @09:58AM (#65557666)

        I literally today got code for a parser from Copilot that did "if lines begins with '%' ... else if line begins with '%%' ...".

        It compiles. Ship it!

      • Well, if you use something like Cursor or its competitors, you can perfectly well include comprehensive test writing, compilation, linting and loops until the code passes your every requirement.

        It can still get stuck in some deeply worrying loops, go tumbling down a rabbit hole, or make a hundred additional classes and functions to solve a simple problem, but if your prompts and rules are good enough, you'll get your grunt work done exceedingly quickly.

        TDD is king with AI tooling. Ensure you read and unders

        • Also: very important to leverage instructions. Ensure that every chat/agent session starts with the AI reading a file containing all your instructions on how it should behave. It makes the world of difference. Write Once, Read Many.

      • I think people just expect too much from AI. Ask it to make a full UI email client and yes it may fail. But I have asked copilot things like "I need a python script that uses selenium to scrape all the images from this page" then "only take the images with the 'cats' directory in the link", then "click on the next page link and download these images until there are no more pages".

        I ask it to show all the code after each query and paste the code and try to run it between of course. It will give many hel
        • It entirely depends on what language you are working in, which is expected due to it's "intelligence" coming from scraping the internet. It seems to be very good at python.

      • by leptons ( 891340 ) on Thursday July 31, 2025 @12:35PM (#65558048)
        I asked the "AI" to write a function to get all objects in an S3 bucket.

        It delivered a working function. It gets all files in an S3 bucket. But, only if the bucket contains less than 1000 objects. Fortunately I already knew how this API works, and this was my first test of the AI to see what it would deliver. It failed.

        The S3 getObject function is paged. The function the "AI" wrote only delivers the first 1000 objects in the bucket, so if the bucket has more than 1000 objects then this is a pretty bad bug. And it would totally ship unless the human in charge knows the specifics about the API. A junior using the "AI" isn't going to realize this. They will test it with a bucket that has a few objects, and it will be seen to be working. They'll submit the PR, and someone may approve it because they probably aren't going to be reading the API docs when they look at the PR to know that the API is paged. Then when there finally are more than 1000 items in the bucket in a production environment, a customer will probably find the bug.

        Sure, if I ask the "AI" to write a function that uses paging and is recursive, the "AI" will deliver that result because someone has already written it and the "AI" has vacuumed up that code to spit it out again, but you have to be so specific about what you want that you could have just written it yourself, with certainty that you are getting the result you need without creating a subtle time-bomb of bug.

        How is "AI" really helping anyone with outputs like this? It's hurting more than helping in many situations.

        Just like everything else in the world, now programming is in a race to the bottom.
        • by Guignol ( 159087 )
          If you are an expert in the field of whatever it is you want to make happen with AI, then AI is fantastic, you can ask it to do something you would have done faster (i mean counting the debugging retries sessions), and better, and sometimes you might even get to obtain what you asked for
          Compare that with having to ask to some young human trying to learn and grow, who wants to deal with those creatures ? it's gross, I suppose
          So anyway, as an expert, totally capable of writing the code yourself, you can be
          • by leptons ( 891340 )
            I write detailed Jira tickets for my team to work on. I am very specific about everything, providing APIs to use and everything else. While writing them, I'm thinking to myself, "what's the chance that "AI" would give a working result, vs a junior". Every time I am glad I have actual humans reading the instructions and producing results. AI is just way too frustrating to deal with constantly getting wrong answers and not what I wanted, and it's not because the instructions are lacking. With "AI" it's always
    • by Nuitari The Wiz ( 1123889 ) on Thursday July 31, 2025 @09:53AM (#65557654)

      A partly trained intern would learn and get better. An AI model keeps making the same mistakes, again and again and again.

      • by sjames ( 1099 )

        The intern also likely burns a lot less resources.

      • It won't if you add your specification to the md file.
      • by Alinabi ( 464689 ) on Thursday July 31, 2025 @01:47PM (#65558220)
        As America's favorite nazi (Werner von Braun) once said, "the human brain is still the best supercomputer that can be mass produced using only unskilled labor."
    • Yup, first semester intern that needs absolute explicit instructions. The only immediate benefit is that they type really really fast

    • by dvice ( 6309704 )

      I would not say that AI does scut work well. I have a case that is trivial to do, you could even hire a first grader to do it, but, AI does it with 90% accuracy, when 100% accuracy would be needed.

      Instead what AI does really well is work where accuracy does not matter. AI is good solution when 90% accuracy is good enough for you, but if you don't want any mistakes in your data, you should not use AI to make it. Good example of such work is writing proof on concept code. Something that you use to test your i

    • "Think of it as a partly trained intern." Think of it as a partly trained hamster. Fixed it for you
  • Such a surprise (Score:5, Insightful)

    by gweihir ( 88907 ) on Thursday July 31, 2025 @09:27AM (#65557572)

    Nobody saw that one coming...

    • Re:Such a surprise (Score:4, Informative)

      by Anonymous Coward on Thursday July 31, 2025 @09:29AM (#65557574)

      programmers have been saying it for years - it takes far more time to review and debug code than it does to write it in the first place.

      why this would be a surprise to anyone, I can't even imagine.

      • by BeerCat ( 685972 )

        It's a variation on the old truism about project management:

        80% of the project takes 80% of the time. The remaining 20% also takes 80% of the time.

      • by Creepy ( 93888 )

        Well, yeah - I've seen AI actually write decent code... without error conditions, or bounds checking, but following the happy path, it really does work. Still needs to learn how people with bad intentions get in, but, you know, the happy path code is serviceable. Can save me a few hours, but I still need to program in sad paths. Don't know if that is more time or less time, really, but it kind of negates itself, I still need to code review and fix shit.

        • Try telling it to add tests for specific conditions you want to check. The good ones will fix the code as necessary.
        • by gweihir ( 88907 )

          Look at the Crowdstrike disaster for what "following the happy path" does. Incidentally, the happy path is 10% of the work and it is the easiest part.

      • Code review is slow UNLESS you are pair programming, e.g. the reviewer knows exactly what you are trying to do and how you're trying to do it. This explains the intuition that enhanced autocomplete is a good thing, but we get nervous using AI for more.
      • Re:Such a surprise (Score:4, Insightful)

        by sjames ( 1099 ) on Thursday July 31, 2025 @12:08PM (#65557982) Homepage Journal

        It plays into the leisure class's lifelong dream of being able to jettison the unwashed masses and keep the money for themselves without having to resort to learning how to do things or (God forbid) doing things themselves.

        Most of us "unwashed masses" understand that things that sound too good to be true probably are, but that's because we haven't grown up in a world where we give orders, shuffle a couple of decimal points, and then sign our names to take credit for the hard work of thousands of people.

    • The real question is what happened to those other 33%.
      • by sjames ( 1099 )

        We'll never find out. Tech magazines don't interview people living under a bridge fighting raccoons for food scraps behind the McDonalds...

    • I guess we can put off those unemployment applications a little while longer.

  • I told ChatGPT I wanted it to implement a particular open source Java interface using a specific major version of a dependency. It mixed imports from the previous and the current major versions. Obviously... that's a problem when the major release is a major rewrite of the public API.

    I asked it specifically "restrict to version X.Y.Z," it confirmed it was going to do that, then went right back to generating mixed major release code.

    Wasn't a problem for me. Took 5 minutes to debug with IntelliJ's decompiler

  • You mean I can't get something for nothing? Gee, what good is it then?

  • by theodp ( 442580 ) on Thursday July 31, 2025 @09:58AM (#65557664)

    Claude Code is a Slot Machine" [rgoldfinger.com]: "I'm guessing that part of why AI coding tools are so popular is the slot machine effect. Intermittent rewards, lots of waiting that fractures your attention, and inherent laziness keeping you trying with yet another prompt in hopes that you don't have to actually turn on your brain after so many hours of being told not to. The exhilarating power of creation. Just insert a few more cents, and you'll get another shot at making your dreams a reality."

  • by The-Ixian ( 168184 ) on Thursday July 31, 2025 @10:03AM (#65557676)

    I am not a developer but I do write Perl and PowerShell scripts when the need arises.

    I usually learn as I go and enjoy the process of figuring out how best to turn my problem into a well-functioning automation.

    Sometimes, though, I would just like to get the LLM to output something really easy, like getting a list of all users in my company and running a simple process on them. It's something I could figure out and write in maybe an hour of web searching and document reading.

    What I get though, is broken with invalid cmdlet parameters or not optimized like not using built-in filtering and relying on client-side filtering.

    I am spending more time figuring out what is broken and doing the research to do things properly anyway. So, in the end, I haven't saved any time.

    Maybe I am just not good at writing the prompts in the first place.

    • by allo ( 1728082 )

      For such scripts it is less prompt skill (heck, that whole "prompting skills" is overrated) but about giving the spec correctly and completely and using a model that knows the programming language you want to use well. Your script sound like something that ChatGPT could do one year ago without too much work on the prompts, at least with python and bash, no idea about powershell.

  • What's the issue? (Score:5, Insightful)

    by smooth wombat ( 796938 ) on Thursday July 31, 2025 @10:23AM (#65557720) Journal

    This is how humans operate. They can produce code, but there are subtle flaws which are revealed only through debugging.

    AI is being trained on stuff produced by humans. Why would you expect it to be any different?

    • I have wondered if we should be building a âoethoroughly debugged codeâ repository, where lots of programmers vet the code, and then see what happens with an LLM trained only on that.

    • by Anonymous Coward

      Because billion/trillion-dollar companies are telling us it is different. The news cycles are constantly banging on about how AI is going to revolutionize everything, and put half of society, or more, out of a job. We've been hearing it constantly for over a year now. You're telling me it's my fault for believing any of it?

    • AI code is worse than human dumbness. I'll tell Copilot to refactor a function in a specific way, and it does it, but then embeds the new function in the old one, rather than replacing it. That's not stupidity that a human would produce.

      But "the issue" is that a lot of people bought the hype that AI was the new master of the universe, and would take over all our jobs in short order. I think we still have some breathing room.

  • As expected (Score:4, Interesting)

    by MpVpRb ( 1423381 ) on Thursday July 31, 2025 @11:13AM (#65557812)

    There is a big difference between creating a cool, simple demo for youtube and writing solid, bulletproof code
    I use AI tools to guide me through complex and confusing documentation, but always check the doc to make sure
    I use AI tools to create sample code that I study and then write my own version once I understand the sample
    The fiction of claiming that non-programmers can effortlessly "vibe code" complex systems is dangerous

  • Vetting and maintenance have always been the bottlenecks of software development, not code creation. RAD pushers keep selling clueless bosses on the creation part. RAD pushers have been around for more than 5 decades.

    (RAD can be done right, and reasonably flexible, but one has to accept certain conventions. They may be good conventions, but people are spoiled and want it their way.)

  • Stack Overflow doesn't want you using LLMs, they want you to use their service ... which involves sifting through endless inapplicable/ill-informed/obsolete responses that may or may not work--while viewing ads the whole time.

    My experience with LLMs is spectacular, but I work within their capabilities and don't expect them to do my job for me.
  • Ya, but I'm guessing it's better than Really Right AI code from MechaHitler [rollingstone.com]. :-)

    Progress... Two steps forward, one goosestep back. :-)

  • So far it seems for every computer needed to make one persons job a bit easier, two more people are needed to maintain those computers and four more people are needed to recover from the inevitable problems that computer quietly creates.

    But sure, AI will save the world - just like computers turn a 100 person company into a 10000 person company.... to do the same tasks, but those initial 100 people now have it easy.
  • by Anonymous Coward

    Developers WHO USE STACKOVERFLOW are growing increasingly frustrated with AI coding tools

    Maybe the ones who are proficient using AI tools don't visit StackOverflow anymore.

  • If you don't give a developer sufficient context you will get bad code, that is just as true for a human as it is for an LLM. AI spec files and automated testing are mandatory if you want to be successful at this, otherwise you're just as bad as the product manager that gives you a half baked request for a feature and doesn't like what you delivered. If the code has bugs, well then you need to improve your spec file and add more tests. You have to remember that AI isn't deterministic, it is probabilistic
  • by Gilmoure ( 18428 ) on Thursday July 31, 2025 @04:12PM (#65558566) Journal

    DOGE To Rewrite SSA Codebase In 'Months' (wired.com)

    https://developers.slashdot.or... [slashdot.org]

Lend money to a bad debtor and he will hate you.

Working...