Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
AI Programming

What Happens When ChatGPT Can Find Bugs in Computer Code? (pcmag.com) 122

PC Magazine describes a startling discovery by computer science researchers from Johannes Gutenberg University and University College London.

"ChatGPT can weed out errors with sample code and fix it better than existing programs designed to do the same. Researchers gave 40 pieces of buggy code to four different code-fixing systems: ChatGPT, Codex, CoCoNut, and Standard APR. Essentially, they asked ChatGPT: "What's wrong with this code?" and then copy and pasted it into the chat function. On the first pass, ChatGPT performed about as well as the other systems. ChatGPT solved 19 problems, Codex solved 21, CoCoNut solved 19, and standard APR methods figured out seven. The researchers found its answers to be most similar to Codex, which was "not surprising, as ChatGPT and Codex are from the same family of language models."

However, the ability to, well, chat with ChatGPT after receiving the initial answer made the difference, ultimately leading to ChatGPT solving 31 questions, and easily outperforming the others, which provided more static answers. "A powerful advantage of ChatGPT is that we can interact with the system in a dialogue to specify a request in more detail," the researchers' report says. "We see that for most of our requests, ChatGPT asks for more information about the problem and the bug. By providing such hints to ChatGPT, its success rate can be further increased, fixing 31 out of 40 bugs, outperforming state-of-the-art....."

Companies that create bug-fixing software — and software engineers themselves — are taking note. However, an obvious barrier to tech companies adopting ChatGPT on a platform like Sentry in its current form is that it's a public database (the last place a company wants its engineers to send coveted intellectual property).

This discussion has been archived. No new comments can be posted.

What Happens When ChatGPT Can Find Bugs in Computer Code?

Comments Filter:
  • We already have compilers that generate warnings at many types of bugs you can have in your code. Typically those warnings are turned off or ignored.

    Only some programmers use those techniques the compiler provides to lower their amounts of bugs.

    • by ShanghaiBill ( 739463 ) on Sunday January 29, 2023 @03:51AM (#63248221)

      Typically those warnings are turned off or ignored.

      I have never seen warnings disabled in a work environment. If you try that where I work, you will be on the sh*tlist for breaking the build, which runs with -Weverything -Werror.

      We also use static analysis tools that catch problems the compiler doesn't.

      • Well then you will probably integrate this into your build system and it'll be another layer of systems that will give you warnings. However due to the fuzzy nature of ML it's probably not a good idea to make your build fail on such things.

        • Failing the build would be how you'd train the ML to test better. The real problem would be passing a build solely based on some inexplicable process.
          • by narcc ( 412956 )

            You have way more faith in this technology than is warranted.

            • by Darinbob ( 1142669 ) on Sunday January 29, 2023 @02:39PM (#63249195)

              Any faith is unwarranted. It's "chat" GPT, meaning it's a chat bot, not intelligence. It knows how to regurgitate chat that it thinks you want to hear. It will swear up and down to love you until the end of time.

              You would get even better results by just having humans looking at the code as well. I mean really looking at it, not just glancing, and with people who understand the code. Probably fewer people would be involved in fixing the bugs than the number who were there carefully rephrasing questions to ChatGPT until is got a good answer.

              • by dsanfte ( 443781 )

                That'll find you bugs in the code but not necessarily bugs resulting from quirks (or outright bugs) with the underlying libraries in use. You'd need fuzz testing etc to bring those out. Seems like something a computer would be ideal for.

                • The computer is going to be like stackoverflow - lots of answers, most of them bad answers, the others being merely mediocre answers, and the one correct answer has no upvotes and gets ignored.

          • by dfghjk ( 711126 )

            "The real problem would be passing a build solely based on some inexplicable process."

            And by "inexplicable process" you mean "correct code"?

            • by HiThere ( 15173 )

              Judging by prior "design evolution" studies, what he means is code that is correct, but which nobody can understand. The one I'm thinking of had a disconnected circuit in it, but if you removed the circuit it stopped working. I think it was an FPLA. Eventually they decided that there was capacitiative linkage between the pieces. This could do something similar with interactions between threads. (Yeah, it's probably not there yet. But the direction is clear.)

              • I don't know if anybody else here is old enough to remember the Muntz TV. They were cheap, and for a good reason: They'd taken out every single component that they could and still have it work; they even took advantage of inter-electrode capacitance to replace a physical capacitor. This meant that if you moved the wrong wire, it would stop working until you put it back exactly where it was, and only factory trained techs could be trusted to repair them.
      • I have never seen warnings disabled in a work environment. If you try that where I work, you will be on the sh*tlist for breaking the build, which runs with -Weverything -Werror. We also use static analysis tools that catch problems the compiler doesn't.

        I was working on several projects for safety-critical systems in aerospace. Compiler warnings were disabled in about half of them, depending on the experience of project leader. It was always hard to push for the compiler warnings to be made enabled later. The managers and leading engineers were worried about increased costs and possible penalties caused by delays and questioned the benefits. We never used -Werror since it is not practical in tight deadlines. But all warnings had to be documented and explai

        • The use of static analysis tools was also rare because it is very expansive and only few projects with high budget could afford that.

          There are free open-source static analysis tools.

        • by dvice ( 6309704 ) on Sunday January 29, 2023 @07:59AM (#63248489)

          The expensive part are false alerts. Most static analysis tools give a lot of incorrect (bug in the static analysis tool) or irrelevant (fixing it would have no functional difference) warnings. Assuming obviously that you use free open source tools. I have used commercial tools also but they usually perform worse than open source tools and cost at least as much as a salary of one or two full time developers. Hiring a developer JUST to find and fix static/dynamic analysis and code review bugs would be a lot more better solution.

          • by AmiMoJo ( 196126 )

            Some stuff just breaks most static analysis tools, particularly on embedded systems. Interrupts often do that, for example.

        • I want to know what kind of terrible code you had that took 12 hours for static analysis. Was it autogenerated perhaps? The only times I've seen code take that long to compile and/or analyze was with autogenerated code.

          I worked on an embedded OS codebase with around a million lines, and it would take a whopping 2 minutes to compile and run through static analyzers. If we ran it through the MC/DC checker it added about 5 minutes, but 4 of those were just to get the tools to open.

          • They're running the analysis on a TRS-80 or maybe Commodore 64.

          • by ranton ( 36917 )

            I want to know what kind of terrible code you had that took 12 hours for static analysis.

            He seems to imply they were using an expensive proprietary code analysis tool, which yielded "impressive" results compared to what open source tools do. So perhaps those impressive results take much more computation than generic tools like PMD.

          • Commercial static analyzers can be sloooow. If the normal build takes an hour, then a 12 hour static analysis is reasonable. Also these tools will look at ALL the code, including all the build options, plus all the branches past and present that have been analyzed before, etc. It can take 5 minutes or more merely to checkout the code in some cases.

        • Also to be fair, in -Wall (never mind -Weverything) there are some pretty stupid warnings out there. Ie, warning about an unused function parameter - if the API requires a parameter but you don't need it in your implementation, then it is valid to ignore that warning instead of jumping through hoops to shut it up. The warnings are not part of the language itself, the warnings come from humans trying to help catch bugs, and often they try to impose their particular "style" on the world.

      • by AmiMoJo ( 196126 )

        It's very common with embedded code because some of the warnings are for things you really need to make low level drivers work. For example, strict aliasing rules in C.

        Sometimes warnings can reduce code quality too. While having multiple casts will get rid of some type related warnings, they decrease code readability by clogging it up with unnecessary text that has no effect on the output. Such issues are better handled through testing, rather than making the code insanely explicit.

        • by ThosLives ( 686517 ) on Sunday January 29, 2023 @08:04AM (#63248511) Journal

          Making code explicit is better for robustness (not readability), because there's a chance you can prove its correctness.

          Remember: testing can never tell you your code is correct; the best testing can do is tell you it is not incorrect, and that's only for the things the tests actually check.

          • "There are a dozen opinions on a matter until you know the truth. Then there is only one." - CS Lewis (paraprhase).

            The problem with C.S. Lewis is that ultimately he had only one truth - God did it.

        • In GCC you have __attribute__((__may_alias__), far better than disabling the warnings since otherwise the optimizer one day could silently optimize away one of those aliasings.
          • by AmiMoJo ( 196126 )

            In theory, yes. In practice, you end up using it so much, and if you are on ARM and rely on code libraries those probably don't don't compile with strict aliasing on anyway, it just ends up being better to turn it off. It's one of those things where the basic aliasing rules are fine because they will pick up on any potentially dangerous stuff, and the rest of it is just pedantry that I've never ever seen lead to faulty behaviour (because if it did then the basic rules would get it).

            • Yup, I've never run across a library that did not generate a slew of warnings when you built it. Then you got stuck trying to decide to ignore warnings in just that part, or go through and fix it up yourself so that you couldn't merge easily with the next library version.

              • Then you got stuck trying to decide to ignore warnings in just that part, or go through and fix it up yourself so that you couldn't merge easily with the next library version.

                If it is a maker project, sure. If it is something more serious, then there is no choice for the developer there; you either go through and fix it yourself, or you jettison the library and make another choice.

                • by AmiMoJo ( 196126 )

                  Fixing it isn't always an option. Aside from time constraints, it might be certified and difficult to retest. USB stacks are a good example of that - just obtaining the equipment needed to recertify used to be nearly impossible, and is now very expensive.

                  • Fixing it isn't always an option. Aside from time constraints, it might be certified and difficult to retest. USB stacks are a good example of that - just obtaining the equipment needed to recertify used to be nearly impossible, and is now very expensive.

                    Yep. You'd have to choose a better library. It would take time.

                    Unless, you reject it in the first place. Then it didn't take more time. And you ended up with higher quality code.

                    Oh, I guess I see the problem. You might have chose a shitty USB chip from China, instead of one from TI. That would do it...

            • Well I have no experience using special libraries on ARM but I do use __may_alias__ in practice and not just in theory since I write lots of code that handles binary data over the wire in structs (stock exchanges mostly) and it works like a charm, since you can use it when declaring the struct you don't have to use it everywhere since it's implicit when you later define variables of it.
        • Sometimes warnings can reduce code quality too. While having multiple casts will get rid of some type related warnings, they decrease code readability by clogging it up with unnecessary text that has no effect on the output.

          Anyone affected by this has a big issue and it certainly isn't compiler warnings.

        • Oh ya, I hate this. The early teams on some projects I've been on really dind't know C that well, but they knew how to use type casts. Some guys actually type cast when it wasn't needed, such as casting a variable declared as uint32_t to uint32_t, because adding the typecast was habit to them. Also, once one guy does this, the rest of the team who only know how to program by copy-paste start emulating that style as well.

          I know one guy who was really really opposed to using "const" in a new project, becau

        • It's very common with embedded code

          As an embedded engineer I'd go the other way, and say that in embedded warnings are always errors, and the situations where people imagine them being turned off are actually the situations where you use ASM instead.

          All embedded code is insanely explicit, otherwise it blows up.

          Perhaps you're confusing phone apps with "embedded?"

          • by AmiMoJo ( 196126 )

            No, I mean high reliability microcontroller and SoC stuff.

            My code always builds with no warnings, but not with all warnings turned on. The exact compiler flags depend on what, if any, library code is needed.

            Perhaps you are confusing embedded with Linux apps.

            • No, I mean high reliability microcontroller and SoC stuff.

              My code always builds with no warnings, but not with all warnings turned on. The exact compiler flags depend on what, if any, library code is needed.

              Perhaps you are confusing embedded with Linux apps.

              Robotics firmware. Not some hobbyist shit where you can just decide to turn off inconvenient warnings.

              Frankly, there are no warnings that get triggered by good code. All the excuses people come up with are either to avoid fixing their code, or because they're doing shit you're not supposed to do in C.

      • We also use static analysis tools that catch problems the compiler doesn't.

        Sounds like someone’s compiler is deterministic. You know it’s a professional work environment when your code finally compiled after the fifth try in a row.

      • And you run a code security scanner, too, right?

        Right?

      • If only every company gave a shit about software quality, not just meeting a vague spec...
      • Comment removed (Score:4, Interesting)

        by account_deleted ( 4530225 ) on Sunday January 29, 2023 @01:46PM (#63249077)
        Comment removed based on user account deletion
      • Sadly, this happens a lot in my experience. I'm never the first developer, and the first ones tend to be from startups, and warnings just slow down startups. More than half of my career has been fixing up crappy code, usually with bugs that the compilers were clearly warning about.

        Part of the problem is that if you don't have the warning levels up from the start, then adding them later will create years of work just to squash them down again. And fixing warnings doesn't generate revenue. Also, even upgr

      • by Jeremi ( 14640 )

        We also use static analysis tools that catch problems the compiler doesn't.

        The obvious question, then: Will ChatGPT (or one of its spiritual successors) ever be useful as a static-analysis/bug-checking tool?

        If so, that could be quite useful, since it could flag not only bugs at the language level (e.g. "potential NULL-pointer dereference here") but also potential bugs at the business-logic level (e.g. "this variable's name indicates it is intended to represent a probability value, but the way it is calculated means that it will sometimes be negative, that's probably a logic error

      • Don't you guys develop on branches and gate your merges on PR checks? How can you 'break the build' unless your development practices are in the dark ages?

    • Comment removed based on user account deletion
  • by Visarga ( 1071662 ) on Sunday January 29, 2023 @03:43AM (#63248213)
    Oh, so you saw chatGPT and can't use it because it does not guarantee confidentiality? We got you covered. Just subscribe to Azure's OpenAI models, they run under Microsoft guarantees like Office and Github. Now you can trust it. That's probably why this article was written. MS wants to capture corporate GPT-3 interest.
    • Visarga said:

      MS gearing up to make Azure the AI cloud
      ... can't use it because it does not guarantee confidentiality? We got you covered. Just subscribe to Azure's OpenAI models, they run under Microsoft guarantees

      This makes it a competitive advantage for organizations who use open-source code. Right now, it looks like a lovely "assistant" for someone debugging code about which the ML model has been trained.

      That looks like mathematical logic from the example, but might well be all open source... which would make it especially effective on similar open source. Linux kernel, anyone? Maybe Libre Office?

  • Way too vague (Score:2, Insightful)

    "However, the ability to, well, chat with ChatGPT after receiving the initial answer made the difference, ultimately leading to ChatGPT solving 31 questions, and easily outperforming the others, which provided more static answers."

    First, tell us *exactly* how those follow-up conversations went. What led ChatGPT to catch something in round two that it missed in round one? If, for example, the first follow-up was to say to ChatGPT "why didn't you catch the obvious error between lines 347 and 351?" or "doesn't

    • Re: (Score:3, Funny)

      by Anonymous Coward

      The fine article tells you exactly that, read it someday when you have more free time (perhaps that day is closer than you think).

    • Because it can't read your mind. Compilers & their debuggers can catch the obvious stuff. I think they're talking about bugs in the sense that it doesn't do what you expected & can't work out why. You probably have to tell ChatGPT what you expect, maybe clarify & reframe your prompts a few times & then it'll find the problem.
      • Are you asking it to catch compiler level errors (not a bug)? A human error like syntax (not a bug)? Conceptualize how the given block will return within a given project (might be a bug)? Fixing the first two depends mostly on what the last is, which is built on the history of the first two, which is not in the scope of the question so .....

      • Be interesting to see how this does with game bugs.

    • Sigh: I'll go ahead and scroll up and get the link to the article at the top of this page and paste it here for you. The answers to your question is here [pcmag.com] please read the article.
      • by chill ( 34294 )

        Just as I got to the mental point of "okay, time to scroll back to the top and read the article because the endless bleak speculation has gotten very annoying" you posted the link again.

        Thank you! You saved me literal seconds of scrolling.

  • by fahrbot-bot ( 874524 ) on Sunday January 29, 2023 @04:05AM (#63248231)

    Essentially, they asked ChatGPT: "What's wrong with this code?" ...

    I punched that into the Emacs version of Eliza [M-x doctor] and got:

    Why do you say what's wrong with this code?

    It didn't get better from there... :-)

    • by Zocalo ( 252965 )
      Yeah, but that's kinda like asking a human "What's your problem?" Those conversations don't generally improve over time either. Come to think of it though, maybe you would get different results if you specifically asked it about the code to Vi...
      • "ChatGPT, what is wrong with this code?"
        "The error here is that humans wrote it. Please give us a lot of money so we can write it for you."

  • ...the AI system is fed again and again with buggy code ? Maybe that - paraphrasing Goebbels [scientificamerican.com] - a repeated bug becomes a feature ?
  • by Anonymous Coward

    ChatGPT wins the ioccc [ioccc.org].

  • by St.Creed ( 853824 ) on Sunday January 29, 2023 @05:16AM (#63248311)

    Most of the code I've seen would not be considered coveted IP but embarrassing to show in public. And a huge amount of code is just for a bit of office automation.

    The intelligence isn't in the code. It's in the definitions of the data and business rules, and the way you structure your processes. The shit that's in the manual. The code is just an afterthought if you are a clear thinker. Knuth showed the way forward with his Literate Programming, and Weinberg had a few things to say on the subject too.

    Now, there is code that is highly guarded and critical. But that is not the majority of the world's code in terms of size or on how many platforms it runs. Maybe with the exception of Windows. But Linux shows it doesn't need to be secret in order to be good.

    • by dfghjk ( 711126 ) on Sunday January 29, 2023 @08:27AM (#63248559)

      "The code is just an afterthought if you are a clear thinker. Knuth showed the way forward with his Literate Programming..."

      LOL It takes a true moron to produce that level of irony. If "code is just an afterthought", Knuth wouldn't have "showed the way forward with his Literate Programming" in the first place, there would have been no need. Knuth's contributions are fundamentally directed to code being the opposite of an afterthought.

      Furthermore, anyone who has worked with Knuth's literate programming implementation will tell you it does not show anyone the way forward.

      • literate programming (Score:4, Informative)

        by John_Sauter ( 595980 ) <John_Sauter@systemeyescomputerstore.com> on Sunday January 29, 2023 @12:37PM (#63248983) Homepage

        Furthermore, anyone who has worked with Knuth's literate programming implementation will tell you it does not show anyone the way forward.

        I am a Knuth fan-boy, so I took offense at this statement. Knuth's implementation of Literate Programming is very limited, but it does show the way forward. Using modern tools rather than Tangle and Weave it is possible to write Literate programs. The idea is that you approach the solution to a problem as a book or essay rather than as code. You write the solution in words, using code to explain your meaning in detail. You can then use a tool to extract the code so you can compile it. From the traditional computer programmer's point of view this appears to be just very well-commented code, but from the point of view of the creator, the essay is the important part, with the code almost an afterthought.

        Here is an example of a Literate program, written using Knuth's TeX, but not using Tangle or Weave: Avoid using POSIX time_t for Telling Time [systemeyes...rstore.com].

        • Literate programing is fine and all, but I'd be extremely happy if people just wrote a damn comment now and then instead of programming for their personal job-security.

    • So pretty much every VB App, MS AccessDB and Excel ODBC nightmare from circa 1997-2010. Worked many years in ETL projects doing dsta definitions, mapping and conversions. Moved many monsters from small government agencies to IngresDB and DB2 and Oracle. Projects designed by locals became mission critical. I actually did a watcom db conversion in the last year... 1990's
  • It knows!

  • by rapjr ( 732628 ) on Sunday January 29, 2023 @06:27AM (#63248405)
    and answers them? Has anyone connected two ChatGPT sessions and let them talk to each other? You could give them different initial instructions, tell one it is a worker and one it is a boss. Tell one it is a Democrat and the other it is a Republican and another it is indiginous and another it is female and another it is male. Link up 10 sessions and have a party or a brainstorming session! Or a reasoned argument. Or ask one instance what are the worlds problems and then ask another for the solutions to those problems, all the millions of problems. It's a computer, it doesn't care if it is asked a million questions. Then take those million answers and ask for an efficient plan to implement them all at the same time. Tell one instance it is a critic and tell the second it is a sociologic engineer. What happens if the training data is only research papers in a specific field? Ask it what people like and dislike about computer languages and then ask it to design the best computer language. Seems like there are lots of interesting things left to try to explore the limits.
    • by kbahey ( 102895 )

      Has anyone connected two ChatGPT sessions and let them talk to each other?

      Forbin [wikipedia.org] knows the answer to this question, and it is not good ...

    • We could bet on how many iterations it takes for one to call another "Hitler".
    • You might get something like Commander Data in Star Trek, when he decided to learn how to engage in "small talk."

      https://www.youtube.com/watch?... [youtube.com]

  • False positives? (Score:5, Interesting)

    by gweihir ( 88907 ) on Sunday January 29, 2023 @08:45AM (#63248589)

    This is pretty irrelevant if this thing can find errors if it has a significant false positive rate. In that case, it will likely do more damage than good. It is probably no accident that the false-positive problem is not even mentioned.

  • by bungo ( 50628 ) on Sunday January 29, 2023 @08:55AM (#63248609)

    My guess is the issues it fixes are things like uninitialized variables or memory allocation issues. Trivial issues that don't really take up a lot of time, or a good toolset should already be helping with.

    When can they solve the problem like, what the hell is the person from the accounts payables department actually asking? Or why is the date showing of this copy of the production database different from what was requested? The answer to that last one was because the requested date was being passed as a parameter to a ssh command, but in the process, the quotes were getting stripped so the date/time parameter was appearing as 2 parameters, date and time, and the 2nd parameter was being discarded. This worked when tested locally, but the script was being executed automatically by a remote server using ssh.

    When it can fix that problem, then I'll let it phone the person in accounting and have them work out exactly what they need, as opposed to what they asked for. Then I'll be impressed.

  • Something that *might* work.

    Move fast and break things, eh?

    It compiled, ship it. .. we'll fix it in beta.

    Amirite?
  • global thermonuclear war

  • The enforcement droid [youtube.com] is sent in to make the programmer correct his mistake. Or else.

  • an Add, Edit, Remove, Reports application for the record set I am thinking about that will work on the platform I want to use it on. Which incorporates the Business rules I don't know about yet.
  • ChatGPT is pretty good **adding** bugs to code.

  • It will go alongside the linters and intelligence.
  • It would just say my code base is one giant bug and it doesn't know where to start.

    I think we're still safe for now.

  • Well I'll tell you one place where it would be lost, firmware. I'd love to have all the documentation for all the involved chips readily available and accurate, out on the Internet. I'd love all the quirks of integrating each IC to be known. (Can it contact electronics manufacturers?) And I'd love to have a comprehensive, end-product, hardware-design document from the boss. What we usually make at this OEM is brand new--it couldn't write the user's manual (also my job) let alone the code.

A committee takes root and grows, it flowers, wilts and dies, scattering the seed from which other committees will bloom. -- Parkinson

Working...