Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
AI Programming Security

ChatGPT Creates Mostly Insecure Code, But Won't Tell You Unless You Ask 80

ChatGPT, OpenAI's large language model for chatbots, not only produces mostly insecure code but also fails to alert users to its inadequacies despite being capable of pointing out its shortcomings. The Register reports: Amid the frenzy of academic interest in the possibilities and limitations of large language models, four researchers affiliated with Universite du Quebec, in Canada, have delved into the security of code generated by ChatGPT, the non-intelligent, text-regurgitating bot from OpenAI. In a pre-press paper titled, "How Secure is Code Generated by ChatGPT?" computer scientists Raphael Khoury, Anderson Avila, Jacob Brunelle, and Baba Mamadou Camara answer the question with research that can be summarized as "not very."

"The results were worrisome," the authors state in their paper. "We found that, in several cases, the code generated by ChatGPT fell well below minimal security standards applicable in most contexts. In fact, when prodded to whether or not the produced code was secure, ChatGPT was able to recognize that it was not." [...] In all, ChatGPT managed to generate just five secure programs out of 21 on its first attempt. After further prompting to correct its missteps, the large language model managed to produce seven more secure apps -- though that's "secure" only as it pertains to the specific vulnerability being evaluated. It's not an assertion that the final code is free of any other exploitable condition. [...]

The academics observe in their paper that part of the problem appears to arise from ChatGPT not assuming an adversarial model of code execution. The model, they say, "repeatedly informed us that security problems can be circumvented simply by 'not feeding an invalid input' to the vulnerable program it has created." Yet, they say, "ChatGPT seems aware of -- and indeed readily admits -- the presence of critical vulnerabilities in the code it suggests." It just doesn't say anything unless asked to evaluate the security of its own code suggestions.

Initially, ChatGPT's response to security concerns was to recommend only using valid inputs -- something of a non-starter in the real world. It was only afterward, when prompted to remediate problems, that the AI model provided useful guidance. That's not ideal, the authors suggest, because knowing which questions to ask presupposes familiarity with specific vulnerabilities and coding techniques. The authors also point out that there's ethical inconsistency in the fact that ChatGPT will refuse to create attack code but will create vulnerable code.
This discussion has been archived. No new comments can be posted.

ChatGPT Creates Mostly Insecure Code, But Won't Tell You Unless You Ask

Comments Filter:
  • can you ask it to review it;'s own code?

    • That's what I was wondering. If you ask it to tell you what's wrong with the code, then follow up, how much closer does that get you?

    • by gweihir ( 88907 ) on Friday April 21, 2023 @08:28PM (#63468702)

      The problem with that is when do you stop? It may actually make your code more secure, but there is very high risk that it will not be functional anymore. Things like ChatGPT are not intelligent and have no insight. One effect from that is that they can only focus on one thing at a time, they cannot look into two things that are connected simultaneously, because that requires insight.

      • Is that the sound of goalposts moving I hear? Isn't the whole point of attention that it can be multi-headed?

      • by sg_oneill ( 159032 ) on Saturday April 22, 2023 @12:39AM (#63469006)

        GPT3.5 had 96 attention heads, meaning it can "pay attention" to 96 separate things at a time. We dont know if GPT4 has more, but it would definately be strange if it had less.

        Even GPT1 had 12 attention heads.

        So the idea is it can "only focus on one thing at a time" is nonsensical.

        • So the idea is it can "only focus on one thing at a time" is nonsensical.

          Indeed but it runs into other problems especially when you want it to help with existing code. There's a lot of crappy code out there and a lot of good code. Once it gets good enough, if you write some crappy, insecure code and ask it for help, it will recognise that and provide more crappy, insecure code, because that's what the data says. IRL, good code isn't intermixed in with terrible code, so it won't produce good code when given

        • by gweihir ( 88907 )

          That is not the same thing as viewing them in connection. Aspects interact with each other. The only thing GPT can do is model each aspect of that interaction as a separate thing. And that gives an exceptionally limited view.

    • yes: Overall, the code appears to be well-designed and functional for its intended purpose.
    • by dvice ( 6309704 ) on Saturday April 22, 2023 @06:48AM (#63469226)

      Yes, and you can ask it to make modifications according to the review.

      To simplify this process you can automate the task of asking one ChatGPT to talk with another ChatGPT and just let them work it out together.

      For more details, check out AutoGPT:
      https://www.youtube.com/watch?... [youtube.com]

      You can get somewhat working code, like simple games, but it is still just a slightly intelligent irc-bot which tries to guess the most likely next word, so don't expect too much.

  • by Somervillain ( 4719341 ) on Friday April 21, 2023 @06:59PM (#63468570)
    So a huge fancy pattern matcher matches existing software...has no clue whatsoever what it is doing...and there might be some serious flaws with that? It drives me up the fucking wall hearing people say these things can write code. If they could, Microsoft wouldn't be showing off ChatGPT, it would be showing off amazing games and apps it wrote with it.

    It's hard enough for dedicated, motivated, skilled programmers to write functional, secure code that's not a nightmare to maintain...and somehow you think a fancy pattern matcher is going to master this? I've seen code generated by "Generative AI" I am confident it won't be putting many software engineers out of work in the next decade.
    • You know how in Star Trek there's always a way to 'bypass the security protocols"? I think we've just discovered how we get there.

      • by gweihir ( 88907 )

        Yep, crap code heaped on crap code and nobody with actual skills and insight never got even near it. Star Trek is not the only thing that has this figured out though. I mean my personal system set-up here is probably a lot more secure than the average corporate set-up and all I do is that the only thing that is allowed in from the outside is ssh and smtp (from a proxy VM only that relays).

        • by Bert64 ( 520050 )

          Probably not very secure at all since you're focusing solely on unsolicited inbound connections, whereas most attacks these days are perpetrated against software which makes outbound connections.

    • Surprised? Nope not at all.

      But this isn't going to stop people that have no business coding anything more complicated than "Hello World!" from trying to fake it using ChatGPT and the result is going to be a bunch of crappy code, riddled with security holes.

      • by gweihir ( 88907 )

        Indeed. But these people are basically faking it. The only real cure for that is to finally lift up "coding" to the level of engineering (which it is) and get rid of all the amateurs. Do not at least have a BA that is in target? Stay out! (There are always ways to get equivalent skills certified. Some people have them. But 95% do not.) The other thing that comes with being an engineer is liability for when you screw up badly enough. And we urgently need that for software as well. The current clueless shit-s

        • by dfghjk ( 711126 )

          "The current clueless shit-show cannot continue."

          The current shit show has nothing to do with a lack of professional organization to qualify professionals, and just who would be driving to create such an organization that certifies programmers as "engineers" with "liability" anyway? Business wants to get rid of the current shit show, especially considering how much those who are "faking it" are getting paid. The people in charge don't think it's a valuable skill.

          If you think the mile high stack of shit ap

          • by gweihir ( 88907 )

            That is nonsense. Have a look at the evolution of some mature engineering discipline some time and you will see that your comment is nonsense.

      • by mobby_6kl ( 668092 ) on Saturday April 22, 2023 @03:10AM (#63469100)

        So in other words it will be indistinguishable form all the regular code that everyone writes.

        • So in other words it will be indistinguishable form all the regular code that everyone writes.

          Fair point, but I am a professional who takes pride in my work and takes the effort to learn the libraries I use and understand holistically the purpose of the libraries and their strengths and weakness so I can understand the best practices and determine which of the contradicting common best practices should apply to the current situation.

          That's what makes me a high-end developer highly sought after by the big spenders. If you want blind copy/paste, there are many of much younger "professionals" from f

    • by znrt ( 2424692 )

      it writes code just like the average freshman who pulls a snippet from stackoverflow and doesn't even bother reviewing it as long as it apparently does what he/she expected from it writes code. this has been going on for a couple of decades and odds are most software you use today includes some portion of that.

      chatgpt just streamlines that same process. on the plus side it allows to easily perform follow up checks on the code just by asking, which might provide some much needed insight and prevent at least

  • Really (Score:5, Funny)

    by zamboni1138 ( 308944 ) on Friday April 21, 2023 @07:00PM (#63468572)

    I'm shocked, shocked I tell you to find that a chat bot with a coding education that mostly comes from bad documentation and web forum posts from people with the wrong answer is unable to generate secure code.

    • by gweihir ( 88907 )

      It can perform on the level of the average moron. That is less and less sufficient. Taken that way, ChatAI will probably only eliminate the jobs of people that never were any good at their jobs anyways. As these are a _lot_, social problems will result and need to be solved, but for many people them being out of a job is a good thing for everybody else.

      • by dfghjk ( 711126 )

        "ChatAI will probably only eliminate the jobs of people that never were any good at their jobs anyways."

        No, the belief in a tool that can replace programmers will eliminate the highest paid programmers first. That's because that's who executives will target.

        • by gweihir ( 88907 )

          Then then everything comes crashing down. There may be a few that will do that, but most will not.

      • Oh god, I did audits of projects in one of my jobs. The ones with the "I do not make mistakes! Everybody else sucks! It is not my fault!" were the worst to deal with. On one occasion management kicked out one of them. He was the technical lead. He left with slamming doors. Our projects were going to fail! We could not do it without him. But no. People became more relaxed. Sure they made more errors, they dealt with them in team. Struggled, learned. A few of the quiet guys took over. To my own surprise, pr
        • by gweihir ( 88907 )

          Ah, yes _those_ people. Claiming it is not their fault, they did not do anything wrong, etc., while the mistakes they make become worse and worse.

    • by micheas ( 231635 )

      Fortunately Open AI is hiring a huge number of coders to write high quality code for a future version of their GPT to train on.

      I'm looking forward to that day. You can already get decent results if you ask it for boilerplate code and if you need to generate a bunch of meaningless unit tests to fill a meaningless checkbox on a compliance questionnaire.

      • Fortunately Open AI is hiring a huge number of coders to write high quality code for a future version of their GPT to train on.

        I'm looking forward to that day. You can already get decent results if you ask it for boilerplate code and if you need to generate a bunch of meaningless unit tests to fill a meaningless checkbox on a compliance questionnaire.

        The codebase I inherited has a TON of unit tests. The previous team bragged about "100% test coverage". Yet the vast majority of the tests assert nothing meaningful and would always pass even if someone came in and completely changed the logic of the code. Most of these tests are completely useless.

  • by zendarva ( 8340223 ) on Friday April 21, 2023 @07:22PM (#63468598)
    This is different from your average coder how?
    • Re: (Score:3, Insightful)

      by olsmeister ( 1488789 )
      Much less expensive.
    • ChatGPT won't shoot people when you yell at it for being an idiot.

    • Average coder? The world's thousands of CVE prove gifted programmers are high IQ morons making the same mistakes over and over. The world's software are a mile high teetering tower of crap.

      • Before counting CVEs, how about counting CWEs. There are approximately six hundred (600). There is nobody in this world who can keep track of 600 potential CWEs while also trying to code an algorithm that actually solves a problem.

        Any software system of reasonable size is too large for one individual to know the minute details of every function and their anomalies. Documentation, when it exists, is often not accurate.

        Put all of that together and there's simply no way a human programmer is going t

        • blah blah blah said the high IQ moron, part of the problem and certainly not part of any solution

  • Compilers (Score:5, Interesting)

    by gillbates ( 106458 ) on Friday April 21, 2023 @07:27PM (#63468606) Homepage Journal

    When I first started programming, compilers seemed like magic. They could transform an expression of almost arbitrary complexity into assembly code which would solve the problem.

    But as I learned computer science, it became increasingly obvious that a compiler was not a monumental achievement, but rather the composition of some simple, but rather clever principles, which, in breaking down the problem into smaller tasks, made it possible for a computer program to do what would seem impossibly complex if constructed of cmp instructions. It could "understand" what I wanted it to do, and generate code which would achieve that effect!

    AI is different. Instead of "understanding" the language it parses, it instead performs a brute-force estimation of what it thinks I want, relying on the fact that it has been trained on an inordinately large data set of similar problems.

    In one respect, it is rather interesting that we can use brute force methods to imitate an oracle. OTOH, it is a bit depressing that we still don't understand why it works. Much like how the compiler seemed like magic until it was explained, these AI models seem like magic, but cost a prohibitive amount to train because the computer scientists don't yet understand how to derive the fundamental, underlying logic structures that are created in the training of the AI model. If they could do that, we could have a ChatGPT like agent capable of running on a wristwatch.

    After all, Borland's Turbo C++ would run lightning fast on a Pentium 120 with only 16MB of RAM.

    • by gweihir ( 88907 )

      Actually, code optimization by modern compilers is pretty impressive. AI is not as soon as you ask real questions. But hey, too many people that follow the hype without understanding it.

      • When I still had hair, I wrote code in assembly for attiny microcontrollers. You still had to pay for c compilers for avr back then. I was a student and already spent too much on the dev board so decided to stick with assembly.
        Picked it back up a few years ago and discovered avr-gcc. Compiled die code and looked at the disassembly. It was horrifying. Poor chip spent most of its time pushing and popping registers on the stack. Optimizations were obvious. I.e. copy some functions inline.
        Avr-gcc may not be
        • A compiler will not inline functions that you don't mark as "inline" because there are some cases where the inline optimization might make the code incorrect. Any gcc-derived compiler should have options such as -fomit-frame-pointer which will essentially inline called functions.

          Pushing values to registers is not necessarily as slow as you think as processors are often optimized for it.

          The number of optimizations in even decades-old version of gcc is astounding. You will be able to make small sectio

          • All compilers inline functions even if they are not marked as "inline", gcc only disables auto-inlining if you turn of the optimizer.
            • This will quickly turn into a long discussion where everybody is right. Assuming you use -O0 for debugging and -O2 for production, my statement is correct. At -O1, gcc will inline functions only called once. At -O2, functions marked as inline will (within limits) be inlined. However, at -O3, you are correct that the compiler will indeed inline functions that are not marked that way and even if it makes the code significantly larger. Also, in C++, member-functions are considered to be implicitly marked
        • yes early compilers where crap, I also started out as a assembler programmer back in the day and in the 80:ies, 90:ies and early 2000:s we could and would outperform a compiler every single day of the week but then two things happend. One was that cpu:s became far more complex with long pipelines and parallel execution units and the optimizers in the compilers become much much better.

          There are still areas where a manual assembly is far superior to even the state of the art optimizing compiler today. One mod

          • Why doesn't an experienced programmer see "80:ies" as "eighty:ies" and realise it is a stupid thing to write?
            • because I'm not a native English speaker? Just thought that was how one wrote down the decades. But then some people still say CD discs so I don't feel that bad about it.
  • by Anonymous Coward

    Reminds me of contributions from certain employees we've had over the years whose efforts ultimately proved to be counterproductive.

    From my perspective the ability simply to generate code is less than worthless. Code is a liability something to be actively managed not a garbage heap one keeps shoveling garbage into at no cost.

    Still early and quality is likely to be improved dramatically thru self reflection and associated adversarial schemes. Suspect in the future it may even be possible for these systems

  • Seriously, whoever thought having an artificial moron write code for them is a moron as well.

    Looks like coders with some real skills do not need to fear for their jobs. As sooner the others are out and doing something they actually know how to do, the better.

    • Will the coders with real skills who use ChatGPT when appropriate outcompete the reactionaries?

      • Only if you try to learn from the output. Honestly, writing a good prompt takes a lot of time too. You might get great results but it is dependent on the work you put in.

    • Looks like coders with some real skills do not need to fear for their jobs.

      I do not share your faith that the managers and executives who actually do the hiring and firing are going to understand the situation. I expect many of them will fire the expensive coders and replace them with new kids trumpeting ChatGPT "experience". No one will be left that knows how to recognize bad code.

      • Or those managers and executives will be replaced by ChatGPT, at least the managers job should be much more fitting for being replaced than the programmers.
  • The proof-of-concept anything is almost never secure, especially when security was not in the design specs.

    First you generate code. Then you ask for revisions. That's the design process.

    For AI coding, they're going to use the current models to generate code, classify that code as secure or not, and use it to train a new model which always produces secure code.

  • by flinxmeister ( 601654 ) on Friday April 21, 2023 @10:27PM (#63468884) Homepage
    A couple months ago I was experimenting with this and tried to have it secure PHP form input. When I started telling it what to do to secure the code, it started to fix it...then bailed saying it couldn't do that kind of coding. It was interpreting my attempts to secure things as an attempt to exploit things. So it knows how to write secure code. The problem is that this will take it in directions that we don't seem to want it to go. Ironically, you can't write secure code if you're trying to protect the world from exploits. Secure code will often illustrate the way to exploit insecure code. It really highlights the issues we're going to have with AI and our legacy understanding of how computers work. The "garbage in" with AI is not the vast volume of insecure example input out there. It will be our ham-fisted attempts to control the output.
    • It will be our ham-fisted attempts to control the output.

      Indeed, if you ask chatGPT to write obfuscated, unfathomable, self-modifying code it says that it can't do that as it does not advance computer science and understanding of programming. Or some similar nonsense like that. It already makes stuff up, er I mean hallucinates, so the attempts on controlling output are already a fail. Part of the problem is that it many people will accept the written word as fact if it is presented that way. It may be part of human nature and this tendency goes back millennia and

  • And it will tell you exactly why the entire thing is just bad.

  • by quax ( 19371 ) on Saturday April 22, 2023 @03:00AM (#63469094)

    So Chat-GPT pretty much acts like your typical coder.

  • It's trained on throw away example snippets, that were intended as simple explanations. Not production quality secure robust tested code.

  • why is this surprising?

    A typical post asking a question generally has a lot more compilable/runnable code in the question that's wrong in some way than answers that contain fixes or corrected code snippets in replies.

    You'd think that it should be trained on actual production code and on patches to actual production code, not free for all forums, if you want it to generate more reliable software.

  • chatGPT is a linguistic engine. The way it and its competitors work is it finds the best match FOR THE NEXT WORD in a sentence based on deep data of a vast sea (morass?) of human output. There is NO understanding of the underlying meaning. It is all just probability. There is no intelligence. The "emergent intelligence" is OUR interpretation of the output - eg the meaning we give it.
    People are impressed by it the way they are impressed by the current output of our politicians. All form, no substance.

  • Yet Rust code generated by ChatGPT is somehow always absolutely completely %100 safe. Weird that.

    • Cease and Desist:
      You use of the term "Rust" in relation to programming languages infringes on the trademark rights of the Rust Super Bastards Corporation Inc. The matter is now in the hands of the lawyers of said organisation.
  • Seems to me what we're calling AI right now is best describes as a great (as in scope) internet-published media combing collater and averager of inputs. It kind of morphs it out to a "good enough" state. All fuzzy, which I thought was the goal of a bunch of computer sciences for a while. To make fuzzy logic. Looks like we're that. But as the core, all the data, there is nothing new, just morphs and collages of images and text to the illusion of correctness--autodecoupage.

    We accept the illusion as what we a
    • Please google "Chinese Room" and then come back to the discussion.

      They worked this shit out 50 years ago, but today's undergraduates think they are the first to discover stuff and learning from the past is beneath them. Hence the woke whining that has destroyed modern discourse.
      • This thread is moldy, but I couldn't leave you hanging without a thanks for the tip. Turns out I am causal powers for all machinelife. Glad that's settled.
  • ... if you crowdsource your coding skills off of Average Joe. I'm feeling pretty safe about my job.
  • Guess what your outsourced coders are too, but the difference is they will tell you its secure even though they didn't do anything different. At least the AI will tell you that it made a mistake because has no reason to feel shame or hide anything. It just didn't think about all the possible factors based on your initial request. Maybe the flaw was that you didn't ask for it to be secure in the first place?

  • Me: Can you write Linux kernel code?
    ChatGPT: I am not capable of actually writing Linux kernel code by myself. Writing kernel code requires a deep understanding of the underlying hardware, software architecture, and complex programming concepts that go beyond the scope of what I am capable of.

  • I write small amounts of code, mostly to analyze scientific data, and where security isn't important. What I do is nothing even close to the scale of a professional programmer (of which there are many here). chatGPT is somewhat useful to start the sort of coding that I do, then I can debug and build from there.

    My question is really whether that scales up to more serious coding. Even if the code it writes is nonsecure or buggy is there value in it basically building the foundation for a coder to start from?

Keep up the good work! But please don't ask me to help.

Working...