Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
AI Programming

'How Good Is ChatGPT at Coding, Really?' (ieee.org) 135

IEEE Spectrum (the IEEE's official publication) asks the question. "How does an AI code generator compare to a human programmer?" A study published in the June issue of IEEE Transactions on Software Engineering evaluated the code produced by OpenAI's ChatGPT in terms of functionality, complexity and security. The results show that ChatGPT has an extremely broad range of success when it comes to producing functional code — with a success rate ranging from anywhere as poor as 0.66 percent and as good as 89 percent — depending on the difficulty of the task, the programming language, and a number of other factors. While in some cases the AI generator could produce better code than humans, the analysis also reveals some security concerns with AI-generated code.
The study tested GPT-3.5 on 728 coding problems from the LeetCode testing platform — and in five programming languages: C, C++, Java, JavaScript, and Python. The results? Overall, ChatGPT was fairly good at solving problems in the different coding languages — but especially when attempting to solve coding problems that existed on LeetCode before 2021. For instance, it was able to produce functional code for easy, medium, and hard problems with success rates of about 89, 71, and 40 percent, respectively. "However, when it comes to the algorithm problems after 2021, ChatGPT's ability to generate functionally correct code is affected. It sometimes fails to understand the meaning of questions, even for easy level problems," said Yutian Tang, a lecturer at the University of Glasgow. For example, ChatGPT's ability to produce functional code for "easy" coding problems dropped from 89 percent to 52 percent after 2021. And its ability to generate functional code for "hard" problems dropped from 40 percent to 0.66 percent after this time as well...

The researchers also explored the ability of ChatGPT to fix its own coding errors after receiving feedback from LeetCode. They randomly selected 50 coding scenarios where ChatGPT initially generated incorrect coding, either because it didn't understand the content or problem at hand. While ChatGPT was good at fixing compiling errors, it generally was not good at correcting its own mistakes... The researchers also found that ChatGPT-generated code did have a fair amount of vulnerabilities, such as a missing null test, but many of these were easily fixable.

"Interestingly, ChatGPT is able to generate code with smaller runtime and memory overheads than at least 50 percent of human solutions to the same LeetCode problems..."
This discussion has been archived. No new comments can be posted.

'How Good Is ChatGPT at Coding, Really?'

Comments Filter:
  • by Tony Isaac ( 1301187 ) on Saturday July 06, 2024 @05:53PM (#64605833) Homepage

    The quality of ChatGPT coding suggestions is about the same as Stack Overflow (which it uses for a lot of its source material)...spotty. You can often find good solutions, but you can also find a lot of crappy ones. The difference is, ChatGPT (usually) saves time.

    Time saver #1: It will scan through a bunch of Stack Overflow (and other coding site) suggestions, picking ones that seem relevant.
    Time saver #2: It will take the suggestions it finds, and customize them to your liking, such as using names you want to use, instead of the names in the SO code.

    But it also can waste time. It sometimes picks an answer that is inefficient, or poorly written, or based on obsolete APIs, or just plain doesn't work.

    All in all, still an improvement over click, read, click, read, click, read, ad nauseum.

    • Stack overflow is meant to serve as a reference, basically to explain the mechanics of the language, to point to the right library, or to just provide a code snippet for a common thing that most people don't remember enough to recall off-hand when they need it. A lot of times I even use it to reference answers I've posted myself.

      If you lean on it too heavily to solve bigger problems, you're gonna have a bad time. I think the same can be said for chatgpt. That chatgpt fails on newer but still simple problems

      • by Tony Isaac ( 1301187 ) on Saturday July 06, 2024 @10:35PM (#64606171) Homepage

        No, Stack Overflow isn't a reference. It's a site where people help each other solve technical problems, often by supplying code snippets.

        https://stackoverflow.co/#:~:t... [stackoverflow.co].

        Programming language vendors provide *references*. The purpose of a reference is to document. SO doesn't do that. It's a forum for discussion.

        Yes, ChatGPT is essentially a fancy search engine. When it provides programming answers, it often searches...Stack Overflow.

      • by sjames ( 1099 )

        If it solves the problem of not being able to find an actual answer because a google search is swamped with 1000 references to posts by smug bastards saying google it, it might be worth something.

        • That rarely happens. Most of the time the answer is answered even if it is determined to be a duplicate. Then you have two different bits of information to go by which is better than having one bit.
    • by reanjr ( 588767 ) on Saturday July 06, 2024 @10:08PM (#64606129) Homepage

      By the time your GPT has figured out a response, I'm already on the correct Stack Overflow page, complete with comments and alternate solutions.

      Are people really so bad at using search engines that ChatGPT helps them search content? That boggles my mind. The queries for a search engine tend to be more terse, but the results more pointed and useful, in my experience, than GPT vomit.

      • by edwdig ( 47888 )

        This is literally the one case where I've found ChatGPT useful. If I've got fairly simple questions, it's usually faster to ask ChatGPT than to sort thru Stack Overflow to find a quality answer. Going to Stack Overflow usually means sorting thru some bad answers and people bickering over what's the best way to do it. ChatGPT tends to be pretty good at giving me a decent answer quickly in those cases.

        If I have to ask anything non-trivial, ChatGPT is a waste of time.

        • by Junta ( 36770 )

          Depends on the nature of the discussion.

          Sometimes with a Python question, discussion consists of people arguing back and forth on which is the more "Pythonic" way, and who the hell cares. It's almost a religious debate there. The debate is a waste of my attention span. Though a quick skim pulls me through.

          However, sometimes the discussion is insightful. Like pointing out a history of one library for unfixed security issues, or that the answer from 2017 that is widely referenced and likely the result Chat

      • Apparently, you search Stack Overflow for simple problems. Usually, when I search Stack Overflow, it's because the answer is actually *hard* to find, or not obvious. After I've clicked the first 10-12 SO links without success, I'm starting to get frustrated. ChatGPT is a lot faster at this process than I am!

      • I've been finding AI to be incredibly useful developing open-source Drupal code, especially form code using Drupal's Form API. I've been using both ChatGPT 4.o and Anthropic's Claude 3.5 and have recently come to prefer the later. Drupal's Form API is very mature and powerful, and I've been developing a form with multiple form fields but only one should be active at a time given a selection. The relatively complexity of the, (known, open-source), solution warrants either an old school StackOverflow search (

      • By the time your GPT has figured out a response, I'm already on the correct Stack Overflow page, complete with comments and alternate solutions.

        Are people really so bad at using search engines that ChatGPT helps them search content? That boggles my mind. The queries for a search engine tend to be more terse, but the results more pointed and useful, in my experience, than GPT vomit.

        Good luck asking Stack Overflow to rewrite something into a different language or framework (for just one example).

        All you are telling people who do know how to use it effectively is that you don't. It's like hearing an assembly programmer ranting against compilers.

      • Looking at my fellow programmers, they use ChatGPT all day long, they don't check stackoverflow anymore. Personally I still haven't used it, I still use SO when I don't have a clue.
      • I have wondered the same thing. Maybe people didn't realize that they could input questions into google? Maybe if you are working with a specific framework language or platform, put that term first in the search field? That's all I do.
    • by gweihir ( 88907 ) on Saturday July 06, 2024 @10:15PM (#64606137)

      Sure. But who will continue to write Stack Overflow questions and answers when AI now "saves time"? And what will AI get trained on when these postings are missing?

      • First, Q&A sites aren't going away any time soon. It will be quite some time before *everybody* gives up on them.

        Second, there are plenty of other sites where people still share code, you know, like GitHub.

        And third, your question is kind of like asking who is going to learn the fine art of shifting gears, when automatic transmissions are in every car.

    • Time saver #1: It will scan through a bunch of Stack Overflow (and other coding site) suggestions, picking ones that seem relevant.

      I suspect it isn't even that good. Based on what I know about LLMs, it will identify the most probably response to your query, so it may respond with the most common response to your problem, not the most relevant. That is if the response isn't even more fine grained that each line or word in the code is the most probable next one, so the previous word or line might cause the next one to be less correct to the overall answer.

  • by el84 ( 10322963 ) on Saturday July 06, 2024 @06:09PM (#64605843)
    like factorial, or quicksort, or one of those lame dumb-ass job interview problems. But give it a problem that is novel and you just get garbage. Particularly if you couch the problem as test cases. Most of the time the code doesn't even run. It's like it's just memorised all the code in the training set and it's working like a cross between search engine and keyword programming. Clearly, it's a start and it will likely get better over time but I'm not worried about being replaced any time soon.
    • Even for something like that which is a known quantity I doubt that it could produce a functioning program if you asked it to write it in a language it had never seen before if given a specification for the language and some simple examples. I suspect most programmers could do that even if the language has some odd design choices.

      I do think that with further refinement there's still great use cases for the technology even if it's not the magic bullet that some hoped it might be. I think it would be great
      • by gweihir ( 88907 )

        Even worse: Unless and until it gets a lot of examples in that new language, it will not ever be able to do anything in it. And who will write these examples?

    • Re: (Score:2, Redundant)

      by dfghjk ( 711126 )

      Exactly. That AI passes the test at all means the test is contrived for AI to pass it. It's probably better than SuperKendall but that's it.

    • by gweihir ( 88907 ) on Saturday July 06, 2024 @10:12PM (#64606133)

      It is worse: Give it a known simple problem with a different order of steps than usually used, but clearly specified. It cannot even do that.

      As a coder, this thing is worthless. Sure, many "coders" are worthless as well (see https://blog.codinghorror.com/... [codinghorror.com] for examples), but making worthless coders cheaper is not going to improve anything.

      • As a coder, this thing is worthless. Sure, many "coders" are worthless as well (see https://blog.codinghorror.com/ [codinghorror.com]... for examples)

        The GP complained about lame worthless interview questions.

        When I first moved into the tech industry, I was asked a programming question in the interview by an interviewer I didn't know. I thought it was a bit weird at the time, a in most of the interviewers know me, you can see what I've done, surely it's obvious I can code. But I wasn't going to be an arrogant dickhead so i pl

        • Recursion for teh win!

          Recursion is beautiful, but hard to do right in a language like c, like near impossible to get tail-recursived. A for loop, or nested such on the other hand, is simple and clean and efficient in most languages.

          You only do functions recursive to show off is what I'm trying to say. And in the right context that's exactly right.

        • by Junta ( 36770 )

          A shocking number of people get through by gaming non-technical managers.

          We had one guy who managed to pull off 5 years with our group, and he had the management *convinced* that his failure to ever do anything useful was the fault of senior developers refusing to let him do anything or refusing to train him. Meanwhile multiple senior engineers wasted hours every week trying to be helpful to teach him and assign him even basic tasks. However his excuse remained the same and the seniors got chastised becaus

    • by Luckyo ( 1726890 )

      That's the point of generative AI. It doesn't replace an expert. It makes one expert do the job of ten, because 9/10 of expert's job is mundane, boring stuff that doesn't require an expert, but comes with the job.

      So you outsource that to AI and check that whatever it made is functional, and do the 1/10 stuff that actually needs your expertise that generative AI can't handle properly. This is notably how it works in many fields in production, right now. It enables a single person do the job of many people, b

    • Honest question: If I said, "Write me a Qt6 application in c++ with four text fields labelled "One", "Two", "Banana", and "Three" and a button below them labelled "Submit" which will call field X at API Y and submit the data. The window should be 600x400 with a 20 pixel margin with fields and button stached vertically with a 20px spacing....... What would it do?
  • 3.5? (Score:5, Insightful)

    by david-bo ( 578532 ) on Saturday July 06, 2024 @06:10PM (#64605847)

    "The study tested GPT-3.5"

    That is a pretty uninteresting test. No one who is serious about this would use 3.5. Stupid study.

  • by danda ( 11343 ) on Saturday July 06, 2024 @06:11PM (#64605853)

    How often do I find myself needing some algo from a site like rosettacode, stackoverflow, or leetcode? maybe a couple times a month. (I've never even visited leetcode.) The rest of the time I am implementing functionality unique to my project's problem domain and codebase.

    I don't see how ChatGPT would help with this.

    Testing ChatGPT or any LLM with these types of problems seems like cheating.

    I would be interested to see tests with regards to ChatGPT's ability to take a basic description of a new unique and non-trivial feature and implement it for an existing open source project like openssl, openssh, firefox, linux kernel, bitcoin-core, etc.

    My guess is that it will perform extremely poorly compared to even a mid-level developer. am I wrong?

    • by Morromist ( 1207276 ) on Saturday July 06, 2024 @07:35PM (#64605961)

      Yeah, I too don't get it. Most programmers are good enough at programming that they can just do the simple stuff with make a quick peek at some old code they've written or something. So who needs chat gpt?

      Whenever I use it it takes more time to fix its little mistakes than it would to just write all the code myself. Waste of time.

    • I would be interested to see tests with regards to ChatGPT's ability to take a basic description of a new unique and non-trivial feature and implement it for an existing open source project like openssl, openssh, firefox, linux kernel, bitcoin-core, etc.

      Please not openssh, it's really the one where no screw ups are ever to be allowed.

    • by dvice ( 6309704 ) on Sunday July 07, 2024 @04:42AM (#64606481)

      You don't even need to use as complex project as Firefox. If you ask AI to write a software and then ask it to make incremental changes to it, it will at some point fail, even if the program is really, really simple and steps really small. Actually, lets try that. I try with Gemini.

      1. Write a python application that takes 2 numbers from command line and adds them together and prints out the result and only the result and exists the program. The application should not do anything else than what is described here and it should not handle any error situations. (OK)
      2. Change the code so that it will instead of 2 numbers, add 3 numbers together. Make no other changes to the code. (OK)
      3. Change the code so that it works with either 2 or 3 arguments. Make no other changes. (OK)
      4. Change the code to use multiplication instead of addition if there are 3 arguments. Make no other changes. (Failed, it multiplied with 2 arguments, added with 3)
      5. Change the code so that it will add the first 2 arguments always together and if there is a 3rd parameter, the 3rd parameter is multiplied with the sum from first 2 arguments. Make no other changes. (OK, I made this step so we could get previous error fixed as it could be I not gave bad instructions)
      6. Add a 4th argument. If it is given, multiply first 3 arguments with it before doing any other calculations. Make no other changes. (Failed miserably)

      Here is the end result:
      ---------------
      import sys

      # Calculate based on argument count (assuming valid numbers provided)
      if len(sys.argv) == 4:
          product = float(sys.argv[1]) * float(sys.argv[2]) * float(sys.argv[3])
          result = product if len(sys.argv) == 3 else product * float(sys.argv[4])
      else:
          sum_of_first_two = float(sys.argv[1]) + float(sys.argv[2])
          result = sum_of_first_two if len(sys.argv) == 3 else sum_of_first_two * float(sys.argv[3])

      # Print the result
      print(result)

      # Exit the program
      sys.exit()
      ---------------

      And here is how it fails
      ---------------
      $ python3 adder.py 3 4 2
      Traceback (most recent call last):
          File "adder.py", line 6, in
              result = product if len(sys.argv) == 3 else product * float(sys.argv[4])
      IndexError: list index out of range
      ---------------

  • by scrib ( 1277042 ) on Saturday July 06, 2024 @06:29PM (#64605875)

    The date seems significant because it implies that chat-gpt does a pretty good job finding an existing answer but a pretty poor job at creating a novel answer.

    I am at once feeling more secure in my job and more worried about the influx of chat-gpt-kiddies.

    Of course, my job is only secure if my managers understand this. Pardon me, I have to go get chat-gpt to write an email for me.

    • As an Engineering Manager, I feel very secure that my devs will continue to have their jobs for as long as we can keep them. I see room for some acceleration via AI, but this is only a realistic threat to people doing work that can be verified with nothing more than a visual inspection (i.e. trivial tasks that we wouldn’t even accept as work in the first place).

      Meanwhile, the company I’m at specializes in project rescue work, and I anticipate a lot more of it coming our way in the years ahead.

  • solving problems? (Score:4, Insightful)

    by dfghjk ( 711126 ) on Saturday July 06, 2024 @06:30PM (#64605879)

    And what exactly does "solving problems" mean? Under what criteria is generative AI considered "good" at programming? I'd suspect under real criteria, good design and good implementation, the success rate would surely be zero. Under contrived tests where the least possible unit test is passed, perhaps it is higher.

    • by gweihir ( 88907 ) on Saturday July 06, 2024 @10:10PM (#64606131)

      LLMs cannot "solve" problems at all. All they can do is calculate probabilities that a solution they have seen fits the problem, using some correlations. The results may fit, may partially fit or be complete crap and, bonus!, the LLM has no clue which if the three it is.

      Just to illustrate this: I just have corrected a Python "open Internet" exam. LLMs are not even capable of understanding that indention is critical in Python. Or that a simple specification with three simple steps actually means these steps need to be done in the order specified.

      • by WaffleMonster ( 969671 ) on Saturday July 06, 2024 @11:48PM (#64606255)

        I just have corrected a Python "open Internet" exam. LLMs are not even capable of understanding that indention is critical in Python.

        I'm not capable of understanding why anyone would design a language like that either.

      • by Junta ( 36770 )

        I saw a quote that I'm having a hard time finding that sums it up. LLMs don't provide information, they provide information shaped output.

        What it spits out looks credible, and in some cases that ends up being credible, but it's clear after some experience that it just spews out credible looking stuff that shouldn't be credible. It's a bullshitting machine. So it'd make a good executive.

  • Wish it was 1999 when search results were just the search results without an AI or SEO trying to meddle with the results.

  • by Rei ( 128717 ) on Saturday July 06, 2024 @06:50PM (#64605907) Homepage

    ...out: the human success rates for comparison.

    I can't access the research paper, sadly, to see what acceptance rates are for each group. But searching on Reddit, it sounds like LeetCode acceptance rates tend to be pretty low. For example this "Easy" question [leetcode.com] has a 17,4% acceptance rate (though that's apparently a particularly low rate for an "easy" question).

    Anyone have access to the paper to see what the human mean scores were on the same problems?

    • In general, you would want to throw away a large percentage of human submissions before calculating the acceptance rate though, since lots of people just sign up for free and try it out for a bit to see what it's like, but don't answer seriously or while watching TV. So this skews the passive statistics.

      It's like when the MOOCs came online, where they teach university courses without requiring exams or homework. Most people who sign up for these courses watch one or two vids and never finish aftwerwards.

      • by Rei ( 128717 )

        since lots of people just sign up for free and try it out for a bit to see what it's like, but don't answer seriously or while watching TV.

        I don't see how this is a valid argument. If anything, since LeetCode is "a test", and incentives (even gamifies, arguably) success by letting people compete for the highest scores, I expect people to try *far* harder than for some random piece of code for some random project that they were just rushing to complete for some deadline.

        • LeetCode and similar is also often recommended to peple for practicing coding interviews (instead of competing). It's cheaper than buying one of those interview questions books, and it's easy enough to attempt a few problems, abandon the ones that are too hard, and feel good about yourself once you've completed two or three.
          • by Rei ( 128717 )

            If you search on Reddit, everyone is comparing their scores with everyone else. And those who do poorly tend to be frustrated with themselves and/or LeetCode. Everyone is clearly quite motivated to score well - and I'd argue, far more than they would be from some random drudgework at work that they've been working on for years.

    • by Junta ( 36770 )

      Problem is there are a handful of LLMs and so by testing "an interaction with " you've tested the best and worst of a large chunk of the AI. They also either work or don't work on the very first try, there's no point in telling it "failed, try again", because it'll just flounder about randomly. With sampling the acceptance rate of the test you include people that aren't very good or don't even care (e.g. some required training by their company).

      Also, never used the platform myself, so I don't know how "at

      • by Rei ( 128717 )

        How good it is at "trying again" often tends to come more down to the finetune than the underlying model, in my experience. I find LLaMA 3 much better at trying again than ChatGPT, for example.

        That said, none of them do independent A* right now.

  • by backslashdot ( 95548 ) on Saturday July 06, 2024 @07:14PM (#64605941)

    If you know how to use it (I reckon most people don't) it's "good enough". I use it for a lot of things when I am lazy. It's good enough for most things .. definitely beats "junior" programmers which is a bit scary because it's hard to become a senior programmer without getting a junior programmer job first. If AI is doing all the junior skilled work .. where's the pathway to becoming senior? Years of learning without a job? We've already made it so most jobs need a bachelor's degree. Now we're going to ask people to show up with a Masters? Note. I said it's hard to .. not impossible. There's always going to be people who can learn and portfolio their way to senior programmer without an entry-level role.

    • I'd argue GPT cheapens the value of a degree. It no longer even demonstrates a basic level of understanding if GPT does all your homework.

      Better off hiring high school graduates who have been coding half their lives.

      • 100% agree, but those people are rare. Only caveat is that they must have done AP Calculus BC and maybe Statistics (or be willing to get up to speed on it within a year). Yeah I know that you don't need that for most devops/coding jobs these days .. but I find it's a good filter and also shapes the mind into an engineering mindset.

  • The truth (Score:4, Insightful)

    by Baron_Yam ( 643147 ) on Saturday July 06, 2024 @07:43PM (#64605977)

    ChatGPT understands nothing, therefore it cannot do anything truly novel. That it can regurgitate useful code at all after being trained on existing example code is a marvel.

    And that's what I use it for - fast search-free regurgitation of code I know has been done before. Then it just needs a quick review to ensure it makes sense and you're off a lot faster than you could have typed and debugged your own code.

    • it is actually pretty good at that. I needed to do some basic nlp but i never did much nlp myself. Things like find words of so many syllable that rhymes with that other word. It would have taken me a day to find the right libraries understand how they work, write the code, teat and debug.
      With chatgpt, it right away suggested a library and gave me starter code, somewhat wrong so i had to adjust. But in 90 minutes i was done

    • ChatGPT understands nothing, therefore it cannot do anything truly novel.

      That's what too many people don't seem to understand, or rather, believe that it has actual cognitive ability, instead of just the illusion of cognition, when in reality it has no such capability and never will.

  • terrible benchmark (Score:4, Insightful)

    by darkain ( 749283 ) on Saturday July 06, 2024 @08:18PM (#64606015) Homepage

    Leetcode is a terrible benchmark.

    Its used for whiteboarding interviews, which is total bullshit to begin with... however, that style of code is *NEVER* used on the job, because they're already solved problems, solved better than any 1 person can come up with, peer reviewed a 1000 times over, and all rolled up into nice neat little libraries.

    Can ChatGPT write brand new code to solve novel problems? no. because its an over-glorified copy-pasta bot.

    • by Junta ( 36770 )

      What are you talking about, I constantly need to (checks leetcode question) determine if a string could have perfectly even letter distribution if one and only one character is removed from it.

      Yeah, test questions are a stupidly useless metric for comparing humans to LLMs. They are arguably stupid for evaluating human performance, but become so much worse for LLMs, that have a different set of strengths and weaknesses that generally favor being better at these sorts of "test" questions than they are at rea

  • by u19925 ( 613350 ) on Saturday July 06, 2024 @08:34PM (#64606021)

    Since it says that it had higher accuracy if the problem was published before 2021, I am guessing that it may be doing verbatim copy of the existing solutions. More the solutions, more likely it will fit in one of the patterns that it had seen and more likely that it picked up that solution.

    A better benchmark would have been to compare with what you get from search engines and see if they materially differ. I have tried dozens of coding problems with ChatGPT and all I get is usually a boilerplate code. As an example, I asked ChatGPT to write a code about moon phase. Its code was correct but it used horrible formula which gave pathetic answers. Fixing one line formula made it work. ChatGPT had no idea which of the dozens of solutions on the internet was a correct one.

    • Since it says that it had higher accuracy if the problem was published before 2021, I am guessing that it may be doing verbatim copy of the existing solutions.

      Instead of guessing, why not try it?

      Then, after you try it, if you want to learn something useful instead of score debating points, ask it for some revisions. "Can you rewrite this in Python for me?" or "can you implement the following additional requirements? 1. blah blah 2. yadda yadda ..."

    • The problem is precisely it isn't a verbatim copy.

      The crap takes some examples that work, slices them up, and then, based on whatever coefficients it derived for the frequency of some strings in what it got as input, it generates crap that looks like code, but won't work.

      Tried it with trivial tasks in several domains:

      - simple webapp (show me an example of two components (A, B), three instances (A1, A2, B1), B reacts on A1, A2 change, for a specific simple js framework) - complete fail, example loads

  • LLMs can't reliably write working code, let alone write goodcode. It is as fundamental as the Halting Problem. A thing LLMs aren't aware of.

  • I have found ChatGPT (and derivative tools like copilot) to be very useful for writing boilerplate - but only with a lot of specification in the prompt.

    It's the classic garbage-in, garbage-out problem: if you're a good coder and good prompter, LLMs deliver useful scaffolds. If you suck, it will suck.
  • by swm ( 171547 ) <swmcd@world.std.com> on Saturday July 06, 2024 @08:59PM (#64606039) Homepage
    From Slashdot
    Waymo Issues Software and Mapping Recall After Robotaxi Crashes Into a Telephone Pole
    https://tech.slashdot.org/stor... [slashdot.org]

    The update corrects an error in the software that "assigned a low damage score" to the telephone pole

    From Schneier on Security
    Using AI for Political Polling
    https://www.schneier.com/blog/... [schneier.com]

    See, polling has gotten hard, because people don't answer their phones any more, and if they do answer they won't talk to you, and if they do talk to you, they may tell you what they think you want to hear, rather than what they really think. So what we can do, see, is create these AI chat-bots that act like people, and then poll the chat-bots instead of calling real people.

    From the article

    What's so powerful about this system is that it can generalize to new scenarios and survey topics, and spit out a plausible answer, even if its accuracy is not guaranteed.

    I am simply gob-smacked that adults—actual grown-up people—continue to take this stuff seriously.

    • by gweihir ( 88907 )

      From the article

      What's so powerful about this system is that it can generalize to new scenarios and survey topics, and spit out a plausible answer, even if its accuracy is not guaranteed.

      I am simply gob-smacked that adults—actual grown-up people—continue to take this stuff seriously.

      Same here. In actual reality, it can do none of those things. As soon as they are a tiny bit unexpected, it cannot even do basic things right. The only thing that happens is that lies are getting more extreme. Usually that is a sign of the mindless hype nearing its end. We can only hope it is here too.

  • Well, many coders are fuckups as well (see, for example: https://blog.codinghorror.com/... [codinghorror.com]) and these it can, maybe, replace to a degree. But forget about having it write even a simple piece of original code or about it actually understanding even a simple specification.

    • Define "original code". There's only so many ways to write the same thing and your average programmer has seen a lot of code before they write anything meaningful.

      On the other hand, the software doesn't understand anything, and it doesn't even do a good job of pretending to understand anything it hasn't seen before. That means it can only even mock understanding of specifications which are highly similar to specifications the system was trained on. On the other hand, the massive duplication of effort in pro

  • by The Cat ( 19816 ) on Saturday July 06, 2024 @10:43PM (#64606185)

    I wish I'd known when I was in high school that I'd never have a decent job.

    I would have done things quite differently.

  • ChatGPT can easily solve problems that have already been solved in the past and discussed over the Internet. It's not so good with newer insufficiently discussed ones.

  • Essentially it's OK if you could also copy and paste code from stackoverflow. It severely breaks if you want to do anything beyond that. For example we once asked it to write a configuration file snipped for "yate", a commonly used VoIP software with a strong focus on mobile application. This configuration file snipped was support to reject SIP "MESSAGE"-Requests.
    The result was an ini-file style file which was something like "MESSAGE=reject".
    In reality yate is configured via something called "regex-route".

  • The highest rated comments on these stories tell me a few things:

    1. The commenters either haven't really tried, or don't know how to effectively use, ChatGPT.

    2. The commenters have really poor PM skills.When working with ChatGPT, you need to give it a good set of requirements, and good feedback so it can do revisions.

    It's a tool. When you learn how to use it, it's an incredible help. Truly incredible stuff, unless you've put the goalposts onto a bullet train, sending them away into the distance.

  • What's missed in all of this hype is that coding is a team activity. Put ten software engineers together, each using AI to generate their parts, and you get a big, unmaintainable mess with no design consistency and tons of redundant code.

  • by LordNimon ( 85072 ) on Sunday July 07, 2024 @08:41AM (#64606753)

    > It sometimes fails to understand the meaning of questions,

    No, it never understands the meaning of the question. That's the whole problem with LLMs.

  • The results show that ChatGPT has an extremely broad range of success when it comes to producing functional code

    By which I'm guessing that if I produced code like that then I'd be experiencing "a broad range of what to do with my time now that I've been fired"

  • High hopes for the topic, but... No one noticed any examples of funny code passed by ChatGPT?

  • Nobody uses GPT 3.5 anymore. That's the old model, and it's distinctively worse than the current model at everything,not just coding. This is a stupid thing to test, now, since nobody will ever use GPT 3.5 for coding ever again

Promising costs nothing, it's the delivering that kills you.

Working...