'How Good Is ChatGPT at Coding, Really?' (ieee.org) 135
IEEE Spectrum (the IEEE's official publication) asks the question. "How does an AI code generator compare to a human programmer?"
A study published in the June issue of IEEE Transactions on Software Engineering evaluated the code produced by OpenAI's ChatGPT in terms of functionality, complexity and security. The results show that ChatGPT has an extremely broad range of success when it comes to producing functional code — with a success rate ranging from anywhere as poor as 0.66 percent and as good as 89 percent — depending on the difficulty of the task, the programming language, and a number of other factors. While in some cases the AI generator could produce better code than humans, the analysis also reveals some security concerns with AI-generated code.
The study tested GPT-3.5 on 728 coding problems from the LeetCode testing platform — and in five programming languages: C, C++, Java, JavaScript, and Python. The results? Overall, ChatGPT was fairly good at solving problems in the different coding languages — but especially when attempting to solve coding problems that existed on LeetCode before 2021. For instance, it was able to produce functional code for easy, medium, and hard problems with success rates of about 89, 71, and 40 percent, respectively. "However, when it comes to the algorithm problems after 2021, ChatGPT's ability to generate functionally correct code is affected. It sometimes fails to understand the meaning of questions, even for easy level problems," said Yutian Tang, a lecturer at the University of Glasgow. For example, ChatGPT's ability to produce functional code for "easy" coding problems dropped from 89 percent to 52 percent after 2021. And its ability to generate functional code for "hard" problems dropped from 40 percent to 0.66 percent after this time as well...
The researchers also explored the ability of ChatGPT to fix its own coding errors after receiving feedback from LeetCode. They randomly selected 50 coding scenarios where ChatGPT initially generated incorrect coding, either because it didn't understand the content or problem at hand. While ChatGPT was good at fixing compiling errors, it generally was not good at correcting its own mistakes... The researchers also found that ChatGPT-generated code did have a fair amount of vulnerabilities, such as a missing null test, but many of these were easily fixable.
"Interestingly, ChatGPT is able to generate code with smaller runtime and memory overheads than at least 50 percent of human solutions to the same LeetCode problems..."
The study tested GPT-3.5 on 728 coding problems from the LeetCode testing platform — and in five programming languages: C, C++, Java, JavaScript, and Python. The results? Overall, ChatGPT was fairly good at solving problems in the different coding languages — but especially when attempting to solve coding problems that existed on LeetCode before 2021. For instance, it was able to produce functional code for easy, medium, and hard problems with success rates of about 89, 71, and 40 percent, respectively. "However, when it comes to the algorithm problems after 2021, ChatGPT's ability to generate functionally correct code is affected. It sometimes fails to understand the meaning of questions, even for easy level problems," said Yutian Tang, a lecturer at the University of Glasgow. For example, ChatGPT's ability to produce functional code for "easy" coding problems dropped from 89 percent to 52 percent after 2021. And its ability to generate functional code for "hard" problems dropped from 40 percent to 0.66 percent after this time as well...
The researchers also explored the ability of ChatGPT to fix its own coding errors after receiving feedback from LeetCode. They randomly selected 50 coding scenarios where ChatGPT initially generated incorrect coding, either because it didn't understand the content or problem at hand. While ChatGPT was good at fixing compiling errors, it generally was not good at correcting its own mistakes... The researchers also found that ChatGPT-generated code did have a fair amount of vulnerabilities, such as a missing null test, but many of these were easily fixable.
"Interestingly, ChatGPT is able to generate code with smaller runtime and memory overheads than at least 50 percent of human solutions to the same LeetCode problems..."
About the same as Stack Overflow (Score:3)
The quality of ChatGPT coding suggestions is about the same as Stack Overflow (which it uses for a lot of its source material)...spotty. You can often find good solutions, but you can also find a lot of crappy ones. The difference is, ChatGPT (usually) saves time.
Time saver #1: It will scan through a bunch of Stack Overflow (and other coding site) suggestions, picking ones that seem relevant.
Time saver #2: It will take the suggestions it finds, and customize them to your liking, such as using names you want to use, instead of the names in the SO code.
But it also can waste time. It sometimes picks an answer that is inefficient, or poorly written, or based on obsolete APIs, or just plain doesn't work.
All in all, still an improvement over click, read, click, read, click, read, ad nauseum.
Re: About the same as Stack Overflow (Score:3)
Stack overflow is meant to serve as a reference, basically to explain the mechanics of the language, to point to the right library, or to just provide a code snippet for a common thing that most people don't remember enough to recall off-hand when they need it. A lot of times I even use it to reference answers I've posted myself.
If you lean on it too heavily to solve bigger problems, you're gonna have a bad time. I think the same can be said for chatgpt. That chatgpt fails on newer but still simple problems
Re: About the same as Stack Overflow (Score:4, Insightful)
No, Stack Overflow isn't a reference. It's a site where people help each other solve technical problems, often by supplying code snippets.
https://stackoverflow.co/#:~:t... [stackoverflow.co].
Programming language vendors provide *references*. The purpose of a reference is to document. SO doesn't do that. It's a forum for discussion.
Yes, ChatGPT is essentially a fancy search engine. When it provides programming answers, it often searches...Stack Overflow.
Re: (Score:2)
If it solves the problem of not being able to find an actual answer because a google search is swamped with 1000 references to posts by smug bastards saying google it, it might be worth something.
Re: (Score:2)
Re: About the same as Stack Overflow (Score:4, Informative)
Re: (Score:3)
Indeed. In all engineering, context is everything. Standard solutions, you could find in a book 100 years ago. But understanding whether a standard solution cuts it or whether and how it needs to be adapted is everything. And LLMs cannot do that even in the most simple cases.
Re: (Score:2)
A lot of experience and data shows it does fail on simpler problems, like this very study. In this case even education fodder, where it's absolutely the case that the problem and context is fully described to let a human solve them without further clarification. I can see reasonable defenses of the utility of LLMs, but this is utterly absurd. It is no where near better than the average person in "any random task", it has things it is good at, but it is bad at a lot of things and even when "good" requires b
Re: About the same as Stack Overflow (Score:3)
Re: About the same as Stack Overflow (Score:5, Insightful)
By the time your GPT has figured out a response, I'm already on the correct Stack Overflow page, complete with comments and alternate solutions.
Are people really so bad at using search engines that ChatGPT helps them search content? That boggles my mind. The queries for a search engine tend to be more terse, but the results more pointed and useful, in my experience, than GPT vomit.
Re: (Score:3)
This is literally the one case where I've found ChatGPT useful. If I've got fairly simple questions, it's usually faster to ask ChatGPT than to sort thru Stack Overflow to find a quality answer. Going to Stack Overflow usually means sorting thru some bad answers and people bickering over what's the best way to do it. ChatGPT tends to be pretty good at giving me a decent answer quickly in those cases.
If I have to ask anything non-trivial, ChatGPT is a waste of time.
Re: (Score:2)
Depends on the nature of the discussion.
Sometimes with a Python question, discussion consists of people arguing back and forth on which is the more "Pythonic" way, and who the hell cares. It's almost a religious debate there. The debate is a waste of my attention span. Though a quick skim pulls me through.
However, sometimes the discussion is insightful. Like pointing out a history of one library for unfixed security issues, or that the answer from 2017 that is widely referenced and likely the result Chat
Re: (Score:3)
Apparently, you search Stack Overflow for simple problems. Usually, when I search Stack Overflow, it's because the answer is actually *hard* to find, or not obvious. After I've clicked the first 10-12 SO links without success, I'm starting to get frustrated. ChatGPT is a lot faster at this process than I am!
Re: About the same as Stack Overflow (Score:2)
Not at all. But my searches are pointed. If it's not in the first two hits, it's probably not on the Internet.
Re: (Score:2)
open-source code searching, with examples (Score:3)
I've been finding AI to be incredibly useful developing open-source Drupal code, especially form code using Drupal's Form API. I've been using both ChatGPT 4.o and Anthropic's Claude 3.5 and have recently come to prefer the later. Drupal's Form API is very mature and powerful, and I've been developing a form with multiple form fields but only one should be active at a time given a selection. The relatively complexity of the, (known, open-source), solution warrants either an old school StackOverflow search (
Re: (Score:2)
By the time your GPT has figured out a response, I'm already on the correct Stack Overflow page, complete with comments and alternate solutions.
Are people really so bad at using search engines that ChatGPT helps them search content? That boggles my mind. The queries for a search engine tend to be more terse, but the results more pointed and useful, in my experience, than GPT vomit.
Good luck asking Stack Overflow to rewrite something into a different language or framework (for just one example).
All you are telling people who do know how to use it effectively is that you don't. It's like hearing an assembly programmer ranting against compilers.
Re: About the same as Stack Overflow (Score:2)
Re: (Score:2)
Re: (Score:2)
Re:About the same as Stack Overflow (Score:5, Insightful)
Sure. But who will continue to write Stack Overflow questions and answers when AI now "saves time"? And what will AI get trained on when these postings are missing?
Re: (Score:2)
First, Q&A sites aren't going away any time soon. It will be quite some time before *everybody* gives up on them.
Second, there are plenty of other sites where people still share code, you know, like GitHub.
And third, your question is kind of like asking who is going to learn the fine art of shifting gears, when automatic transmissions are in every car.
Re: (Score:2)
If they get massively less hits, they will go away. It is actually a pretty cold financial question.
Re: (Score:2)
And that's precisely why Stack Overflow is making deals with the AI companies. https://openai.com/index/api-p... [openai.com]
Re: (Score:2)
Sure. Stack Overflow is. But what about the users that create the content?
Re: (Score:2)
Time saver #1: It will scan through a bunch of Stack Overflow (and other coding site) suggestions, picking ones that seem relevant.
I suspect it isn't even that good. Based on what I know about LLMs, it will identify the most probably response to your query, so it may respond with the most common response to your problem, not the most relevant. That is if the response isn't even more fine grained that each line or word in the code is the most probable next one, so the previous word or line might cause the next one to be less correct to the overall answer.
It's great if it's a problem it's seen before ... (Score:4, Insightful)
Re: (Score:3)
I do think that with further refinement there's still great use cases for the technology even if it's not the magic bullet that some hoped it might be. I think it would be great
Re: (Score:3)
Even worse: Unless and until it gets a lot of examples in that new language, it will not ever be able to do anything in it. And who will write these examples?
Re: (Score:2, Redundant)
Exactly. That AI passes the test at all means the test is contrived for AI to pass it. It's probably better than SuperKendall but that's it.
Re:It's great if it's a problem it's seen before . (Score:5, Insightful)
It is worse: Give it a known simple problem with a different order of steps than usually used, but clearly specified. It cannot even do that.
As a coder, this thing is worthless. Sure, many "coders" are worthless as well (see https://blog.codinghorror.com/... [codinghorror.com] for examples), but making worthless coders cheaper is not going to improve anything.
Re: (Score:2)
As a coder, this thing is worthless. Sure, many "coders" are worthless as well (see https://blog.codinghorror.com/ [codinghorror.com]... for examples)
The GP complained about lame worthless interview questions.
When I first moved into the tech industry, I was asked a programming question in the interview by an interviewer I didn't know. I thought it was a bit weird at the time, a in most of the interviewers know me, you can see what I've done, surely it's obvious I can code. But I wasn't going to be an arrogant dickhead so i pl
Re: (Score:2)
Recursion for teh win!
Recursion is beautiful, but hard to do right in a language like c, like near impossible to get tail-recursived. A for loop, or nested such on the other hand, is simple and clean and efficient in most languages.
You only do functions recursive to show off is what I'm trying to say. And in the right context that's exactly right.
Re: (Score:2)
A shocking number of people get through by gaming non-technical managers.
We had one guy who managed to pull off 5 years with our group, and he had the management *convinced* that his failure to ever do anything useful was the fault of senior developers refusing to let him do anything or refusing to train him. Meanwhile multiple senior engineers wasted hours every week trying to be helpful to teach him and assign him even basic tasks. However his excuse remained the same and the seniors got chastised becaus
Re: (Score:3)
I've noticed this kind of thing too. One theory I have is that once people have started their career, they become less willing to prepare for a technical interview going forward.
Nothing wrong with that. I'm not going to go into a job where I need to bone up on leetcode before attending the interview. Nor will I do take-home interview questions. And I would absolutely not expect anyone to do have to do prep for an interview.
That's why the initial screen is language of your choice, no APIs, no algorithm trick
Re: (Score:2)
To be clear, there is indeed nothing wrong with that. I'm just saying I've noticed it, and for us it wastes the candidate's time as well as ours, since they don't advance to the next stage. There's no shortage of candidates though, so we don't have an incentive to drop the higher barrier to entry (and I usually tell HR to warn the candidate that they will be grilled on the relevant subjects)
Re: (Score:3)
That's the point of generative AI. It doesn't replace an expert. It makes one expert do the job of ten, because 9/10 of expert's job is mundane, boring stuff that doesn't require an expert, but comes with the job.
So you outsource that to AI and check that whatever it made is functional, and do the 1/10 stuff that actually needs your expertise that generative AI can't handle properly. This is notably how it works in many fields in production, right now. It enables a single person do the job of many people, b
Re: (Score:2)
Re: It's great if it's a problem it's seen before (Score:2)
3.5? (Score:5, Insightful)
"The study tested GPT-3.5"
That is a pretty uninteresting test. No one who is serious about this would use 3.5. Stupid study.
Re: 3.5? (Score:2)
"I really can't imagine what kind of stuff are people "programming" if they find it useful."
I suspect it's people who see boilerplate as a template rather than a target for abstraction and encapsulation. I've never understood this mindset, but I've often encountered it.
Re: (Score:2)
Yep, woe to those poor souls, who get to maintain that kind of "software".
Re: (Score:2)
Re: (Score:2)
I was specifically talking about the complaint that appeared strictly ignorant of the fact that studies take a while to be properly formulated after results are collected.
Which demonstrated staggering ignorance of reality of how studies are formulated, and effort put into them (outside social sciences).
Re: (Score:2)
Stupid remark. 3.5 is like two years old. This test can't have taken more than a few weeks. Duh!
Re: (Score:2)
> By that logic any thorough report that took more than a few days to complete can be dismissed as "uninteresting" because it was testing the "old" version.
It has nothing to do with Tesla or anything at that point. If you take too long in a fast moving environment, or just plain don't do updates to latest code bases, many shortcomings you find will be obsolete and useless.
It is uninteresting at that point. All it does is point out something that has already been pointed out and fixed. Do you also think b
Re: (Score:2)
the first releases of GPT that it isn't designed for a super deep understanding of, well, ANYTHING. The shit you are supposed to read through and agree to before using it even says it isn't designed to be perfect at anything, and answers it puts out may be just as incorrect as asking the village idiot the same thing.
And the latest GPT has shit that you agree to that says it is designed to be perfect?
If you take too long in a fast moving environment, or just plain don't do updates to latest code bases, many shortcomings you find will be obsolete and useless.
So any software that releases every week should never be studied more than a day or two so that results can be released before the next release, which will make the results uninteresting? I think you found a perfect way to dismiss any results of any testing, release a new version of your software every day, dismiss any criticism as "uninteresting, we probably fixed it already". Oh wait, that is pretty much what Elon does. A
testing the wrong thing (Score:3)
How often do I find myself needing some algo from a site like rosettacode, stackoverflow, or leetcode? maybe a couple times a month. (I've never even visited leetcode.) The rest of the time I am implementing functionality unique to my project's problem domain and codebase.
I don't see how ChatGPT would help with this.
Testing ChatGPT or any LLM with these types of problems seems like cheating.
I would be interested to see tests with regards to ChatGPT's ability to take a basic description of a new unique and non-trivial feature and implement it for an existing open source project like openssl, openssh, firefox, linux kernel, bitcoin-core, etc.
My guess is that it will perform extremely poorly compared to even a mid-level developer. am I wrong?
Re:testing the wrong thing (Score:5, Insightful)
Yeah, I too don't get it. Most programmers are good enough at programming that they can just do the simple stuff with make a quick peek at some old code they've written or something. So who needs chat gpt?
Whenever I use it it takes more time to fix its little mistakes than it would to just write all the code myself. Waste of time.
Re: (Score:2)
I would be interested to see tests with regards to ChatGPT's ability to take a basic description of a new unique and non-trivial feature and implement it for an existing open source project like openssl, openssh, firefox, linux kernel, bitcoin-core, etc.
Please not openssh, it's really the one where no screw ups are ever to be allowed.
Re: (Score:2)
Amen!
Re:testing the wrong thing (Score:4, Interesting)
You don't even need to use as complex project as Firefox. If you ask AI to write a software and then ask it to make incremental changes to it, it will at some point fail, even if the program is really, really simple and steps really small. Actually, lets try that. I try with Gemini.
1. Write a python application that takes 2 numbers from command line and adds them together and prints out the result and only the result and exists the program. The application should not do anything else than what is described here and it should not handle any error situations. (OK)
2. Change the code so that it will instead of 2 numbers, add 3 numbers together. Make no other changes to the code. (OK)
3. Change the code so that it works with either 2 or 3 arguments. Make no other changes. (OK)
4. Change the code to use multiplication instead of addition if there are 3 arguments. Make no other changes. (Failed, it multiplied with 2 arguments, added with 3)
5. Change the code so that it will add the first 2 arguments always together and if there is a 3rd parameter, the 3rd parameter is multiplied with the sum from first 2 arguments. Make no other changes. (OK, I made this step so we could get previous error fixed as it could be I not gave bad instructions)
6. Add a 4th argument. If it is given, multiply first 3 arguments with it before doing any other calculations. Make no other changes. (Failed miserably)
Here is the end result:
---------------
import sys
# Calculate based on argument count (assuming valid numbers provided)
if len(sys.argv) == 4:
product = float(sys.argv[1]) * float(sys.argv[2]) * float(sys.argv[3])
result = product if len(sys.argv) == 3 else product * float(sys.argv[4])
else:
sum_of_first_two = float(sys.argv[1]) + float(sys.argv[2])
result = sum_of_first_two if len(sys.argv) == 3 else sum_of_first_two * float(sys.argv[3])
# Print the result
print(result)
# Exit the program
sys.exit()
---------------
And here is how it fails
---------------
$ python3 adder.py 3 4 2
Traceback (most recent call last):
File "adder.py", line 6, in
result = product if len(sys.argv) == 3 else product * float(sys.argv[4])
IndexError: list index out of range
---------------
It helps with homework but not jobs (Score:4)
The date seems significant because it implies that chat-gpt does a pretty good job finding an existing answer but a pretty poor job at creating a novel answer.
I am at once feeling more secure in my job and more worried about the influx of chat-gpt-kiddies.
Of course, my job is only secure if my managers understand this. Pardon me, I have to go get chat-gpt to write an email for me.
Re: (Score:3)
As an Engineering Manager, I feel very secure that my devs will continue to have their jobs for as long as we can keep them. I see room for some acceleration via AI, but this is only a realistic threat to people doing work that can be verified with nothing more than a visual inspection (i.e. trivial tasks that we wouldn’t even accept as work in the first place).
Meanwhile, the company I’m at specializes in project rescue work, and I anticipate a lot more of it coming our way in the years ahead.
solving problems? (Score:4, Insightful)
And what exactly does "solving problems" mean? Under what criteria is generative AI considered "good" at programming? I'd suspect under real criteria, good design and good implementation, the success rate would surely be zero. Under contrived tests where the least possible unit test is passed, perhaps it is higher.
Re:solving problems? (Score:4, Insightful)
LLMs cannot "solve" problems at all. All they can do is calculate probabilities that a solution they have seen fits the problem, using some correlations. The results may fit, may partially fit or be complete crap and, bonus!, the LLM has no clue which if the three it is.
Just to illustrate this: I just have corrected a Python "open Internet" exam. LLMs are not even capable of understanding that indention is critical in Python. Or that a simple specification with three simple steps actually means these steps need to be done in the order specified.
Re:solving problems? (Score:5, Insightful)
I just have corrected a Python "open Internet" exam. LLMs are not even capable of understanding that indention is critical in Python.
I'm not capable of understanding why anyone would design a language like that either.
Re: (Score:2)
I'm inclined to agree. It's part of the macho cult of C.
I'm currently doing python day to day. I don't love it, but it's fine. You get used to the lack of close braces, though I do find that a persistent but minor irritation when refactoring. I like that in vim, I can easily select code by block, or smoosh code around quickly then hit = to reindent correctly.
But this is a very minor gripe. I'm not refactoring all day every day. Occasionally I need a few more keystrokes in my favourite editor THEREFORE IT S
Re: (Score:3)
I must oppose this, I too think whitespace should not be deciding functionality of how code runs.
And you are a bit harsh on the previous poster, they never claimed to have difficulty understanding or coding it, just that they disliked the design choice.
My biggest gripe with Python is how it makes some coders write totally unreadable one- or two-liners to solve a problem. It's like some think they are competing in an obfuscating code contest.
Re: (Score:2)
My biggest gripe with Python is how it makes some coders write totally unreadable one- or two-liners to solve a problem
Now this I've never seen. I've certainly seen this with perl and c, but with python if anything I've seen the language actually try to prevent things happening in a one liner. To the point where I've seen regret expressed about the existence of the lambda capability, which is generally pretty readable.
Re: (Score:2)
Indeed. Perl and C? Unreadable code is really easy and many "coders" seem to believe it is actually a goal. Python? Possible, but you have to work for it.
Re: (Score:2)
Requiring minimal actual skills from people claiming to be competent is not "bullshit toxic macho type flexing". But calling that "bullshit toxic macho type flexing" is bullshit toxic macho type flexing. You are calling out yourself.
And since when did proper indentation make things "more obscure and more difficult"? You must be one of those: https://blog.codinghorror.com/... [codinghorror.com]
Re: (Score:2)
I saw a quote that I'm having a hard time finding that sums it up. LLMs don't provide information, they provide information shaped output.
What it spits out looks credible, and in some cases that ends up being credible, but it's clear after some experience that it just spews out credible looking stuff that shouldn't be credible. It's a bullshitting machine. So it'd make a good executive.
Re: (Score:2)
That is actually a very good description.
Same useless AI suggestion shit as on Google (Score:3)
Wish it was 1999 when search results were just the search results without an AI or SEO trying to meddle with the results.
Re: (Score:2)
SEO totally sucks.
Of course the summary leaves the important data (Score:4, Interesting)
...out: the human success rates for comparison.
I can't access the research paper, sadly, to see what acceptance rates are for each group. But searching on Reddit, it sounds like LeetCode acceptance rates tend to be pretty low. For example this "Easy" question [leetcode.com] has a 17,4% acceptance rate (though that's apparently a particularly low rate for an "easy" question).
Anyone have access to the paper to see what the human mean scores were on the same problems?
Re: (Score:3)
It's like when the MOOCs came online, where they teach university courses without requiring exams or homework. Most people who sign up for these courses watch one or two vids and never finish aftwerwards.
Re: (Score:2)
I don't see how this is a valid argument. If anything, since LeetCode is "a test", and incentives (even gamifies, arguably) success by letting people compete for the highest scores, I expect people to try *far* harder than for some random piece of code for some random project that they were just rushing to complete for some deadline.
Re: (Score:2)
Re: (Score:2)
If you search on Reddit, everyone is comparing their scores with everyone else. And those who do poorly tend to be frustrated with themselves and/or LeetCode. Everyone is clearly quite motivated to score well - and I'd argue, far more than they would be from some random drudgework at work that they've been working on for years.
Re: (Score:2)
Problem is there are a handful of LLMs and so by testing "an interaction with " you've tested the best and worst of a large chunk of the AI. They also either work or don't work on the very first try, there's no point in telling it "failed, try again", because it'll just flounder about randomly. With sampling the acceptance rate of the test you include people that aren't very good or don't even care (e.g. some required training by their company).
Also, never used the platform myself, so I don't know how "at
Re: (Score:2)
How good it is at "trying again" often tends to come more down to the finetune than the underlying model, in my experience. I find LLaMA 3 much better at trying again than ChatGPT, for example.
That said, none of them do independent A* right now.
good enough (Score:3)
If you know how to use it (I reckon most people don't) it's "good enough". I use it for a lot of things when I am lazy. It's good enough for most things .. definitely beats "junior" programmers which is a bit scary because it's hard to become a senior programmer without getting a junior programmer job first. If AI is doing all the junior skilled work .. where's the pathway to becoming senior? Years of learning without a job? We've already made it so most jobs need a bachelor's degree. Now we're going to ask people to show up with a Masters? Note. I said it's hard to .. not impossible. There's always going to be people who can learn and portfolio their way to senior programmer without an entry-level role.
Re: good enough (Score:2)
I'd argue GPT cheapens the value of a degree. It no longer even demonstrates a basic level of understanding if GPT does all your homework.
Better off hiring high school graduates who have been coding half their lives.
Re: (Score:2)
100% agree, but those people are rare. Only caveat is that they must have done AP Calculus BC and maybe Statistics (or be willing to get up to speed on it within a year). Yeah I know that you don't need that for most devops/coding jobs these days .. but I find it's a good filter and also shapes the mind into an engineering mindset.
Re: good enough (Score:2)
well ieee is an organization, ieee spectrum is a journal (well, aagazine really, but often they republish journal articles)
The truth (Score:4, Insightful)
ChatGPT understands nothing, therefore it cannot do anything truly novel. That it can regurgitate useful code at all after being trained on existing example code is a marvel.
And that's what I use it for - fast search-free regurgitation of code I know has been done before. Then it just needs a quick review to ensure it makes sense and you're off a lot faster than you could have typed and debugged your own code.
Re: The truth (Score:3)
it is actually pretty good at that. I needed to do some basic nlp but i never did much nlp myself. Things like find words of so many syllable that rhymes with that other word. It would have taken me a day to find the right libraries understand how they work, write the code, teat and debug.
With chatgpt, it right away suggested a library and gave me starter code, somewhat wrong so i had to adjust. But in 90 minutes i was done
Re: (Score:2)
ChatGPT understands nothing, therefore it cannot do anything truly novel.
That's what too many people don't seem to understand, or rather, believe that it has actual cognitive ability, instead of just the illusion of cognition, when in reality it has no such capability and never will.
terrible benchmark (Score:4, Insightful)
Leetcode is a terrible benchmark.
Its used for whiteboarding interviews, which is total bullshit to begin with... however, that style of code is *NEVER* used on the job, because they're already solved problems, solved better than any 1 person can come up with, peer reviewed a 1000 times over, and all rolled up into nice neat little libraries.
Can ChatGPT write brand new code to solve novel problems? no. because its an over-glorified copy-pasta bot.
Re: (Score:2)
What are you talking about, I constantly need to (checks leetcode question) determine if a string could have perfectly even letter distribution if one and only one character is removed from it.
Yeah, test questions are a stupidly useless metric for comparing humans to LLMs. They are arguably stupid for evaluating human performance, but become so much worse for LLMs, that have a different set of strengths and weaknesses that generally favor being better at these sorts of "test" questions than they are at rea
what fraction is verbatim copy? (Score:4, Insightful)
Since it says that it had higher accuracy if the problem was published before 2021, I am guessing that it may be doing verbatim copy of the existing solutions. More the solutions, more likely it will fit in one of the patterns that it had seen and more likely that it picked up that solution.
A better benchmark would have been to compare with what you get from search engines and see if they materially differ. I have tried dozens of coding problems with ChatGPT and all I get is usually a boilerplate code. As an example, I asked ChatGPT to write a code about moon phase. Its code was correct but it used horrible formula which gave pathetic answers. Fixing one line formula made it work. ChatGPT had no idea which of the dozens of solutions on the internet was a correct one.
Re: (Score:2)
Since it says that it had higher accuracy if the problem was published before 2021, I am guessing that it may be doing verbatim copy of the existing solutions.
Instead of guessing, why not try it?
Then, after you try it, if you want to learn something useful instead of score debating points, ask it for some revisions. "Can you rewrite this in Python for me?" or "can you implement the following additional requirements? 1. blah blah 2. yadda yadda ..."
Re: (Score:2)
The problem is precisely it isn't a verbatim copy.
The crap takes some examples that work, slices them up, and then, based on whatever coefficients it derived for the frequency of some strings in what it got as input, it generates crap that looks like code, but won't work.
Tried it with trivial tasks in several domains:
- simple webapp (show me an example of two components (A, B), three instances (A1, A2, B1), B reacts on A1, A2 change, for a specific simple js framework) - complete fail, example loads
LLMs are always bad at programming (Score:2)
LLMs can't reliably write working code, let alone write goodcode. It is as fundamental as the Halting Problem. A thing LLMs aren't aware of.
Pretty good for boilerplate...if you are competent (Score:3)
It's the classic garbage-in, garbage-out problem: if you're a good coder and good prompter, LLMs deliver useful scaffolds. If you suck, it will suck.
This week in AI (Score:3)
Waymo Issues Software and Mapping Recall After Robotaxi Crashes Into a Telephone Pole
https://tech.slashdot.org/stor... [slashdot.org]
From Schneier on Security
Using AI for Political Polling
https://www.schneier.com/blog/... [schneier.com]
See, polling has gotten hard, because people don't answer their phones any more, and if they do answer they won't talk to you, and if they do talk to you, they may tell you what they think you want to hear, rather than what they really think. So what we can do, see, is create these AI chat-bots that act like people, and then poll the chat-bots instead of calling real people.
From the article
I am simply gob-smacked that adults—actual grown-up people—continue to take this stuff seriously.
Re: (Score:2)
From the article
I am simply gob-smacked that adults—actual grown-up people—continue to take this stuff seriously.
Same here. In actual reality, it can do none of those things. As soon as they are a tiny bit unexpected, it cannot even do basic things right. The only thing that happens is that lies are getting more extreme. Usually that is a sign of the mindless hype nearing its end. We can only hope it is here too.
It is a fuckup (Score:2)
Well, many coders are fuckups as well (see, for example: https://blog.codinghorror.com/... [codinghorror.com]) and these it can, maybe, replace to a degree. But forget about having it write even a simple piece of original code or about it actually understanding even a simple specification.
Re: (Score:2)
Define "original code". There's only so many ways to write the same thing and your average programmer has seen a lot of code before they write anything meaningful.
On the other hand, the software doesn't understand anything, and it doesn't even do a good job of pretending to understand anything it hasn't seen before. That means it can only even mock understanding of specifications which are highly similar to specifications the system was trained on. On the other hand, the massive duplication of effort in pro
You Know What? (Score:3)
I wish I'd known when I was in high school that I'd never have a decent job.
I would have done things quite differently.
Super cool! (Score:2)
ChatGPT can easily solve problems that have already been solved in the past and discussed over the Internet. It's not so good with newer insufficiently discussed ones.
Some of my colleagues try to use ChatGPT (Score:2)
Essentially it's OK if you could also copy and paste code from stackoverflow. It severely breaks if you want to do anything beyond that. For example we once asked it to write a configuration file snipped for "yate", a commonly used VoIP software with a strong focus on mobile application. This configuration file snipped was support to reject SIP "MESSAGE"-Requests.
The result was an ini-file style file which was something like "MESSAGE=reject".
In reality yate is configured via something called "regex-route".
Whistling in the dark (Score:2)
The highest rated comments on these stories tell me a few things:
1. The commenters either haven't really tried, or don't know how to effectively use, ChatGPT.
2. The commenters have really poor PM skills.When working with ChatGPT, you need to give it a good set of requirements, and good feedback so it can do revisions.
It's a tool. When you learn how to use it, it's an incredible help. Truly incredible stuff, unless you've put the goalposts onto a bullet train, sending them away into the distance.
Coding is not just writing independent modules (Score:2)
What's missed in all of this hype is that coding is a team activity. Put ten software engineers together, each using AI to generate their parts, and you get a big, unmaintainable mess with no design consistency and tons of redundant code.
It sometimes fails to understand ... (Score:4, Insightful)
> It sometimes fails to understand the meaning of questions,
No, it never understands the meaning of the question. That's the whole problem with LLMs.
"extremely broad range of success". Uhhhh. Yea. (Score:2)
The results show that ChatGPT has an extremely broad range of success when it comes to producing functional code
By which I'm guessing that if I produced code like that then I'd be experiencing "a broad range of what to do with my time now that I've been fired"
No funny bone (Score:2)
High hopes for the topic, but... No one noticed any examples of funny code passed by ChatGPT?
Well that was useless (Score:2)