AI Coding Competition Pits GPT-4 Against Bard, GitHub Co-Pilot, Bing, and Claude+ (hackernoon.com) 39
HackerNoon tested five AI bots on coding problems from Leetcode.com — GPT-4, GitHub Co-Pilot, Bard, Bing, and Claude+.
There's some interesting commentary on the strengths and weaknesses of each one -- and of course, the code that they ultimately output. The final results? [GPT-4's submission] passes all tests. It beat 47% of submissions on runtime and 8% on memory. GPT-4 is highly versatile in generating code for various programming languages and applications. Some of the caveats are that it takes much longer to get a response. API usage is also a lot more expensive and costs could ramp up quickly. Overall it got the answer right and passed the test.
[Bing's submission] passed all the tests. It beat 47% of submissions on runtime and 37% on memory. This code looks a lot simpler than what GPT-4 generated. It beat GPT-4 on memory and it used less code! Bing seems to have the most efficient code so far, however, it gave a very short explanation of how it solved it. Nonetheless, best so far.
But both Bard and Claude+ failed the submission test (badly), while GitHub Copilot "passes all the tests. It scored better than 30% of submissions on runtime and 37% on memory."
There's some interesting commentary on the strengths and weaknesses of each one -- and of course, the code that they ultimately output. The final results? [GPT-4's submission] passes all tests. It beat 47% of submissions on runtime and 8% on memory. GPT-4 is highly versatile in generating code for various programming languages and applications. Some of the caveats are that it takes much longer to get a response. API usage is also a lot more expensive and costs could ramp up quickly. Overall it got the answer right and passed the test.
[Bing's submission] passed all the tests. It beat 47% of submissions on runtime and 37% on memory. This code looks a lot simpler than what GPT-4 generated. It beat GPT-4 on memory and it used less code! Bing seems to have the most efficient code so far, however, it gave a very short explanation of how it solved it. Nonetheless, best so far.
But both Bard and Claude+ failed the submission test (badly), while GitHub Copilot "passes all the tests. It scored better than 30% of submissions on runtime and 37% on memory."
Fixed, short, known problems. (Score:5, Informative)
They took a standard problem from a public programming competition site. One where people have answers in their public repos.
Color me unimpressed.
Re: (Score:3)
What's more, they took problems that were, in all likelihood, part of the AI's original training set.
Yeah, this isn't a very interesting "contest".
Re: (Score:1)
Re: (Score:3)
All the ones that passed are based on ChatGPT. All the ones that didn't are not.
Re: (Score:2)
Re: Fixed, short, known problems. (Score:1)
Re: Fixed, short, known problems. (Score:5, Funny)
Google isn't great at returning non-sponsored results.
Re: (Score:3)
I’ve played with chatGPT. In some ways it’s impressive. But my understanding is that’s it’s basicall
Re:Fixed, short, known problems. (Score:4, Insightful)
Re: (Score:2)
Re: (Score:2)
The vast majority of ALL problem solving is pattern recognition of previous experience. No matter what your field, how much is original thinking, and how much is based upon past experience? I guess the point is that even if AI solves the 99% of pattern recognition, and you need people for the remaining 1%, then that is a win. Not only that, but if it is able bring together all posted code from the internet, then it would bring together the collected intelligence/experience of everyone posting on the internet. For coding, that would be very powerful by itself. For slashdot forums though, not so much.
Sure, but I think that 99% is a very charitable assessment of current code AIs.
They're a valuable tool for sure, but I find they tend to have trouble understanding exactly what sub-problem needs to be solved in the scope of the wider application. And they tend to struggle with using APIs properly.
I find the best process for using them is trying to write detailed specific comments, and using that as a prompt for the AI. When it works you get useful code and useful comments!
Re: (Score:2)
Re: (Score:1)
I'm pretty impressed with ChatGPT for everyday problem solving. For example - I asked:
Me: I want to fence a 10 meter by 20 meter paddock with a barb wire fence that has a top wire and a bottom wire. How much wire do I need?
ChatGPT: To calculate the length of wire needed to fence a 10 meter by 20 meter paddock with a top and bottom wire, you'll need to calculate the perimeter of the paddock first.
The perimeter is the total distance around the paddock, which is equal to twice the length plus twice the width.
S
I just had kind of a scary thought (Score:2)
You've all heard of SEO (Search Engine Optimization) I assume.
What would you call SEO where people are creating Open Source software, not so that people can use it (if they do, fine, whatever) but so the AI can be tainted with bad practices that make systems easier to exploit?
We've all seen junk sites where they just frame content, purchase McStories from various places and break pages in to 5 frames so you have to look at more ads. What if we start seeing a crapflood of subtly altered programming "tutor
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Agreed -- it seems that a Google search (or even an Introduction to Programming textbook) could also pass this test; no AI required.
A better challenge would be asking all of the AIs to come up with a solution to a problem that hasn't been posed anywhere before.
Re: (Score:1)
A better challenge would be asking all of the AIs to come up with a solution to a problem that hasn't been posed anywhere before.
I just did that.
My question: I want to build a new drywall and doorway to divide a large 10 x 5 meter room into two 5x5 meter rooms. The room is 2.4 meters high and the door will be a standard sized 90 cm x 200cm doorway. please generate a list of materials that are needed, excluding tools. Use australian building standards.
ChatGPT
Sure, here's an updated list of materials for building a new drywall and doorway to divide a large 10 x 5 meter room into two 5x5 meter rooms, based on Australian building standa
Re: (Score:2)
So half was known, but a hard problem, and the other half was a "surprise".
Re: (Score:2)
Indeed. Looks good to those that do not think about it, is basically meaningless to those with a clue.
What sort of AI did they code? (Score:1)
Re: (Score:1)
Needs some work.
Coding problems? (Score:2)
TFS mentions (multiple) "coding problems" while TFA only seems to describe a single one. And that's some palindrome toy function which is pretty pointless concerning real world usage.
It would have been nice to have seen a couple of tasks that are representative for day to day developer work.
Re: (Score:1)
Re:Coding problems? (Score:5, Insightful)
Go to bugzilla for firefox. pick 10 random bugs, and tell it to fix them. ;)
I look forward to the results.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re:Coding problems? (Score:4, Interesting)
The idea that it can be done at all, was fantasy just a year ago.
It would not surprise me one bit if LLM's have reached their useful limits already. The most significant advancement I've seen with LLM's is the Human-Machine language interface. That part is a resounding success. I've been completely unimpressed with everything else it does, and I would be quite surprised if it got much better.
Re: (Score:2)
Is real coding like this at all? (Score:2, Interesting)
ObEliza (Score:2)
Re: (Score:3)
Who cares? (Score:1)
Let them duke it out, then shoot the winner.
Re: (Score:1)
I wouldln't use Bing even if it won, let's hope it shoots itself.
I hope my code performs well (Score:1)
After all, they trained on it.
Would have liked to see CodeWhisperer as well (Score:2)
Google bard is NOT where it is at (Score:3)
> But both Bard and Claude+ failed the submission test (badly),
My personal experience was similar. ChatGPT-4 is an effective tool as a software development assistant. Most people complaining about ChatGPT-4 haven't used it. Or have not learned how to use it properly. If they are developers, these luddites will eventually learn the hard way, or be replaced by someone who is not reluctant to use best tools for the job.