AI Coding Competition Pits GPT-4 Against Bard, GitHub Co-Pilot, Bing, and Claude+ (hackernoon.com) 39

Posted by EditorDavid on Sunday April 30, 2023 @07:34AM from the battling-bots dept.

HackerNoon tested five AI bots on coding problems from Leetcode.com — GPT-4, GitHub Co-Pilot, Bard, Bing, and Claude+.

There's some interesting commentary on the strengths and weaknesses of each one -- and of course, the code that they ultimately output. The final results? [GPT-4's submission] passes all tests. It beat 47% of submissions on runtime and 8% on memory. GPT-4 is highly versatile in generating code for various programming languages and applications. Some of the caveats are that it takes much longer to get a response. API usage is also a lot more expensive and costs could ramp up quickly. Overall it got the answer right and passed the test.

[Bing's submission] passed all the tests. It beat 47% of submissions on runtime and 37% on memory. This code looks a lot simpler than what GPT-4 generated. It beat GPT-4 on memory and it used less code! Bing seems to have the most efficient code so far, however, it gave a very short explanation of how it solved it. Nonetheless, best so far.
But both Bard and Claude+ failed the submission test (badly), while GitHub Copilot "passes all the tests. It scored better than 30% of submissions on runtime and 37% on memory."

AI Coding Competition Pits GPT-4 Against Bard, GitHub Co-Pilot, Bing, and Claude+

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 39 Comments Log In/Create an Account

Comments Filter:

Fixed, short, known problems. (Score:5, Informative)

by bradley13 ( 1118935 ) writes: on Sunday April 30, 2023 @07:54AM (#63486574) Homepage

They took a standard problem from a public programming competition site. One where people have answers in their public repos.
Color me unimpressed.

- Re: (Score:3)
  
  by Sun ( 104778 ) writes:
  
  What's more, they took problems that were, in all likelihood, part of the AI's original training set.
  Yeah, this isn't a very interesting "contest".
  - Re: (Score:1)
    
    by coopertempleclause ( 7262286 ) writes:
    
    Makes you wonder about the ones that still managed to fail...
    - Re: (Score:3)
      
      by Sun ( 104778 ) writes:
      
      All the ones that passed are based on ChatGPT. All the ones that didn't are not.
      - Re: (Score:2)
        
        by phantomfive ( 622387 ) writes:
        
        So OpenAI managed to get this solutions page into their dataset [leetcode.com].
- Re: Fixed, short, known problems. (Score:1)
  
  by sixminuteabs ( 1452973 ) writes:
  
  Then I guess it is particularly embarrassing for Google that Bard could not even do that.
  - Re: Fixed, short, known problems. (Score:5, Funny)
    
    by Rosco P. Coltrane ( 209368 ) writes: on Sunday April 30, 2023 @09:52AM (#63486658)
    
    Google isn't great at returning non-sponsored results.
    
- Re: (Score:3)
  
  by hdyoung ( 5182939 ) writes:
  
  Thanks. I was going to post here and ask exactly this. My coding experience is sparse, but those problems look like minor variants of “hey, freshman, write a snippet of nested loop code and don’t f*&k it up”. Stuff that’s been posted a billion times on a billion coding forums for the past billion years. Ok that’s superlative but you get the drift).
  
  I’ve played with chatGPT. In some ways it’s impressive. But my understanding is that’s it’s basicall
  - Re:Fixed, short, known problems. (Score:4, Insightful)
    
    by rattaroaz ( 1491445 ) writes: on Sunday April 30, 2023 @12:00PM (#63486798)
    
    The vast majority of ALL problem solving is pattern recognition of previous experience. No matter what your field, how much is original thinking, and how much is based upon past experience? I guess the point is that even if AI solves the 99% of pattern recognition, and you need people for the remaining 1%, then that is a win. Not only that, but if it is able bring together all posted code from the internet, then it would bring together the collected intelligence/experience of everyone posting on the internet. For coding, that would be very powerful by itself. For slashdot forums though, not so much.
    
    - Re: (Score:2)
      
      by Visarga ( 1071662 ) writes:
      
      But you have to check the 99% AI solves, can't just run it blindly. Checking takes long. The boost is small.
    - Re: (Score:2)
      
      by quantaman ( 517394 ) writes:
      
      The vast majority of ALL problem solving is pattern recognition of previous experience. No matter what your field, how much is original thinking, and how much is based upon past experience? I guess the point is that even if AI solves the 99% of pattern recognition, and you need people for the remaining 1%, then that is a win. Not only that, but if it is able bring together all posted code from the internet, then it would bring together the collected intelligence/experience of everyone posting on the internet. For coding, that would be very powerful by itself. For slashdot forums though, not so much.
      Sure, but I think that 99% is a very charitable assessment of current code AIs.
      They're a valuable tool for sure, but I find they tend to have trouble understanding exactly what sub-problem needs to be solved in the scope of the wider application. And they tend to struggle with using APIs properly.
      I find the best process for using them is trying to write detailed specific comments, and using that as a prompt for the AI. When it works you get useful code and useful comments!
  - Re: (Score:2)
    
    by Visarga ( 1071662 ) writes:
    
    It will still write 50% of your code for you - the boilerplate and standalone pieces, but debugging its code takes just as much as debugging your own and typing doesn't take so long. So it's maybe a 20% boost. Nice, makes coding more pleasant. Won't replace devs yet.
  - Re: (Score:1)
    
    by vivian ( 156520 ) writes:
    
    I'm pretty impressed with ChatGPT for everyday problem solving. For example - I asked:
    Me: I want to fence a 10 meter by 20 meter paddock with a barb wire fence that has a top wire and a bottom wire. How much wire do I need?
    ChatGPT: To calculate the length of wire needed to fence a 10 meter by 20 meter paddock with a top and bottom wire, you'll need to calculate the perimeter of the paddock first.
    The perimeter is the total distance around the paddock, which is equal to twice the length plus twice the width.
    S
- I just had kind of a scary thought (Score:2)
  
  by istartedi ( 132515 ) writes:
  
  You've all heard of SEO (Search Engine Optimization) I assume.
  What would you call SEO where people are creating Open Source software, not so that people can use it (if they do, fine, whatever) but so the AI can be tainted with bad practices that make systems easier to exploit?
  We've all seen junk sites where they just frame content, purchase McStories from various places and break pages in to 5 frames so you have to look at more ads. What if we start seeing a crapflood of subtly altered programming "tutor
  - Re: (Score:2)
    
    by Visarga ( 1071662 ) writes:
    
    A nice cat and mouse game, AI will win in the end, it will be like fake news detection, use a bunch of journalists in the first line and the reify their judgements.
- Re: (Score:2)
  
  by phantomfive ( 622387 ) writes:
  
  You can see the solutions posted in forums right next to the problem [leetcode.com]. Apparently Bard and Claude+ didn't have that web page in their data set.
- Re: (Score:2)
  
  by Jeremi ( 14640 ) writes:
  
  Agreed -- it seems that a Google search (or even an Introduction to Programming textbook) could also pass this test; no AI required.
  A better challenge would be asking all of the AIs to come up with a solution to a problem that hasn't been posed anywhere before.
  - Re: (Score:1)
    
    by vivian ( 156520 ) writes:
    
    A better challenge would be asking all of the AIs to come up with a solution to a problem that hasn't been posed anywhere before.
    I just did that.
    My question: I want to build a new drywall and doorway to divide a large 10 x 5 meter room into two 5x5 meter rooms. The room is 2.4 meters high and the door will be a standard sized 90 cm x 200cm doorway. please generate a list of materials that are needed, excluding tools. Use australian building standards.
    ChatGPT
    Sure, here's an updated list of materials for building a new drywall and doorway to divide a large 10 x 5 meter room into two 5x5 meter rooms, based on Australian building standa
- Re: (Score:2)
  
  by Visarga ( 1071662 ) writes:
  
  > To start off we are going to test the AI on a hard Leetcode question, after all, we want to be able to solve complex coding problems. We also wanted to test it on a less well-known question.
  
  So half was known, but a hard problem, and the other half was a "surprise".
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  Indeed. Looks good to those that do not think about it, is basically meaningless to those with a clue.
What sort of AI did they code? (Score:1)

by greytree ( 7124971 ) writes:

Better than all these fake AI programs, that are all dumb pattern matchers with a big database?
- - Re: (Score:1)
    
    by greytree ( 7124971 ) writes:
    
    Damn. My pattern matchers with a big database got stuck in a loop on the "my pattern matchers with a big database" key.
    
    Needs some work.
Coding problems? (Score:2)

by roskakori ( 447739 ) writes:

TFS mentions (multiple) "coding problems" while TFA only seems to describe a single one. And that's some palindrome toy function which is pretty pointless concerning real world usage.
It would have been nice to have seen a couple of tasks that are representative for day to day developer work.
- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  Indeed. Someone previously posted code on /. showing how GPT-3 wanted to solve a file reading problem in plain old C. It was a festering bug-ridden mess with buffer overruns, use-after-frees, etc., but was still probably equivalent to what an unsupervised CS graduate would write.
- Re:Coding problems? (Score:5, Insightful)
  
  by vux984 ( 928602 ) writes: on Sunday April 30, 2023 @09:30AM (#63486648)
  
  Go to bugzilla for firefox. pick 10 random bugs, and tell it to fix them. ;)
  I look forward to the results.
  
  - Re: (Score:2)
    
    by rattaroaz ( 1491445 ) writes:
    
    We may have to wait for GPT5. You can't ask a programmer who just started 1 year ago to do the same either. This may be another real life story of John Henry, who beat the machine, but in the end, the march of progress beats the man. We will have to see, as the story is not over yet.
    - Re: (Score:2)
      
      by phantomfive ( 622387 ) writes:
      
      Or, it could be an example of a computer that copied another person's solution, without understanding what it does. In fact, that's exactly what it is.
- Re: (Score:2)
  
  by rattaroaz ( 1491445 ) writes:
  
  Remember that this is just the start. Beginning programmers start with stupid useless problems to solve, and work their way to complicated systems. The AI models are still at the beginning. The idea that it can be done at all, was fantasy just a year ago.
  - Re:Coding problems? (Score:4, Interesting)
    
    by StormReaver ( 59959 ) writes: on Sunday April 30, 2023 @04:32PM (#63487208)
    
    The idea that it can be done at all, was fantasy just a year ago.
    It would not surprise me one bit if LLM's have reached their useful limits already. The most significant advancement I've seen with LLM's is the Human-Machine language interface. That part is a resounding success. I've been completely unimpressed with everything else it does, and I would be quite surprised if it got much better.
    
- Re: (Score:2)
  
  by phantomfive ( 622387 ) writes:
  
  It is almost certain the GPT had this page in its database [leetcode.com].
Is real coding like this at all? (Score:2, Interesting)

by joe_frisch ( 1366229 ) writes:

From what I've seen, most coding is not writing some well defined algorithm, but more like "we have this badly documented and buggy tool we need to interface wit this other buggy and badly documented tool. "
- ObEliza (Score:2)
  
  by greytree ( 7124971 ) writes:
  
  And what make you think most coding is not writing some well defined algorithm, but more like "we have this badly documented and buggy tool we need to interface wit this other buggy and badly documented tool.
- Re: (Score:3)
  
  by dargaud ( 518470 ) writes:
  
  Yeah, the biggest program I worked on started with half a page of specifications "for some quick tests". After 3 years and 30000 lines of code it now controls a nuclear reactor and a particle accelerator and hasn't had a bug found in it in almost 10 years of production ! I'm not even kidding.
Who cares? (Score:1)

by Opportunist ( 166417 ) writes:

Let them duke it out, then shoot the winner.
- Re: (Score:1)
  
  by Austerity Empowers ( 669817 ) writes:
  
  I wouldln't use Bing even if it won, let's hope it shoots itself.
I hope my code performs well (Score:1)

by El_Muerte_TDS ( 592157 ) writes:

After all, they trained on it.
Would have liked to see CodeWhisperer as well (Score:2)

by djo26 ( 1466671 ) writes:

Maybe it wasn't released in time for this article, but it would have been great to see this tech in the mix as well: https://aws.amazon.com/codewhi... [amazon.com]
Google bard is NOT where it is at (Score:3)

by linuxguy ( 98493 ) writes: on Monday May 01, 2023 @03:28PM (#63489484) Homepage

> But both Bard and Claude+ failed the submission test (badly),
My personal experience was similar. ChatGPT-4 is an effective tool as a software development assistant. Most people complaining about ChatGPT-4 haven't used it. Or have not learned how to use it properly. If they are developers, these luddites will eventually learn the hard way, or be replaced by someone who is not reluctant to use best tools for the job.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Fixed, short, known problems. (Score:5, Informative)

Re: (Score:3)

Re: (Score:1)

Re: (Score:3)

Re: (Score:2)

Re: Fixed, short, known problems. (Score:1)

Re: Fixed, short, known problems. (Score:5, Funny)

Re: (Score:3)

Re:Fixed, short, known problems. (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

I just had kind of a scary thought (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

What sort of AI did they code? (Score:1)

Re: (Score:1)

Coding problems? (Score:2)

Re: (Score:1)

Re:Coding problems? (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Coding problems? (Score:4, Interesting)

Re: (Score:2)

Is real coding like this at all? (Score:2, Interesting)

ObEliza (Score:2)

Re: (Score:3)

Who cares? (Score:1)

Re: (Score:1)

I hope my code performs well (Score:1)

Would have liked to see CodeWhisperer as well (Score:2)

Google bard is NOT where it is at (Score:3)

Related Links Top of the: day, week, month.

Slashdot Top Deals