Exhausted Man Defeats AI Model In World Coding Championship 46

Posted by BeauHD on Friday July 18, 2025 @04:30PM from the humanity-has-prevailed dept.

An anonymous reader quotes a report from Ars Technica: A Polish programmer running on fumes recently accomplished what may soon become impossible: beating an advanced AI model from OpenAI in a head-to-head coding competition. The 10-hour marathon left him "completely exhausted." On Wednesday, programmer Przemysaw Debiak (known as "Psyho"), a former OpenAI employee, narrowly defeated the custom AI model in the AtCoder World Tour Finals 2025 Heuristic contest in Tokyo. AtCoder, a Japanese platform that hosts competitive programming contests and maintains global rankings, held what may be the first contest where an AI model competed directly against top human programmers in a major onsite world championship. During the event, the maker of ChatGPT participated as a sponsor and entered an AI model in a special exhibition match titled "Humans vs AI." Despite the tireless nature of silicon, the company walked away with second place.

The competition required contestants to solve a single complex optimization problem over 600 minutes. The contest echoes the American folk tale of John Henry, the steel-driving man who raced against a steam-powered drilling machine in the 1870s. Like Henry's legendary battle against industrial automation, Debiak's victory represents a human expert pushing themselves to their physical limits to prove that human skill still matters in an age of advancing AI. Both stories feature exhausting endurance contests -- Henry drove steel spikes for hours until his heart gave out, while Debiak coded for 10 hours on minimal sleep. The parallel extends to the bittersweet nature of both victories: Henry won his race but died from the effort, symbolizing the inevitable march of automation, while Debiak's acknowledgment that humanity prevailed "for now" suggests he recognizes this may be a temporary triumph against increasingly capable machines. While Debiak won 500,000 yen and survived his ordeal better than the legendary steel driver, the AtCoder World Tour Finals pushes humans and AI models to their limits through complex optimization challenges that have no perfect solution -- only incrementally better ones. "Humanity has prevailed (for now!)," wrote Debiak on X, noting he had little sleep while competing in several competitions across three days. "I'm completely exhausted. ... I'm barely alive."

Exhausted Man Defeats AI Model In World Coding Championship

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 46 Comments Log In/Create an Account

Comments Filter:

Sleep (Score:1, Interesting)

by Anonymous Coward writes:

> while Debiak coded for 10 hours on minimal sleep
Typical highly paid person who needs to sleep after 10hrs of work
- Re: (Score:2)
  
  by Rinnon ( 1474161 ) writes:
  
  [...] noting he had little sleep while competing in several competitions across three days.
It depends on the challenge (Score:4, Insightful)

by fabiomb ( 5315421 ) writes: on Friday July 18, 2025 @04:39PM (#65530138)

I am fighting with Claude 4 and ChatGPT 4.1 using them in VSCode with Copilot as an Agent (Gemini 2.5 pro fails every time), everything is fine, but they can't figure out a simple PHP application with everything explained in a nice file ,every url is wrong, uses differents patterns of design in every page, broke stuff that worked, iterates in useless testing code that never works and more. So, probably you don't need to test an AI in a precise exercise but in a real world application, an useful one, there's the place where it belongs and it works awfuly bad at it, i need the AI to follow the instructions and not take every damn wrong path nobody is telling it to take :P

- Re:It depends on the challenge (Score:4, Interesting)
  
  by Brain-Fu ( 1274756 ) writes: on Friday July 18, 2025 @04:49PM (#65530154) Homepage Journal
  
  From the summary: "The competition required contestants to solve a single complex optimization problem over 600 minutes."
  And there you have it! It's a sufficiently-specific problem based on a knowledge base that is sufficiently-well-established in sources that can be used as training materials for AI. So, that's why it did so well.
  Once you start putting the AI into real world situations, like the ones described in the parent post, it performs much worse. I literally just spent time fiddling with permissions because an AI blatantly lied to me about what could cause the error message I was seeing, when the problem had nothing to do with permissions at all. I doubt it ever would have been able to figure the problem out on its own.
  This is why all the hype about replacing human programmers with AI is still just hype. It is true that the AI can do some amazing things, but it is not true that it can replace human software developers.
  Not yet, anyway.
  
  - Re: (Score:3)
    
    by Wolfling1 ( 1808594 ) writes:
    
    Thre are links on the AtCoder website for the challenge that take you to the exact problem to be solved. Its an interesting puzzle to read. At first, I was like 'there's no UI involved, of course that's why the AI did well', but then I read some more and realized that this was exactly the same kind of real world problem I worked on in the 90s to do with hard drive optimisation.
    
    The problem they offered was not a common problem, but it was definitely mappable to some real world problems.
    
    It demonstrates
    - Re: (Score:3, Informative)
      
      by 0123456 ( 636235 ) writes:
      
      Exactly. In the 90s we still used to try to optimize C code by using register variables and complex function structure that happened to suit the way the processor worked.
      Then we stopped doing that because we realized the new compilers could optimize it a heck of a lot better.
      Now we typically don't even write programs that generate machine code any more but feed everything into a VM that generates code on the fly.
      I don't remember having to do any serious optimization for years, and it was mostly stuff like b
    - Re: (Score:2)
      
      by phantomfive ( 622387 ) writes:
      
      It seems the problem was an NP hard problem, and the goal was to build a heuristic that gets as close as possible to a good answer without being perfect.
      
      In other words, in this contest, the goal was literally to write buggy code.
  - Re: It depends on the challenge (Score:2)
    
    by LindleyF ( 9395567 ) writes:
    
    I tried asking Gemini about an error in a screenshot. Somehow, it convinced itself the error said something different than what it said. That was bizarre.
Reminds me of old John Henry (Score:3)

by bittmann ( 118697 ) writes: on Friday July 18, 2025 @04:48PM (#65530152) Journal

John Henry told his captain
'A man ain't nothin' but a man
But before I let your steam drill beat me
Down
I'd die with a hammer in my hand. Lord
Lord
I'd dies with a hammer in my hand.'

- Re: (Score:2)
  
  by timeOday ( 582209 ) writes:
  
  You beat me to it. This guy is the modern John Henry!
  - Re: (Score:2)
    
    by phantomfive ( 622387 ) writes:
    
    The summary beat you to it. ltr
Competitive coding is not... (Score:3)

by MpVpRb ( 1423381 ) writes: on Friday July 18, 2025 @04:51PM (#65530162)

...the same as solving real and complex problems, especially if they are not precisely defined

- Re: (Score:3)
  
  by Luthair ( 847766 ) writes:
  
  Yea, I'd even question the notion that 'top coders' go to these events. I've had the opportunity to work with a lot of people who pioneered modern open source developer tools over the past 20-years and I can't recall ever talking about coding competitions.
  - Re: (Score:2)
    
    by Rinnon ( 1474161 ) writes:
    
    The average person doesn't understand the difference between a mid coder and a top coder unless it's explained to them in tournament format.
Humans always beat computers.. until we don't (Score:2)

by LodCrappo ( 705968 ) writes:

If computing has taught us anything, it's that they tend to improve in capability
- Re: (Score:2)
  
  by zephvark ( 1812804 ) writes:
  
  It's not really going to help if "AI" just makes mistakes faster than before. This is a technology that doesn't look like it improves significantly.
  - Re: (Score:2)
    
    by LodCrappo ( 705968 ) writes:
    
    What is your basis for thinking LLMs don't improve? Genuinely, where does that come from. All the charts and benchmarks and records I see are going up every time there is a new one.
    - Re: Humans always beat computers.. until we don't (Score:2)
      
      by reanjr ( 588767 ) writes:
      
      Shit like this probably: https://garymarcus.substack.co... [substack.com]
      - Re: Humans always beat computers.. until we don't (Score:2)
        
        by macmurph ( 622189 ) writes:
        
        I just took the time to read most his 2022 post and it didnt hold up very well. He illustrates why gpt 3 is a dead end but I just tried the exact example on gpt 4o and it passed with flying colors.
        The premise is that scaling wonâ(TM)t work and that the world is overly focused on the LLM approach.
        He claims tagging images wont work well enough for radiology. Fine but companies like Surona medical seem to be still around years later.
        He claims that image tagging wont work but an iPhone does an amazing job
        
        Re: (Score:2)
        
        by djinn6 ( 1868030 ) writes:
        
        Finally computer coding is obviously working quite well.
        You should try to use it for a real world problem, not a simple problem with solutions that people wrote down ages ago and that the "AI" can recall from their training data.
        
        Re: Humans always beat computers.. until we don't (Score:2)
        
        by reanjr ( 588767 ) writes:
        
        I spent an hour trying to get Gemini to create an empty Android project folder. I gave up.
        
        Re: (Score:2)
        
        by DamnOregonian ( 963763 ) writes:
        
        Fascinating.
        
        I just did exactly that using Aider 0.51 and Qwen3+Qwen2.5 locally.
        I'm not going to bother asking Gemini, but I'm guessing you're either full of shit, or didn't really want to succeed.
        
        Re: Humans always beat computers.. until we don't (Score:2)
        
        by reanjr ( 588767 ) writes:
        
        If you can figure out where I went wrong, I'd love to hear it: https://youtu.be/U05JrrtVBuk [youtu.be]
        
        Re: (Score:2)
        
        by DamnOregonian ( 963763 ) writes:
        
        I got like 20 minutes in, and it looked like this person was still fighting to get their toolchain installed.
        After that, it looks like they were fighting with the LLM regarding version differences in gradle.
        
        They will have better luck if they download the Android SDK, create a project from there, and edit from that.
        
        Also, I'd clarify that with which you were fighting. Gemini is the LLM.
        Gemini CLI, the coding assistant they were using is an agentic interface to it.
        Agentic CLI coding interfaces are very
        
        Re: (Score:2)
        
        by reanjr ( 588767 ) writes:
        
        That was essentially going to be my next step. Except that I installed Android Studio rather than just the SDK, and worked through several issues getting the Firebase library installed and configured. I didn't even try to use the LLM to do this, since I assumed if it couldn't create an empty project folder, it also wasn't going to be able to properly add dependencies to said project. My intention is to try again now that I have the basic project skeleton setup.
        But just to be clear: the solution for Gemin
        
        Re: (Score:2)
        
        by ceoyoyo ( 59147 ) writes:
        
        To be fair, "tagging images", i.e. supervised learning, is pretty widely supplemented by unsupervised techniques today because it usually works a lot better. He also says things like just scaling up won't work forever, which seems to be true; the current LLM models are not just scaled up versions of previous ones, and the current LLM craze itself is the result of architectural innovation.
        He also says things like "I argued that 'deep learning is hitting a wall'" which suggests he doesn't really understand th
publicity stunt (Score:2)

by nothinginparticular ( 6181282 ) writes:

Well done to openai for scraping all available code from the internet and amalgamating bits of it produce an apparently working program. We should also give credit to be many thousands of real people who manually curate their LLMs day after day. Let us not forget that there is minimal intelligence in current LLMs. Their only intelligence is that they can interpolate between datapoints to give outputs that are somewhere in between them. They have no ability to operate outside of their training data and ther
John Henry died (Score:2)

by rsilvergun ( 571051 ) writes:

Just a reminder that the steam machine won in the end.
Today's John Henry (Score:2)

by spiritplumber ( 1944222 ) writes:

"Captain said to big old John Henry, That old drill keeps a-coming around. Take that steam drill out and start it on that job Let it whop, let it whop that steel on down Let it whop, let it whop that steel on down." – Traditional, "Datalinks" https://paeantosmac.wordpress.... [wordpress.com]
600 minutes WTF (Score:2)

by robi5 ( 1261542 ) writes:

Who the hell comes up with numbers like 600 minutes? Why not 36 000 seconds then?
I can compute without AI that it's 10 hours.
Meaningless stunt is meaningless (Score:2)

by gweihir ( 88907 ) writes:

Real-world software creation follows other rules.
- Re: (Score:2)
  
  by DamnOregonian ( 963763 ) writes:
  
  Cope harder.
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    Well, if you are intent on proving some more that you have really no clue, just continue doing this.
    - Re: (Score:2)
      
      by DamnOregonian ( 963763 ) writes:
      
      Your insecurity is palpable.
      There is no amount of intellectual honesty you will not sacrifice to try to keep the bar just out of reach of LLMs, at least in your own mind.
      
      Just this week, statistics aren't math, and now coding competitions are meaningless stunts that "have different rules" than "real world software creation"
      As a Software Engineer with decades of experience, that one particularly made me LOL.
      
      What happened to you- were you diddled by an LLM?
      - Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        Your insecurity is palpable.
        You are projecting. Your inept attempts at "stalking" me here show a very clear picture. To anybody but you, that is.
        
        Re: (Score:2)
        
        by DamnOregonian ( 963763 ) writes:
        
        Do you think the "I know you are, but what am I" retort is clever? lol
        
        You literally shitpost over every single topic that involves AI. It doesn't talk stalking for us to run into each other.
        Yet another example of just how fucking stupid you are.
        
        Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        Yes, dig yourself deeper. God job. That way you will never grow.
        
        Re: (Score:2)
        
        by DamnOregonian ( 963763 ) writes:
        
        Lol- you just keep expanding your delusion- I love it.
        I have a feeling when you put your fingers in your ears, they touch in the middle.
How good was each solution? (Score:2)

by aegl ( 1041528 ) writes:

Next step would be for someone (or some AI?) to evaluate the solutions on metrics:
1) Performance
2) Maintainability
Without sleep? (Score:2)

by Rosco P. Coltrane ( 209368 ) writes:

Debiak coded for 10 hours on minimal sleep
Is that guy a cat who needs to nap every 2 hours?
FWIW, I once participated in a coding contest at my university in the early 90's that lasted 72 hours (the first prize was a full scholarship, which I didn't get :)) I ran on coffee and speed for the full 72 hours, then collapsed on a couch and slept until someone woke me up to come get my third prize (a Solaris license).
10 hours non-stop coding sounds like a normal day at the office trying to wrap up a project.
- Re: (Score:3)
  
  by DamnOregonian ( 963763 ) writes:
  
  Is that guy a cat who needs to nap every 2 hours?
  It was a 3 day tournament, and he competed every day.
  He estimates he got ~10h of sleep across the 72h.
  I've done 0 in 72 before. To call it unhealthy is an understatement- it's dangerous.
Did the AI really compete? (Score:2)

by Pinky's Brain ( 1158667 ) writes:

I'd like to see a tech report. Unless it was a model with command line tools access, working from the problem description (firewalled so it can only access the contest server for safety of course) then it wasn't actually competing. If some human had to build a bespoke framework around it first, the human doing so was competing.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Sleep (Score:1, Interesting)

Re: (Score:2)

It depends on the challenge (Score:4, Insightful)

Re:It depends on the challenge (Score:4, Interesting)

Re: (Score:3)

Re: (Score:3, Informative)

Re: (Score:2)

Re: It depends on the challenge (Score:2)

Reminds me of old John Henry (Score:3)

Re: (Score:2)

Re: (Score:2)

Competitive coding is not... (Score:3)

Re: (Score:3)

Re: (Score:2)

Humans always beat computers.. until we don't (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: Humans always beat computers.. until we don't (Score:2)

Re: Humans always beat computers.. until we don't (Score:2)

Re: (Score:2)

Re: Humans always beat computers.. until we don't (Score:2)

Re: (Score:2)

Re: Humans always beat computers.. until we don't (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

publicity stunt (Score:2)

John Henry died (Score:2)

Today's John Henry (Score:2)

600 minutes WTF (Score:2)

Meaningless stunt is meaningless (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

How good was each solution? (Score:2)

Without sleep? (Score:2)

Re: (Score:3)

Did the AI really compete? (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals