


China's Moonshot Launches Free AI Model Kimi K2 That Outperforms GPT-4 In Key Benchmarks 24
Chinese AI startup Moonshot AI has released Kimi K2, a trillion-parameter open-source language model that outperforms GPT-4 in key benchmarks with particularly strong performance on coding and autonomous agent tasks. VentureBeat reports: The new model, called Kimi K2, features 1 trillion total parameters with 32 billion activated parameters in a mixture-of-experts architecture. The company is releasing two versions: a foundation model for researchers and developers, and an instruction-tuned variant optimized for chat and autonomous agent applications. "Kimi K2 does not just answer; it acts," the company stated in its announcement blog. "With Kimi K2, advanced agentic intelligence is more open and accessible than ever. We can't wait to see what you build."
The model's standout feature is its optimization for "agentic" capabilities -- the ability to autonomously use tools, write and execute code, and complete complex multi-step tasks without human intervention. In benchmark tests, Kimi K2 achieved 65.8% accuracy on SWE-bench Verified, a challenging software engineering benchmark, outperforming most open-source alternatives and matching some proprietary models. [...] On LiveCodeBench, arguably the most realistic coding benchmark available, Kimi K2 achieved 53.7% accuracy, decisively beating DeepSeek-V3's 46.9% and GPT-4.1's 44.7%. More striking still: it scored 97.4% on MATH-500 compared to GPT-4.1's 92.4%, suggesting Moonshot has cracked something fundamental about mathematical reasoning that has eluded larger, better-funded competitors.
But here's what the benchmarks don't capture: Moonshot is achieving these results with a model that costs a fraction of what incumbents spend on training and inference. While OpenAI burns through hundreds of millions on compute for incremental improvements, Moonshot appears to have found a more efficient path to the same destination. It's a classic innovator's dilemma playing out in real time -- the scrappy outsider isn't just matching the incumbent's performance, they're doing it better, faster, and cheaper.
The model's standout feature is its optimization for "agentic" capabilities -- the ability to autonomously use tools, write and execute code, and complete complex multi-step tasks without human intervention. In benchmark tests, Kimi K2 achieved 65.8% accuracy on SWE-bench Verified, a challenging software engineering benchmark, outperforming most open-source alternatives and matching some proprietary models. [...] On LiveCodeBench, arguably the most realistic coding benchmark available, Kimi K2 achieved 53.7% accuracy, decisively beating DeepSeek-V3's 46.9% and GPT-4.1's 44.7%. More striking still: it scored 97.4% on MATH-500 compared to GPT-4.1's 92.4%, suggesting Moonshot has cracked something fundamental about mathematical reasoning that has eluded larger, better-funded competitors.
But here's what the benchmarks don't capture: Moonshot is achieving these results with a model that costs a fraction of what incumbents spend on training and inference. While OpenAI burns through hundreds of millions on compute for incremental improvements, Moonshot appears to have found a more efficient path to the same destination. It's a classic innovator's dilemma playing out in real time -- the scrappy outsider isn't just matching the incumbent's performance, they're doing it better, faster, and cheaper.
Re:China (Score:4, Funny)
The USA STILL steal intellectual property under the guise of "National security" needs.
So Pot, meet Kettle.
Re: (Score:2, Troll)
They don't have million of expats in China and a totalitarian state to easily squeeze their families though.
Re: (Score:2, Insightful)
Re: (Score:1)
The US citizens who were falsely arrested recently because of zealous ICE enforcement will have their day in court and already have an official paper trail. The billionaires who get taken off the streets in China for some re-education are just taken with no explanation before and after ... when it happens to a nobody in China, no one even hears about it.
Hyperbole vs. actual.
Re: (Score:2)
Educate yourself, you fucking dullard. [wikipedia.org]
Re: (Score:1)
Most 1st world countries have "socialist" policies in Universal healthcare, universal education, etc etc etc but they are also far more democratic, healthier, safer , happier, with better life expectancies than the USA . Trump is rapidly running further to the right WHILE also becoming totalitarian
Re: (Score:2)
The US has a problem with its poorest folks that many other first world countries do not have. However, its middle and well-off match and exceed the rest of the world, respectively.
Judging by your word selection, I'm guessing you're a francophone.
Here's France's income distributed life expectancy. [niussp.org]
Which is better? I suppose that's up to the beholder. If you're going for maximum amount of likely years lived- the US is.
If you're looking for a better place to be poor? Well, that's actuall
Re: (Score:2)
Indeed, however comunism never existed anywhere, even in China. At best, they are totalitarian, and hopfully leaning benevolent today. But I see that since USofA is leaving the stage, there is indeed a vaccum.
Chinese engineers and scientists are smart (Score:4, Insightful)
Attempting to prevent them from acquiring tech is futile and counterproductive
Politicians like to see everything as a race
Warmongers and defense contractors see everything as a threat that requires more military spending
Cooperation would be a better strategy
Re: (Score:2)
Theres this obnoxious myth a lot of people at least subconsciously seem to have that innovation only comes from americans europeans, australians and. .... well you can probably figure the commonality, and it aint english.
We used to accuse the Japanese of only ever stealing tech , we now know better, the japanese where phenomenal innovators until the arse fell out of their economy.
The chinese have been great innovators for long before the wests industrial revolution. Its in the cultural DNA of the people. Ye
You pay for it later... (Score:2)
Re: (Score:2)
Re: (Score:2)
No one knows what GPT-4 really costs to run (Score:2)
Most of their costs could be developing a lot of the basics, which continually diffuse away to other companies and China through ex-employees, requiring a lot more expenses on salary and exploration in training than the competition.
Wonder how much of this is distillation... (Score:4, Insightful)
Not that I care if they are using other companies models to ease costs. You can't inhale the internet, wave your hands about copyright and then complain "IP" when somebody uses your stuff in way you don't like.
If it takes more air out of the AI bubble, all the better, I say.
Re: (Score:2)
One wonders if it identifies as ChatGPT.
Just don't ask about the Tiananmen Square masacre (Score:3, Informative)
I'm sure it's been well trained to ensure you get the correct party-approved information.
Bu seriously I am curious as to how these chinese models react to questions about things the CCP does not want people to talk about. The CCP has a long history of attempting to apply censorship all across the world.
OpenAI (Score:1)
I'd like to take this time to once again laugh at the "open," "non-profit" OpenAI, that took the anti-human route and is now rapidly sinking. Good riddance.
Re: (Score:2)
Every new model is better. That's because they fine-tune them to be better at the benchmarks, and the benchmarks keep adjusting to the new SOTA.
Also, out of curiosity, how are we defining "rapidly sinking?"
I mean, I'm with you on criticizing the bullshit of OpenAI being completely non-open, but they are otherwise still basically top of the pack.