

Anthropic Launches the World's First 'Hybrid Reasoning' AI Model (wired.com) 18
An anonymous reader quotes a report from Wired: Anthropic, an artificial intelligence company founded by exiles from OpenAI, has introduced the first AI model that can produce either conventional output or a controllable amount of "reasoning" needed to solve more grueling problems. Anthropic says the new hybrid model, called Claude 3.7, will make it easier for users and developers to tackle problems that require a mix of instinctive output and step-by-step cogitation. "The [user] has a lot of control over the behavior -- how long it thinks, and can trade reasoning and intelligence with time and budget," says Michael Gerstenhaber, product lead, AI platform at Anthropic.
Claude 3.7 also features a new "scratchpad" that reveals the model's reasoning process. A similar feature proved popular with theChinese AI model DeepSeek. It can help a user understand how a model is working over a problem in order to modify or refine prompts. Dianne Penn, product lead of research at Anthropic, says the scratchpad is even more helpful when combined with the ability to ratchet a model's "reasoning" up and down. If, for example, the model struggles to break down a problem correctly, a user can ask it to spend more time working on it. [...]
Penn says that Claude's reasoning mode received additional data on business applications including writing and fixing code, using computers, and answering complex legal questions. "The things that we made improvements on are ... technical subjects or subjects which require long reasoning," Penn says. "What we have from our customers is a lot of interest in deploying our models into their actual workloads." Anthropic says that Claude 3.7 is especially good at solving coding problems that require step-by-step reasoning, outscoring OpenAI's o1 on some benchmarks like SWE-bench. The company is today releasing a new tool, called Claude Code, specifically designed for this kind of AI-assisted coding. "The model is already good at coding," Penn says. But "additional thinking would be good for cases that might require very complex planning -- say you're looking at an extremely large code base for a company."
Claude 3.7 also features a new "scratchpad" that reveals the model's reasoning process. A similar feature proved popular with theChinese AI model DeepSeek. It can help a user understand how a model is working over a problem in order to modify or refine prompts. Dianne Penn, product lead of research at Anthropic, says the scratchpad is even more helpful when combined with the ability to ratchet a model's "reasoning" up and down. If, for example, the model struggles to break down a problem correctly, a user can ask it to spend more time working on it. [...]
Penn says that Claude's reasoning mode received additional data on business applications including writing and fixing code, using computers, and answering complex legal questions. "The things that we made improvements on are ... technical subjects or subjects which require long reasoning," Penn says. "What we have from our customers is a lot of interest in deploying our models into their actual workloads." Anthropic says that Claude 3.7 is especially good at solving coding problems that require step-by-step reasoning, outscoring OpenAI's o1 on some benchmarks like SWE-bench. The company is today releasing a new tool, called Claude Code, specifically designed for this kind of AI-assisted coding. "The model is already good at coding," Penn says. But "additional thinking would be good for cases that might require very complex planning -- say you're looking at an extremely large code base for a company."
Sounds Great! (Score:1)
Like all of them, it sounds great. Does it live up to the hype though?
People are starting to distrust the hype.
Re: (Score:2)
Re: (Score:3)
Indeed. The fact of the matter is that this is a "constant delivery scam", where the next version is always promised to finally make it worthwhile, but it never does.
Re: (Score:2)
It's great. I've been using it a lot today.
Re: (Score:2)
I was a bit disappointed. The extra reasoning steps are nice, and it has Claude's strengths of taking the time to lay out what existing code is doing before attacking the problem (and now it does more work on that) - Claude is usually a very strong model. But it still has Claude's finetune's flaws of occasionally "wallpapering over problems". Last night I had an issue where one custom logging function wasn't being imported from one python file of mine, and Claude REALLY wanted to just put it in a try-cat
Re: (Score:2)
And, based on experience from these sorts of threads, whenever anyone who hasn't done much coding in AI IDEs like Cursor, let me preemptively correct:
1) This is a discussion of failure cases. The normal case is superb. It'll often do tasks that would take me hours or even days in a matter of minutes. I'm currently training a font-recognition ViT, for example, and I'll paste in reems of debugging info, and it'll figure out what's going on with the internal variables to cause poor convergence, trace through
Re: (Score:2, Insightful)
People are starting to distrust the hype.
While true, smart people (a small minority) have been distrusting it all along.
Did they just say their model is garbage? (Score:1)
Because I think that's what they said.
Perplexity (Score:2)
Funny thing... (Score:3, Interesting)
LLMs can neither do "instinctive" nor "cognition" or "reasoning". All they can do is statistics. While using statistics several times can cut down on errors, it is not assured to do. And it certainly does not lead to insight.
Re: (Score:2)
That is not how LLMs work. You are thinking of Markov models. LLMs are not not Markov models. Not even by the most pedantic description that would rope in even humans if not for quantum uncertainty (they're non-deterministic from a given starting state, thanks to Flash Attention). They do not work by "statistics". Randomness doesn't even come into the picture until the softmax function, which serves as a way to round from a conceptual state partially converged to linguistic
Re: (Score:1)
You are thinking of Markov models.
Nope. That is not even in the rough general area of LLMs. I know a bit more about this tech.
Re: (Score:2)
Ah, the deranged LLM fanbois, modding anybody down that does not cheer for their fetish.
Well, here is news for you: Use LLMs all you like, you will still suck. In fact, you will probably suck more.
Re: (Score:2)
Your description absolutely was of Markov models.
Turns out capturing statistics of human knowledge (Score:2)
captures and represents the semantics of the entities and relationships (including situations) which the humans are writing about.
LLMs are actually learning some kind of amalgamated and averaged version of the collective human knowledge base about the world. In particular, the aspects of the world that we think are important to express in communications to others.
LLMs are thus "borrowing" our disc
Re: (Score:2)
I am not into "belief". That seems to be more your thing.
I, on the other hand, have followed AI development for about 35 years now and I am a PhD level CS type. I, unlike you, can see what is going on.
Whereas "Reasoning" = Random(x); (Score:2)