Developers News | Slashdot

Google is Adding More AI Overviews and a New 'AI Mode' To Search (theverge.com) 33

Posted by msmash on Wednesday March 05, 2025 @07:30PM from the aggressive-expansion dept.

India's 'Human Calculator Kid' Shatters 6 World Records In a Single Day (gizmodo.com) 39

Posted by BeauHD on Friday February 21, 2025 @05:26PM from the mental-math dept.

OpenAI Cancels Its o3 AI Model In Favor of a 'Unified' Next-Gen Release 10

Posted by BeauHD on Wednesday February 12, 2025 @05:50PM from the what-to-expect dept.

OpenAI has canceled the release of o3 in favor of a "simplified" product lineup. CEO Sam Altman said in a post on X that, in the coming months, OpenAI will release a model called GPT-5 that "integrates a lot of [OpenAI's] technology," including o3. TechCrunch reports: The company originally said in December that it planned to launch o3 sometime early this year. Just a few weeks ago, Kevin Weil, OpenAI's chief product officer, said in an interview that o3 was on track for a "February-March" launch. "We want to do a better job of sharing our intended roadmap, and a much better job simplifying our product offerings," Altman wrote in the post. "We want AI to 'just work' for you; we realize how complicated our model and product offerings have gotten. We hate the model picker [in ChatGPT] as much as you do and want to return to magic unified intelligence."

Altman also announced that OpenAI plans to offer unlimited chat access to GPT-5 at the "standard intelligence setting," subject to "abuse thresholds," once the model is generally available. (Altman declined to provide more detail on what this setting -- and these abuse thresholds -- entail.) Subscribers to ChatGPT Plus will be able to run GPT-5 at a "higher level of intelligence," Altman said, while ChatGPT Pro subscribers will be able to run GPT-5 at an "even higher level of intelligence."

"These models will incorporate voice, canvas, search, deep research, and more," Altman said, referring to a range of features OpenAI has launched in ChatGPT over the past few months. "[A] top goal for us is to unify [our] models by creating systems that can use all our tools, know when to think for a long time or not, and generally be useful for a very wide range of tasks." Before GPT-5 launches, OpenAI plans to release its GPT-4.5 model, code-named "Orion," in the next several weeks, according to Altman's post on X. Altman says this will be the company's last "non-chain-of-thought model." Unlike o3 and OpenAI's other so-called reasoning models, non-chain-of-thought models tend to be less reliable in domains like math and physics.

Children's Arithmetic Skills Do Not Transfer Between Applied and Academic Mathematics (nature.com) 100

Posted by msmash on Wednesday February 12, 2025 @12:05PM from the closer-look dept.

Researchers Created an Open Rival To OpenAI's o1 'Reasoning' Model for Under $50 23

Posted by msmash on Thursday February 06, 2025 @10:45AM from the pushing-the-limits dept.

'Magical' Efficient-Market Theory Rebuked in Era of Passive Investing (yahoo.com) 57

Posted by msmash on Friday January 31, 2025 @02:40PM from the How-about-that dept.

Could New Linux Code Cut Data Center Energy Use By 30%? (datacenterdynamics.com) 65

Posted by EditorDavid on Saturday January 25, 2025 @07:34PM from the loving-Linux dept.

Two computer scientists at the University of Waterloo in Canada believe changing 30 lines of code in Linux "could cut energy use at some data centers by up to 30 percent," according to the site Data Centre Dynamics.

It's the code that processes packets of network traffic, and Linux "is the most widely used OS for data center servers," according to the article: The team tested their solution's effectiveness and submitted it to Linux for consideration, and the code was published this month as part of Linux's newest kernel, release version 6.13. "All these big companies — Amazon, Google, Meta — use Linux in some capacity, but they're very picky about how they decide to use it," said Martin Karsten [professor of Computer Science in the Waterloo's Math Faculty]. "If they choose to 'switch on' our method in their data centers, it could save gigawatt hours of energy worldwide. Almost every single service request that happens on the Internet could be positively affected by this."

The University of Waterloo is building a green computer server room as part of its new mathematics building, and Karsten believes sustainability research must be a priority for computer scientists. "We all have a part to play in building a greener future," he said. The Linux Foundation, which oversees the development of the Linux OS, is a founder member of the Green Software Foundation, an organization set up to look at ways of developing "green software" — code that reduces energy consumption.
Karsten "teamed up with Joe Damato, distinguished engineer at Fastly" to develop the 30 lines of code, according to an announcement from the university. "The Linux kernel code addition developed by Karsten and Damato was based on research published in ACM SIGMETRICS Performance Evaluation Review" (by Karsten and grad student Peter Cai).

Their paper "reviews the performance characteristics of network stack processing for communication-heavy server applications," devising an "indirect methodology" to "identify and quantify the direct and indirect costs of asynchronous hardware interrupt requests (IRQ) as a major source of overhead...

"Based on these findings, a small modification of a vanilla Linux system is devised that improves the efficiency and performance of traditional kernel-based networking significantly, resulting in up to 45% increased throughput..."

Cutting-Edge Chinese 'Reasoning' Model Rivals OpenAI o1 55

Posted by BeauHD on Tuesday January 21, 2025 @05:40PM from the would-you-look-at-that dept.

An anonymous reader quotes a report from Ars Technica: On Monday, Chinese AI lab DeepSeek released its new R1 model family under an open MIT license, with its largest version containing 671 billion parameters. The company claims the model performs at levels comparable to OpenAI's o1 simulated reasoning (SR) model on several math and coding benchmarks. Alongside the release of the main DeepSeek-R1-Zero and DeepSeek-R1 models, DeepSeek published six smaller "DeepSeek-R1-Distill" versions ranging from 1.5 billion to 70 billion parameters. These distilled models are based on existing open source architectures like Qwen and Llama, trained using data generated from the full R1 model. The smallest version can run on a laptop, while the full model requires far more substantial computing resources.

The releases immediately caught the attention of the AI community because most existing open-weights models -- which can often be run and fine-tuned on local hardware -- have lagged behind proprietary models like OpenAI's o1 in so-called reasoning benchmarks. Having these capabilities available in an MIT-licensed model that anyone can study, modify, or use commercially potentially marks a shift in what's possible with publicly available AI models. "They are SO much fun to run, watching them think is hilarious," independent AI researcher Simon Willison told Ars in a text message. Willison tested one of the smaller models and described his experience in a post on his blog: "Each response starts with a ... pseudo-XML tag containing the chain of thought used to help generate the response," noting that even for simple prompts, the model produces extensive internal reasoning before output. Although the benchmarks have yet to be independently verified, DeepSeek reports that R1 outperformed OpenAI's o1 on AIME (a mathematical reasoning test), MATH-500 (a collection of word problems), and SWE-bench Verified (a programming assessment tool).

TechCrunch notes that three Chinese labs -- DeepSeek, Alibaba, and Moonshot AI's Kimi, have released models that match o1's capabilities.

AI Benchmarking Organization Criticized For Waiting To Disclose Funding from OpenAI (techcrunch.com) 6

Posted by msmash on Monday January 20, 2025 @04:50PM from the not-a-good-look dept.

OpenAI's AI Reasoning Model 'Thinks' In Chinese Sometimes, No One Really Knows Why 104

Posted by BeauHD on Tuesday January 14, 2025 @08:45PM from the what's-going-on dept.

OpenAI's "reasoning" AI model, o1, has exhibited a puzzling behavior of "thinking" in Chinese, Persian, or some other language -- "even when asked a question in English," reports TechCrunch. While the exact cause remains unclear, as OpenAI has yet to provide an explanation, AI experts have proposed a few theories. From the report: Several on X, including Hugging Face CEO Clement Delangue, alluded to the fact that reasoning models like o1 are trained on datasets containing a lot of Chinese characters. Ted Xiao, a researcher at Google DeepMind, claimed that companies including OpenAI use third-party Chinese data labeling services, and that o1 switching to Chinese is an example of "Chinese linguistic influence on reasoning."

"[Labs like] OpenAI and Anthropic utilize [third-party] data labeling services for PhD-level reasoning data for science, math, and coding," Xiao wrote in a post on X. "[F]or expert labor availability and cost reasons, many of these data providers are based in China." [...] Other experts don't buy the o1 Chinese data labeling hypothesis, however. They point out that o1 is just as likely to switch to Hindi, Thai, or a language other than Chinese while teasing out a solution.

Other experts don't buy the o1 Chinese data labeling hypothesis, however. They point out that o1 is just as likely to switch to Hindi, Thai, or a language other than Chinese while teasing out a solution. Rather, these experts say, o1 and other reasoning models might simply be using languages they find most efficient to achieve an objective (or hallucinating). "The model doesn't know what language is, or that languages are different," Matthew Guzdial, an AI researcher and assistant professor at the University of Alberta, told TechCrunch. "It's all just text to it."

Tiezhen Wang, a software engineer at AI startup Hugging Face, agrees with Guzdial that reasoning models' language inconsistencies may be explained by associations the models made during training. "By embracing every linguistic nuance, we expand the model's worldview and allow it to learn from the full spectrum of human knowledge," Wang wrote in a post on X. "For example, I prefer doing math in Chinese because each digit is just one syllable, which makes calculations crisp and efficient. But when it comes to topics like unconscious bias, I automatically switch to English, mainly because that's where I first learned and absorbed those ideas."

[...] Luca Soldaini, a research scientist at the nonprofit Allen Institute for AI, cautioned that we can't know for certain. "This type of observation on a deployed AI system is impossible to back up due to how opaque these models are," they told TechCrunch. "It's one of the many cases for why transparency in how AI systems are built is fundamental."

Rational or Not? This Basic Math Question Took Decades To Answer. (quantamagazine.org) 49

Posted by msmash on Friday January 10, 2025 @10:45AM from the irrational-behavior dept.

OpenAI's Next Big AI Effort GPT-5 is Behind Schedule and Crazy Expensive (msn.com) 120

Posted by EditorDavid on Sunday December 22, 2024 @04:34AM from the underclocking dept.

"From the moment GPT-4 came out in March 2023, OpenAI has been working on GPT-5..." reports the Wall Street Journal. [Alternate URL here.] But "OpenAI's new artificial-intelligence project is behind schedule and running up huge bills. It isn't clear when — or if — it'll work."

"There may not be enough data in the world to make it smart enough." OpenAI's closest partner and largest investor, Microsoft, had expected to see the new model around mid-2024, say people with knowledge of the matter. OpenAI has conducted at least two large training runs, each of which entails months of crunching huge amounts of data, with the goal of making Orion smarter. Each time, new problems arose and the software fell short of the results researchers were hoping for, people close to the project say... [And each one costs around half a billion dollars in computing costs.]

The $157 billion valuation investors gave OpenAI in October is premised in large part on [CEO Sam] Altman's prediction that GPT-5 will represent a "significant leap forward" in all kinds of subjects and tasks.... It's up to company executives to decide whether the model is smart enough to be called GPT-5 based in large part on gut feelings or, as many technologists say, "vibes."

So far, the vibes are off...
OpenAI wants to use its new model to generate high-quality synthetic data for training, according to the article. But OpenAI's researchers also "concluded they needed more diverse, high-quality data," according to the article, since "The public internet didn't have enough, they felt." OpenAI's solution was to create data from scratch. It is hiring people to write fresh software code or solve math problems for Orion to learn from. [And also theoretical physics experts] The workers, some of whom are software engineers and mathematicians, also share explanations for their work with Orion... Having people explain their thinking deepens the value of the newly created data. It's more language for the LLM to absorb; it's also a map for how the model might solve similar problems in the future... The process is painfully slow. GPT-4 was trained on an estimated 13 trillion tokens. A thousand people writing 5,000 words a day would take months to produce a billion tokens.

OpenAI's already-difficult task has been complicated by internal turmoil and near-constant attempts by rivals to poach its top researchers, sometimes by offering them millions of dollars... More than two dozen key executives, researchers and longtime employees have left OpenAI this year, including co-founder and Chief Scientist Ilya Sutskever and Chief Technology Officer Mira Murati. This past Thursday, Alec Radford, a widely admired researcher who served as lead author on several of OpenAI's scientific papers, announced his departure after about eight years at the company...

OpenAI isn't the only company worrying that progress has hit a wall. Across the industry, a debate is raging over whether improvement in AIs is starting to plateau. Sutskever, who recently co-founded a new AI firm called Safe Superintelligence or SSI, declared at a recent AI conference that the age of maximum data is over. "Data is not growing because we have but one internet," he told a crowd of researchers, policy experts and scientists. "You can even go as far as to say that data is the fossil fuel of AI."

And that fuel was starting to run out.

OpenAI Unveils o3, a Smarter AI Model With Improved Reasoning Skills (openai.com) 27

Posted by msmash on Friday December 20, 2024 @02:36PM from the AGI-race dept.

Google Releases Its Own 'Reasoning' AI Model (techcrunch.com) 5

Posted by BeauHD on Thursday December 19, 2024 @07:30PM from the rise-of-the-machines dept.

Journal That Published Faulty Black Plastic Study Removed From Science Index (arstechnica.com) 29

Posted by msmash on Thursday December 19, 2024 @03:42PM from the closer-look dept.

Huge Math Error Corrected In Black Plastic Study (arstechnica.com) 105

Posted by BeauHD on Tuesday December 17, 2024 @03:00AM from the oops dept.

Ars Technica's Beth Mole reports: Editors of the environmental chemistry journal Chemosphere have posted an eye-catching correction to a study reporting toxic flame retardants from electronics wind up in some household products made of black plastic, including kitchen utensils. The study sparked a flurry of media reports a few weeks ago that urgently implored people to ditch their kitchen spatulas and spoons. Wirecutter even offered a buying guide for what to replace them with. The correction, posted Sunday, will likely take some heat off the beleaguered utensils. The authors made a math error that put the estimated risk from kitchen utensils off by an order of magnitude.

Specifically, the authors estimated that if a kitchen utensil contained middling levels of a key toxic flame retardant (BDE-209), the utensil would transfer 34,700 nanograms of the contaminant a day based on regular use while cooking and serving hot food. The authors then compared that estimate to a reference level of BDE-209 considered safe by the Environmental Protection Agency. The EPA's safe level is 7,000 ng -- per kilogram of body weight -- per day, and the authors used 60 kg as the adult weight (about 132 pounds) for their estimate. So, the safe EPA limit would be 7,000 multiplied by 60, yielding 420,000 ng per day. That's 12 times more than the estimated exposure of 34,700 ng per day. However, the authors missed a zero and reported the EPA's safe limit as 42,000 ng per day for a 60 kg adult. The error made it seem like the estimated exposure was nearly at the safe limit, even though it was actually less than a tenth of the limit. "We regret this error and have updated it in our manuscript," the authors said in a correction.

"This calculation error does not affect the overall conclusion of the paper," the correction reads. The study maintains that flame retardants "significantly contaminate" the plastic products, which have "high exposure potential."

Microsoft Announces Phi-4 AI Model Optimized for Accuracy and Complex Reasoning (computerworld.com) 31

Posted by EditorDavid on Monday December 16, 2024 @01:34AM from the Bing-bot's-brother dept.

An anonymous reader shared this report from Computerworld: Microsoft has announced Phi-4 — a new AI model with 14 billion parameters — designed for complex reasoning tasks, including mathematics. Phi-4 excels in areas such as STEM question-answering and advanced problem-solving, surpassing similar models in performance. Phi-4, part of the Phi small language models (SLMs), is currently available on Azure AI Foundry under the Microsoft Research License Agreement and will launch on Hugging Face [this] week, the company said in a blog post.

The company emphasized that Phi-4's design focuses on improving accuracy through enhanced training and data curation.... "Phi-4 outperforms comparable and even larger models on tasks like mathematical reasoning, thanks to a training process that combines synthetic datasets, curated organic data, and innovative post-training techniques," Microsoft said in its announcement. The model leverages a new training approach that integrates multi-agent prompting workflows and data-driven innovations to enhance its reasoning efficiency. The accompanying report highlights that Phi-4 balances size and performance, challenging the industry norm of prioritizing larger models... Phi-4 achieved a score of 80.4 on the MATH benchmark and has surpassed other systems in problem-solving and reasoning evaluations, according to the technical report accompanying the release. This makes it particularly appealing for domain-specific applications requiring precision, like scientific computation or advanced STEM problem-solving.

Microsoft emphasized its commitment to ethical AI development, integrating advanced safety measures into Phi-4. The model benefits from Azure AI Content Safety features such as prompt shields, protected material detection, and real-time application monitoring. These features, Microsoft explained, help users address risks like adversarial prompts and data security threats during AI deployment. The company also reiterated that Azure AI Foundry, the platform hosting Phi-4, offers tools to measure and mitigate AI risks. Developers using the platform can evaluate and improve their models through built-in metrics and custom safety evaluations, Microsoft added... With Phi-4, Microsoft continues to evolve its AI offerings while promoting responsible use through robust safeguards. Industry watchers will observe how this approach shapes adoption in critical fields where reasoning and security are paramount.

Are People Starting to Love Self-Driving Robotaxis? (marketplace.org) 106

Posted by EditorDavid on Sunday December 15, 2024 @01:34PM from the carried-away dept.

"In a tiny handful of places..." Wired wrote last month, "you can find yourself flanked by taxis with no one in the drivers' seats." But they added that "Granted, practically everyone has been numbed by the hype cycle."

Wired's response? "[P]ile a few of us into an old-fashioned, human-piloted hired car, then follow a single Waymo robotaxi wherever it goes for a whole workday" to "study its movements, its relationship to life on the streets, its whole self-driving gestalt. We'll interview as many of its passengers as will speak to us, and observe it through the eyes of the kind of human driver it's designed to replace."

This week Wired senior editor John Gravios discussed the experience on the business-news radio show Marketplace (with Marketplace host Kai Ryssdal): Ryssdal: What kinds of reactions did you get from people once you track them down, what did they say about their experience in this driverless car?

Gravios:It was pretty uniform and impressive how much people just love it. They just like the experience of the drive, I guess it's a little bit less herky-jerky than a human driver, but I think a lot of it just comes down to people are just kind of relieved not to have to talk to somebody else, as as sad as that is...

Ryssdal: Tell me about Gabe, your Uber driver, and his thoughts on this whole thing, because that was super interesting.

Gravios: So Gabe, this is a guy whose labor is directly at stake. You know, he's a guy whose labor is going to be replaced by a Waymo. He's had 30 years of experience as a professional driver, first as a taxi driver. He even organized a taxi driver strike in the days before Uber. His first, I think his prejudice with Waymo is having shared the road with them sort of sporadically, he thought of them as kind of dopey, rule-following, frustrating vehicles to share the road with. But over the course of the day, he started to recognize that the Waymo was driving a lot like a taxi driver. The Waymo was doing things that were aggressive, that are exactly the kinds of things that a taxi driver is trained to be aggressive with and doing things that were cautious that are exactly the kinds of things that taxi drivers are trained to be cautious with.

Ryssdal: Can we talk unit economics here? According to the math from a study you guys' cite, Waymo is not making a whole lot of money per vehicle, right? And eventually they're going to scale, and it's going to work out, but for the moment, even though they've gotten 11 billion-something-dollars, they're not turning a whole lot of profit here.

Gravios: Yeah, that's a big question, and the math is, even that study, based on a lot of guesswork. It's really hard to say what the unit economics are. What we can say is that the ridership rates are going up so fast that that study is already well out of date. When we were doing our chase, I think the monthly ridership for Waymo was 100,000 rides a month. By October, it was already 150,000 rides a month. So, the economics are just shifting under our feet a lot.

Harvard Is Releasing a Massive Free AI Training Dataset Funded by OpenAI and Microsoft (wired.com) 27

Posted by msmash on Thursday December 12, 2024 @03:35AM from the moving-forward dept.

OpenAI Releases 'Smarter, Faster' ChatGPT - Plus $200-a-Month Subscriptions for 'Even-Smarter Mode' (venturebeat.com) 64

Posted by EditorDavid on Thursday December 05, 2024 @10:34PM from the machines-learning dept.

Wednesday OpenAI CEO Sam Altman announced "12 Days of OpenAI," promising that "Each weekday, we will have a livestream with a launch or demo..." And sure enough, today he announced the launch of two things:

- "o1, the smartest model in the world. Smarter, faster, and more features (e.g. multimodality) than o1-preview. Live in ChatGPT now, coming to API soon."

- "ChatGPT Pro. $200/month. Unlimited usage and even-smarter mode for using o1. More benefits to come!"

Altman added this update later: For extra clarity: o1 is available in our plus tier, for $20/month. With the new pro tier ($200/month), it can think even harder for the hardest problems. Most users will be very happy with o1 in the plus tier!
VentureBeat points out that subscribers "also gain access to GPT-4o, known for its advanced natural language generation capabilities, and the Advanced Voice feature for speech-based interactions."

And even for non-subscribers, ChatGPT can now also analyze images, points out VentureBeat, "a hugely helpful feature upgrade as it enables users to upload photos and have the AI chatbot respond to them, giving them detailed plans on how to build a birdhouse entirely from a single candid photo of one, for one fun example." In another, potentially more serious and impressive example, it is now capable of helping design data centers from sketches... o1 represents a significant evolution in reasoning model capabilities, including better handling of complex tasks, image-based reasoning, and enhanced accuracy. Enterprise and Education users will gain access to the model next week... OpenAI's updates also include safety enhancements, with the o1-preview scoring 84 on a rigorous safety test, compared to 22 for its predecessor...

To encourage the use of AI in societal-benefit fields, OpenAI has announced the ChatGPT Pro Grant Program. The initiative will initially award 10 grants to leading medical researchers, providing free access to ChatGPT Pro tools.
In a video Altman displays graphs showing o1 dramatically outperforms gpt4o on math questions, on competition coding at CodeForces, and on PhD-level science questions.

2012	US Military Designates Julian Assange an "Enemy of State"	805 comments
2011	Conflict Between Occupy Wall Street Protestors and NYPD Escalating	961 comments
2007	Microsoft Should Abandon Vista?	1119 comments
2002	Ready, Steady, Evolve	911 comments
2001	Star Trek: Enterprise Reactions?	1688 comments

Slashdot Top Deals