Google

Google is Adding More AI Overviews and a New 'AI Mode' To Search (theverge.com) 33

Google announced Wednesday it is expanding its AI Overviews to more query types and users worldwide, including those not logged into Google accounts, while introducing a new "AI Mode" chatbot feature. AI Mode, which resembles competitors like Perplexity or ChatGPT Search, will initially be limited to Google One AI Premium subscribers who enable it through the Labs section of Search.

The feature delivers AI-generated answers with supporting links interspersed throughout, powered by Google's search index. "What we're finding from people who are using AI Overviews is that they're really bringing different kinds of questions to Google," said Robby Stein, VP of product on the Search team. "They're more complex questions, that may have been a little bit harder before." Google is also upgrading AI Overviews with its Gemini 2.0 model, which Stein says will improve responses for math, coding and reasoning-based queries.
Math

India's 'Human Calculator Kid' Shatters 6 World Records In a Single Day (gizmodo.com) 39

An anonymous reader quotes a report from Gizmodo: Fourteen-year-old Aaryan Shukla cruised through six mental math calculation world records in a single day, according to a Guinness World Records statement published on February 12, earning the well-deserved nickname, "human calculator kid." Specifically, it took Shukla:

- 30.9 seconds to mentally add 100 four-digit numbers
- One minute and 9.68 seconds to mentally add 200 four-digit numbers
- 18.71 seconds to mentally add 50 five-digit numbers
- Five minutes and 42 seconds to mentally divide a 20-digit number by a ten-digit number ten times
- 51.69 seconds to mentally multiply two five-digit numbers ten times
- Two minutes and 35.41 seconds to mentally multiply two eight-digit numbers ten times

According to the statement, these are among the most difficult mental calculation world records ever attempted. Shukla's frankly mind-boggling achievement also comes in the wake of another world record he broke in April 2024 at the age of 13: fastest time to mentally add 50 five-digit numbers. It took him just 25.19 seconds. That's an addition every half a second. I wouldn't be surprised if students seeking "shortcuts" in their math homework started phoning up Shukla instead of reaching for their ChatGPT browser tab.
Guinness World Records published a video about Shukla's accomplishments on YouTube.
AI

OpenAI Cancels Its o3 AI Model In Favor of a 'Unified' Next-Gen Release 10

OpenAI has canceled the release of o3 in favor of a "simplified" product lineup. CEO Sam Altman said in a post on X that, in the coming months, OpenAI will release a model called GPT-5 that "integrates a lot of [OpenAI's] technology," including o3. TechCrunch reports: The company originally said in December that it planned to launch o3 sometime early this year. Just a few weeks ago, Kevin Weil, OpenAI's chief product officer, said in an interview that o3 was on track for a "February-March" launch. "We want to do a better job of sharing our intended roadmap, and a much better job simplifying our product offerings," Altman wrote in the post. "We want AI to 'just work' for you; we realize how complicated our model and product offerings have gotten. We hate the model picker [in ChatGPT] as much as you do and want to return to magic unified intelligence."

Altman also announced that OpenAI plans to offer unlimited chat access to GPT-5 at the "standard intelligence setting," subject to "abuse thresholds," once the model is generally available. (Altman declined to provide more detail on what this setting -- and these abuse thresholds -- entail.) Subscribers to ChatGPT Plus will be able to run GPT-5 at a "higher level of intelligence," Altman said, while ChatGPT Pro subscribers will be able to run GPT-5 at an "even higher level of intelligence."

"These models will incorporate voice, canvas, search, deep research, and more," Altman said, referring to a range of features OpenAI has launched in ChatGPT over the past few months. "[A] top goal for us is to unify [our] models by creating systems that can use all our tools, know when to think for a long time or not, and generally be useful for a very wide range of tasks." Before GPT-5 launches, OpenAI plans to release its GPT-4.5 model, code-named "Orion," in the next several weeks, according to Altman's post on X. Altman says this will be the company's last "non-chain-of-thought model." Unlike o3 and OpenAI's other so-called reasoning models, non-chain-of-thought models tend to be less reliable in domains like math and physics.
Math

Children's Arithmetic Skills Do Not Transfer Between Applied and Academic Mathematics (nature.com) 100

Children working in India's fruit and vegetable markets can perform complex mental calculations with ease, yet struggle with basic written math tests that determine their academic future, according to new research that raises troubling questions about mathematics education worldwide.

The study, published in Nature, reveals how traditional education systems are failing to tap into the mathematical talents of students who develop practical skills outside the classroom, particularly those from lower-income families. MIT economist Abhijit Banerjee, who grew up watching young market vendors deftly handle complicated transactions, led the research. His team found that while these children could rapidly perform mental arithmetic, they performed poorly on standard written assessments like long division problems.

The findings come at a critical moment when mathematics education must evolve to meet modern demands, incorporating data literacy and computational skills alongside traditional mathematics. The research points to systemic issues, including a global shortage of trained mathematics teachers and assessment systems that reward memorization over reasoning. Without addressing these challenges, researchers warn, naturally talented students from disadvantaged backgrounds may never reach their potential in fields like research, entrepreneurship, or teaching.
AI

Researchers Created an Open Rival To OpenAI's o1 'Reasoning' Model for Under $50 23

AI researchers at Stanford and the University of Washington were able to train an AI "reasoning" model for under $50 in cloud compute credits, according to a research paper. From a report: The model, known as s1, performs similarly to cutting-edge reasoning models, such as OpenAI's o1 and DeepSeek's R1, on tests measuring math and coding abilities. The s1 model is available on GitHub, along with the data and code used to train it.

The team behind s1 said they started with an off-the-shelf base model, then fine-tuned it through distillation, a process to extract the "reasoning" capabilities from another AI model by training on its answers. The researchers said s1 is distilled from one of Google's reasoning models, Gemini 2.0 Flash Thinking Experimental. Distillation is the same approach Berkeley researchers used to create an AI reasoning model for around $450 last month.
The Almighty Buck

'Magical' Efficient-Market Theory Rebuked in Era of Passive Investing (yahoo.com) 57

An anonymous reader shares a report: At first blush, stock trading this week is hardly a paragon of the market-efficiency theory, an oft-romanticized idea in Economics 101. After all, big equity gauges plunged on Monday, spurred by fears of an AI model released a week earlier, before swiftly rebounding. A fresh academic paper suggests the rise of passive investing may be fueling these kind of fragile market moves.

According to a study to be published in the prestigious American Economic Review, evidence is building that active managers are slow to scoop up stocks en masse when prices move away from their intrinsic worth. Thanks to this lethargic trading behavior and the relentless boom in benchmark-tracking index funds, the impact of each trade on prices gets amplified, explaining how sell orders, like on Monday perhaps, can induce broader equity gyrations. As a result, the financial landscape is proving less dynamic and more volatile in the era of Big Passive, according to authors at the UCLA Anderson School of Management, the Stockholm School of Economics and the University of Minnesota Carlson School of Management.

Power

Could New Linux Code Cut Data Center Energy Use By 30%? (datacenterdynamics.com) 65

Two computer scientists at the University of Waterloo in Canada believe changing 30 lines of code in Linux "could cut energy use at some data centers by up to 30 percent," according to the site Data Centre Dynamics.

It's the code that processes packets of network traffic, and Linux "is the most widely used OS for data center servers," according to the article: The team tested their solution's effectiveness and submitted it to Linux for consideration, and the code was published this month as part of Linux's newest kernel, release version 6.13. "All these big companies — Amazon, Google, Meta — use Linux in some capacity, but they're very picky about how they decide to use it," said Martin Karsten [professor of Computer Science in the Waterloo's Math Faculty]. "If they choose to 'switch on' our method in their data centers, it could save gigawatt hours of energy worldwide. Almost every single service request that happens on the Internet could be positively affected by this."

The University of Waterloo is building a green computer server room as part of its new mathematics building, and Karsten believes sustainability research must be a priority for computer scientists. "We all have a part to play in building a greener future," he said. The Linux Foundation, which oversees the development of the Linux OS, is a founder member of the Green Software Foundation, an organization set up to look at ways of developing "green software" — code that reduces energy consumption.

Karsten "teamed up with Joe Damato, distinguished engineer at Fastly" to develop the 30 lines of code, according to an announcement from the university. "The Linux kernel code addition developed by Karsten and Damato was based on research published in ACM SIGMETRICS Performance Evaluation Review" (by Karsten and grad student Peter Cai).

Their paper "reviews the performance characteristics of network stack processing for communication-heavy server applications," devising an "indirect methodology" to "identify and quantify the direct and indirect costs of asynchronous hardware interrupt requests (IRQ) as a major source of overhead...

"Based on these findings, a small modification of a vanilla Linux system is devised that improves the efficiency and performance of traditional kernel-based networking significantly, resulting in up to 45% increased throughput..."
AI

Cutting-Edge Chinese 'Reasoning' Model Rivals OpenAI o1 55

An anonymous reader quotes a report from Ars Technica: On Monday, Chinese AI lab DeepSeek released its new R1 model family under an open MIT license, with its largest version containing 671 billion parameters. The company claims the model performs at levels comparable to OpenAI's o1 simulated reasoning (SR) model on several math and coding benchmarks. Alongside the release of the main DeepSeek-R1-Zero and DeepSeek-R1 models, DeepSeek published six smaller "DeepSeek-R1-Distill" versions ranging from 1.5 billion to 70 billion parameters. These distilled models are based on existing open source architectures like Qwen and Llama, trained using data generated from the full R1 model. The smallest version can run on a laptop, while the full model requires far more substantial computing resources.

The releases immediately caught the attention of the AI community because most existing open-weights models -- which can often be run and fine-tuned on local hardware -- have lagged behind proprietary models like OpenAI's o1 in so-called reasoning benchmarks. Having these capabilities available in an MIT-licensed model that anyone can study, modify, or use commercially potentially marks a shift in what's possible with publicly available AI models. "They are SO much fun to run, watching them think is hilarious," independent AI researcher Simon Willison told Ars in a text message. Willison tested one of the smaller models and described his experience in a post on his blog: "Each response starts with a ... pseudo-XML tag containing the chain of thought used to help generate the response," noting that even for simple prompts, the model produces extensive internal reasoning before output.
Although the benchmarks have yet to be independently verified, DeepSeek reports that R1 outperformed OpenAI's o1 on AIME (a mathematical reasoning test), MATH-500 (a collection of word problems), and SWE-bench Verified (a programming assessment tool).

TechCrunch notes that three Chinese labs -- DeepSeek, Alibaba, and Moonshot AI's Kimi, have released models that match o1's capabilities.
AI

AI Benchmarking Organization Criticized For Waiting To Disclose Funding from OpenAI (techcrunch.com) 6

An anonymous reader shares a report: An organization developing math benchmarks for AI didn't disclose that it had received funding from OpenAI until relatively recently, drawing allegations of impropriety from some in the AI community.

Epoch AI, a nonprofit primarily funded by Open Philanthropy, a research and grantmaking foundation, revealed on December 20 that OpenAI had supported the creation of FrontierMath. FrontierMath, a test with expert-level problems designed to measure an AI's mathematical skills, was one of the benchmarks OpenAI used to demo its upcoming flagship AI, o3.

In a post on the forum LessWrong, a contractor for Epoch AI going by the username "Meemi" says that many contributors to the FrontierMath benchmark weren't informed of OpenAI's involvement until it was made public. "The communication about this has been non-transparent," Meemi wrote. "In my view Epoch AI should have disclosed OpenAI funding, and contractors should have transparent information about the potential of their work being used for capabilities, when choosing whether to work on a benchmark."

AI

OpenAI's AI Reasoning Model 'Thinks' In Chinese Sometimes, No One Really Knows Why 104

OpenAI's "reasoning" AI model, o1, has exhibited a puzzling behavior of "thinking" in Chinese, Persian, or some other language -- "even when asked a question in English," reports TechCrunch. While the exact cause remains unclear, as OpenAI has yet to provide an explanation, AI experts have proposed a few theories. From the report: Several on X, including Hugging Face CEO Clement Delangue, alluded to the fact that reasoning models like o1 are trained on datasets containing a lot of Chinese characters. Ted Xiao, a researcher at Google DeepMind, claimed that companies including OpenAI use third-party Chinese data labeling services, and that o1 switching to Chinese is an example of "Chinese linguistic influence on reasoning."

"[Labs like] OpenAI and Anthropic utilize [third-party] data labeling services for PhD-level reasoning data for science, math, and coding," Xiao wrote in a post on X. "[F]or expert labor availability and cost reasons, many of these data providers are based in China." [...] Other experts don't buy the o1 Chinese data labeling hypothesis, however. They point out that o1 is just as likely to switch to Hindi, Thai, or a language other than Chinese while teasing out a solution.

Other experts don't buy the o1 Chinese data labeling hypothesis, however. They point out that o1 is just as likely to switch to Hindi, Thai, or a language other than Chinese while teasing out a solution. Rather, these experts say, o1 and other reasoning models might simply be using languages they find most efficient to achieve an objective (or hallucinating). "The model doesn't know what language is, or that languages are different," Matthew Guzdial, an AI researcher and assistant professor at the University of Alberta, told TechCrunch. "It's all just text to it."

Tiezhen Wang, a software engineer at AI startup Hugging Face, agrees with Guzdial that reasoning models' language inconsistencies may be explained by associations the models made during training. "By embracing every linguistic nuance, we expand the model's worldview and allow it to learn from the full spectrum of human knowledge," Wang wrote in a post on X. "For example, I prefer doing math in Chinese because each digit is just one syllable, which makes calculations crisp and efficient. But when it comes to topics like unconscious bias, I automatically switch to English, mainly because that's where I first learned and absorbed those ideas."

[...] Luca Soldaini, a research scientist at the nonprofit Allen Institute for AI, cautioned that we can't know for certain. "This type of observation on a deployed AI system is impossible to back up due to how opaque these models are," they told TechCrunch. "It's one of the many cases for why transparency in how AI systems are built is fundamental."
Math

Rational or Not? This Basic Math Question Took Decades To Answer. (quantamagazine.org) 49

Three mathematicians have developed a breakthrough method for proving whether numbers can be written as fractions, solving a problem that has puzzled researchers for decades. Frank Calegari, Vesselin Dimitrov and Yunqing Tang proved the irrationality of an infinite collection of numbers related to the Riemann zeta function, building on Roger Apery's landmark 1978 proof about a single such number.

The new approach, which relies on 19th-century mathematical techniques, has already helped settle a 50-year-old conjecture about modular forms and could lead to more advances in number theory.
AI

OpenAI's Next Big AI Effort GPT-5 is Behind Schedule and Crazy Expensive (msn.com) 120

"From the moment GPT-4 came out in March 2023, OpenAI has been working on GPT-5..." reports the Wall Street Journal. [Alternate URL here.] But "OpenAI's new artificial-intelligence project is behind schedule and running up huge bills. It isn't clear when — or if — it'll work."

"There may not be enough data in the world to make it smart enough." OpenAI's closest partner and largest investor, Microsoft, had expected to see the new model around mid-2024, say people with knowledge of the matter. OpenAI has conducted at least two large training runs, each of which entails months of crunching huge amounts of data, with the goal of making Orion smarter. Each time, new problems arose and the software fell short of the results researchers were hoping for, people close to the project say... [And each one costs around half a billion dollars in computing costs.]

The $157 billion valuation investors gave OpenAI in October is premised in large part on [CEO Sam] Altman's prediction that GPT-5 will represent a "significant leap forward" in all kinds of subjects and tasks.... It's up to company executives to decide whether the model is smart enough to be called GPT-5 based in large part on gut feelings or, as many technologists say, "vibes."

So far, the vibes are off...

OpenAI wants to use its new model to generate high-quality synthetic data for training, according to the article. But OpenAI's researchers also "concluded they needed more diverse, high-quality data," according to the article, since "The public internet didn't have enough, they felt." OpenAI's solution was to create data from scratch. It is hiring people to write fresh software code or solve math problems for Orion to learn from. [And also theoretical physics experts] The workers, some of whom are software engineers and mathematicians, also share explanations for their work with Orion... Having people explain their thinking deepens the value of the newly created data. It's more language for the LLM to absorb; it's also a map for how the model might solve similar problems in the future... The process is painfully slow. GPT-4 was trained on an estimated 13 trillion tokens. A thousand people writing 5,000 words a day would take months to produce a billion tokens.

OpenAI's already-difficult task has been complicated by internal turmoil and near-constant attempts by rivals to poach its top researchers, sometimes by offering them millions of dollars... More than two dozen key executives, researchers and longtime employees have left OpenAI this year, including co-founder and Chief Scientist Ilya Sutskever and Chief Technology Officer Mira Murati. This past Thursday, Alec Radford, a widely admired researcher who served as lead author on several of OpenAI's scientific papers, announced his departure after about eight years at the company...

OpenAI isn't the only company worrying that progress has hit a wall. Across the industry, a debate is raging over whether improvement in AIs is starting to plateau. Sutskever, who recently co-founded a new AI firm called Safe Superintelligence or SSI, declared at a recent AI conference that the age of maximum data is over. "Data is not growing because we have but one internet," he told a crowd of researchers, policy experts and scientists. "You can even go as far as to say that data is the fossil fuel of AI."

And that fuel was starting to run out.

AI

OpenAI Unveils o3, a Smarter AI Model With Improved Reasoning Skills (openai.com) 27

OpenAI has unveiled a new AI model that it says takes longer to solve problems but gets better results, following Google's similar announcement a day earlier. The model, called o3, replaces o1 from September and spends extra time working through questions that need step-by-step reasoning.

It scores three times higher than o1 on ARC-AGI, a test measuring how well AI handles complex math and logic problems it hasn't seen before. "This is the beginning of the next phase of AI," CEO Sam Altman said during a livestream Friday.

The Microsoft-backed startup is keeping o3 under wraps for now but plans to let outside researchers test it.
AI

Google Releases Its Own 'Reasoning' AI Model (techcrunch.com) 5

An anonymous reader quotes a report from TechCrunch: Google has released what it's calling a new "reasoning" AI model -- but it's in the experimental stages, and from our brief testing, there's certainly room for improvement. The new model, called Gemini 2.0 Flash Thinking Experimental (a mouthful, to be sure), is available in AI Studio, Google's AI prototyping platform. A model card describes it as "best for multimodal understanding, reasoning, and coding," with the ability to "reason over the most complex problems" in fields such as programming, math, and physics. [...]

Built on Google's recently announced Gemini 2.0 Flash model, Gemini 2.0 Flash Thinking Experimental appears to be similar in design to OpenAI's o1 and other so-called reasoning models. Unlike most AI, reasoning models effectively fact-check themselves, which helps them avoid some of the pitfalls that normally trip up AI models. As a drawback, reasoning models often take longer -- usually seconds to minutes longer -- to arrive at solutions. Given a prompt, Gemini 2.0 Flash Thinking Experimental pauses before responding, considering a number of related prompts and "explaining" its reasoning along the way. After a while, the model summarizes what it considers to be the most accurate answer.

Science

Journal That Published Faulty Black Plastic Study Removed From Science Index (arstechnica.com) 29

The publisher of a high-profile, now-corrected study on black plastics has been removed from a critical index of academic journals amid questions about quality criteria, according to a report by Retraction Watch. From a report: On December 16, Clarivate -- a scholarly publication analytics company -- removed the journal Chemosphere from its platform, the Web of Science, which is a key index for academic journals. The indexing platform tracks citations and calculates journal "impact factors," a proxy for relevance in its field. It's a critical metric not only for the journals but for the academic authors of the journal's articles, who use the score in their pursuit of promotions and research funding.

To be included in the Web of Science, Clarivate requires journals to follow editorial quality criteria. According to Retraction Watch, Chemosphere has retracted eight articles this month and published 60 expressions of concern since April. In a December 12 news release, Chemosphere acknowledged the quality concerns and laid out steps it will take to improve its editorial process. Those include improvements to article vetting and peer review, along with assurances that articles will be retracted if there's evidence of policy breaches. "We believe that these measures will help us regain the standard of research integrity that has always been so important to us," the news release stated.

Math

Huge Math Error Corrected In Black Plastic Study (arstechnica.com) 105

Ars Technica's Beth Mole reports: Editors of the environmental chemistry journal Chemosphere have posted an eye-catching correction to a study reporting toxic flame retardants from electronics wind up in some household products made of black plastic, including kitchen utensils. The study sparked a flurry of media reports a few weeks ago that urgently implored people to ditch their kitchen spatulas and spoons. Wirecutter even offered a buying guide for what to replace them with. The correction, posted Sunday, will likely take some heat off the beleaguered utensils. The authors made a math error that put the estimated risk from kitchen utensils off by an order of magnitude.

Specifically, the authors estimated that if a kitchen utensil contained middling levels of a key toxic flame retardant (BDE-209), the utensil would transfer 34,700 nanograms of the contaminant a day based on regular use while cooking and serving hot food. The authors then compared that estimate to a reference level of BDE-209 considered safe by the Environmental Protection Agency. The EPA's safe level is 7,000 ng -- per kilogram of body weight -- per day, and the authors used 60 kg as the adult weight (about 132 pounds) for their estimate. So, the safe EPA limit would be 7,000 multiplied by 60, yielding 420,000 ng per day. That's 12 times more than the estimated exposure of 34,700 ng per day. However, the authors missed a zero and reported the EPA's safe limit as 42,000 ng per day for a 60 kg adult. The error made it seem like the estimated exposure was nearly at the safe limit, even though it was actually less than a tenth of the limit.
"We regret this error and have updated it in our manuscript," the authors said in a correction.

"This calculation error does not affect the overall conclusion of the paper," the correction reads. The study maintains that flame retardants "significantly contaminate" the plastic products, which have "high exposure potential."
AI

Microsoft Announces Phi-4 AI Model Optimized for Accuracy and Complex Reasoning (computerworld.com) 31

An anonymous reader shared this report from Computerworld: Microsoft has announced Phi-4 — a new AI model with 14 billion parameters — designed for complex reasoning tasks, including mathematics. Phi-4 excels in areas such as STEM question-answering and advanced problem-solving, surpassing similar models in performance. Phi-4, part of the Phi small language models (SLMs), is currently available on Azure AI Foundry under the Microsoft Research License Agreement and will launch on Hugging Face [this] week, the company said in a blog post.

The company emphasized that Phi-4's design focuses on improving accuracy through enhanced training and data curation.... "Phi-4 outperforms comparable and even larger models on tasks like mathematical reasoning, thanks to a training process that combines synthetic datasets, curated organic data, and innovative post-training techniques," Microsoft said in its announcement. The model leverages a new training approach that integrates multi-agent prompting workflows and data-driven innovations to enhance its reasoning efficiency. The accompanying report highlights that Phi-4 balances size and performance, challenging the industry norm of prioritizing larger models... Phi-4 achieved a score of 80.4 on the MATH benchmark and has surpassed other systems in problem-solving and reasoning evaluations, according to the technical report accompanying the release. This makes it particularly appealing for domain-specific applications requiring precision, like scientific computation or advanced STEM problem-solving.

Microsoft emphasized its commitment to ethical AI development, integrating advanced safety measures into Phi-4. The model benefits from Azure AI Content Safety features such as prompt shields, protected material detection, and real-time application monitoring. These features, Microsoft explained, help users address risks like adversarial prompts and data security threats during AI deployment. The company also reiterated that Azure AI Foundry, the platform hosting Phi-4, offers tools to measure and mitigate AI risks. Developers using the platform can evaluate and improve their models through built-in metrics and custom safety evaluations, Microsoft added... With Phi-4, Microsoft continues to evolve its AI offerings while promoting responsible use through robust safeguards. Industry watchers will observe how this approach shapes adoption in critical fields where reasoning and security are paramount.

AI

Are People Starting to Love Self-Driving Robotaxis? (marketplace.org) 106

"In a tiny handful of places..." Wired wrote last month, "you can find yourself flanked by taxis with no one in the drivers' seats." But they added that "Granted, practically everyone has been numbed by the hype cycle."

Wired's response? "[P]ile a few of us into an old-fashioned, human-piloted hired car, then follow a single Waymo robotaxi wherever it goes for a whole workday" to "study its movements, its relationship to life on the streets, its whole self-driving gestalt. We'll interview as many of its passengers as will speak to us, and observe it through the eyes of the kind of human driver it's designed to replace."

This week Wired senior editor John Gravios discussed the experience on the business-news radio show Marketplace (with Marketplace host Kai Ryssdal): Ryssdal: What kinds of reactions did you get from people once you track them down, what did they say about their experience in this driverless car?

Gravios:It was pretty uniform and impressive how much people just love it. They just like the experience of the drive, I guess it's a little bit less herky-jerky than a human driver, but I think a lot of it just comes down to people are just kind of relieved not to have to talk to somebody else, as as sad as that is...

Ryssdal: Tell me about Gabe, your Uber driver, and his thoughts on this whole thing, because that was super interesting.

Gravios: So Gabe, this is a guy whose labor is directly at stake. You know, he's a guy whose labor is going to be replaced by a Waymo. He's had 30 years of experience as a professional driver, first as a taxi driver. He even organized a taxi driver strike in the days before Uber. His first, I think his prejudice with Waymo is having shared the road with them sort of sporadically, he thought of them as kind of dopey, rule-following, frustrating vehicles to share the road with. But over the course of the day, he started to recognize that the Waymo was driving a lot like a taxi driver. The Waymo was doing things that were aggressive, that are exactly the kinds of things that a taxi driver is trained to be aggressive with and doing things that were cautious that are exactly the kinds of things that taxi drivers are trained to be cautious with.

Ryssdal: Can we talk unit economics here? According to the math from a study you guys' cite, Waymo is not making a whole lot of money per vehicle, right? And eventually they're going to scale, and it's going to work out, but for the moment, even though they've gotten 11 billion-something-dollars, they're not turning a whole lot of profit here.

Gravios: Yeah, that's a big question, and the math is, even that study, based on a lot of guesswork. It's really hard to say what the unit economics are. What we can say is that the ridership rates are going up so fast that that study is already well out of date. When we were doing our chase, I think the monthly ridership for Waymo was 100,000 rides a month. By October, it was already 150,000 rides a month. So, the economics are just shifting under our feet a lot.

AI

Harvard Is Releasing a Massive Free AI Training Dataset Funded by OpenAI and Microsoft (wired.com) 27

Harvard University announced Thursday it's releasing a high-quality dataset of nearly one million public-domain books that could be used by anyone to train large language models and other AI tools. From a report: The dataset was created by Harvard's newly formed Institutional Data Initiative with funding from both Microsoft and OpenAI. It contains books scanned as part of the Google Books project that are no longer protected by copyright.

Around five times the size of the notorious Books3 dataset that was used to train AI models like Meta's Llama, the Institutional Data Initiative's database spans genres, decades, and languages, with classics from Shakespeare, Charles Dickens, and Dante included alongside obscure Czech math textbooks and Welsh pocket dictionaries. Greg Leppert, executive director of the Institutional Data Initiative, says the project is an attempt to "level the playing field" by giving the general public, including small players in the AI industry and individual researchers, access to the sort of highly-refined and curated content repositories that normally only established tech giants have the resources to assemble. "It's gone through rigorous review," he says.

Leppert believes the new public domain database could be used in conjunction with other licensed materials to build artificial intelligence models. "I think about it a bit like the way that Linux has become a foundational operating system for so much of the world," he says, noting that companies would still need to use additional training data to differentiate their models from those of their competitors.

AI

OpenAI Releases 'Smarter, Faster' ChatGPT - Plus $200-a-Month Subscriptions for 'Even-Smarter Mode' (venturebeat.com) 64

Wednesday OpenAI CEO Sam Altman announced "12 Days of OpenAI," promising that "Each weekday, we will have a livestream with a launch or demo..." And sure enough, today he announced the launch of two things:

- "o1, the smartest model in the world. Smarter, faster, and more features (e.g. multimodality) than o1-preview. Live in ChatGPT now, coming to API soon."

- "ChatGPT Pro. $200/month. Unlimited usage and even-smarter mode for using o1. More benefits to come!"

Altman added this update later: For extra clarity: o1 is available in our plus tier, for $20/month. With the new pro tier ($200/month), it can think even harder for the hardest problems. Most users will be very happy with o1 in the plus tier!
VentureBeat points out that subscribers "also gain access to GPT-4o, known for its advanced natural language generation capabilities, and the Advanced Voice feature for speech-based interactions."

And even for non-subscribers, ChatGPT can now also analyze images, points out VentureBeat, "a hugely helpful feature upgrade as it enables users to upload photos and have the AI chatbot respond to them, giving them detailed plans on how to build a birdhouse entirely from a single candid photo of one, for one fun example." In another, potentially more serious and impressive example, it is now capable of helping design data centers from sketches... o1 represents a significant evolution in reasoning model capabilities, including better handling of complex tasks, image-based reasoning, and enhanced accuracy. Enterprise and Education users will gain access to the model next week... OpenAI's updates also include safety enhancements, with the o1-preview scoring 84 on a rigorous safety test, compared to 22 for its predecessor...

To encourage the use of AI in societal-benefit fields, OpenAI has announced the ChatGPT Pro Grant Program. The initiative will initially award 10 grants to leading medical researchers, providing free access to ChatGPT Pro tools.

In a video Altman displays graphs showing o1 dramatically outperforms gpt4o on math questions, on competition coding at CodeForces, and on PhD-level science questions.

Slashdot Top Deals