AI

Anthropic Launches Improved Version of Its Entry-Level LLM (techcrunch.com) 5

Anthropic, the AI startup co-founded by ex-OpenAI execs, has released an updated version of its faster, cheaper, text-generating model available through an API, Claude Instant. TechCrunch reports: The updated Claude Instant, Claude Instant 1.2, incorporates the strengths of Anthropic's recently announced flagship model, Claude 2, showing "significant" gains in areas such as math, coding, reasoning and safety, according to Anthropic. In internal testing, Claude Instant 1.2 scored 58.7% on a coding benchmark compared to Claude Instant 1.1, which scored 52.8%, and 86.7% on a set of math questions versus 80.9% for Claude Instant 1.1. "Claude Instant generates longer, more structured responses and follows formatting instructions better," Anthropic writes in a blog post. "Instant 1.2 also shows improvements in quote extraction, multilingual capabilities and question answering."

Claude Instant 1.2 is also less likely to hallucinate and more resistant to jailbreaking attempts, Anthropic claims. In the context of large language models like Claude, "hallucination" is where a model generates text that's incorrect or nonsensical, while jailbreaking is a technique that uses cleverly-written prompts to bypass the safety features placed on large language models by their creators. And Claude Instant 1.2 features a context window that's the same size of Claude 2's -- 100,000 tokens. Context window refers to the text the model considers before generating additional text, while tokens represent raw text (e.g. the word "fantastic" would be split into the tokens "fan," "tas" and "tic"). Claude Instant 1.2 and Claude 2 can analyze roughly 75,000 words, about the length of "The Great Gatsby." Generally speaking, models with large context windows are less likely to "forget" the content of recent conversations.

Math

ChatGPT Is Getting Dumber at Basic Math 91

A recently released research reveals a fundamental challenge of developing artificial intelligence: ChatGPT has become worse at performing certain basic math operations. From a report: The researchers at Stanford University and the University of California, Berkeley said the deterioration is an example of a phenomenon known to AI developers as drift, where attempts to improve one part of the enormously complex AI models make other parts of the models perform worse.

[...] Thus far, they have tested two versions of ChatGPT: version 3.5, available free online to anyone, and version 4.0, available via a premium subscription. The results aren't entirely promising. They gave the chatbot a basic task: identify whether a particular number is a prime number. This is the sort of math problem that is complicated for people but simple for computers.

Is 17,077 prime? Is 17,947 prime? Unless you are a savant you can't work this out in your head, but it is easy for computers to evaluate. A computer can just brute force the problem -- try dividing by two, three, five, etc., and see if anything works. To track performance, the researchers fed ChatGPT 1,000 different numbers. In March, the premium GPT-4, correctly identified whether 84% of the numbers were prime or not. (Pretty mediocre performance for a computer, frankly.) By June its success rate had dropped to 51%. Across eight different tasks, GPT-4 became worse at six of them. GPT-3.5 improved on six measures, but remained worse than its advanced sibling at most of the tasks.
AI

Is ChatGPT Getting Worse? (fortune.com) 93

A new study (PDF) from Stanford found that ChatGPT performed worse on certain tasks in June than its March version. The paper supports a widely held, though unproven, notion that the AI language model's performance in coding and compositional tasks has deteriorated in recent months. Fortune reports: The study compared the performance of the chatbot, created by OpenAI, over several months at four "diverse" tasks: solving math problems, answering sensitive questions, generating software code, and visual reasoning. Researchers found wild fluctuations -- called drift -- in the technology's ability to perform certain tasks. The study looked at two versions of OpenAI's technology over the time period: a version called GPT-3.5 and another known as GPT-4. The most notable results came from research into GPT-4's ability to solve math problems.

Over the course of the study researchers found that in March GPT-4 was able to correctly identify that the number 17077 is a prime number 97.6% of the times it was asked. But just three months later, its accuracy plummeted to a lowly 2.4%. Meanwhile, the GPT-3.5 model had virtually the opposite trajectory. The March version got the answer to the same question right just 7.4% of the time -- while the June version was consistently right, answering correctly 86.8% of the time. Similarly varying results happened when the researchers asked the models to write code and to do a visual reasoning test that asked the technology to predict the next figure in a pattern.

James Zou, a Stanford computer science professor who was one of the study's authors, says the "magnitude of the change" was unexpected from the "sophisticated ChatGPT." The vastly different results from March to June and between the two models reflect not so much the model's accuracy in performing specific tasks, but rather the unpredictable effects of changes in one part of the model on others. [...] The exact nature of these unintended side effects is still poorly understood because researchers and the public alike have no visibility into the models powering ChatGPT. It's a reality that has only become more acute since OpenAI decided to backtrack on plans to make its code open source in March. "These are black-box models," Zou says. "So we don't actually know how the model itself, the neural architectures, or the training data have changed."

The Almighty Buck

Twitter Starts Sharing Ad Revenue With Verified Creators (techcrunch.com) 62

Twitter has started sending out the first payouts to creators on the platform who are part of the company's revenue sharing program. The largest payout reported thus far was to Billy Markus, the co-creator of the Dogecoin cryptocurrency, which amounted to a whopping $37,050. TechCrunch reports: Users who subscribe to Twitter Blue and have earned more than 5 million tweet impressions each month for the last 3 months are eligible to join. According to owner Elon Musk, the first round of creator payouts will total $5 million, and will be cumulative from the month of February onward. These payouts will be delivered via Stripe. [...] Twitter's payouts are determined by tweet impressions. Babylon Bee writer Ashley St. Clair (710,000 followers) said that she earned $7,153, and according to her "napkin math," she had around 840 million impressions from February through July. That would make her rate about $0.0085 CPM (cost per mille), or $8.52 per million impressions. It's not clear whether or not individual CPMs change from user to user.
AI

Anthropic Releases a New Version of Its ChatGPT Rival, Claude (bloomberg.com) 23

Anthropic, an artificial intelligence startup positioning itself as the builder of a safer kind of chatbot, has released a new version of its AI bot, named Claude. From a report: Anthropic said that Claude 2 is available to anyone in the US or UK online at claude.ai, and businesses can access it via an application programming interface. The new release on Tuesday comes several months after Anthropic began offering an earlier version of Claude to businesses that wanted to add it to their products. Previously, the bot was tested by a handful of companies including Quora, which built it into an app called Poe that lets users ask questions.

Like its predecessor, Claude 2 is built atop a large language model and can be used for written tasks like summarizing, searching, answering questions and coding. Both models can currently take in large chunks of text -- a user can ask it to summarize a book, for instance -- though Claude 2 can generate longer responses than its predecessor. Responses can reach up to about 3,000 words, according to data provided by the company. Claude 2 will also offer more accurate responses on some topics, such as coding and grade-school-level math, the company said. Anthropic's goal has been for Claude to be less susceptible than other chatbots to manipulation.

Television

TV's Golden Era Proved Costly To Streamers (wsj.com) 111

Consumers are winning from the streaming revolution but across most of Hollywood, the businesses churning out TV and movies are losing. From a report: Services such as Netflix, Disney+, Paramount+ and Max have become the default entertainment options for homes across America rather than cable, saving many consumers money. For the titans of Hollywood, that shift has been costly. Traditional media and entertainment companies have reported losses of more than $20 billion combined since early 2020 on their direct-to-consumer streaming businesses. Netflix, which brings in profits, is an exception, but the rest of the industry is wondering: While consumers love streaming, is it actually a good business?

Investors now care about profitability rather than growth, a change that makes finding new revenue streams and retaining customers critical. Studios that for years were able to splurge on content to feed viewers' insatiable appetite for new shows and films now must pull back to make the math work. The ad market is weakening, many companies have laid off staff to save money and Hollywood writers are on strike. Market values for Paramount Global, Comcast, Walt Disney and Netflix are down more than $280 billion combined since the end of 2020. Warner Bros. Discovery is worth about half of its total value since its 2022 trading debut as a combined company. The declines have come after many of the stocks rose during the early part of the pandemic, when consumers were stuck at home and hungry for entertainment.

Microsoft

Microsoft's Light-Based, Transistor-less Computer Solves Complex Optimization Problems at the Speed of Light (techspot.com) 65

"Picture a world where computing is not limited by the binary confines of zeros and ones, but instead, is free to explore the vast possibilities of continuous value data." That's Microsoft's research blog, describing its newly-developed Analog Iterative Machine, an analog optical computer designed for solving difficult optimization problems.

"For a multidisciplinary group of researchers at the Microsoft Research Lab in Cambridge, U.K., the mission was to build a new kind of computer that would transcend the limitations of the binary systems," says a Microsoft blog post.

Neowin describes it as a computer "that uses photons and electrons, rather than transistors, to process data." Light "passes through several layers, making impressions on each part of what's known as a 'modular array'," writes PC Gamer. "It's this process of projecting light through the array that replaces the function of a standard transistor."

Microsoft says it can "solve practical problems at the speed of light." And "it's already shown potential for surpassing state-of-the art digital (silicon-based) technology," adds TechSpot, "or even the most powerful quantum computers being designed right now." The AIM machine is built using commodity opto-electronic technologies that are low-cost and scalable, Microsoft says, and is based on an "asynchronous data flow architecture" which doesn't require data exchange between storage units and "compute locations."

AIM isn't designed for general purpose computing tasks, though. The analog optical computer is useful to solve difficult "optimization problems" like the well-known travelling salesman riddle, Microsoft says, which are at the heart of many, math-intensive industries including finance, logistics, transportation, energy, healthcare, and manufacturing. When it comes to crunching all the possible combinations of an exponentially growing problem, traditional, digital computers struggle to provide a solution in a "timely, energy-efficient and cost-effective manner."

AIM was conceived to address two simultaneous trends, Microsoft explains, which are sidestepping the unraveling of Moore's Law and overcoming the limitations of specialized machines designed for solving optimization problems... AIM works at the speed of light, and it seemingly provides a 100x increase in performance compared to the most advanced digital approaches available today. For now, AIM is still a research project with limited access for potential customers. The machine, however, is already being tested by UK financial company Barclays, which is using it to track transactions of money into stock purchases.

Microsoft says it's now releasing its "AIM simulator as a service, allowing selected users to get first-hand experience. The initial users are the team's collaborators at Princeton University and at Cambridge University."
Math

Here's How We Could Begin Decoding an Alien Message Using Math (sciencenews.org) 64

Slashdot reader silverjacket writes: Researchers at Oxford and elsewhere developed a method that figures out the most likely number and size of dimension in which to format a string of bits, with applications to interpreting messages from extraterrestrial intelligence (METI), if we were to receive them.
The new method "looks at every possible combination of dimension number and size," according to Science News: The researchers also measure each possible configuration's global order by seeing how much an image compression algorithm can shrink it without losing information — mathematically, randomness is less compressible than regular patterns...
Hector Zeni [one of the creators of this method] "notes that in Carl Saganâ(TM)s sci-fi novel Contact, the characters spend a lot of time figuring out that a message received from aliens is in three dimensions (specifically a video). âoeIf you have our tools, you would solve that problem in seconds and with no human intervention.â An algorithm that pieces together smaller algorithmic components in order to explain or predict data — this new method is just one way to do it — may also help us one day achieve artificial general intelligence, Zenil says. Such automated approaches don't depend on human assumptions about the signal. That opens the door to discovering forms of intelligence that might think differently from our own.
Education

US Reading and Math Scores Drop To Lowest Level In Decades (npr.org) 248

The average test scores for 13-year-old students in the U.S. have decreased in reading and math since 2020, reaching the lowest levels in decades, with more significant declines in math. NPR reports: The average scores, from tests given last fall, declined 4 points in reading and 9 points in math, compared with tests given in the 2019-2020 school year, and are the lowest in decades. The declines in reading were more pronounced for lower performing students, but dropped across all percentiles. The math scores were even more disappointing. On a scale of 500 points, the declines ranged from 6 to 8 points for middle and high performing students, to 12 to 14 points for low performing students.

The math results also showed widening gaps based on gender and race. Scores decreased by 11 points for female students over 2020 results, compared with a 7-point decrease for male students. Among Black students, math scores declined 13 points, while white students had a 6-point drop. Compared with the 35-point gap between Black and white students in 2020, the disparity widened to 42 points.

While the scores show a drop from the pre-pandemic years, the results also show that there are other factors at work. The decline is even more substantial when compared with scores of a decade ago: The average scores declined 7 points in reading and 14 points in mathematics. The Education Department says plans are underway to address the learning loss. [...] The latest results are from the NAEP Long-Term Trend Assessment, traditionally administered every four years by the National Center for Education Statistics.

Social Networks

Reddit CEO Steve Huffman: Reddit 'Was Never Designed To Support Third-Party Apps' (theverge.com) 224

Reddit CEO Steve Huffman says he is refusing to undo the company's decision to increase prices for third-party app developers, despite thousands of subreddits pledging to keep their subreddits private or restricted in protest. "It's a startling change for many members of the Reddit community, but it's one that Reddit CEO Steve Huffman tells The Verge that he's fine with making," writes The Verge's Jay Peters. "Those third-party apps, in his eyes, aren't adding much value to the platform." From the report: "So the vast majority of the uses of the API -- not [third-party apps like Apollo for Reddit] -- the other 98 percent of them, make tools, bots, enhancements to Reddit. That's what the API is for," Huffman says. "It was never designed to support third-party apps." According to Huffman, he "let it exist," and "I should take the blame for that because I was the guy arguing for that for a long time." Huffman now takes issue with the third-party apps that are building a business on top of his own. "I didn't know -- and this is my fault -- the extent that they were profiting off of our API. That these were not charities."

I asked him if he felt that Apollo, rif for Reddit, and Sync, which all plan to shut down as a result of the pricing changes, don't add value to Reddit. "Not as much as they take," he says. "No way." "They need to pay for this. That is fair. What our peers have done is banned them entirely. And we said no, you know what, we believe in free markets. You need to cover your costs," he says. Apollo developer Christian Selig recently did the math for us on The Vergecast, though, and suggested that covering Reddit's asking price with only 30 days' notice would have been nigh-impossible.

Huffman didn't have an answer for why the deadline was so short, beyond wanting there to be a deadline. "We're perfectly willing to work with the folks who want to work with us, including figuring out what the transition period will look like. But I think a deadline forces people, us included, to negotiate that." I also asked if Huffman truly believes that the blackouts haven't impacted his decision-making around the API pricing changes at all. "In this case? That's true," says Huffman. "That's our business decision, and we're not undoing that business decision."

Programming

Google's Bard AI Can Now Write and Execute Code To Answer a Question 19

In a blog post on Wednesday, Google said Bard is getting better at logic and reasoning. "Google says that now when you ask Bard a 'computational' task like math or string manipulation, instead of showing the output of the language model, that language model will instead write a program, execute that program, and then show the output of that program to the user as an answer," reports Ars Technica. From the report: Google's blog post provides the example input of "Reverse the word 'Lollipop' for me." ChatGPT flubs this question and provides the incorrect answer "pillopoL," because language models see the world in chunks of words, or "tokens," and they just aren't good at this. It gets the output correct as "popilloL," but more interesting is that it also includes the python code it wrote to answer the question. That's neat for programming-minded people to see under the hood, but wow, is that probably the scariest output ever for regular people. It's also not particularly relevant. Imagine if Gmail showed you a block of code when you just asked it to fetch email. It's weird. Just do the job you were asked to do, Bard.

Google likens an AI model writing a program to humans doing long division in that it's a different mode of thinking [...]. Google says this "writing code on the fly" method will also be used for questions like: "What are the prime factors of 15683615?" and "Calculate the growth rate of my savings." The company says, "So far, we've seen this method improve the accuracy of Bard's responses to computation-based word and math problems in our internal challenge datasets by approximately 30%." As usual, Google warns Bard "might not get it right" due to interpreting your question wrong or just, like all of us, writing code that doesn't work the first time. Bard is coding up answers on the fly right now if you want to give it a shot at bard.google.com.
Television

The Binge Purge 156

TV's streaming model is broken. It's also not going away. For Hollywood, figuring that out will be a horror show. From a report: Across the town, there's despair and creative destruction and all sorts of countervailing indicators. Certain shows that were enthusiastically green-lit two years ago probably wouldn't be made now. Yet there are still streamers burning mountains of cash to entertain audiences that already have too much to watch. Netflix has tightened the screws and recovered somewhat, but the inarguable consensus is that there is still a great deal of pain to come as the industry cuts back, consolidates, and fumbles toward a more functional economic framework. The high-stakes Writers Guild of America strike has focused attention on Hollywood's labor unrest, but the really systemic issue is streaming's busted math. There may be no problem more foundational than the way the system monetizes its biggest hits: It doesn't.

Just ask Shawn Ryan. In April, the veteran TV producer's latest show, the spy thriller The Night Agent, became the fifth-most-watched English-language original series in Netflix's history, generating 627 million viewing hours in its first four weeks. As it climbed to the heights of such platform-defining smashes as Stranger Things and Bridgerton, Ryan wondered how The Night Agent's success might be reflected in his compensation. "I had done the calculations. Half a billion hours is the equivalent of over 61 million people watching all ten episodes in 18 days. Those shows that air after the Super Bowl -- it's like having five or ten of them. So I asked my lawyer, 'What does that mean?'" recalls Ryan. As it turns out, not much. "In my case, it means that I got paid what I got paid. I'll get a little bonus when season two gets picked up and a nominal royalty fee for each additional episode that gets made. But if you think I'm going out and buying a private jet, you're way, way off."

Ryan says he'll probably make less money from The Night Agent than he did from The Shield, the cop drama he created in 2002, even though the latter ran on the then-nascent cable channel FX and never delivered Super Bowl numbers. "The promise was that if you made the company billions, you were going to get a lot of millions," he says. "That promise has gone away." Nobody is crying for Ryan, of course, and he wouldn't want them to. ("I'm not complaining!" he says. "I'm not unaware of my position relative to most people financially.") But he has a point. Once, in a more rational time, there was a direct relationship between the number of people who watched a show and the number of jets its creator could buy. More viewers meant higher ad rates, and the biggest hits could be sold to syndication and international markets. The people behind those hits got a cut, which is why the duo who invented Friends probably haven't flown commercial since the 1990s. Streaming shows, in contrast, have fewer ads (or none at all) and are typically confined to their original platforms forever. For the people who make TV, the connection between ratings and reward has been severed.
AI

Google IO To Feature AI Updates, Showing Off PaLM 2 LLM (cnbc.com) 10

At its annual Google I/O developers conference on Wednesday, Google is planning to announce a number of generative AI updates, including launching a general-use large language model (LLM) called PaLM 2. CNBC reports: According to internal documents about Google I/O viewed by CNBC, the company will unveil PaLM 2, its most recent and advanced LLM. PaLM 2 includes more than 100 languages and has been operating under the internal codename "Unified Language Model." It's also performed a broad range of coding and math tests as well as creative writing tests and analysis. At the event, Google will make announcements on the theme of how AI is "helping people reach their full potential," including "generative experiences" to Bard and Search, the documents show. Pichai will be speaking to a live crowd of developers as he pitches his company's AI advancements.

Google first announced the PaLM language model in April of 2022. In March of this year, the company launched an API for PaLM alongside a number of AI enterprise tools it says will help businesses "generate text, images, code, videos, audio, and more from simple natural language prompts." Last month, Google said its medical LLM called "Med-PaLM 2" can answer medical exam questions at an "expert doctor level" and is accurate 85% of the time.

Education

Khan Academy Piloting a Version of GPT Called Khanmigo (fastcompany.com) 36

Sal Khan, founder and CEO of online learning nonprofit Khan Academy, wants to turn GPT into a tutor. From a report: Khan Academy is testing a carefully managed version of OpenAI's GPT that can help guide students in their studies, not enable them to cheat. A pilot is currently running with a handful of schools and districts to test the software, and Khan hopes to open a wider beta this summer. "I strive to be at the cutting edge of how AI, especially large language models, can be integrated to actually solve real problems in education," Khan says.

Many students are already using ChatGPT and other generative AI tools to assist with their homework -- sometimes against their teachers' wishes. Khan Academy's approach stands out because it's designed to answer students' questions without giving away the answers, and to integrate with the organization's existing videos and exercises. In a demonstration for Fast Company, Khan showed how the chatbot, dubbed Khanmigo, can guide students through math problems, help debug code, serve as a debate partner, and even engage in conversation in the voices of literary characters like Hamlet and Jay Gatsby. The project began last June, when Khan received an introductory email from Sam Altman and Greg Brockman, OpenAI's CEO and president, respectively. The two offered a private demo of the AI software, and Khan was impressed with the program's ability to answer questions intelligently about various academic materials.

Math

Scientists Finally Solved the Mystery of How the Mayan Calendar Works (popularmechanics.com) 97

An anonymous reader quotes a report from Popular Mechanics: The Mayan calendar's 819-day cycle has confounded scholars for decades, but new research shows how it matches up to planetary cycles over a 45-year span. That's a much broader view of the tricky calendar than anyone previously tried to take. In a study published in the journal Ancient Mesoamerica, two Tulane University scholars highlighted how researchers never could quite explain the 819-day count calendar until they broadened their view.

"Although prior research has sought to show planetary connections for the 819-day count, its four-part, color-directional scheme is too short to fit well with the synodic periods of visible planets," the study authors write. "By increasing the calendar length to 20 periods of 819-days a pattern emerges in which the synodic periods of all the visible planets commensurate with station points in the larger 819-day calendar." That means the Mayans took a 45-year view of planetary alignment and coded it into a calendar that has left modern scholars scratching their heads in wonder.

Mercury was always the starting point for the tricky timeline because its synodic period -- 117 days -- matches nicely into 819. From there, though, we need to start extrapolating out the 819 number, and if you chart 20 cycles of 819, you can fit every key planet into the mix. And Mars may be the kicker for the overall length. With a 780-day synodic period, 21 periods match exactly to 16,380, or 20 cycles of 819. Venus needs seven periods to match five 819-day counts, Saturn has 13 periods to fit with six 819-day counts, and Jupiter 39 periods to hit 19 819-counts.
"Rather than limit their focus to any one planet," the authors write, "the Maya astronomers who created the 819-day count envisioned it as a larger calendar system that could be used for predictions of all the visible planet's synod periods, as well as commensuration points with their cycles in the Tzolk'in and Calendar Round."
Earth

Utah's Record Snowfall 'Buys Us Time' for Drying Great Salt Lake (cnn.com) 56

Utah's Great Salt Lake had shrunk by two thirds its original size, the New York Times reported last June. And "It was only three months ago that nearly three dozen scientists and conservationists sounded the alarm that the Great Salt Lake in Utah faces 'unprecedented danger'," CNN reports. "Unless the state's lawmakers fast-tracked 'emergency measures' to dramatically increase the lake's inflow by 2024, it would likely disappear in the next five years." Now, after an incredible winter full of rain and snow, there is a glimmer of hope on North America's largest terminal lake, where water levels had fallen to a record-low last fall amid a historic, climate change-fueled drought across the West. As of Thursday, the snowpack in the Great Salt Lake basin was more than double the average for this time of year. All of this winter's rain and snow that fell directly into the Great Salt Lake increased the water level there by three feet...

In reality, the precipitation only made up for what was lost to last year's drought and evaporation... To reverse the decline, the Great Salt Lake needs an additional 1 million acre-feet of water — roughly 326 billion gallons — per year, according to the January assessment. Bonnie Baxter, the director of the Great Salt Lake Institute at Westminster College and one of the authors of the January report, said the state would "need another five years like this in order to get the system healthy again."

"If I do the math, we got about three feet of direct precipitation that fell into the lake this year, that is fantastic," Baxter told CNN. "But the last two years, we also lost 2.8 feet in the summer, and we expect to lose that three feet in the desiccating summer. So now, we're pretty much even, and that's not a good place to be."

Baxter says the rainfall "buys us some time" to work on long-term issues like water rights and metering the water used in agriculture — maybe a year or two — but "We're not going to be bailed out by excess snow."

There's hope melting snow could add more water, but Baxter warns that it might not. "If it melts really quickly, which is probably going to happen because we have these late snows and now we're right up against warm temperatures, then you get the water just rushing over the land and not taking time to charge the aquifers and just evaporating off the surface."
Math

NYT Debuts Digits, the Math Version of Wordle (gamespot.com) 17

The New York Times added a new daily puzzle game to its library in the form of Digits. GameSpot reports: This collection of math conundrums tasks you with reaching a designated number by using six numbers that you're free to multiply, divide, subtract, or add up to reach the final result, so long as your process doesn't create any fractions or negative numbers.

Currently in beta and only available for this week, there'll be five of these math puzzles to solve every day. These aren't one-and-done puzzles like Wordle, and depending on the path you choose to solve one of these math mysteries, you'll be awarded 1-3 star ratings. If Digits proves to be popular with its readers, the New York Times will then start work on the further development of the game.

AI

Khan Academy Chief Says GPT-4 is Ready To Be a Tutor (axios.com) 58

For all the high-profile examples of ChatGPT getting facts and even basic math wrong, Khan Academy founder Sal Khan says the latest version of the generative AI engine makes a pretty good tutor. From a report: "This technology is very powerful," Khan told Axios in a recent interview. "It's getting better." Khan Academy was among the early users of GPT-4 that OpenAI touted when it released the updated engine. This week, two more school districts (Newark, N.J. and Hobart, Indiana) are joining the pilot of Khanmigo, the AI-assisted tutor. With the two new districts, a total of 425 teachers and students are testing Khanmigo.

The chatbot works much like a real-life or online tutor, looking at students' work and helping them when they get stuck. In a math problem, for example, Khanmigo can detect not just whether a student got an answer right or wrong, but also where they may have gone astray in their reasoning. ChatGPT and its brethren have been highly controversial -- especially in education, where some schools are banning the use of the technology. Concerns range from the engines' propensity to be confidently wrong (or "hallucinate") to worries about students using the systems to write their papers. Khan said he understands these fears, but also notes that many of those criticizing the technology are also using it themselves and even letting their kids make use of it. And, for all its flaws, he says today's AI offers the opportunity for more kids -- in both rich and developing countries -- to get personalized learning. "The time you need tutoring is right when you are doing the work, often when you are in class," Khan said.

Math

Mathematicians Invent New 'Einstein' Shape (theguardian.com) 50

One of mathematics' most intriguing visual mysteries has finally been solved -- thanks to a hobbyist in England. From a report: The conundrum: is there a shape that can be arranged in a tile formation, interlocking with itself ad infinitum, without the resulting pattern repeating over and over again? In nature and on our bathroom walls, we typically see tile patterns that repeat in "a very predictable, regular way," says Dr Craig Kaplan, an associate professor of computer science at the University of Waterloo in Ontario. What mathematicians were interested in were shapes that "guaranteed non-periodicity" -- in other words, there was no way to tile them so that the overall pattern created a repeating grid. Such a shape would be known as an aperiodic monotile, or "einstein" shape, meaning, in roughly translated German, "one shape" (and conveniently echoing the name of a certain theoretical physicist).

"There's been a thread of beautiful mathematics over the last 60 years or so searching for ever smaller sets of shapes that do this," Kaplan says. "The first example of an aperiodic set of shapes had over 20,000 shapes in it. And of course, mathematicians worked to get that number down over time. And the furthest we got was in the 1970s," when the Nobel-prize winning physicist Roger Penrose found pairs of shapes that fit the bill. Now, mathematicians appear to have found what they were looking for: a 13-sided shape they call "the hat." The discovery was largely the work of David Smith of the East Riding of Yorkshire, who had a longstanding interest in the question and investigated the problem using an online geometry platform. Once he'd found an intriguing shape, he told the New York Times, he would cut it out of cardstock and see how he could fit the first 32 pieces together. "I am quite persistent but I suppose I did have a bit of luck," Smith told the Guardian in an email.

Math

A Geometric Shape That Does Not Repeat Itself When Tiled (phys.org) 72

IHTFISP shares a report from Phys.Org: A quartet of mathematicians from Yorkshire University, the University of Cambridge, the University of Waterloo and the University of Arkansas has discovered a 2D geometric shape that does not repeat itself when tiled. David Smith, Joseph Samuel Myers, Craig Kaplan and Chaim Goodman-Strauss have written a paper describing how they discovered the unique shape and possible uses for it. Their full paper is available on the arXiv preprint server. [...]

The shape has 13 sides and the team refers to it simply as "the hat." They found it by first paring down possibilities using a computer and then by studying the resulting smaller sets by hand. Once they had what they believed was a good possibility, they tested it using a combinatorial software program -- and followed that up by proving the shape was aperiodic using a geometric incommensurability argument. The researchers close by suggesting that the most likely application of the hat is in the arts.

Slashdot Top Deals