New AI Math Exam Tests Machines on Unsolved Problems

3

Top mathematicians have launched “First Proof,” a unique challenge designed to rigorously test the mathematical capabilities of artificial intelligence. The exam presents AI systems with actual, unsolved problems drawn directly from current research, giving them one week to find solutions. This marks a significant step beyond existing tests, which often rely on pre-existing datasets or competition problems.

The Problem with Current AI Math Tests

Previous attempts to gauge AI’s mathematical prowess have been flawed. While models like Google’s Gemini Deep Think have achieved high scores on the International Mathematical Olympiad, these tests use standardized problems that don’t mirror real research. Furthermore, some AI-generated “solutions” have turned out to be re-discoveries of obscure, previously published proofs—essentially sophisticated literature searches masquerading as original work. As Yale professor Daniel Spielman notes, many reported breakthroughs come from the companies developing the AI themselves, raising questions about objectivity.

First Proof: A Controlled Experiment

The First Proof initiative aims to correct these issues. Eleven leading mathematicians, including a Fields Medal winner, crafted original problems that have never appeared in any AI training data. The solutions are encrypted and will be revealed on February 13, ensuring a fair test.

The problems aren’t designed to be groundbreaking theorems, but rather “lemmas”—small, essential steps in larger proofs. These are the kinds of tedious, yet crucial calculations that consume mathematicians’ time. Solving them would demonstrate AI’s potential to accelerate research by automating these foundational tasks.

Why This Matters: The Future of AI in Mathematics

The focus on practical utility over flashy results is key. Mathematician Andrew Sutherland suggests that AI’s near-term impact will be felt not in solving grand unsolved problems, but in becoming an indispensable tool for working mathematicians. If AI can reliably handle the “grunt work” of theorem proving, it could free up researchers to focus on more creative and conceptual tasks.

“This may be the year when a lot more people start paying attention”
– Andrew Sutherland, MIT

First Proof isn’t just a test; it’s a benchmark for the future of AI-assisted mathematics, with the potential to reshape how research is conducted.

попередня статтяYellowstone’s Supervolcano Shows Unusual Ground Uplift