Can AI Solve Real Math Proofs? Researchers Put Generative Models to the Test

по

-

25.03.2026

13

<br>

In 2026, the question isn’t if machines can outthink us, but where. While Deep Blue defeated chess champion Garry Kasparov in 1997, the real question was never about raw processing power. Today, generative AI is forcing a similar reckoning in a field far more abstract: mathematics. Researchers are probing whether these models can actually advance math, not just solve textbook problems.

The Difference Between Calculation and Discovery

Most people associate math with numbers and formulas. But at the research level, mathematics is about proving statements true or false – often about concepts too complex to visualize. Unlike homework where the answer is a single value, mathematicians deal with abstract shapes in multiple dimensions, proving their properties using equations. This is not a matter of computation, but of conceptual understanding.

AI has already shown impressive performance on standardized tests like the International Mathematical Olympiad and has even “solved” certain Erdős problems. However, these benchmarks are misleading. They resemble homework more than cutting-edge research. Just as a calculator is different from a mathematician, passing a test doesn’t equate to genuine mathematical insight. The core question is whether AI can fundamentally change how math is done, not just speed up existing processes.

The First Proof Challenge: A Rigorous Test

To determine AI’s true capabilities, a team of 11 mathematicians launched the “First Proof” challenge. They posed actual unsolved research problems, breaking them down into smaller “lemmas” (sub-proofs) from their own upcoming papers. This ensured the questions weren’t in AI training data, eliminating the possibility of regurgitation. The goal was simple: could AI contribute to original mathematical discovery?

Early results are mixed. Initial tests with publicly available chatbots yielded only two correct answers out of ten. However, larger AI companies, using proprietary models and human oversight, achieved significantly better scores. OpenAI claimed six correct solutions, and Google Gemini reported similar success. A community of math enthusiasts also contributed, pushing the boundaries of what’s possible with LLMs.

The Rise of AI Collaboration: Scaffolding and Iteration

The most striking finding was the disparity between public and private AI performance. In-house models vastly outperformed openly accessible ones. But another trend emerged: “scaffolding.” Researchers aren’t relying on single LLMs but rather orchestrating multiple AI interactions, using them to interrogate and refine each other’s work. This iterative process boosts accuracy but blurs the line between AI and human contribution.

19th-Century Math: A Style Problem?

Even when AI arrives at correct proofs, mathematicians notice a difference in style. AI solutions often resemble 19th-century methods – laborious, roundabout, and lacking elegance. True mathematical discovery involves creating new concepts that streamline understanding, a process AI has yet to master. However, some AI-generated proofs have surprised researchers with their creativity, suggesting the potential for genuine breakthroughs.

The Future of AI in Mathematics

The First Proof team plans to continue the challenge with stricter controls, providing clearer insights into AI capabilities. The goal isn’t to replace mathematicians but to understand whether AI is a powerful tool or a revolutionary force. If AI can consistently produce original, elegant proofs, it could reshape the field. For now, the question remains open. The next rounds of testing will reveal whether AI can truly advance mathematics or simply accelerate existing methods.