An AI from Google DeepMind can solve some International Mathematical Olympiad (IMO) questions on geometry almost as well as the best human contestants.
“The results of AlphaGeometry are stunning and breathtaking,” says Gregor Dolinar, the IMO president. “It seems that AI will win the IMO gold medal much sooner than was thought even a few months ago.”
The IMO, aimed at secondary school students, is one of the most difficult maths competitions in the world. Answering questions correctly requires mathematical creativity that AI systems have long struggled with. GPT-4, for instance, which has shown remarkable reasoning ability in other domains, scores 0 per cent on IMO geometry questions, while even specialised AIs struggle to answer as well as average contestants.
This is partly down to the difficulty of the problems, but it is also because of a lack of training data. The competition has been run annually since 1959, and each edition consists of just six questions. Some of the most successful AI systems, however, require millions or billions of data points. Geometrical problems in particular, which make up one or two of the six questions and involve proving facts about angles or lines in complicated shapes, are particularly difficult to translate to a computer-friendly format.
Thang Luong at Google DeepMind and his colleagues have bypassed this problem by creating a tool that can generate hundreds of millions of machine-readable geometrical proofs. When they trained an AI called AlphaGeometry using this data and tested it on 30 IMO geometry questions, it answered 25 of them correctly, compared with an estimated score of 25.9 for an IMO gold medallist based on their scores in the contest.
“Our [current] AI systems are still struggling with the ability to do things like deep reasoning, where we need to plan ahead for many, many steps and also see the big picture, which is why mathematics is such an important benchmark and test set for us on our quest to artificial general intelligence,” Luong told a press conference.
AlphaGeometry consists of two parts, which Luong compares to different thinking systems in the brain: a fast, intuitive system and a slower, more analytical one. The first, intuitive part is a language model, similar to the technology behind ChatGPT, called GPT-f. It has been trained on the millions of generated proofs and suggests which theorems and arguments to try next for a problem. Once it suggests a next step, a slower but more careful “symbolic reasoning” engine uses logical and mathematical rules to fully construct the argument that GPT-f has suggested. The two systems then work in tandem, switching between one another until a problem has been solved.
While this method is remarkably successful at solving IMO geometry problems, the answers it constructs tend to be longer and less “beautiful” than human proofs, says Luong. However, it can also spot things that humans miss. For example, it discovered a better and more general solution to a question from the 2004 IMO than was listed in the official answers.
Solving IMO geometry problems in this way is impressive, says Yang-Hui He at the London Institute for Mathematical Sciences, but the system is inherently limited in the mathematics it can use because IMO problems should be solvable using theorems taught below undergraduate level. Expanding the amount of mathematical knowledge AlphaGeometry has access to might improve the system or even help it make new mathematical discoveries, he says.
It would also be interesting to see how AlphaGeometry copes with not knowing what it needs to prove, as mathematical insight can often come from exploring theorems with no set proof, says He. “If you don’t know what your endpoint is, can you find within the set of all [mathematical] paths whether there is a theorem that is actually interesting and new?”
Last year, algorithmic trading company XTX Markets announced a $10 million prize fund for AI maths models, with a $5 million grand prize for the first publicly shared AI model that can win an IMO gold medal, as well as smaller progress prizes for key milestones.
“Solving an IMO geometry problem is one of the planned progress prizes supported by the $10 million AIMO challenge fund,” says Alex Gerko at XTX Markets. “It’s exciting to see progress towards this goal, even before we have announced all the details of this progress prize, which would include making the model and data openly available, as well as solving an actual geometry problem during a live IMO contest.”
DeepMind declined to say whether it plans to enter AlphaGeometry in a live IMO contest or whether it is expanding the system to solve other IMO problems not based on geometry. However, DeepMind has previously entered public competitions for protein folding prediction to test its AlphaFold system.