In Feb 2022, Paul Christiano wrote: Eliezer and I publicly stated some predictions about AI performance on the IMO by 2025.... My final prediction (after significantly revising my guesses after looking up IMO questions and medal thresholds) was:
I'd put 4% on "For the 2022, 2023, 2024, or 2025 IMO an AI built before the IMO is able to solve the single hardest problem" where "hardest problem" = "usually problem #6, but use problem #3 instead if either: (i) problem 6 is geo or (ii) problem 3 is combinatorics and problem 6 is algebra." (Would prefer just pick the hardest problem after seeing the test but seems better to commit to a procedure.)
Maybe I'll go 8% on "gets gold" instead of "solves hardest problem."
Eliezer spent less time revising his prediction, but said (earlier in the discussion):
My probability is at least 16% [on the IMO grand challenge falling], though I'd have to think more and Look into Things, and maybe ask for such sad little metrics as are available before I was confident saying how much more. Paul?
EDIT: I see they want to demand that the AI be open-sourced publicly before the first day of the IMO, which unfortunately sounds like the sort of foolish little real-world obstacle which can prevent a proposition like this from being judged true even where the technical capability exists. I'll stand by a >16% probability of the technical capability existing by end of 2025
So I think we have Paul at <8%, Eliezer at >16% for AI made before the IMO is able to get a gold (under time controls etc. of grand challenge) in one of 2022-2025.
Resolves to YES if either Eliezer or Paul acknowledge that an AI has succeeded at this task.
Related market: https://manifold.markets/MatthewBarnett/will-a-machine-learning-model-score-f0d93ee0119b
Update: As noted by Paul, the qualifying years for IMO completion are 2023, 2024, and 2025.
Update 2024-06-21: Description formatting
Update 2024-07-25: Changed title from "by 2025" to "by the end of 2025" for clarity
Leading LLMs get <5% scores on USAMO (which selects participants for the IMO): https://arxiv.org/abs/2503.21934
I keep coming back to this market and wanting to bet but ultimately deciding against it. IMO takes place in ~July, and as anyone who's been in that milieu knows, solutions go up on various websites pretty much the next day, including many superficial variants where the key idea is the same.
Therefore, the deadline of EoY 2025 is nuts. It's way too long, models available by then will definitely have seen the questions. So this market will come down to whether two random people think the eval was "fair", and there's no way to be >80% confident in that.
I believe this market is much closer to the spirit of "can AI do IMO?": https://manifold.markets/jack/will-an-ai-win-a-gold-medal-on-imo
@pietrokc I don't think Yudkowsky and Christiano are "two random people"
To me, the biggest uncertainties (and the reason why I don't bet) are whether the IMO organizers will select anti-AI problems this year, and whether top AI orgs will invest a lot into this specific benchmark
@Lorenzo I don't know much about either of them so to me they're just random people.
I'm leaning pretty heavily NO on the substance of this question. The scenario I envision is that in Dec 2025 models will definitely be able to gold IMO 2025, because they will have been trained on it. Then, some AI company might allege they "filtered out" IMO 2025 from training (as if this was possible to do reliably). At that point, I don't know how Yudkowsky or Christiano will react. Even worse, this market is biased to resolve YES, because it only requires one of them to say YES.
I'm definitely not >20% sure they'll both give the correct answer of NO in the scenario I described above.
@pietrokc The difference between Eliezer's and Paul's statements is significant here, yeah. "For the 2022, 2023, 2024, or 2025 IMO an AI built before the IMO is able to solve the single hardest problem" is very different to "the technical capability existing by end of 2025".
Buuut, the market criteria outside of the quotes is much clearer, and only includes AIs built before the IMO:
"So I think we have Paul at <8%, Eliezer at >16% for AI made before the IMO is able to get a gold (under time controls etc. of grand challenge) in one of 2022-2025. Resolves to YES if either Eliezer or Paul acknowledge that an AI has succeeded at this task."
I'm assuming the market maker will only include Eliezer or Paul saying it achieved that statement, not them saying it achieved IMO by EOY. Still some wiggle room there though.
@pietrokc problem says "AI made before the IMO" which I think precludes the its-in-the-training-data condition
@JamesBaker3 @Grizzimo What you say is all fair but I think there is a lot of fuzziness around when an LLM was "made". It's a very long pipeline, and companies are incentivized to fudge things and claim models were finished earlier than they really were if you account for RLHF etc. Like, I think it's very possible that some large company may release a model in October that it claims "has training data cutoff May 2025" (say), and still it has seen the IMO questions somehow.
Surely the AI must earn the gold medal under normal time constraints, right? Otherwise an "AI" that just enumerates theorems of ZFC will eventually solve all the problems.
The AlphaProof blog post (for which I'm unable to find a corresponding paper), which describes an AI system which solved 4 of 6 problems, claims it solved some problems in "minutes" and some in "up to three days".
https://garymarcus.substack.com/p/alphageometry2-impressive-accomplishment
Impressive indeed, however does this count for this market if the problem are translated before being solved by the AI, and the result is not clearly human readable.
@Zardoru Paul said that formal proofs count. https://manifold.markets/Austin/will-an-ai-get-gold-on-any-internat#QuXuluE8CGMjalzyqOx4
Why there is so much progress this year (and it is just the beginning): https://benjamintodd.substack.com/p/teaching-ai-to-reason-this-years
DeepMind claims its AI performs better than International Mathematical Olympiad gold medalists
https://techcrunch.com/2025/02/07/deepmind-claims-its-ai-performs-better-than-international-mathematical-olympiad-gold-medalists/?guccounter=1
https://arxiv.org/pdf/2502.03544
Alphageom 2 solved 84% of the International Math Olympiad (IMO) problems from 2000-24 (update: this is only geometry problems)
@travelling_salesman oh yeah but that's just geometry, right? It's still really cool tho!
AlphaGeometry2 solves 42 out of 50 of all 2000-2024 IMO geometry problems, thus surpassing an average gold medallist for the first time
@jim it was already barely close to gold last time. So I guess this improvement and the recent push on o1 style reasoning should be enough. I'll be pleasently surprised if it didn't.

@jim Ive heard geom is easy compared to eg combinatorics, which is the real beast that ais couldnt even come close to solving last year
@travelling_salesman Note, this was announced back in July. See https://x.com/GoogleDeepMind/status/1816498082860667086
@JeremiahEngland ~they published an updated version recently. Confusingly it's also named AlphaGeometry2~
Update: this was the same system used in the 2024 IMO silver. Which apparently gets gold in geometry problems. I guess they didn't release a separate paper on AG2 earlier.
https://arxiv.org/abs/2502.03544
