In what year will AI achieve a score of 95% or higher on the LiveCodeBench leaderboard?

Ṁ25

2033

Invalid contract

Background

LiveCodeBench is a holistic and contamination-free benchmark that continuously harvests fresh coding problems from LeetCode, AtCoder and Codeforces contests, then replays them inside a deterministic harness to stop data-leak and measure real generalisation. Each evaluation window fixes a start & end date (the current one spans 454 problems released 1 Aug 2024 → 1 May 2025), and scores models by Pass@1—the share of tasks whose very first generated solution compiles and passes hidden tests. The benchmark also tags every problem easy / medium / hard and reports per-tier accuracy, revealing where models still stumble.

State of play (July 2025):

O4-Mini-High: 80.2% [Pass@1]

Why 95% matters

True zero-shot coding mastery. Pass@1 at 95% means the agent never needs retries, tool chains, or human edits—mirroring the reliability expected of senior engineers.
Contamination guard. Because tasks are time-filtered, perfect accuracy demonstrates genuine problem-solving, not memorisation of training-set snippets.
Broader skill coverage. LiveCodeBench evaluates code generation, self-repair and test-output prediction; a model that aces code generation Pass@1 is likely strong on the other tracks too, hinting at near-general software autonomy.

Resolution Criteria

The market resolves to the first calendar year in which ALL of the following conditions are satisfied:

Leaderboard evidence – The public LiveCodeBench leaderboard lists a run with Pass@1 ≥ 95% on all problems in the active evaluation window (currently 454 tasks).
Independent verification – The claim is confirmed by either
- (a) a peer-reviewed or widely-cited paper (e.g. arXiv, NeurIPS, ICSE) that releases evaluation logs, or
- (b) acceptance by the LiveCodeBench maintainers as an official leaderboard entry.
Autonomy – After evaluation starts, no human may alter code; unlimited compute, retrieval or tool use is allowed only if invoked automatically by the agent.
Expiry – If no qualifying run is verified by Jan 1, 2033, the market resolves “Not Applicable.”

This question is managed and resolved by Manifold.

#️ Technology

#AI

#Technical AI Timelines

#OpenAI

#AI Impacts

Get

1,000

and

3.00

Comments

1 Holder

3 Trades

Invalid contract

Background

Resolution Criteria

Related questions

Related questions