In what year will AI achieve a score of 95% or higher on the LiveCodeBench leaderboard?
1
Ṁ25
2033

Invalid contract

Background

LiveCodeBench is a holistic and contamination-free benchmark that continuously harvests fresh coding problems from LeetCode, AtCoder and Codeforces contests, then replays them inside a deterministic harness to stop data-leak and measure real generalisation. Each evaluation window fixes a start & end date (the current one spans 454 problems released 1 Aug 2024 → 1 May 2025), and scores models by Pass@1—the share of tasks whose very first generated solution compiles and passes hidden tests. The benchmark also tags every problem easy / medium / hard and reports per-tier accuracy, revealing where models still stumble.

State of play (July 2025):

O4-Mini-High: 80.2% [Pass@1]

Why 95% matters

  • True zero-shot coding mastery. Pass@1 at 95% means the agent never needs retries, tool chains, or human edits—mirroring the reliability expected of senior engineers.

  • Contamination guard. Because tasks are time-filtered, perfect accuracy demonstrates genuine problem-solving, not memorisation of training-set snippets.

  • Broader skill coverage. LiveCodeBench evaluates code generation, self-repair and test-output prediction; a model that aces code generation Pass@1 is likely strong on the other tracks too, hinting at near-general software autonomy.

Resolution Criteria

The market resolves to the first calendar year in which ALL of the following conditions are satisfied:

  1. Leaderboard evidence – The public LiveCodeBench leaderboard lists a run with Pass@1 ≥ 95% on all problems in the active evaluation window (currently 454 tasks).

  2. Independent verification – The claim is confirmed by either

    • (a) a peer-reviewed or widely-cited paper (e.g. arXiv, NeurIPS, ICSE) that releases evaluation logs, or

    • (b) acceptance by the LiveCodeBench maintainers as an official leaderboard entry.

  3. Autonomy – After evaluation starts, no human may alter code; unlimited compute, retrieval or tool use is allowed only if invoked automatically by the agent.

  4. Expiry – If no qualifying run is verified by Jan 1, 2033, the market resolves “Not Applicable.”

Get
Ṁ1,000
and
S3.00
© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules