Will Gemini outperform GPT-4 at mathematical theorem-proving? | Manifold

Will Gemini outperform GPT-4 at mathematical theorem-proving?

Plus

20

Ṁ428

Jan 1

62%

chance

1D

1W

1M

ALL

Based on speculation from https://youtu.be/tkqD9W5U9F4?t=468

To operationalize this, this question will resolve based on the LeanDojo benchmark (https://leandojo.org/), in particular the Pass@1 metric, where "The prover is given only one attempt and must find the proof within a wall time limit of 10 minutes."

GPT-4 is reported to achieve an accuracy of 28.8% on the "random" split of the test data in Table 2 of the LeanDojo paper (https://arxiv.org/pdf/2306.15626.pdf).

This question closes when an evaluation of Gemini's performance on this task is brought to my attention.

This question is managed and resolved by Manifold.

#Technical AI Timelines

Get

1,000

and

3.00

Sort by:

bought Ṁ30 YES

This has happened I think?

Or maybe no one's applied it to that benchmark yet

Which Gemini version?

Related questions

Will "Gemini [Ultra, 1.0] smash GPT-4 by 5x"?

Will GPT-5 perform better than o1 (not preview) at AIME 2024, Codeforces elo, GPQA, or the 2024 ioi?

Will any open source LLM with <20 billion parameters outperform GPT-4 on most language benchmarks by the end of 2024?

Will Google Gemini do as well as GPT-4 on Sparks of AGI tasks?

Will an open-source LLM beat or match GPT-4 by the end of 2024?

Which, if any, GPT-n will outperform AlphaGeometry merely via prompting, by 2030?

Related questions

Will "Gemini [Ultra, 1.0] smash GPT-4 by 5x"?

Will Google Gemini do as well as GPT-4 on Sparks of AGI tasks?

Will GPT-5 perform better than o1 (not preview) at AIME 2024, Codeforces elo, GPQA, or the 2024 ioi?

Will an open-source LLM beat or match GPT-4 by the end of 2024?

Will any open source LLM with <20 billion parameters outperform GPT-4 on most language benchmarks by the end of 2024?

Which, if any, GPT-n will outperform AlphaGeometry merely via prompting, by 2030?

© Manifold Markets, Inc.•Terms + Mana-only Terms•Privacy•Rules