How well will OpenAI's o1 (not o1-preview) do on the ARC prize when it's released if tested? | Manifold

How well will OpenAI's o1 (not o1-preview) do on the ARC prize when it's released if tested?

Premium

8

Ṁ1989

Jan 1

33.76

expected

1D

1W

1M

ALL

The creators of the ARC prize already tested OpenAI's new o1-preview and o1-mini models on the prize. The non-preview version of o1 performed substantially better (see below) on OpenAI's math benchmarks and will seemingly be released before EOY. Assuming it's tested on the ARC prize, how well will the full version of o1 perform?

Note 1: I usually don't participate in my own markets, but in this case I am participating since the resolution criteria are especially clear.

Note 2: The ideal case is if the ARC prize tests o1 in the same conditions. If they don't, I'll try to make a fair call on whether unofficial testing matches the conditions closely enough to count. If there's uncertainty, I'll err on the side of resolving N/A.

This question is managed and resolved by Manifold.

#️ Technology

#Technical AI Timelines

Get

1,000

and

3.00

Sort by:

The preview got like 25, IIRC?

@MartinVlach 21% actually, 12% more than 4o

Which set? Public eval or semi private?

@Usaar33

> OpenAI o1-preview and o1-mini both outperform GPT-4o on the ARC-AGI public evaluation dataset. o1-preview is about on par with Anthropic's Claude 3.5 Sonnet in terms of accuracy but takes about 10X longer to achieve similar results to Sonnet.

Public, same as the evaluation I linked.

Related questions

On which day will OpenAI’s AI o1 be available to the public? (exact day)

Will Anthropic, Google, xAI or Meta release a model that thinks before it responds like o1 from OpenAI by EOY 2024?

+6% 1d33% chance

Will OpenAI o1 (or any direct iteration) get gold on any International Math Olympiad by the end of 2025?

Will openAI have the most accurate LLM across most benchmarks by EOY 2024?

Will OpenAI make o1 pro mode available on the API before 2026?

Will OpenAI release o2 as part of the 12 days of Christmas?

Will anyone be able to get OpenAI’s new model o1 to leak its system message by EOY 2024?

Which of these companies will release a model that thinks before it responds like O1 from OpenAI by EOY 2024?

Will Anthropic release a model that thinks before it responds like o1 from OpenAI by EOY 2024?

Will OpenAI release o2 before 2026?

Related questions

On which day will OpenAI’s AI o1 be available to the public? (exact day)

Will OpenAI release o2 as part of the 12 days of Christmas?

Will Anthropic, Google, xAI or Meta release a model that thinks before it responds like o1 from OpenAI by EOY 2024?

Will anyone be able to get OpenAI’s new model o1 to leak its system message by EOY 2024?

Will OpenAI o1 (or any direct iteration) get gold on any International Math Olympiad by the end of 2025?

Which of these companies will release a model that thinks before it responds like O1 from OpenAI by EOY 2024?

Will openAI have the most accurate LLM across most benchmarks by EOY 2024?

Will Anthropic release a model that thinks before it responds like o1 from OpenAI by EOY 2024?

Will OpenAI make o1 pro mode available on the API before 2026?

Will OpenAI release o2 before 2026?

© Manifold Markets, Inc.•Terms + Mana-only Terms•Privacy•Rules