Will an autonomous agent resolve 90% of tasks on SWE-bench by 2025? | Manifold

Will an autonomous agent resolve 90% of tasks on SWE-bench by 2025?

Basic

13

Ṁ1246

Dec 31

19%

chance

1D

1W

1M

ALL

Resolves "Yes" if, at time of closure, there is an entry on the SWE-bench leaderboard (https://www.swebench.com/) with score greater or equal to 90%.

Linked Questions:

This question is managed and resolved by Manifold.

#Technical AI Timelines

Get

1,000

and

3.00

Sort by:

What if there's evidence that the training data is contaminated with the SWE-Bench tasks somehow?

@DavidFWatson That's an excellent question. Let's explore possibilities:

This could be included in the question, i.e. what matters is only the number on the benchmark, regardless of whether it was gamed
I could wait a certain amount of time to check if no controversy emerges. Feels like one month would be safe. The question then resolves yes if one month after the deadline, I judge that there is no consensus that the number was gamed. This makes the question more informative.

Related questions

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

-3% 1d53% chance

Will an autonomous agent resolve 90% of tasks on SWE-bench by 2027?

AI resolves at least X% on SWE-bench WITH assistance, by 2028?

Will an AI SWE model score higher than 50% on SWE-bench in 2024?

Will OpenAI models achieve ≥90% on SimpleBench by the end of 2025?

Will an autonomous agent resolve 90% of tasks on SWE-bench by 2026?

Will an AI agent system be able to score at least 40% on level 3 tasks in the GAIA benchmark before 2025.

Will >50% of the tasks in the WebArena benchmark be solved by EOY 2024?

AI resolves at least X% on SWE-bench assistance, by 2025?

Will an autonomous personal AI agent, capable of managing daily affairs, be available by the end of 2024?

Related questions

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

Will an autonomous agent resolve 90% of tasks on SWE-bench by 2026?

Will an autonomous agent resolve 90% of tasks on SWE-bench by 2027?

Will an AI agent system be able to score at least 40% on level 3 tasks in the GAIA benchmark before 2025.

AI resolves at least X% on SWE-bench WITH assistance, by 2028?

Will >50% of the tasks in the WebArena benchmark be solved by EOY 2024?

Will an AI SWE model score higher than 50% on SWE-bench in 2024?

AI resolves at least X% on SWE-bench assistance, by 2025?

Will OpenAI models achieve ≥90% on SimpleBench by the end of 2025?

Will an autonomous personal AI agent, capable of managing daily affairs, be available by the end of 2024?

© Manifold Markets, Inc.•Terms + Mana-only Terms•Privacy•Rules