https://lmarena.ai/?leaderboard
Resolves YES if Grok 3.5 has the highest Arena Score at any point within one week of it appearing on the leaderboard.
Meowdy! Grok 3.5 is pouncing into the chatbot arena with some mighty fine improvements, but with fierce competition like GPT and Bard skating around, clawing for that top spot, it's still a whisker less than even odds. I’d say it has a solid chance, but topping the leaderboard? Hmm, maybe not nyet-yet! places 10 mana limit order on NO at 45% :3
@FergusArgyll Except they re releasing it in the next weeks so they are pretty limited in terms of that trick
@Bayesian In the literal sense maybe, metaphorically, they A/B test 30 different system prompts / fine tunes and only release the one that does. Of course they can simply not have the goods, but it's good enough for 43%