By 2028 will we be able to identify distinct submodules/algorithms within LLMs?
By 2028 will we be able to identify distinct submodules/algorithms within LLMs?
➕
Plus
21
Ṁ1620
2028
76%
chance

Roughly: will we be able to examine an LLM and extract some identifiable sub-module accomplishing an understandable task (e.g. "addition" or "inference on some decision tree" or "quicksort"). For instance it could be some set of neurons from layers L_1, ..., L_k that when run on its own executes the specified algorithm.

It must also be demonstrated that the LLM actually uses the submodule in some interpretable way. e.g. if the module implements quicksort, a demonstration might be that modifying the module to implement reversed quicksort causes the LLM to produce reverse sorted data when asked for sorted data.

The work must be done for an LLM at least as capable as OPT-3 66B.

The work must identify at least 10 submodules, or identify at least one while proving that no others exist.

If it turns out that the question is ill-posed in a way that can't be fixed with some minor tweaks, I'll resolve N/A.

Up until 2026 I may refine the criteria here, either in response to feedback from predictors or future research giving me a better way to ask the question.

Get
Ṁ1,000
and
S3.00


Sort by:
1y

How well-defined do the sub modules need to be?

I am sure that it's possible to find subnetworks that are activated more for certain types of tasks, but I don't expect these to be cleanly demarcated. I expect there to be a lot of nodes and edges that partially contribute, where if you exclude all of these partial-contributions, the network can't do the task, but if you include all of them, you're including most of the network.

1y

@jonsimon Most likely that would resolve NO, but it would depend on exactly how tangled up everything is.

1y

@vluzko If it doesn't work in the neuron basis, but does work in a learned basis discovered by e.g. using sparse autoencoders, does that count?

Also, how complex does the "module" have to be? Would a "module" that does date math (e.g. "today is January 19th, in exactly 2 weeks it will be" count?

What is this?

What is Manifold?
Manifold is the world's largest social prediction market.
Get accurate real-time odds on politics, tech, sports, and more.
Win cash prizes for your predictions on our sweepstakes markets! Always free to play. No purchase necessary.
Are our predictions accurate?
Yes! Manifold is very well calibrated, with forecasts on average within 4 percentage points of the true probability. Our probabilities are created by users buying and selling shares of a market.
In the 2022 US midterm elections, we outperformed all other prediction market platforms and were in line with FiveThirtyEight’s performance. Many people who don't like trading still use Manifold to get reliable news.
How do I win cash prizes?
Manifold offers two market types: play money and sweepstakes.
All questions include a play money market which uses mana Ṁ and can't be cashed out.
Selected markets will have a sweepstakes toggle. These require sweepcash S to participate and winners can withdraw sweepcash as a cash prize. You can filter for sweepstakes markets on the browse page.
Redeem your sweepcash won from markets at
S1.00
→ $1.00
, minus a 5% fee.
Learn more.
© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules