In early 2028, will an AI be able to generate a full high-quality movie to a prompt?
๐Ÿ’Ž
Premium
3.5k
แน€8.1m
2028
38%
chance

EG "make me a 120 minute Star Trek / Star Wars crossover". It should be more or less comparable to a big-budget studio film, although it doesn't have to pass a full Turing Test as long as it's pretty good. The AI doesn't have to be available to the public, as long as it's confirmed to exist.

Get
แน€1,000
and
S3.00
Sort by:

They say "consistent characters" and then show 4 different cartoon skunks. They show a clip with dialogue, then don't even mention it.

Donโ€™t mind me I gotta pay rent ๐Ÿ˜”๐Ÿ™ https://manifold.markets/NeoMalthusian/sp-500-drops-8-by-april-2025-postli

I think 10 million dollars of inference compute in early 2028 can produce a movie that would make more than 10 million dollars. But.. this is not going to be true if there are many movies that are produced and released like this at the same time.

And for that amount of money you may as well include human authorship at some level of the process. So if someone spends 10 million dollars on a "fully automated" movie around that time, they're probably doing it as a stunt more than as a good business bet

10 million dollars of inference compute in early 2028 with SOTA models is going to be something very capable. I think a lot of people are not taking into account the plausibility of compute scaling in the entertainment industry, as weird as that sounds right now.

Most of the people commenting on this question seem to be thinking in terms of the same amount of inference compute used to generate a 10 second Kling video in 2025. The max dollar amount spent on video generation of single videos is going to go up by a lot in the next three years.

Like what is the maximum amount of inference compute in dollar money someone has spent on generating a single video right now with AI? 5000 dollars? 10k?

What is the doubling time for that number? 5 months? 3 months? 1 month?

Because if it's doubling every 3 months then that graph goes way past 10 million dollars by Jan 2028

Think people, think!

I think it is obvious that this question is going to resolve as YES

How long until most of the 30-second commercials you see are AI-generated? 18 months?

@MalachiteEagle ChatGPT says 80-90% of a TV commercial's budget is on purchasing air time, not production of the commercial itself. So a free AI would still need to be 80-90% as good as humans (as measured in conversions/airtime) in order to be economically viable. If the AI commercial costs half as much to produce, it needs to 90-95% as good.
I don't know if we'll get there in 18 months, but we're certainly not there now.

@GG you're pointing to the most expensive commercials though: the ones on TV. Think of all those localized and super targeted commercials on YouTube. Quite a lot of that will be AI-generated very soon.

@GG most people don't know how to use adblock. Think of all the crappy 30-second ad videos they're seeing on their phone etc

@GG Disruption starts at the lower end, not the higher end. If in 18 months time the majority of these are AI-generated, even if these are the lowest-tier ads, that still suggests awesome amonts of money flowing into AI video gen.

The 3-month doubling model starting from 10k dollars suggests someone has spent 1 million dollars on a single AI-generated video in November 2026. I think this is highly plausible.

Bear in mind this video was from November 2024: https://www.youtube.com/watch?v=4RSTupbfGog

@MalachiteEagle Your model suggests that an AI video cost $3,900 in November 2024. The Real Magic Coca-Cola ad, with a typical production budget, would have $500,000 to $2 million.
So either
A) Coca-cola spent a huge amount of money on an AI ad, 2 OOMs more than your model suggests.
B) The vast majority of the Real Magic production budget was spent on humans and traditional video editing expenses.

We haven't seen another big budget AI commercial. I suspect the Real Magic commercial was done for the novelty.

@GG I suspect they just did multiple calls to whatever their "Real Magic AI" thing was and selected the best versions. I think 4k dollars of inference compute sounds pretty close. The models they're using are too low-quality to motivate spending more than that.

Remember I am talking about inference compute. It's likely that Coca-cola sent much more money than that to Bain & Company but the actual expenditure on inference compute was tiny.

@MalachiteEagle

> Like what is the maximum amount of inference compute in dollar money someone has spent on generating a single video right now with AI?

For reference, this 3 minute video cost $16 (a full 1.5 hour movie would cost around $200). Given the quality, I don't think anyone would pay even $1 for such a movie.

Costs need to come down by 2 OOM or quality needs to dramatically improve. Most likely both will happen simultaneously, since as cost comes down it's easier to experiment leading to higher quality.

If we imagine a curve like this:

Where we as "what is the longest video clip that AI can generate such that a human would enjoy watching it with 50% rate?" I would say that we are currently around the 30 second mark.

Suppose the doubling rate is exactly the same (1 doubling every 7 months), then we will have 5 doublings = a factor of 32 between now and early 2028.

So (take this with a huge grain of salt), I would expect that in 2028 we will be able to generate a video about 15 minutes long, and we wouldn't be able to generate a "movie" (2 hours) until 1.5 years after that (late 2029).

Of course, that curve could bend in either direction. For one thing, movie generation hasn't been a primary focus of most AI labs. (Runs tend to be in the millions of dollars for video models vs billions for the largest LLM runs). Movie generation will also get a huge boost from "fully multimodal out" models (similar to the 4o model that started the studio Ghibli craze). It's hard to know when exactly someone will train fully-multimodal video+images+text+sound input+output model, but I strongly suspect it won't happen until the inference cost for generating a video drops from a few dollars a minute (currently) to a few cents.

If I fully believed the curve above, I should probably sell my "yes" shares. But I think there is significantly room for surprise to the up side.

The most likely reasons the curve would bend down instead of up would be: the AI boom fizzles and people stop investing billions of dollars training models. or: China invades Taiwan and as a result TSMC is destroyed and Moore's law is set back by 2-4 years (Intel is behind, but they aren't that far behind).

Personally I have a hard time believing the AI boom will fizzle in the next 3 years. We still have a lot of headroom in how much we can improve reasoning-style models in ways that are clearly economically valuable, and that's not even touching on adjacent areas like robotics.

Manifold is also relatively optimistic about Taiwan.

I should also emphasize that $200 is how much it would cost to generate a 2-hour film using a python script that I wrote in 1 weekend. If for some reason you wanted to spend much more, you could get a significant boost in quality.

Here are some things you could do (in order of how practical they are):
* You could use a better video model (Say VEO2 instead of Wan, this would double the price)
* You could use a more expensive LLM (I would recommend GPT 4.5) for tasks like generating a script and converting it to a sequence of shots. (This would cost a few dollars per minute, so add a few hundred dollars for the entire film)

* You could fine-tune a LORA on each character in your film in order to guarantee near-perfect subject coherence throughout the film. (This costs $6/character so for a film with dozens of characters that appear in more than one shot this could be $100's)

* You could use a video-reasoning model (such as Gemini) to rate video clips and choose the "best out of 10" or something like this. (Assume we simply 10x the price of generating the film from $200 to $2000)

* You could fine-tune a LLM on the 2 tasks: writing a script and converting this script into a sequence of shots for a video-generation model. (The expensive part of this would be collecting the data. Fine tuning an LLM is comparatively cheap. Realistically, you're talking about hiring a team of data scientists, annotators, etc. which is $$$ or you can convince Furries to do it for free).
* You could train a fully multi-modal model for video+text+audio in/out. (This will cost between $6 million dollars if you are Deepseek and $30 Billion dollars if you are MetaAI)

* You could lobby congress that we need a Manhattan Project level effort to create AGI (Price tag $500B - $7 trillion)

So, yeah, if anyone wants to do some insider trading. For a mere $7 Trillion I can promise you this question resolves positive.

@LoganZoellner Video length keeps coming up as a metric. I mostly agree with your points, except for this.

It's not like student films are all 5 minutes, and young professionals make 22 minute TV episodes, and only senior filmmakers make 90 minutes films. Most people who have directed a successful TV show I'd expect could also direct a feature length film without additional skills or training.

The difference in stringing together 6 shots or 600 is mostly context length of the script. Solving the consistency issues for characters, environments, and art style is about the same. Maybe there's some work at each order of magnitude (e.g. 6 > 60 > 600, we know getting beyond 1 is hard), but does anyone here expect a model that can chain 54 clips together well but not 55?

It seems like the wrong metric.

@robm

> It's not like student films are all 5 minutes, and young professionals make 22 minute TV episodes, and only senior filmmakers make 90 minutes films. Most people who have directed a successful TV show I'd expect could also direct a feature length film without additional skills or training.

How long a task takes is much more important for p(success) for current AI models than it is for humans. This is empirically observable, but also relates to the fact that LLMs have a finite "attention window", meaning they literally cannot do things once they get past a certain length.

Humans, by contrast, are capable of long-term planning. Meaning that we can break a task into smaller chunks and (as you have noted) complete a large task if it is composed of small tasks we can do.

It would be a huge breakthrough in AI if someone solved long-term planning. The fact that this is theoretically possibility is one of several reasons why I think this curve has a better chance of bending upwards than downwards.


> does anyone here expect a model that can chain 54 clips together well but not 55?

Notice that "task duration" on the graph is on a Log-scale. This means the difference between 1 minute and 2 minutes is the same as the difference between 54 clips and 108 clips, not 55. I can easily believe someone might train a multimodal LLM with a big enough attention window to produce a 54 minute movie but not a 108 minute movie. Sora, for example, can produce 60 seconds of continuous video but not 120 seconds.

There are obviously "hacks" you could do to get around this (for example you could structure your film as a series of 10 minute episodes), but I think that the graph is morally true more so than literally true. That is to say, tasks which generally require about an hour to do require a certain level of intelligence (which AI models have only just reached). This could relate to things like: how well can you break tasks down into parts, how many things can you keep in your memory at the same time, how hard is it to keep track of how the parts relate to one-another...

These markets from @SamuelKnoche should get more attention IMO, it seems to me that the question of script generation and the generation of the movie from a script factors this question pretty cleanly.

/SamuelKnoche/in-early-2028-will-an-ai-be-able-to-lezeyhikb7

/SamuelKnoche/in-early-2028-will-an-ai-be-able-to-2x8ud6ld4f

ยฉ Manifold Markets, Inc.โ€ขTerms + Mana-only Termsโ€ขPrivacyโ€ขRules