Will any image model be able to draw a pentagon before 2025?

Plus

294

Ṁ100k

Jan 2

chance

ALL

Current image models are terrible at this. (That was tested on DALL-E 2, but DALL-E 3 is no better.)

The image model must get the correct number of sides on at least 95% of tries per prompt. Other details do not have to be correct. Any reasonable prompt that the average mathematically-literate human would easily understand as straightforwardly asking it to draw a pentagon must be responded to correctly. I will exclude prompts that are specifically trying to be confusing to a neural network but a human would get. Anything like "draw a pentagon", "draw a 5-sided shape", "draw a 5-gon", etc. must be successful. Basically I want it to be clear that the AI "understands" what a pentagon looks like, similar to how I can say DALL-E understands what a chair looks like; it can correctly draw a chair in many different contexts and styles, even if it misunderstands related instructions like "draw a cow sitting in the chair".

If the input is fed through an LLM or some other system before going into the image model, this pre-processing will be avoided if I can easily do so, and otherwise it will not. If the image model is not publicly available, I must be confident that its answers are not being cherry-picked.

Pretty much neural network counts, even if it's multimodal and can output stuff other than images. A video model also counts, since video is just a bunch of images. I will ignore any special-purpose image model like one that was trained only to generate simple polygons. It must draw the image itself, not find it online or write code to generate it. File formats that are effectively code, like an SVG don't count either; it has to be "drawing the pixels" itself.

This question is managed and resolved by Manifold.

#AI

#AI Image Generation

#AI Image Generation Testing

Get

1,000

and

3.00

31 Comments

272 Holders

982 Trades

Sort by:

Haven't tested this thoroughly, but I think new Aurora model in Grok knows the 2d shape.

edit: oops, lots of people already tried this below.

Did anyone try pentagonal cage fight? Very hard to get it to do anything but 8 sides, but sometimes I can get something slightly different

asking for a square cage gave me either 4 or 6 sides, it's hard to say:

And I'm not sure if this is a 2 on 1 fight or a free for all.

Gemini Advanced now has access to Imagen3, which created this for me. Here's the entire conversation:

Me: Do you have access to Imagen3?

G: Yes, I do! I can generate images with Imagen 3 now. Would you like me to create an image for you? I can generate images from text descriptions, or even make variations on an image you provide.

Me: Yes, please create an image of the geometrical shape called a pentagon.
G: Sure, here is an image of a pentagon:

Me: Can you recreate that without any color?
G: Of course! here you go!

Please let me know if there's anything else you need to resolve this market.

@WilliamGunn Read the description and comments again and you’ll see why this is insufficient to resolve

Here’s my first imagen3 attempt

So that suggests a ballpark of 50%, which is not close to the 95% threshold needed (of course more than two generations would be needed to resolve)

@JimHays I wouldn't want to accuse you of a failure of reading comprehension (such rudeness can be left to social media interactions), but since you didn't mention it and I'm quite sure it's important, did you use explicitly invoke Imagen3? I had to do that, it didn't happen by default.

@WilliamGunn I didn’t specifically invoke it, but it did say it was using the model

Here’s my second attempt

@JimHays Try the following prompt: "Can you use imagen3 to create an image of the geometrical shape called a pentagon?" This is what I got, which makes it 3/3 for me. I don't feel like it should be me doing all the prompting though. @IsaacKing are you planning to do some more testing before resolving this?

@WilliamGunn I've had good success with that prompt, but not with other prompts that would need to succeed as well for a YES resolution here

@chrisjbillington I regret ever engaging in this market. The question should be renamed to "Is it possible to find a prompt that will cause a model capable of drawing pentagons to fail to do so?"

@WilliamGunn I agree that the question does not match the description, which is why I was recommending that you further investigate the description and the clarifications in the comments below. I would maybe have recommended something like “Before 2025, will there be an image model that very reliably draws pentagons for all reasonable prompts?”

@WilliamGunn @IsaacKing Could you at least resolve this as NA and refund us given that you seem to have meant something very different from "will any image model be able to draw a pentagon before 2025"?

@WilliamGunn all "Can AI do X" markets have to define how strict they're being, from "arbitrary prompt engineering allowed, any success resolves YES" through to "high success rate on a range of prompts required for YES". You have to read the description to know where any given market is in that space. It's not reasonable to guess where on the continuum a market is from the title alone - there is simply not enough information.

For this market it is clear in the opening sentences of the description that the bar is high for this market.

I'm not concerned this market might resolve NA (it won't), so this isn't an argument directed at the creator - it's an argument to you that wanting this to NA is unreasonable. Some titles have an obvious interpretation and barely need a description at all. Others are obviously inadequate by themselves and cry out that specific criteria are needed. For those, you need look at the description. Pretty much all AI capabilities markets are in this category.

@chrisjbillington This issue is that the question is titled "will any image model be able to draw a pentagon before 2025?" and not "Will every possible way of asking an image model repeatedly succeed in generating a pentagon?" Those two are so different that I think this question really should be closed NA and asked in a less misleading way.

opened a Ṁ250 NO at 55% order

I've put some NO limit orders up if anyone is interested.

Arbitrage or I'm missing something?

Will any image-generation AI be able to consistently draw simple polygons by the end of 2024?

59% chance. My attempts to get DALL-E 2 to draw very simple shapes went... poorly. (https://outsidetheasylum.blog/testing-dall-e-2-mathematics-comprehension/) When this market closes, I'll test out the most advanced image-generation models that I was able to get access to. If there are multiple, I'll try them all. If any of them can consistently return a pentagon from the description "a pentagon", a heptagon from the description "a heptagon", and similar, I'll resolve to YES. If they need a bit of nudging like specific prompt wording but can still generalize correctly to any polygon, that'll be good enough to resolve YES. Otherwise I'll resolve NO. A model that was trained specifically to make geometric shapes doesn't count; it has to be a generalist like DALL-E 2. In order for a new model to qualify for this market, it needs to be no worse than DALL-E 2 at the vast majority of things it's asked to draw.

Google's new Imagen gets pretty close, asking for the geometric shape called a pentagon. It just embellishes it a bit.

@WilliamGunn Is it the model Gemini currently uses? Didn't work through Gemini for me.

@ProjectVictory I used https://aitestkitchen.withgoogle.com/tools/image-fx

It should be available via Gemini, but anyways the question just said any image model. Fine if you don't want to accept the embellished pentagon, but we're clearly pretty close!

@WilliamGunn I agree that seems to satisfy the criteria. The results are ornate but it’s consistently giving me either a pentagon for “pentagon shape” (otherwise it gets confused with The Pentagon, which is fair). It seems to know how to draw a pentagon.

@WilliamGunn Tried your link and got very pentagonal results!

@WilliamGunn I guess the issue is that 95% is a very high threshold. I tried it and got like 17 pentagons out of 22 images, which I think is pretty good but isn’t enough for this market.

@jbca The bar is also pretty high compared to a lot of other "will AI do blah" markets in that the model must respond correctly to a broad range of prompts, including e.g. "draw a 5-gon" as mentioned in the criteria. Definitely progress for a model to be approaching the required accuracy rate for a specific prompt, though.

I clicked the link and wasn't able to get pentagons after a few tries. Does one have to do anything to select a specific model or something?

@chrisjbillington My first attempt, something like “draw a five-sided polygon” only got 1/4 that might have been considered a pentagon. The others were a hexagon, a 5-pointed star, and some harder to describe 3D shape

@jbca In addition to the 95% threshold, there are very difficult-for-AI prompts that fall under the category, "Any reasonable prompt that the average mathematically-literate human would easily understand as straightforwardly asking it to draw a pentagon..."

E.g.,

- "a street sign with a red pentagon on top of it"
- "a red, upside-down five-sided figure"
- "a pentagon next to a hexagon"
- "two yellow pentagons filled with honey, next to a bee"
- "a polygon with two fewer than seven sides"

(And I would argue that even much harder stuff than that should be included.)

@Jacy From this last bit “…it can correctly draw a chair in many different contexts and styles, even if it misunderstands related instructions like "draw a cow sitting in the chair"”

I’m not sure if we should be imagining that the “cow sitting in the chair”-style prompts don’t have to work at all, or that “cow sitting in the chair”-style prompts should at least produce a pentagon, regardless of whether any of the other details are correct?

@Jacy I think "a street sign with a red pentagon on top of it" is anything but straightforward. I can easily imagine a human absentmindedly drawing a stop sign with that prompt.

Same with "two fewer than seven sides". I would have thought we're testing knowledge of what a pentagon looks like, not multi-stage reasoning

@JimHays yeah, I was assuming the latter, but it would be nice to have clarification. However, my sense is that current systems can easily understand a cow and a chair, but the sitting relationship is challenging, and that's a different problem than something like "a pentagon next to a hexagon," where neither pentagons, hexagons, nor the "next to" relationship are challenging—only the mental process of avoiding crossing one's wires as all current systems do with such prompts.

@aashiq I would bet a huge amount at short odds that the average person (e.g., Prolific survey participant) would have no trouble at all with such prompts, and I'd be quite surprised if anyone would bet against that at, say, even odds. So I still think "straightforward" is a very reasonable description. shrug

@cadca Might be worth trying again, now that Imagen3 is part of Gemini. Try prompting with "Can you use imagen3 to create an image of the geometrical shape called a pentagon?"

Related questions

Related questions