site banner

Friday Fun Thread for December 8, 2023

Be advised: this thread is not for serious in-depth discussion of weighty topics (we have a link for that), this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.

2
Jump in the discussion.

No email address required.

AI-generated images are here - and they're awful.

Dall-E 3 seems to have crossed the line into "good-enough" territory that many smaller websites are now using AI-generated images. It seems, unfortunately, that Dall-E 3 has a style. And that style is awful. The scenes are always far too busy, everything is cartoony, and the colors are oversaturated.

It's the new Corporate Memphis.

It also makes me feel super racist that I have to constantly tell it to draw white people since it has an even looser grip on historical accuracy then Bridgerton.

Still way better than corporate memphis, which signifies defiance to reality as opposed to being cheap. Deliberately inhuman proportions as opposed to accidentally missing fingers.

AI has much more variety anyway, you can just dump 3 or 4 artists names in and get a very different result, likewise with different models.

Try running Stable Diffusion. It's open source and you can find a specifically-trained model for whatever you want. If you can't run it locally, you can find a million services online that can do it for you. You don't need to deal with content restrictions or it adding diversity to all of your prompts. I use a site called NovelAI, which runs its own version of SD trained off of an anime booru. They recently put out a new version of the image gen and it's really good. The progress made since just the last year is astounding.

We could have easily been in the timeline where the big three image diffusion models (Dall E, Stable Diffusion, and Midjourney) all kept it locked down, but since one of them broke ranks and made it open source, it's opened up a world of potential for everyone to use image generation how they want.

Yeah, they're kind of shit and samey. You start to recognise everything about them, the way they're composed, how humans and animals look, the lighting, the contrast, which facial expressions they use etc.

That said, Corporate Memphis and generic stock photos were at least as bad or worse, and this tech seems to be rapidly improving. Maybe in a few years we'll actually have something spectacular and truly democratise visual art generation. We're almost there.

Maybe in a few years we'll actually have something spectacular and truly democratise visual art generation. We're almost there.

I think so. Better speed will handle much of it. Perhaps the image gen will make 1000 images and then an adversarial AI will choose the best 10. Then the human can pick one and use text prompting to refine it easily.

"Move the teddy bear a little to the left"

"Change the font size to be a little bigger"

"Can you make it a little more punk rock".

Etc...

I think we're about 2-3 years out.

So you're using them enough that you have to "constantly" correct historical ethnic choices, but also calling them "awful?" If an artist was awful, I would stop asking them for new art and go to someone else. Is your job forcing you to use Dall-e 3 specifically or something? What kind of image/material are you using it for?

I don't have a subscription to GPT 4, so am unable to test this, but the previous iteration allowed users to mention styles that they want it to emulate, and there isn't necessarily an advantage to just leaving it at its default style. If it's oversaturated, you can probably request a limited palette? I tried asking Dall e 2 for a painting with a zorn palette, and it used too much blue (zorn replaces blue with black as a primary), but maybe GPT could help interpret that kind of thing (or I could try spelling out what I mean more clearly?).

I had heard that people have been making add-ons for Stable Diffusion that point it toward specific styles, so that might be worth looking into as well.

Can't beat the cost and convenience for a good-enough image!

I'm sure we'll get to a spot in 2 or 3 years where this gets a lot better. I do have Stable Diffusion but it's slow and hard to wrestle with. I do use Dall-E 3 for work but it's not a large part of what I do. Let's say I generate 2-3 images a day.

My whine here is specifically about the stylistical awfulness of Dall-E 3 images which I now see cropping up everywhere. Prompt-hacking doesn't work. I try stuff like this: "Simple, not complex, no extra characters, restrained, not saturated", but it doesn't seem to really give me what I want.

which I now see cropping up everywhere

I haven't noticed it -- do you have an example?

"Simple, not complex, no extra characters, restrained, not saturated"

Maybe it has trouble with negatives? I wonder if it would respond to directions about specific color palettes (yellow ochre, Paynes grey, cadmium red?), where to place the focal point, or name dropping Rembrandt?

I haven't noticed it -- do you have an example?

Sure. Here are a some examples from a blog that was posted to the slatestarcodex subreddit.

1, 2, 3

Once you recognize the "style" you see it everywhere. The main thing is that they are just way too busy.

Huh. I could see how you wouldn't prefer that style, but also feel like the main problem is not so much that the image generator did a bad job, as that the concepts simply aren't great, and hardly anyone would do much better. And those who could do better are engaged in more upscale projects to begin with.

There seems to be a rationalist market, and a market by definition has a lot of booths at it, so it drew a lot of booths, and put in a vanishing point that really emphasizes how large the market is. Makes sense, given the concept, I'm unsure what an excellent graphic designer would do with it. Doesn't look oversaturated? Markets are known for having a lot of bright colors to entice customers, but maybe the sun shouldn't be that low? Is sunset part of the concept, like the sun is setting on the free exchange of ideas or something? It's clearly still not great at making signs with words on them, but is visibly improving from last year.

A guy with a lot of books and papers. I assume the room cluttered with papers and the clock are part of the concept, and that they asked for pen and ink? Clearly not oversaturated. If the clock isn't important, it doesn't belong there. If the prompt didn't include "an office cluttered with papers," then that's weird.

Comic. Weird feet and flags in the last frame. It looks like it becoming increasingly chaotic and cluttered is, again, part of the concept? If not, that's an odd progression. It looks like print comics were included as a style reference, so the coloring is to be expected. There are some distracting splashes of red in the background, especially on the second panel, I doubt a human would do that, or the implication in the second panel that now there's another floor desk under the man's desk. The first panel has a visible dot gradient, like a metal plate where the gradient was burned in with resin and acid -- or more like a cross between that and a fine hatch. It's kind of funny that it's trying to emulate plate printed comics in that one instance, but otherwise looks more like a vector graphic, but, eh, I guess I don't expect it to have a model of what physical processes cause what effects. The hands and facial expressions are pretty good. But, also, the concept itself looks even more cliche than the art.

I invite you to show me anything that makes all these images I've generated samey.

You're prompting it wrong.

Common link is they are all have far too many unnecessary elements that detract from the image. I will grant that only image #2 looks like a 100% match for the Dall-E 3 archetype.

What do you mean by "unnecessary things"?

They're precisely what I asked for, within the limits of my prompting and the model. Without knowing the prompts, I have no idea what you think they're missing.

At the very least the last one is a minimal brutalist logo for a PMC, I can hardly imagine what could be less so.

More comments

Isn't it amazing that DALL-E 3 has prompts? Those little text input boxes where you can specify the styles and content, be it in the style of video game concept art, minimalism, expressionism, and just about anything you can think of?

PEBKAC right here. I'm not going to defend their approach to diversity being prompt injection of random diverse ethnicities and genders.

At any rate, they've been here for more than a year, welcome to 2023, just about in time to meet 2024.

Was the sarcasm really necessary?

Not strictly, but I am still immensely frustrated by claims that AIs, be they LLMs or image generators, have intransigent styles that can't be modified by something as simple as telling it to.

Sometimes the problem does exist between the keyboard and chair, and here it certainly does.

At any rate, if a new piece of technology comes out and doesn't do exactly what I want, my first instinct is to look for a solution, and it clearly hasn't been done to any reasonable degree, because otherwise there wouldn't remain a problem in need of solving.

Not strictly, but I am still immensely frustrated by claims that AIs, be they LLMs or image generators, have intransigent styles that can't be modified by something as simple as telling it to.

You're not wrong, but really, throwing "PEBKAC" at someone is not helpful and is unnecessarily antagonistic.

Have you been living in a cave? These generative AI sucks posts are soo tiring.

Ai generated images have been "here" for a year and more, since DALLE2 and then stable diffusion.

There are also many open source stable diffusion checkpoints out there that produce photorealistic images.

Also they might "suck" now, but just extrapolate the trend and honestly ask yourself for how much longer it will be that way.

I will also be underwater if the tide keeps coming in...

And we will all be Chinese Indian Nigerian by 2050. What's your point?

"just extrapolate the trend" is not the slightest bit convincing.