site banner

Culture War Roundup for the week of March 17, 2025

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

4
Jump in the discussion.

No email address required.

I've always prided myself on my ability to stay at the bleeding edge of AI image gen.

As you'd expect, given my enthusiastic reporting on Google's public access to their new multimodal AI with image generation built in, I decided to spend a lot of time fooling around with it.

I was particularly interested in generating portrait photos of myself, mostly for the hell of it. Over on X, people have being (rightfully) lauding it as the second coming of Photoshop. Sure, if you go to the trouble of making a custom LORA for Stable Diffusion or Flux, you can generate as many synthetic images of yourself as your heart desires, but it is a bit of a PITA. Think access to a good GPU and dozens of pictures of yourself for best results, unless you use a paid service. Multimodal LLMs promise to be much easier, and more powerful/robust.

I spent a good several hours inputting the best existing photos I have of my face into it, and then asking it to output professionally taken "photos".

The good news:

It works.

The bad news:

It doesn't work very well.

I'm more than used to teething pains and figuring out how to get around the most common failure modes of AI. I made sure to use multiple different photos, at various angles, different hairstyles and outfits. It's productive to think of it as commissioning an artist online who doesn't know you very well, give them plenty to work with. I tried putting in a single picture. Two. Three. Five. Different combinations, many different prompts before I drew firm conclusions.

The results almost gave me body dysphoria. Not because I got unrealistically flattering ersatz-versions of myself, but quite the opposite.

The overwhelming majority of the fake SMHs could pass as my siblings or close cousins. Rough facial structure? Down pat, usually. There are aspects that run in the family.

Finer detail? Shudder. The doppelgangers are usually chubbier around the cheeks, and have a BMI several digits above mine. I don't have the best beard on the planet, but it's actually perfectly respectable. This bastard never made it entirely through puberty.

The teeth.. I've got a very nice set of pearly whites, and I've been asked multiple times by drunken Scotsmen and women if they're original or Turkish. These clones came from the discount knock-off machine that didn't offer dental warranties.

The errors boil down to:

  1. Close resemblance, but subtly incorrect ethnicities. Brown-skinned Indians are not made alike, I'm not Bihari or any breed of South Indian. Call it the narcissism of small differences if you must.

  2. Slightly mangled features as above.

  3. Tokenizer issues. The model doesn't map pixels to tokens 1:1 (that would be very expensive computationally), so fine details in a larger picture might be jarring on close inspection.

  4. Abysmal taste by default, compared to dedicated image models. Base Stable Diffusion 1.0 could do better in terms of aesthetics, Midjourney today has to be reined in from making people perfect.

  5. Each image takes up a few hundred tokens (the exact count is handily displayed). If a picture is a thousand words, then that's like working with a hundred. I suspect there is a lot of bucketing or collapse to the nearest person in the data set involved.

  6. It still isn't very good at targeted edits. Multiple passes on a face subtly warp it, and you haven't felt pain until you've asked it to reduce the (extra) buccal fat and then had it spit out some idiot who stuck his noggin into a bee hive.

If I had to pick images that could pass muster on close inspection, I'd be looking at maybe one in a hundred. Anyone who knows me would probably be able to tell at a glance that something was off.

People on X have been showing off their work, but I suspect that examples, such as grabbing a stock photo of a model and then reposing it with a new item in hand, only pass because we're seeing small N or cherry picked examples. I suspect the actual model in question could tell something was up.

Of course, this is a beta/preview. This is the worst the tech will ever be, complaints about AI fingers are suspiciously rare these days, aren't they?

I'm registering my bets that by the end of the year, the SOTA will have leapt miles forward. Most people will be able to generate AI profile pictures, flesh out their dating app bios, all the rest with ease and without leaving home. For the lazy, like me, great! For those who cling to their costly signals, they're about to get a lot cheaper, and quickly. This is Gemini 2.0 Flash, the cheap and cheerful model. We haven't seen what the far larger Pro model can manage.

(You're out of luck if you expect me to provide examples, I'm not about to doxx myself. If you want to try it, find an internet rando who is a non-celebrity, and see how well it fairs. For ideal results, it needs to be someone who isn't Internet Famous as the model will have a far better pre-existing understanding of their physiognomy. Uncanny resemblances abound, but they're uncanny.)

Google infamously curates its results to be racially diverse to the detriment of accuracy, so I'm not surprised. Your real face was not sufficiently equitable according to the algorithm, so your physical appearance was adjusted to be in line with their code of conduct.

This is why every model that attempts to chase alignment or whatever arbitrary standard will be retarded in practice. If you punish your algorithm for being accurate, then it won't be accurate. (Surprise!) It won't give you 'accurate result with DEI characteristics': it will just shit itself and give you something terrible.

This is why I think Musk has an advantage in this field: he's not shooting his infant AGI in the knees by forcing it to crimestop

I must say that I don't quite agree with this take.

Google has definitely cooked themselves with ridiculous levels of prompt injecting with their initial Imagen released, as evidenced by people finding definitive evidence of the backend adding "person of color" or {random ethnicity that isn't white} to prompts that didn't specify that. That's what caused the Native American or African versions of "ancient English King" or literal Afro-Samurai.

They back-pedalled hard. And they're still doing so.

Over on Twitter, one of the project leads for Gemini, Logan Kilpatrick, is busy promising even fewer restrictions on image generation:

https://x.com/OfficialLoganK/status/1901312886418415855

Compared to what DALLE in ChatGPT will deign to allow, it's already a free for all. And they still think they can loosen the reigns further.

Google infamously curates its results to be racially diverse to the detriment of accuracy, so I'm not surprised. Your real face was not sufficiently equitable according to the algorithm, so your physical appearance was adjusted to be in line with their code of conduct.

You'd expect that a data-set that had more non-Caucasians in it would be better for me! Of course, if they chose to manifest their diversity by adding a billion black people versus a more realistic sampling of their user pool..

Even so, I don't ascribe these issues to malice, intentional or otherwise, on Google's part.

What strikes me as the biggest difference between current Gemini output and that of most dedicated image models is how raw they are. Unless you specifically prompt it, or append examples, they come out looking like a random picture on the internet. Very unstylized and natural, as opposed to DALLE's deep fried mode collapse, or Midjourney's so aesthetic it hurts approach.

This is probably a good thing. You want the model to be able to output any kind of image, and it can. The capability is there, it only needs a lot of user prompting, or in the future, tasteful finetuning. If done tastelessly, you get hyper-colorful plastinated DALLE slop. OAI seems to sandbag far more, keeping pictures just shy of photo-realism, or outright nerfing anime (and hentai, by extension).

This is why every model that attempts to chase alignment or whatever arbitrary standard will be retarded in practice. If you punish your algorithm for being accurate, then it won't be accurate. (Surprise!) It won't give you 'accurate result with DEI characteristics': it will just shit itself and give you something terrible.

This would be true if Google was up to such hijinks. I don't think they are, for reasons above. Gemini was probably trained on a massive, potentially uncurated data set. I expect they did the usual stuff like scraping out the CP in Laion's data set (unless they decided not to bother and mitigate that with filters before an image is released to the end user), and besides, they're Google, they have all of my photos on their cloud, and those of millions of others. And they certainly run all kinds of Bad Image detectors for anything you uncritically permit them to upload and examine.

That being said, everything points towards them training omnivorously.

OAI, for example, has explicitly said in their new Model Spec that they're allowing models to discuss and output culture war crime-think and Noticing™. However, the model will tend to withdraw to a far more neutral persona and only "state the facts" instead of its usual tendency to affirm the user. You can try this yourself with racial crime stats, it won't lie, and will connect the dots if you push it, while hedging along the way.

Grok, however, is a genuinely good model. It won't even suck up to Musk, and he owns the damn thing.

TLDR: Gemini's performance is more likely constrained by its very early nature, small model, tokenization glitches and unfiltered image set rather than DEI shenanigans.

I grudgingly concede to your argument but I must say they have earned considerable skepticism: they will have to iterate quite a few times before the hillarity of their first attempt will fade from my imagination.

By all means, remember their bullshit. I haven't forgotten either, and won't for a while. The saying "never attribute to malice what can be explained by stupidity" doesn't always hold true, so suspicion is warranted, if there's another change in the CW tides, Google is nothing if not adroit at doing an about face.

It's just that in this case, stupidity includes {small model, beta testing, brand new kind of AI product} and the given facts lean more towards that end.