@SnapDragon comments on "Culture War Roundup for the week of March 10, 2025

Culture War Roundup for the week of March 10, 2025

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

Shaming.
Attempting to 'build consensus' or enforce ideological conformity.
Making sweeping generalizations to vilify a group you dislike.
Recruiting for a cause.
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
Don't imply that someone said something they did not say, even if you think it follows from what they said.
Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

Jump in the discussion.

No email address required.

self_made_human amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi 7mo ago · Edited 7mo ago

Moderately interesting news in AI image gen:

It's been a good while since we've had AI chat assistants able to generate images on user request. Unfortunately, for about as long, we've had people being peeved at the disconnect between what they asked for, and what they actually got. Particularly annoying was the tendency for the assistants to often claim to have generated what you desired, or that they edited an image to change it, without actually doing that.

This was an unfortunate consequence of the LLM, being the assistant persona you speak to, and the actual image generator that spits out images from prompts, actually being two entirely separate entities. The LLM doesn't have any more control over the image model than you do when running something like Midjourney or Stable Diffusion. It's sending a prompt through a function call, getting an image in response, and then trying to modify prompts to meet user needs. Depending on how lazy the devs are, it might not even be 'looking' at the final output at all.

The image models, on the other hand, are a fundamentally different architecture, usually being diffusion-based (Google a better explanation, but the gist of it is that they hallucinate iteratively from a sample of random noise till it resembles the desired image) whereas LLMs use the Transformer architecture. The image models do have some understanding of semantics, but they're far stupider than LLMs when it comes to understanding finer meaning in prompts.

This has now changed.

Almost half a year back, OpenAI teased the ability of their then unreleased GPT-4o to generate images natively. It was the LLM (more of a misnomer now than ever) actually making the image, in the same manner it could output text or audio.

The LLM doesn’t just “talk” to the image generator - it is the image generator, processing everything as tokens, much like it handles text or audio.

Unfortunately, we had nothing but radio silence since then, barring a few leaks of front-end code suggesting OAI would finally switch from DALLE-3 for image generation to using GPT-4o, as well as Altman's assurances that they hadn't canned the project on the grounds of safety.

Unfortunately for him, Google has beaten them to the punch . Gemini 2.0 Flash Experimental (don't ask) has now been blessed with the ability to directly generate images. I'm not sure if this has rolled out to the consumer Gemini app, but it's readily accessible on their developer preview.

First impressions: It's good.

You can generate an image, and then ask it to edit a feature. It will then edit the original image and present the version modified to your taste, unlike all other competitors, who would basically just re-prompt and hope for better luck on the second roll.

Image generation just got way better, at least in the realm of semantic understanding. Most of the usual give-aways of AI generated imagery, such as butchered text, are largely solved. It isn't perfect, but you're looking at a failure rate of 5-10% as opposed to >80% when using DALLE or Flux. It doesn't beat Midjourney on aesthetics, but we'll get there.

You can imagine the scope for chicanery, especially if you're looking to generate images with large amounts of verbiage or numbers involved. I'd expect the usual censoring in consumer applications, especially since the LLM has finer control over things. But it certainly massively expands the mundane utility of image generation, and is something I've been looking forward to ever since I saw the capabilities demoed.

Flash 2.0 Experimental is also a model that's dirt cheap on the API, and while image gen definitely burns more tokens, it's a trivial expense. I'd strongly expect Google to make this free just to steal OAI's thunder.

Context

phailyoor self_made_human 7mo ago

I am eating crow right now.

I'm very interested to know if this model is also better at spatial reasoning compared to other models. I'm gonna see if I can get access and try it out.

self_made_human amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi phailyoor 7mo ago

It's easy being an AI advocate, I just have to wait a few weeks or months for the people doubting them to be proven wrong haha.

Jokes aside, I have tinkered with it quite a bit, and it is obviously much smarter than any dedicated image model I've used.

It took me about a half a dozen prompts (additions and corrections for the original image, instead of brand new ones) to go from the first image I've attached to the second.

It followed instructions like:

Can you edit that to remove the greebles coming out of his head?

And

Please add the text "USSRI" in the background, similar to the font and typography used in Soviet propaganda posters. The stethoscope has two bells, please fix that too.

I did notice that there was some off-target editing, when I asked for the color of the cybernetic arm to be more like carbon fiber, it also changed his helmet. The text in the background could move around or degrade with edits to the foreground. That's not a big deal, because I can iteratively approach the image I'm envisioning.

/images/17418103732556858.webp

/images/17418103735760682.webp

SnapDragon self_made_human 7mo ago

Unfortunately, even in this board, being "proven wrong" doesn't stop them. e.g. this argument I had with someone who actually claimed that LLMs "suck at writing code", despite the existence of objective benchmarks like SWE-bench that LLMs have been doing very well on. (Not to mention o3's crazy high rating on Codeforces.) AI is moving so fast, I think some people don't understand that they need to update from that one time in 2023 they asked ChatGPT3 for help and its code didn't compile.

What is this place?

Why are you called The Motte?

New post guidelines

Rules

Recommended Posts And Communities

Recommended Realtime Chats