This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.
Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.
We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:
-
Shaming.
-
Attempting to 'build consensus' or enforce ideological conformity.
-
Making sweeping generalizations to vilify a group you dislike.
-
Recruiting for a cause.
-
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.
In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:
-
Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
-
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
-
Don't imply that someone said something they did not say, even if you think it follows from what they said.
-
Write like everyone is reading and you want them to be included in the discussion.
On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

Jump in the discussion.
No email address required.
Notes -
That's fair. There are some models that allow more specific control prompt-only of multicharacter composition, like Whisk, Nano Banana, and Qwen, but they have tradeoffs and tend to give 'worse' output quality if used as the only or final part of a workflow. In-painting can give phenomenal amounts of control for very complex character layouts (or background layouts), but at the cost of a lot of tedious work (cw: 9mb video file). There's been similar efforts using related technologies for comics, loresheets, game environments, and ultra-complex characters (in the furry fandom, usually things like cyborgs and complex hybrids).
Which does give more space for self-expression, but it's not going to have the volume to be visible in a DeviantArt firehose view.
I've done my time with Stable Diffusion, from the closed alpha to a local instance running on my pc.
Dedicated image models, or at least pure diffusion ones, are dead. Nano Banana does just about everything I need. If I was anal about the drop in resolution, I'd find a pirate copy of Photoshop and stitch it together myself, I'm sure you can work around it by feeding crops into NB and trusting they'll align.
All of the fancy pose tools like ControlNet are obsolete. You can just throw style and pose references at the LLM and it'll figure it out.
I suppose they might have niche utility when creating a large, highly detailed composition, but the pain is genuinely not worth it unless you absolutely must have that.
Yeah, Nano Banana (and Whisk) are stupidly powerful, and don't really seem to have a local or open-source competitor yet. Qwen Image /Image Edit can kinda work on similar principles, and can do some level of scene composition or pose transfer, but it's limited and gets pretty ugly. A number of furry diffusion users start from Nano Banana prompting, then do the final work with a local image model (whether for upscaling, changes in content, or NSFW).
I dunno that I'd call ControlNet obsolete, but that may reflect my own unfamiliarity with Nano Banana (and not using the paid version) as much as anything deeper.
More options
Context Copy link
More options
Context Copy link
This is actually a very heartening video! It shows that you can make a complex scene that doesn't have this PonyXL house style. How do AI artists deal with preserving character details from image to image? It seems to me this is even more important for furry art (various fur patterns must be harder to reproduce correctly than "black hair, pixie cut").
[cw: lots of furry images. nothing involving nudity in any sense but the Donald Duck or swimsuit sense, but probably not something you'd want to explain to your boss]
It depends a lot on what you're aiming for. It's possible to get text-only prompts that retain fairly good consistency of a character. Some of that's because the character itself is pretty 'standard', although they also have a number of potential faults (eg, border collie with a floppy ear and a spot around one eye seems easy, but a lot of models struggle with the "my left or your left" problem). And these can require pretty serious levels of detail and description, much of which wouldn't be obvious to non-artists.
If you've already got a single piece with the character and want a second one in an entirely different context, tools with more semantic understanding focused around transfer like Qwen Image Edit, Nano Banana, and Whisk can do that surprisingly well (albeit generally on the cloud and censored: afaik, only Qwen Image Edit has a local mode). I'd expect some multimodal LLMs could do something similar, but I've only really tried GLM-4.6V for local multimode and never got anything particularly exciting from it.
For one-offs with more specific or complex markings or fur patterns, especially around the face or hands/paws, you're usually going to see a lot of inpainting. The threshold where that becomes necessary can be surprisingly low: this guy seems trivial at first glance, but since it's not supposed to have a few tells from real maned wolves that's often something he had to tweak aggressively, and the four markings on the forehead are really not something most AIgen wants to do as part of a facial structure, so he'd often be loading up krita to help do inpainting. It's still not 'real' artwork, but it can get fuzzier on the edges.
If you plan to reuse the character, doing a few works with inpainting, traditional media, 3d modeling software, or some combination of the above, then building a LoRA tends to be the most effective. A good LoRA takes a lot of effort, but it can be done with a surprisingly small number of reference images and maintain a lot of detail or handle very strange layouts.
For an example, I'll use uverage. He's an avali-wolf hybrid, so he's got a lot of unusual features (the four ears are intentional, the ring marks around the ears and thighs are not standard, and his tail is probably derived from another VRchat species) and while avali are popular enough (6k e621 images) as fictional species go enough he's probably not the first avali-wolf, there's not exactly a surfeit of non-AI training data that matches what conclusion this particular aiGenner came up with. Yet the LoRA can carry markings and physical characteristics across styles, perceived 'medium', or even transfer markings to gender or to other species.
It's far from perfect. Notably, the arm feathers and crest tend to come and go randomly, and the LoRA seems to be messing with the finger-and-toes count. That might be an intentional stylistic decision, but probably not. And LoRAs do have costs: poorly trained LoRA can degrade image quality, and they seldom scale above three or four LoRA in one generation (either text2img or inpainting) before the models tend to just go nuts. But it's the sort of thing that's practically doable at small scales by individuals without too autistic a level of focus.
That said, I will caveat that enough furries are faceblind enough, or otherwise tend to identify characters more by mood, dress (as little as that might be), and large high-contrast markings. I don't know how well the same approach would transfer to realistic or even anime-like humans, especially for an audience with better perception about microexpression or sensitivity to smaller errors; the few examples I'm familiar with tend to be side characters in content I'm not gonna link here.
More options
Context Copy link
Nano Banana or GPT Image are perfectly capable of ingesting reference images of entirely novel characters, and then just placing them idomatically in an entirely new context. It's as simple as uploading the image(s) and asking it to transfer the character over. In the old days of 2023, you'd have to futz around fine-tuning Stable Diffusion to get far worse results.
More options
Context Copy link
I think by and large they are terrible at it and don't. There are a few different techniques that claim to achieve this, but as someone who follows this closely it's all still fairly bad. By far one of the biggest remaining hurdles of mass commercial use.
Matching Eye colour hair colour, clothes etc are doable with stuff like retraining the model, a reference or prompting with a well known actor/figure
God forbid you try to recreate a character that passes the filter of someone who's not faceblind
Want to bet? I’ll wager up to US $500 that I can produce a 30 second video with a consistent, recognizable character using Veo (either Flow interface or API, your choice). Max Veo length is 8 seconds so that’s keeping consistency across 4 generations. We can do cuts to scenes within one gen if you want.
Want to agree on details? This offer is open to anyone.
The bar for me is not that it's recognizably consistent. It's actual consistence. For something like this to cross the commercial viability threshold stuff needs to stay on model.
The character needs to stay consistent in different lighting conditions, angles and FOVs.
Finally it needs to be able to handle unique appearences, not average pretty faces and clothes.
The issue isn't that it's impossible to make a video of a character from an image be consistent with that image. Although in my opinion we're still not there. The difficulty arrises from the fact that such a video will inevitably have to conjure up new details in the process. Keeping the newly created information consistent with the next generated clip gets exponentially harder with each new clip and required context. Similar to how LLMs fail if the context is long enough.
I doubt you can make something like bill gates wearing tiger face paint and a floppy sleeping cap from a flat front shot, to an over the shoulder partial view, to a side view without messing up the direction of the flop of the cap or the position/amount of tiger stripes in the make-up.
Not going to bet money on it because I'm sure with enough tries it's doable, I'm just illustrating a point that the amount of stripes and flops or whatever is essentially the same as subtle facial features like the angle of the jaw or the tilt of the eyes.
The technology is fundamentally just not designed for this sort of thing. There's tons of workarounds and it will still be very impactful, you can work within the constraint to achieve amazing stuff, but the constraints are still there.
Bruh, you have absolutely no idea what you're talking about.
Elaborate, or refrain from comments that are nothing more than "Nuh uh!"
Respectfully, I am the one who has offered to provide evidence.
The comment I was responding to was a a longer, word slop version of "Nuh uh!". Assertions without any evidence, confident claims of what it "fundamentally" can or cannot do without evidence.
Is that sufficient elaboration? Or do I need to wrap it in a great big: "No way, I'm totally right for reasons that I cannot demonstrate and I will not agree to an objective test because I'm totally right because models fundamentally can't do the things I say they can't even though I'm not going to provide any reason for why that's true. The vibes are correct as everyone simply knows."?
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link