site banner

Friday Fun Thread for September 30, 2022

Be advised; this thread is not for serious in depth discussion of weighty topics, this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.

4
Jump in the discussion.

No email address required.

There was some [previous discussion], more culture war focused, on AI art and specifically AI pornography. 17 days ago, furry-specialized models were "currently a WIP and will be available soon".

The Yiffy specialized model has reached Epoch18 last night, following hot on the tail (hur hur) of Epoch13 on September 25th and Epoch15 a couple days later. While it's not quite up to my test case yet (or I'm not a good enough promptomancer to get it there), it's made a huge amount of progress toward it. And while I can't speak for Primaprimaprima's test of "a single high-quality AI image of two people having sex", there's absolutely the opportunity to generate images of two furries having sex, now. While it does PoV shots more easily, people have already found a few prompts that pretty consistently get common positions or even some kinks like exhibitionism going.

((Separately, we have separate Doe Biden and Buck Breaking jokes from Trace Woodgrains. Not sure if Trace was using Yiffy, or the less-porn-trained Furry model.))

Some somewhat surprising revelations:

  • Furry models seem to be doing better about anatomy like hands than conventional StableDiffusion. Which is kinda funny when someone wants paws, but potentially useful. Still not great, though, and probably only because the source images have such a restrained number of poses.

  • It's actually somewhat useful to train and tag for things you don't want. The Automatic1111 WebUI has the option of negative prompting. For adult content use, that can be useful for avoiding orientations or genders or other content you're not interested in. But that also useful if you don't want adult content at all; not only can you find the opposite of dicks, you can find the opposite of "bad anatomy". Which isn't necessarily going to make an output good, but it does point to some interesting options.

  • Albeit at the cost that you've probably trained for things that you don't like. Both the Furry and Yiffy checkpoints were trained against datasets filtered both on quality (albeit by simple upvotes), but also by content, for a variety of very good reasons.

  • Hit rates are either not great or outstanding, depending on what perspective you're looking at. Some more simple 'pinup' style prompts have gotten 30%+ as what their creators consider 'acceptable', but more complicated prompts can be ~10%, or even never produce good results at the first pass.

  • Furries have, perhaps understandably, focused on the use of furry artists for style prompting, but you can get somewhat surprising results looking at things in unexpected ways. Furry porn by DaVinci ends up looking pretty cool! I've had better luck getting SFW noodly cartoon people from prompts involving braeburned (cw: gay) and zachary911 (cw: gaaaaaay) than prompts involving Rick Griffin, who's pretty much the king of that field. A number of non-furry artstation artists (eg Greg Rutkowski, Michael & Inessa Garmash, Ruan Jia, Pino Daeni) can augment the style of prompt that's already got furry artists included.

  • Prompt and name collision seem to be an issue. Perhaps moreso in the furry fandom than elsewhere, but I do think it's going to point to some general issues with the tokenizer. I'm not sure if this is an issue from the scale of the data, or if it's one of many wider problems in the CLIP tokenizer.

  • This isn't very advanced. There's some fantastic work happening in the field, but the Automatic1111 webui also is missing unit tests and breaks functionality every other commit. The Yiffy model was trained on the word 'explict', because typos. It's not unusual to develop a prompt's settings by dartboard.

  • It's still very involved. At the extreme end, there's people who have entire workflows of inpainting and outpainting to correct defects like hands and eyes, follow by resolution enhancements, followed by resolution enhancements. But even well before that, it's tricky to dial in the right denoising settings. But with the exception of that last photoshop touchup phase, it's far from clear that these could not, themselves, be automated, and even that much of that automation would be new technology rather than slapping together existing bits. Indeed, automating the underlying 'does this image look /right/' step was a major part of the filtering of the LAOIN dataset used to train StableDiffusion to begin with.

  • It's also a surprisingly small training dataset. Yiffy trained starting on 150k images, moving to 200k for later epochs. A different model was separately trained on latex, rubber, and 'goo' with 100k images and, while I've not experimented with it or part of its audience, seems to be fairly successful. Many of the very useful tagged styles have less than 500 pieces in the training data: this (cw: topless fox guy in a loincloth, probably nsfw, but nothing 'showing') compares the relative effects of artists with 1400 (ruaidri), 230 (snowskau), and 47 (garnetto) works in the training data. That doesn't necessarily say something about training size floors, and it's possible that the terms are coming from previous training from the original LAION data, but it does suggest ceilings.

What are they training it on? Only a few hundred k, so presumably they're not throwing the entire e621 catalog at it yet.

From my understanding, the major trainers have largely downloaded a subset of e621 data, filtered by upvote score, and then by content. Furry and Yiffy to both SFW and less-extreme-kink NSFW, with different thresholds and limits. Zach3d on 'texture' fetish and a few specific species mixed with a small subset of general pictures with a higher upvote score. I think most have also filtered out material that they think is likely to cause artifacting, either technical stuff like severe jpg compression, or many-panel comic pieces.

So it's relatively easy to grab sets and train a model, to the point that groups of amateurs can do it? That's good news: I had a vision of all art production being censored by the kind of political commissars who inevitably take over large projects.

The idea of being able to restrict artists at the canvas level must be making some of them drool.

As far as I know, each of these datasets has been curated by one person, to their respective tastes. Hasuwoof for Yiffy, DirtyApples for Furry, and Zach for Zach3d. e621 is well-enough tagged for high-score posts that it seems fairly automatable, and as long as you're not abusing the download process, it's hard to tell a normal user from an archiver, especially if you filter before download. And the code itself is... not fun, since it's poorly documented python in most parts, but it's nothing ridiculous.

((There's a My Little Pony-specific one that's supposed to have been released recently, but I know less about that.))

There's been some discussion of setting up teams for difficult heavy lifting (eg, improving tagging, building and parsing datasets with more eyes-on-curation), but the big issue for now are cost and technical accessibility. The core model is expensive because it took literally millions of steps in a large dataset, but further tuning is relatively cheap, with most epochs taking less than a day on a single (beefy) cloud GPU server. But getting the data together and onto that machine rapidly enough can be complex to do right, and easy to end up with a staggering AWS bill if done wrong.

That'll be less an issue if newer GPU generations continue to bulk up on VRAM; if done at home, it's mostly an energy (and/or cooling) bill thing. And that might be coming as soon as this winter for people willing to splurge on the higher-VRAM versions of the 4090.