@Porean's banner p

Porean


				

				

				
3 followers   follows 1 user  
joined 2022 September 04 23:18:26 UTC

				

User ID: 266

Porean


				
				
				

				
3 followers   follows 1 user   joined 2022 September 04 23:18:26 UTC

					

No bio...


					

User ID: 266

Seeing as no one else has discussed this, I'll try to give a brief overview of the Drama that has taken place in the Stable Diffusion community over the last few days.


On 7th Oct, most of the source code, private models, and proprietary technology of NovelAI is leaked on /sdg/.

NovelAI's anime art generation stack was (is) significantly better than the open source tech available, so tooling is quickly made available to work with the leaked models. More specifically, the developer for the most popular offline Stable Diffusion tool, AUTOMATIC1111, immediately implements features to work with NovelAI's "Hypernetworks".

Within a day, AUTOMATIC1111 is accused of code theft and banned from the Stable Diffusion community, a decision explicitly backed by Emad. This causes a lot of Drama to happen,

  • Because the stable-diffusion-webui project is extremely popular; no other open source tool comes close in functionality,

  • Because it's ambiguous whether any code was truly stolen or not,

  • Because NovelAI was discovered to have illegally copied open source code in their leaked repositories, an error that was admittedly quickly reverted by their devs,

  • Because of the optics of the situation -- Stability AI backing a closed-source company over a popular open source figure?

The drama expands further when links to stable-diffusion-webui are scrubbed from /r/stablediffusion, causing the former moderators of that subreddit to reveal that the moderation team of the subreddit experienced a quiet takeover by representatives of Stability AI. It is additionally claimed that a similar process occured for the SD Discord server.

And as an extra bonus, the coomers using SD have gone into high alert after Emad remarked in an interview that the release of SD V1.5 would be delayed to "handle edge cases" & (de)bias the model against "topless women".


Insofar as I have a take on all of this, it's going to be some blend of "holy shit Emad please stop doing bad PR" and the Seven Zillion Witches problem. I find it horrific that the mutually agreeable agenda of "let's keep AI open and available" is doing its best to self-destruct via the usual processes of internet catfights and the amplification of minor differences within what ought to be a united ingroup.

Leave the rest of the internet at the door.

Or could you at least have something more substantial to talk about than, "redditors upvote dumb shit, news at 11"?

A few followups to last week's post on the shifting political alignment of artists:

HN: Online art communities begin banning AI-generated images

The AI Unbundling

Vox: What AI Art means for human artists

FurAffinity was, predictably, not the only site to ban AI content. Digital artists online are in crisis mode, and you can hardly blame them -- their primary income source is about to disappear. A few names for anyone here still paying for commissions: PornPen, Waifu Diffusion, Unstable Diffusion.

But what I really want to focus on is the Vox video. I watched it (and it's accompanying layman explanation of diffusion models) with the expectation it'd be some polemic against the dangers of amoral tech nerds bringing grevious harm to marginalised communities. Instead, what I got was this:

There's hundreds of millions of years of evolution that go into making the human body move through three-dimensional space gracefully and respond to rapidly changing situations. Language -- not hundreds of millions of years of evolution behind that, actually. It's pretty recent. And the same thing is true for creating images. So our idea that like, creative symbolic work will be really hard to automate and that physical labor will be really easy to automate, is based on social distinctions that we draw between different kinds of people. Not based on a really good understanding of actually what's hard.

So, although artists are organising a reactionary/protectionist front against AI art, the media seems to be siding with the techbros for the moment. And I kind of hate this. I'm mostly an AI maximalist, and I'm fully expecting whoever sides with Team AI to gain power in the coming years. To that end, I was hoping the media would make a mistake...

To which tribe shall the gift of AI fall?

In a not particularly surprising move, FurAffinity has banned AI content from their website. Ostensible justification is the presence of copied artist signatures in AI artpieces, indicating a lack of authenticity. Ilforte has skinned the «soul-of-the-artist» argument enough and I do not wish to dwell on it.

What's more important, in my view, is what this rejection means for the political future of AI. Previous discussions on TheMotte have demonstrated the polarizing effects of AI generated content — some are deathly afraid of it, others are practically AI-supremacists. Extrapolating outwards from this admittedly-selective community, I expect the use of AI-tools to become a hotly debated culture war topic within the next 5 years.

If you agree on this much, then I have one question: which party ends up as the Party of AI?

My kneejerk answer to this was, "The Left, of course." Left-wingers dominate the technological sector. AI development is getting pushed forward by a mix of grey/blue tribers, and the null hypothesis is that things keep going this way. But the artists and the musicians and the writers and so on are all vaguely left-aligned as well, and they are currently the main reactionary force against AI.

The Motte is no stranger to the JQ. You do not need to waste your time pointing out the basic statistics: we've seen them, we've discussed them, and we've had far longer and nuanced discussions on the topic.

Or, to put it in a way you'll understand: Lurk more. The "hey I just noticed this thing about Jewish overrepresentation..." skit makes you stick out like a lamppost.

For the people that might get duped like me: no, this article isn't about figuring out the true statistically rigourous answer for what a medium-sized breast looks like.

Unironically just read every LessWrong article about Definitions. In a sane world, we'd just create new words to distinguish between transwomen and women and call it a day.

it also means that the online community forms strong bonds and is only associated with positive emotions.

I can't speak for the rest of the population, but the lack of hugboxing on the Right was exactly why I turned right-wing back in Ye Olden Days of approximately a decade ago. The consistent hugs and kisses and emotions from left-dominated spaces in the early 2010s was exactly why I, and I suspect most online right-wingers, didn't like them. They were incredibly easy to bully and seemed to have not a shred of a spine, and that kind of behaviour is just innately appalling to politically agitated young males.

Nowadays, the online left is a lot more vicious and willing to persecute its enemies to the bitter end. And as much as I disagree with their values, I can at least respect that they've transformed themselves from limp-wristed victimhood to arguably successful political agitators.

Whenever someone brings up the photography analogy, I always think they're completely missing the point. It's almost like you're Seeing as a State -- artists exist now, revolution happens, artists exist after.

What you're neglecting to mention is that the artists that exist in the present will not be the artists of the future. We had photorealistic painters, and later we had photographers. The latter were not made of the former. People will suffer, perish, anguish, and all of this stuff is important for understanding how things play out in the near future.

If you're of what might be referred to as the "pro-HBD" persuasion around here, how would the world look different if there were not meaningful cognitive/behavioral differences between ethnic groups?

We would not exist. God would be real. Cartesianism would be accurate. At the most basic level, it is extremely difficult to put "HBD is fake" and "evolution and scientific materialism is real" into the same boxes of reality.

Premise #2: Within the unit of people we care about, we care about everyone equally.

I think this premise specifically is inherently anti-utilitarian. How can you assign the same utility to each individual when there's so much variance? When the actions and roles and beliefs and experiences of two people can differ so greatly?

GDB is

  1. not easy to learn

  2. even less easy to learn if you are a part of the modern GUI/webapp/the-fuck-is-a-shell generation (so, the problem statement at hand)

  3. doesn't even scale to larger projects, so you can hardly say you'll use it in a real job

Compare it with, let's say, the chrome debug console. Or the vscode debugger for python. They're far more intuitive than x/10g info all-regs, b 0x1234, ni×100, etc.

AI art continues to be terrible at generating pornographic images where a lot of freelance artists' requests come from.

My dude, I listed three services that provide what I believe to be good quality AI pornography. I have personally been making use of these services and I suspect I will not be using my old collection anymore, going forwards.

It also has trouble maintaining a coherent style across multiple images,

This is just a prompt engineering problem, or more specifically cranking up the scale factor for whichever art style you're aping && avoiding samplers that end with _A.

Remember that people were extremely gung ho about the future of stuff like motion controls and VR in gaming

And I can assure you I was not one of these people. Neither was I a web3 advocate, or self-driving car optimist, or any other spell of "cool tech demo cons people into believing the impossible".

For Stable Diffusion, there is no demo. The product is already here. You can already get your art featured / sold by putting it up on the sites that permit it. I know with 100% certainty that I am never going to pay an old-school artist* for a piece of digital art again, because any ideas I had were created by me with a few prompt rolls an hour ago.

*I might pay for a promptmancer if I get lazy. But that will be magnitudes cheaper, and most likely done by people who used to not be artists.

For example - France has a fertility rate around 1.8. 1.7 for the US. Germany 1.4.

In the east with more strict gender norms the rich societies however have far more abysmal fertility rates - Japan 1.3, South Korea 0.8, Taiwan 1.1, Singapore 1.2.

Note that the nations you've listed in the second category, despite having "strict gender norms", generally have both males and females employed at similar rates.

The link between feminism and fertility is simple -- working women have less time for childraising, and society never restructured itself to account for this loss. The Asian nations you've listed (and also Germany!) all have norms of extreme workaholic culture compared to the Western nations you've listed, and this is likely a significant cause of the reduced fertility in the former.

Whether 21st century feminism is doing well or not is missing the point; the bulk of the damage was done by developing the norm of both parents working in families.

What's the point?

No, really -- let's say you win. You've convinced the entirety of the western public that COVID-19 was made in a Chinese biolab. Okay, now what?

I have 180°'d on my opinions, thanks.

TLDR: it should be possible for any chump with 12GB of ordinary RAM, or some combination of offloaded RAM+vRAM that sums to 9GB, because running encoder-only is fast enough. Tests and stats mostly extrapolated from T5-3B because of personal hardware constraints (converting models costs much more memory than loading them)

There are markdown tables in this comment that do not display correctly on the site, despite appearing correctly on the comment preview. You may wish to paste the source for this comment into a markdown preview site.


To start, T5-XXL's encoder is actually 4.6B, not 5.5. I do not know why the parameters aren't evenly split between the encoder & decoder, but they aren't.

Additionally, it's likely that int8 quantisation will perform well enough for most users. load_in_8bit was recently patched to work with T5-like models, so that brings the memory requirements for loading the model down to ∼5GB.

What about vram spikes during inference? Well, unlike SD, the memory use of T5 is not going to blow significantly beyond what its parameter count would imply, assuming the prompts remain short. Running T5-3B from huggingface [0], I get small jumps of:

| dtype | vram to load | .encode(11 tokens) | .encode(75 tokens) |

|-|-|-|-|

| 3B-int8 | 3.6GB | 4.00GB | 4.35GB |

| 3B-bf16 | 6.78GB | | 7.16GB |

Note that the bump in memory for bf16 is smaller than int8 because int8 does on-the-fly type promotion shenangians.

Extrapolating these values to T5-XXL, we can expect bumps of (0.4∼0.8) * 11/3 = 1.5∼3GB of memory use for an int8 T5-XXL encoder, or <1.5GB for a bf16 encoder. We should also expect the model to take 10∼20% extra vram to load than what its parameters should imply.

So, an ideal int8 T5-XXL encoder would take up to (4.6*1.15+3)GB, or slightly more than 8GB of vram during runtime. That still locks out a substantial number of SD users -- not to mention the 10xx series users who lack int8 tensor cores to begin with. Are they fucked, then?


Short answer: no, we can get away with CPU inference via ONNX.

I first came across the idea below a Gwern comment. Given that prompts are limited to 77 tokens, would it be possible to run the encoder in a reasonable amount of wall time? Say, <60s.

Huggingface's default settings are atrociously slow, so I installed the ONNX runtime for HF Optimum and built ONNX models for T5-3B [1]. Results:

| quantized? | model size on disk | python RAM after loading (encoder+decoder) | model.encoder(**input) duration | full seq2seq pass |

|-|-|-|-|-|

| no | 4.7+6.3GB | 17.5GB | 0.27s | 42s |

| yes | 1.3+1.7GB | 8.6GB | 0.37s | 28s |

I'm not sure whether I failed to use the encoder correctly here, considering how blazing fast the numbers I got were. Even if they're wrong, an encoder pass on T5-XXL is still likely to fall below 60s.

But regardless, the tougher problem here is RAM use. Assuming it is possible to load the text encoder standalone in 8bit (I have not done so here due to incompetency, but the model filesizes are indicative), the T5-XXL text encoder would still be too large for users with merely 8GB of RAM to use. An offloading scheme with DeepSpeed would probably only marginally help there.


[0] - example code to reproduce:


PROMPT = "..."

model = T5ForConditionalGeneration.from_pretrained(model_name, device_map='auto', low_cpu_mem_usage=True, ...)#add torch_dtype=torch.bfloat16 OR load_in_8bit=True here

inputs = tokenizer(PROMPT, return_tensors='pt')

output = model.encoder(**inputs)

[1] - example code for ONNX model creation:


model_name = "t5-3b"

model_name_local = "./t5-3b-ort"

model_name_quantized = "./t5-3b-ort-quantized"


def create_ORT_base():

    model = ORTModelForSeq2SeqLM.from_pretrained(model_name, from_transformers=True)

    model.save_pretrained(model_name_local)


def create_ORT_quantized():

    model = ORTModelForSeq2SeqLM.from_pretrained(model_name_local)

    model_dir = model.model_save_dir

    #

    encoder_quantizer = ORTQuantizer.from_pretrained(model_dir, file_name="encoder_model.onnx")

    decoder_quantizer = ORTQuantizer.from_pretrained(model_dir, file_name="decoder_model.onnx")

    decoder_wp_quantizer = ORTQuantizer.from_pretrained(model_dir, file_name="decoder_with_past_model.onnx")

    quantizer = [encoder_quantizer, decoder_quantizer, decoder_wp_quantizer]

    #

    dqconfig = AutoQuantizationConfig.avx512_vnni(is_static=False, per_channel=False)

    for q in quantizer:

        q.quantize(save_dir=model_name_quantized,quantization_config=dqconfig)

I didn't have any good place to add this in my post, but it's worth noting that caching of text embeddings will help a lot with using T5-XXL. Workflows that involve large batch sizes/counts || repeated inpaintings on the same prompt do not need to keep the text encoder loaded permanently. Similar to the --lowvram mechanism implemented now, the text encoder can be loaded on demand, only when the prompt changes, saving memory costs.

Misogynist (in the feminist sense) would be more accurate. There is zero mention of anything related to getting laid.

I hate how much coverage the AI/rat community is giving to "Loab". It seems abundantly clear to me it's a social hoax (or at least just a funny art exhibition) rather than demonstrating anything insightful into the latent space of diffusion models.

This is pretty clearly a woman.

We have differences in lived experiences, then. I've said this before, but I really think Hanania nailed it by hypothesising that the anti-trans side cannot be understood without acknowleding how some are simply innately disgusted by what they perceive as abnormal physical features. Or, to simplify, too many people have a disgust reflex against non-passing transsexuals for the movement to succeed.

You can talk about how we should all apply Bayesian reasoning to deduce that an odd looking person is likely to prefer she/her, but that's a tall order for someone experiencing literal transphobia (as in: an instinctive, uncontrollable fear/repulsion) as they look at the person.

As for your commentary on how "passing is transphobic", I think it has been independently suggested a thousand times by some of the more radical trans activists.

Your post made sense to me, but I think that's a result of me agreeing with 90% of it. It might help if you broke up your stream of consciousness into proper paragraphs and subpoints.

In all likelihood, you'd need something like 8x3090, but that's about as hard to trace as a stealth weed growbox in a basement. Inference, I expect, also won't be feasible on normal consumer machines, so it'll incentivize some stealthy cloud computing, maybe very small-scale.

I'll bet against that. It's supposed to be an Imagen-like model leveraging T5-XXL's encoder with a small series of 3 unets. Given that each unet is <1B, this is no worse than trying to run Muse-3B locally.

Sizzle50's various posts on BLM were really great, but I think everyone here has discussed that to death.

Instead, I'll link SayingAndUnsaying's longpost on Hawaiian Racial Dynamics, which will be new & novel for a lot more readers.

I miss the past.