site banner

Culture War Roundup for the week of January 16, 2023

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

13
Jump in the discussion.

No email address required.

Some more heating up in the AI image generation culture wars, with stock image company Getty Images suing Stability AI over alleged copyright violations. Here's Getty Image's full press release:

This week Getty Images commenced legal proceedings in the High Court of Justice in London against Stability AI claiming Stability AI infringed intellectual property rights including copyright in content owned or represented by Getty Images. It is Getty Images’ position that Stability AI unlawfully copied and processed millions of images protected by copyright and the associated metadata owned or represented by Getty Images absent a license to benefit Stability AI’s commercial interests and to the detriment of the content creators.

Getty Images believes artificial intelligence has the potential to stimulate creative endeavors. Accordingly, Getty Images provided licenses to leading technology innovators for purposes related to training artificial intelligence systems in a manner that respects personal and intellectual property rights. Stability AI did not seek any such license from Getty Images and instead, we believe, chose to ignore viable licensing options and long‑standing legal protections in pursuit of their stand‑alone commercial interests.

This follows a separate class action lawsuit filed in California by 3 artists against multiple image generation AI companies including Stability AI, Midjourney, and DeviantArt (which is an art sharing site, but which seems to be working on building its own image creation model). According to Artnews, "The plaintiffs claim that these companies have infringed on 17 U.S. Code § 106, exclusive rights in copyrighted works, the Digital Millennium Copyright Act, and are in violation of the Unfair Competition law." It seems to me that these 2 lawsuits are complaining about basically the same thing.

IANAL, and I have little idea of how the courts are likely to rule on this, especially English courts versus American ones. I know there's precedent for data scraping being legal, but those are highly context-dependent, and e.g. the Google Books case was contingent on the product not being a meaningful competitor to the books that were being scanned, which is a harder argument to make about an AI image generator with respect to a stock image service. In my subjective opinion, anything published on the public internet is fair game for training by AI, since others learning from viewing your work is one of the things you necessarily accept when you publish your work for public view on the internet. This includes watermarked sample images of proprietary images that one could buy. However, there's a strong argument to be made for the other side, that there's something qualitatively different about a human using an AI to offload the process of learning from viewing images compared to a human directly learning from viewing images such that the social contract of publishing for public consumption as it exists doesn't account for it and must be amended to include an exception for AI training.

Over the past half year or so, I'm guessing AI image generation is second only to ChatGPT in mainstream attention that has been directed towards AI-related stuff - maybe 3rd after self-driving cars, and so it's unsurprising to me that a culture war has formed around it. But having paid attention to some of AI image generation-related subreddits, I've noticed that the lines still don't really fit with existing culture war lines. There's signs of coalescing against AI image generation in the left, with much of the pushback coming from illustrators who are on the left, such as the comic artist Sarah C Andersen who's one of the 3 artists in that class action lawsuit, and also a sort of leftist desire to protect the jobs of lowly paid illustrators by preventing competition. But that's muddled by the fact that, on Reddit, most people are on the left to begin with, and the folks who are fine with AI image generation tools (by which I mean the current models trained on publicly-available but sometimes copyrighted images) are also heavily on the left, and there are also leftist arguments in favor of the tech for opening up high quality image generation to people with disabilities like aphantasia and others. Gun to my head, I would guess that this trend will continue until it's basically considered Nazism within 2 years to use "unethically trained AI" to create images, but my confidence level in that guess would be close to nil.

From a practical perspective, there's no legislation that can stop people from continuing to use the models that are already built, but depending on the results of these lawsuits, we could see further development in this field slow down quite a bit. I imagine that it can and will be worked around, and restrictions on training data will only delay the technology by a few years, which would mean that what I see as the true complaint from stock image websites and illustrators - unfair competition - wouldn't be addressed, so I would expect this culture war to remain fairly heated for the foreseeable future.

At least in America, copyright laws are (ostensibly) primarily for the purpose of promoting and incentivizing the creation of original works. Copying someone else's work is bad not because it increases the availability of that work (which is a good thing) but because it decreases the rightful monetary gain of the original creator since people buying or pirating the copied work are not paying them when they ought to. As a result, fewer artists are financially incentivized to make things, and thus fewer original works are created (which is bad).

As such, there is no legitimate interest in banning AI art or its access to training data, provided the AI is creating new original works and not blatant ripoffs. Yes, the AI art may compete with human artists and thus indirectly reduce financial incentives for them to make art, but it does this indirectly by creating new original art that competes with theirs in the marketplace, same as any human artists who competes with them. As a result, more original art is created, and thus the copyright laws as intended should allow it to exist.

You make a good "spherical cows" argument which shows how we would want things to play out. I think there's a big gap between that toy model and real courts, though. Copyright laws might ostensibly primarily be for the purpose of promoting and incentivizing the creation of original works, but I think they're actually primarily for the purpose of appeasing entities who financially benefit off of being able to own intellectual property. Now, there are many such entities on both sides of this culture war, so I think which side wins in the courts will come down largely to who can grease more of the right palms than the opposition rather than any principles.

Although I agree that there is some ideological corruption in the courts, especially where the culture war is involved, they are less than 100% corrupt. Certainly less than the legislators. Which means that the actual intent of the law sends a nontrivial signal, and if there is money and advocates on both sides of the issue the actual intent can serve as a tiebreaker even if the other side has slightly more money.

The risk is that there’s simply a blanket ban on copyrighting anything that contains AI art, with a full provenance pipeline required if any suspicions arise. Can’t use it for a logo, for corporate art, for media that you intend to sell, in the workplace etc.

I don't think there's a particularly high risk of that. Full provenance pipeline requirement would essentially make the entire creative industry non-viable, and creating a new "infectious public domain" category just for AI generated images - where not only are the images in the public domain, any media that uses such images also become public domain - is sufficiently new and disruptive that whatever political will can be scrounged up to push for such a system seems likely to be crushed by the might of entrenched interests like Disney or Microsoft who would essentially see their entire film, TV, and video game divisions destroyed if they allowed it to happen.

For the record, if that outcome ("AI art can't be copyrighted but remains legal") comes to pass, I'd be very surprised and shift my model of reality by a significant amount. My current model says that copyright disproportionately benefits those who have the means to enforce it, which are also entities close to the reigning powers in most countries, at the expense of the vast majority of people, and so such a combination of measures would amount to a massive voluntary relinquishment of power.

I'd bet on something close to the opposite: AI art can be copyrighted just fine, but the means of its production are subject to regulatory capture so that most people can't legally produce it.

What really gets in my gears over this is how pointlessly these people are going to throw away the west's advantages to squeeze out just a couple more years of employment. Some country is not going to tell these sites to pound sand and unless you're going to stop them with guns and tanks your job is going to be outsourced to that country eventually. We are wasting very precious seconds trying to put a pin back into a grenade with our teeth.

What really gets in my gears over this is how pointlessly these people are going to throw away the west's advantages

Detractors of AI technology don't view this as an "advantage" in the way you're thinking of it.

Being the leader in AI is a bit like being the leader in bioweapons. Tactically prudent, so you don't get blindsided by your adversaries, but it's not something that gives you the warm fuzzies. It's the sort of thing that you wish wasn't necessary in the first place. And it's certainly not the sort of thing that you want floating around unaccounted for in private hands. You want it managed by the public sector, strictly regulated, under lock and key.

Some country is not going to tell these sites to pound sand

"It's legal over there so it should be legal here" is hardly a convincing argument. The issue here is that Stability is unjustly profiting off the work of artists without proper compensation. So what if other countries would let them do it with impunity? People do unjust things all the time, but that's no excuse for you to do the same.

Anyway, if it is a foregone conclusion that artists are all going to be out of a job and nothing they do matters either way, then that's just an even stronger argument for them to go after Stability now, just out of spite.

Note how they aren't suing OpenAI (and by extension, Microsoft). It's not just a matter of training data but also of tactical savviness; of course Dall-E 2 (and the next model that's allegedly called Flow and will be integrated with GPT-4) can be artistic enough to compete with human artists, even if it won't be mimicking their particular styles. Indeed, it'd be more interesting to push those models to develop novel styles of their own. I also suspect some LW inspiration in this attack on minor players: Stability, for example, doesn't plan to restrict itself to the pretty picture generator business, and is therefore a problem in the paranoid world model where the fewer actors there are, the less risk.

The question is how hard it'll be to train new foundation models in the case of these people succeeding and legitimate centralized business entities like Stability going under.

The coolest new text to image model, seemingly far superior to Stable Diffusion, Midjourney V4 and even Imagen/Parti in comprehension, is Google's Muse, a pure Transformer (with T5-XXL text encoder, but that's par for the course). It's much faster at inference than its predecessors (Imagen, Parti, even non-distilled SD if you have enough VRAM) and more naturally lends itself to image modification.

We train a number of base Transformer models at different parameter sizes, ranging from 600M to 3B parameters. Each of these models is fed in the output embeddings from a T5-XXL model, which is pre-trained and frozen and consists of 4.6B parameters.

We train on the Imagen dataset consisting of 460M text-image pairs (Saharia et al., 2022). Training is performed for 1M steps, with a batch size of 512 on 512-core TPU-v4 chips (Jouppi et al., 2020). This takes about 1 week of training time.

So that's ~86000 TPU-v4-hours, which given the track record of optimizations for SD I'll consider the budget for getting a good-enough image generator we'd be able to use in perpetuity. Naively, that's something like $120K in compute using A100, maybe less depending on provider's terms. Of course if the hostility towards independent AI becomes prevalent and supported by law, it'll be hard to rent a node of 512 A100s (and harder to buy: that's like $10M in hardware if you're lucky) for such a purpose. On the other hand, if you're working in the shadows, training time and efficiency don't matter as much, and you can try to shard the workload, and use cheaper hardware (as in: 3090s from bankrupt Ethereum miners connected by Infiniband cards from the same Ebay, or rather darknet vendor)...

But before the censors even get to enforce such regulations, we'll get DeepFloyd IF, which already seems to be on par with Google's top models. It's a model developed by Alex Shonenkov, ex-head of image generation AI at Sberbank (yes, Sberbank was in the business of AI art); they are backed by Stability. It isn't trained on contemporary digital art, but, in the tradition of StableDiffusion, will trivially allow finetuning. In all likelihood, you'd need something like 8x3090, but that's about as hard to trace as a stealth weed growbox in a basement. Inference, I expect, also won't be feasible on normal consumer machines, so it'll incentivize some stealthy cloud computing, maybe very small-scale. Would be nice if the powers that be didn't succeed in crashing crypto, so we could build a p2p on-demand opensource AI inference and training economy.

All of the arithmetic above is only relevant to static images, and is therefore of limited use. What is more interesting is video+audio. What is vital is text, especially code; the best opensourced LLMs are gimmicks in comparison to GPT 3.5, and GPT 3.5 is still only barely useful in a professional context; we need something better to achieve escape velocity. There are lawsuits brewing in this domain too; the noose is tightening, the surveillance escape velocity approaches as well. I do hope that enough competent people realize the end result of the ongoing multipronged attack on ML-oriented hardware availability, unsupervised internet connectivity, gratuitous electricity expenditure, AI legality, and distributed secure ledgers before the timeline's stable state is determined. For now, it feels like pretty much everyone tech-savvy is still deriving status points from mocking cryptobros and pooh-poohing ChatGPT over its inability to write decent poetry or count to 10.

of course Dall-E 2 (and the next model that's allegedly called Flow and will be integrated with GPT-4) can be artistic enough to compete with human artists, even if it won't be mimicking their particular styles. Indeed, it'd be more interesting to push those models to develop novel styles of their own.

I don't know much about Dall-E 2 or Flow, but this is the sort of thing I'm particularly excited about in this space in the near future. I'm reminded of how AI learning how to play chess or go resulted in novel, unexpected behavior that caught human experts in such games off guard, showing us humans just how limited our understanding of the space of possible chess/go moves was. In the case of AI generated images, an AI can explore the space of possible pixel arrangements in a grid in a way that's far more flexible than a human with their human constraints can, and I wonder just how much beauty there is in this space that were previously impossible or extremely difficult to access due to our human constraints.

In all likelihood, you'd need something like 8x3090, but that's about as hard to trace as a stealth weed growbox in a basement. Inference, I expect, also won't be feasible on normal consumer machines, so it'll incentivize some stealthy cloud computing, maybe very small-scale.

I'll bet against that. It's supposed to be an Imagen-like model leveraging T5-XXL's encoder with a small series of 3 unets. Given that each unet is <1B, this is no worse than trying to run Muse-3B locally.

Well, I think Muse-3B won't run locally either.

How do you suppose T5-XXL's encoder is to be used, in practice? It's 5.5B, so 11GB in bf16. And StableDiffusion is 860M, but in practice it takes multiple GBs.

TLDR: it should be possible for any chump with 12GB of ordinary RAM, or some combination of offloaded RAM+vRAM that sums to 9GB, because running encoder-only is fast enough. Tests and stats mostly extrapolated from T5-3B because of personal hardware constraints (converting models costs much more memory than loading them)

There are markdown tables in this comment that do not display correctly on the site, despite appearing correctly on the comment preview. You may wish to paste the source for this comment into a markdown preview site.


To start, T5-XXL's encoder is actually 4.6B, not 5.5. I do not know why the parameters aren't evenly split between the encoder & decoder, but they aren't.

Additionally, it's likely that int8 quantisation will perform well enough for most users. load_in_8bit was recently patched to work with T5-like models, so that brings the memory requirements for loading the model down to ∼5GB.

What about vram spikes during inference? Well, unlike SD, the memory use of T5 is not going to blow significantly beyond what its parameter count would imply, assuming the prompts remain short. Running T5-3B from huggingface [0], I get small jumps of:

| dtype | vram to load | .encode(11 tokens) | .encode(75 tokens) |

|-|-|-|-|

| 3B-int8 | 3.6GB | 4.00GB | 4.35GB |

| 3B-bf16 | 6.78GB | | 7.16GB |

Note that the bump in memory for bf16 is smaller than int8 because int8 does on-the-fly type promotion shenangians.

Extrapolating these values to T5-XXL, we can expect bumps of (0.4∼0.8) * 11/3 = 1.5∼3GB of memory use for an int8 T5-XXL encoder, or <1.5GB for a bf16 encoder. We should also expect the model to take 10∼20% extra vram to load than what its parameters should imply.

So, an ideal int8 T5-XXL encoder would take up to (4.6*1.15+3)GB, or slightly more than 8GB of vram during runtime. That still locks out a substantial number of SD users -- not to mention the 10xx series users who lack int8 tensor cores to begin with. Are they fucked, then?


Short answer: no, we can get away with CPU inference via ONNX.

I first came across the idea below a Gwern comment. Given that prompts are limited to 77 tokens, would it be possible to run the encoder in a reasonable amount of wall time? Say, <60s.

Huggingface's default settings are atrociously slow, so I installed the ONNX runtime for HF Optimum and built ONNX models for T5-3B [1]. Results:

| quantized? | model size on disk | python RAM after loading (encoder+decoder) | model.encoder(**input) duration | full seq2seq pass |

|-|-|-|-|-|

| no | 4.7+6.3GB | 17.5GB | 0.27s | 42s |

| yes | 1.3+1.7GB | 8.6GB | 0.37s | 28s |

I'm not sure whether I failed to use the encoder correctly here, considering how blazing fast the numbers I got were. Even if they're wrong, an encoder pass on T5-XXL is still likely to fall below 60s.

But regardless, the tougher problem here is RAM use. Assuming it is possible to load the text encoder standalone in 8bit (I have not done so here due to incompetency, but the model filesizes are indicative), the T5-XXL text encoder would still be too large for users with merely 8GB of RAM to use. An offloading scheme with DeepSpeed would probably only marginally help there.


[0] - example code to reproduce:


PROMPT = "..."

model = T5ForConditionalGeneration.from_pretrained(model_name, device_map='auto', low_cpu_mem_usage=True, ...)#add torch_dtype=torch.bfloat16 OR load_in_8bit=True here

inputs = tokenizer(PROMPT, return_tensors='pt')

output = model.encoder(**inputs)

[1] - example code for ONNX model creation:


model_name = "t5-3b"

model_name_local = "./t5-3b-ort"

model_name_quantized = "./t5-3b-ort-quantized"


def create_ORT_base():

    model = ORTModelForSeq2SeqLM.from_pretrained(model_name, from_transformers=True)

    model.save_pretrained(model_name_local)


def create_ORT_quantized():

    model = ORTModelForSeq2SeqLM.from_pretrained(model_name_local)

    model_dir = model.model_save_dir

    #

    encoder_quantizer = ORTQuantizer.from_pretrained(model_dir, file_name="encoder_model.onnx")

    decoder_quantizer = ORTQuantizer.from_pretrained(model_dir, file_name="decoder_model.onnx")

    decoder_wp_quantizer = ORTQuantizer.from_pretrained(model_dir, file_name="decoder_with_past_model.onnx")

    quantizer = [encoder_quantizer, decoder_quantizer, decoder_wp_quantizer]

    #

    dqconfig = AutoQuantizationConfig.avx512_vnni(is_static=False, per_channel=False)

    for q in quantizer:

        q.quantize(save_dir=model_name_quantized,quantization_config=dqconfig)

I didn't have any good place to add this in my post, but it's worth noting that caching of text embeddings will help a lot with using T5-XXL. Workflows that involve large batch sizes/counts || repeated inpaintings on the same prompt do not need to keep the text encoder loaded permanently. Similar to the --lowvram mechanism implemented now, the text encoder can be loaded on demand, only when the prompt changes, saving memory costs.

Idk about the UK, but in America they’re going to lose. “I had no idea allowing free access to my content would allow someone to create a computer-legible map of abstract conceptspace that renders my entire business obsolete,” isn’t a legal argument.

Show me where, in the stability AI software, the getty photos are saved. Show me how to get one of the getty images out of stability AI.

You can't.

Folks are workin' on it. We'll get there.

As far as the device knows, that’s an abstract picture element as meaningful as the letter T or Donald Trump’s hair.

As far as the device knows, that’s an abstract picture element as meaningful as the letter T or Donald Trump’s hair.

Good luck getting a judge or jury to believe you.

The facts of the matter are almost irrelevant to the legal case; the tech-savviness of Some Boomer in a black robe is the decisive factor, making me think that the artists are probably going to win

Not out of place when discussing court cases, mind you, but when I see these rather strictly legalistic arguments used in the context of unprecedented, transformative tools, they seem lacking. Surely the circumstances call for more than pointing to such and such paragraph? From my shallow understanding of how society handled previous such changes, it's a vain hope, losers will simply have to cope.

This is where the legislator is supposed to actually be of any use to people.

He's supposed to get all the people in a room, hash out some compromise that doesn't completely destroy norms or completely stifle innovation and drill down some new norms so the people getting made redundant can merrily go into irrelevance without destitution.

But that only works if you have competent elites, so I guess legal fights it will be.

Overfitting is a thing, and it's certainly possible to use the base Stable Diffusion model to recreate closely-enough-to-count-as-infringement some famous artworks such as the Mona Lisa. Dunno if there are any non-public domain artworks for this applies, though, and in the general case, certainly these models don't particularly enable someone to create copies of copyrighted images.

I'm not sure if such details matter, though. The lawsuits seem to be claiming that the very act of training on the copyrighted images is infringement, and I don't know if that argument relies fully on the false notion that the models allow for reliable reproduction of training images outside of extreme edge cases.

What you could maybe do is prove that prompts such as "in the style of artist X" derive distinctive features from that artist's corpus, and maybe you could likewise build a more complex case showing that the results of Getty-related prompts yield images sharing features of Getty stock pictures. Sounds like a difficult task though and of uncertain legal status even if possible.

Meanwhile I am sure the damage to Getty's business will be immense. Just from my own experience, advertising clients right now are rejecting high-priced images and time-intensive comps left right and centre in favour of AI visuals they can create themselves in seconds, and this is happening even with some quite deep pocketed clients.