site banner

Culture War Roundup for the week of May 1, 2023

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

9
Jump in the discussion.

No email address required.

The Writer's Guild of America (WGA) is on strike as of May 2nd, after negotiations with the Alliance of Motion Picture and Television Producers (AMPTP) broke down. While most of their demands deal with the way pay and compensation in the streaming era is structured, on the second page towards the bottom is:

ARTIFICIAL INTELLIGENCE

  • WGA PROPOSAL: Regulate use of artificial intelligence on MBA-covered projects: AI can’t write or rewrite literary material; can’t be used as source material; and MBA-covered material can’t be used to train AI.
  • AMPTP OFFER: Rejected our proposal. Countered by offering annual meetings to discuss advancements in technology.

I think this is an interesting first salvo in the fight over AI in creative professions. While this is just where both parties are starting for strike negotiations, and either could shift towards a compromise, I still can't help but see a hint that AMPTP isn't super interested in foregoing the use of AI in the future.

In 2007, when the WGA went on strike for 3 months, it had a huge effect on television at the time. There was a shift to unscripted programming, like reality television, and some shows with completed scripts that had been on the back burner got fast tracked to production. Part of me doubts that generative AI is really at the point where this could happen, but it would be fascinating if the AMPTP companies didn't just use traditional scabs during this strike, but supplemented them with generative AI in some way. Maybe instead of a shift to reality television, we'll look back on this as the first time AI became a significant factor in the production of scripted television and movies. Imagine seeing a "prompt engineer" credit at the end of every show you watch in the future.

It'll be interesting to see how this all plays out.

and MBA-covered material can’t be used to train AI

Is this even legal? AFAICT there’s no abstract ownership of concepts or ideas that copyright holders can claim, only claims against produced works. So a copyright holder can sue someone who uses AI to generate similar content to what is copyrighted, but not for using a work as training data per se. Sounds like the writers should be picketing Congress too.

I'd argue that a neural net is a derivative work of its training data, so its mere creation is a copyright violation.

But you could make a similar argument that a human brain is a derivative work of its training data. Obviously there are huge differences, but are those differences relevant to the core argument? A neural net takes a bunch of stuff it's seen before and then combines ideas and concepts from them in a new form. A human takes a bunch of stuff they've seen before and then combines ideas and concepts from them in a new form. Copyright laws typically allow for borrowing concepts and ideas from other things as long as the new work is transformative and different enough that it isn't just a blatant ripoff. Otherwise you couldn't even have such a thing as a "genre", which all share a bunch of features that they copy from each other.

So it seems to me that, if a neural net creates content which is substantially different from any of its inputs, then it isn't copying them in a legal sense or moral sense, beyond that which a normal human creator who had seen the same training data and been inspired by them would be copying them.

Who is the ‘human’ in this example?

That's an entirely different question. Obviously the LLM is not itself a human, but neither is a typwriter or computer which a human uses as a tool to write something. So probably the copyright author would be the person who prompts the LLM and then takes its output and tries to publish it. Especially if they are responsible for editing its text and don't just copy paste it unchanged. You could make an argument that the LLM creator is the copyright holder, or that the LLM is responsible for its own output which is then uncopyrightable since it wasn't produced by a human.

But regardless of how you address the above question, it doesn't change my main point that the AI does not violate copyrights of humans it uses input from in any way differently from a human doing the same things that it does. Copyright law is complicated, but there's a long history and a lot of precedents and individual issues tend to get worked out. For this purpose, the LLM, or a human using LLM as an assistant, should be subject to the same constraints that human creators already are. They're not "stealing" any more or less than humans already do by consuming each other's work. You don't need special laws or rules or restrictions on it that don't already exist.

You can't reason by analogy with what humans do because LLMs are not human. They are devices, which contain data stored on media. If that data encodes copyrighted works, they are quite possibly copyright violations. If I memorize the "I have a dream" speech, the King estate can do nothing to me. They can bust me for reciting it in public, but I can recite in in private all I want (though I could get in trouble for writing it down). If I can ask an LLM for the "I have a dream" speech and it produces it, I have proven that the LLM contains a copy of the "I have a dream" speech and is therefore a copyright violation. And that's just the reproduction right; the derivative work right is even wider.

If I can ask an LLM for the "I have a dream" speech and it produces it, I have proven that the LLM contains a copy of the "I have a dream" speech and is therefore a copyright violation.

Except that LLM don't explicitly memorize any text, they generate it. It's the difference between storing an explicit list of all numbers 1 to 100 {1,2,3...100}, and storing a set of instructions: {f_n = n: n in [1,100]} that can be used to generate the list. It has a complicated set of relationships between words that it understands, and is very refined such that if it sees the words "Recite the "I have a dream" speech verbatim", it has a very good probability of successfully saying each of the words correctly. At least I think the better versions do, many of them would not actually get it word for word, because none of them have it actually memorized, they're generating it new.

Now granted, you can strongly argue, and I would tend to agree, that a word for word recitation by a LLM of a copyrighted work is a copyright violation, but this is analogous to being busted for reciting it in public. The LLM learning from copyrighted works is not a violation, because during training it doesn't copy them, it learns from them and changes its own internal structure in ways that improve its generating function such that it's more capable of producing works similar to them, but does not actually copy them or remember them directly. And it doesn't create an actual verbatim copy unless specifically asked to (and even then is likely to fail because it doesn't have a copy stored and has to generate it from its function)

Imagine I create some wacky compression format that will encode multiple copyrighted images into a single file, returning one of them when you hit it with an encryption key corresponding to the name of the image -- the file "contains" nothing but line noise, but if you run it through the decompressor with the key "Mickey Mouse" it will return a copyright violation.

Is passing this file around on Napster (or whatever the kids are doing these days) a copyright violation?

More comments