site banner

Culture War Roundup for the week of May 1, 2023

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

9
Jump in the discussion.

No email address required.

Imagine I create some wacky compression format that will encode multiple copyrighted images into a single file, returning one of them when you hit it with an encryption key corresponding to the name of the image -- the file "contains" nothing but line noise, but if you run it through the decompressor with the key "Mickey Mouse" it will return a copyright violation.

Is passing this file around on Napster (or whatever the kids are doing these days) a copyright violation?

I'm pretty sure it is, if that's the intended use case of that file, and people other than you know about the decryption method. On the other hand, literally any data file of a certain length (call it A) can be turned into literally any other data file of the same length (B) if you hit it with exactly the right "decryption" (B-A) by just adding the bits together. So if you take this idea too far, every file is secretly an encrypted Mickey Mouse to the right code.

There's something nontrivial in here about information theory. If the copyrighted image has 500 kb of data, and your "encrypted file" is 500 kb, and the decryption key "Mickey Mouse" is 12 bytes, then clearly the file must contain the copyright violation. If you make an "encrypted file" with 12 kb and some wacky compression algorithm that requires 500 kb to encode and is specifically designed to transform the string "Mickey Mouse" into a copyrighted image, then yeah, that algorithm is a copyright violation.

On the other hand, if you use a random number generator to generate a random 500 kb number A, and then compute C = (B - A) where B is your copyrighted image, then in isolation both A and C are random numbers. If you just distribute A and nobody has any way of knowing or guessing C, then no copyright violation has occurred. If you just distribute C and nobody has any way of knowing or guessing A, then no copyright violation has occurred. But if you distribute them together, or if you distribute one and someone else distributes the other, or if one of them is a commonly known or guessable number, then you're probably violating copyright and trying to get away on a technicality.

But it's not enough for it to simply be possible to "decrypt" something into another thing. A string of pure 0s can be "decrypted" into any image or text. A word processor will generate any copyrighted text if the user presses the right keys in the right combination. I think there has to be some level of intent or ease or information theory value such that the file is doing the majority of the work.

So I'll concede that you make a LLM that will easily reproduce copyrighted material with simple descriptions and passwords, then I can see there being issues there. Similar to how if an author keeps spitting out blatant ripoffs of copyrighted works with a couple of words changed they'll get in trouble. But simply having used them in the training material is not itself a copyright violation. A robust LLM that has trained on lots of copyrighted materials but refuses to replicate them verbatim is not a copyright violation simply for having learned from them (which seems to be the primary objection that artists are having, not the actual reproduction of their work which I would agree is bad).

Indeed, I claim that's closer to description than analogy. An LLM is a way of encoding (lossily) a whole lot of textual data in a very opaque form, in a way that you can get much of that data out by giving fairly intuitive prompts.