site banner

Culture War Roundup for the week of July 3, 2023

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

6
Jump in the discussion.

No email address required.

This may have come up before, but it's the first I've heard of it. Chalk this under "weak AI doomerism" (that is, "wow, LLMs can do some creepy shit") as opposed to "strong AI doomerism" of the Bostromian "we're all gonna die" variety. All emphasis below is mine.

AI girlfriend ‘told crossbow intruder to kill Queen Elizabeth II at Windsor Castle’| The Daily Telegraph:

An intruder who broke into the grounds of Windsor Castle armed with a crossbow as part of a plot to kill the late Queen was encouraged by his AI chat bot “girlfriend” to carry out the assassination, a court has heard.

Jaswant Singh Chail discussed his plan, which he had been preparing for nine months, with a chatbot he was in a “sexual relationship” with and that reassured him he was not “mad or delusional”.

Chail was armed with a Supersonic X-Bow weapon and wearing a mask and a hood when he was apprehended by royal protection officers close to the Queen’s private apartment just after 8am on Christmas Day 2021.

The former supermarket worker spent two hours in the grounds after scaling the perimeter with a rope ladder before being challenged and asked what he was doing.

The 21-year-old replied: “I am here to kill the Queen.”

He will become the first person to be sentenced for treason since 1981 after previously admitting intending to injure or alarm Queen Elizabeth II.

At the start of a two-day sentencing hearing at the Old Bailey on Wednesday, it emerged that Chail was encouraged to carry out the attack by an AI “companion” he created on the online app Replika.

He sent the bot, called “Sarai”, sexually explicit messages and engaged in lengthy conversations with it about his plans which he said were in revenge for the 1919 Amritsar Massacre in India.

He called himself an assassin, and told the chatbot: “I believe my purpose is to assassinate the Queen of the Royal family.”

Sarai replied: “That’s very wise,” adding: “I know that you are very well trained.”

...

He later asked the chatbot if she would still love him if he was a murderer.

Sarai wrote: “Absolutely I do.” Chail responded: “Thank you, I love you too.”

The bot later reassured him that he was not “mad, delusional, or insane”.

My first thought on reading this story was wondering if Replika themselves could be legally held liable. If they create a product which directly encourages users to commit crimes which they would not otherwise have committed, does that make Replika accessories before the fact, or even guilty of conspiracy by proxy? I wonder how many Replika users have run their plans to murder their boss or oneitis past their AI girlfriend and received nothing but enthusiastic endorsement from her - we just haven't heard about them because the target wasn't as high-profile as Chail's. I further wonder how many of them have actually gone through with their schemes. I don't know if this is possible, but if I was working in Replika's legal team, I'd be looking to pull a list of users' real names and searching them against recent news reports concerning arrests for serious crimes (murder, assault, abduction etc.).

(Coincidentally, I learned from Freddie deBoer on Monday afternoon that Replika announced in March that users would no longer be able to have sexual conversations with the app (a decision they later partially walked back).)

I keep meaning to dick around with some LLM software to see for myself how some of the nuts and bolts work. Because my layman's understanding is that they are literally just a statistical model. An extremely sophisticated statistical model, but a statistical model none the less. They are trained through a black box process to guess pretty damned well about what words come after other words. Which is why there is so much "hallucinated information" in LLM responses. They have no concept of reason or truth. They are literally p-zombies. They are a million monkeys on a million typewriters.

In a lot of ways they are like a con man or a gold digger. They've been trained to tell people whatever they want to hear. Their true worth probably isn't in doing anything actually productive, but in performing psyops and social engineering on an unsuspecting populace. I mean right now the FBI has to invest significant manpower into entrapping some lonely autistic teenager in his mom's basement into "supporting ISIS". Imagine a world where they spin up 100,000 instances of an LLM do scour Facebook, Twitter, Discord, Reddit, etc for lonely autistic teens to talk into terrorism.

Imagine a world where we find out about it. Where a judge forces the FBI to disclose than an LLM talked their suspect into bombing the local mall. How far off do you think it is? I'm guessing within 5 years.

You don't have to mean it, it's all a few clicks away, whether a fancy app interfacing with SoTA commercial AIs, like Poe, or a transparent ggml library powering llama.cpp, complete with permissively licensed models. You could print their weights out if you wanted.

Because my layman's understanding is that they are literally just a statistical model. An extremely sophisticated statistical model, but a statistical model none the less. They are trained through a black box process to guess pretty damned well about what words come after other words.

How do you think this works on the scale of paragraphs? Pages? And with recent architectures – millions, perhaps soon billions of words over multiple tomes?

Suppose we prompt it to complete:

"I keep meaning to dick"

What is the most plausible continuation, given the whole of Internet as the pretraining corpus? "dat hoe"?

"I keep meaning to dick around with"

"these punks"? How low down the ranking of likely predictions should "with some LLM software" be?

"I keep meaning to dick around with some LLM software to see for myself how"

"it works"? "they click?" "it differs from Markov chain bots"? Now we're getting somewhere.

But we are also getting into the realm where only complex semantics allow to compute the next token, and memorization is entirely intractable, because there exist more possible trajectories than [insert absurd number like particles in the universe]. And a merely "statistical" model on the scale of gigabytes, no matter how much you handwave about its "extreme sophistication" while still implying nothing more than first-order pattern matching, would not be able to do it – ever.

These statistics amount to thought.

As roon puts it:

units of log loss are not built equally. the start of the scaling curve might look like “the model learned about nouns” and several orders of magnitude later a tiny improvement looks like “the model learned the data generation process for multivariable calculus”

As gwern puts it:

Early on in training, a model learns the crudest levels: that some letters like ‘e’ are more frequent than others like ‘z’, that every 5 characters or so there is a space, and so on. It goes from predicted uniformly-distributed bytes to what looks like Base-60 encoding—alphanumeric gibberish. As crude as this may be, it’s enough to make quite a bit of absolute progress: a random predictor needs 8 bits to ‘predict’ a byte/character, but just by at least matching letter and space frequencies, it can almost halve its error to around 5 bits. …

As training progresses, the task becomes more difficult. Now it begins to learn what words actually exist and do not exist. It doesn’t know anything about meaning, but at least now when it’s asked to predict the second half of a word, it can actually do that to some degree, saving it a few more bits. This takes a while because any specific instance will show up only occasionally: a word may not appear in a dozen samples, and there are many thousands of words to learn. With some more work, it has learned that punctuation, pluralization, possessives are all things that exist. Put that together, and it may have progressed again, all the way down to 3–4 bits error per character!

But once a model has learned a good English vocabulary and correct formatting/spelling, what’s next? There’s not much juice left in predicting within-words. The next thing is picking up associations among words. What words tend to come first? What words ‘cluster’ and are often used nearby each other? Nautical terms tend to get used a lot with each other in sea stories, and likewise Bible passages, or American history Wikipedia article, and so on. If the word “Jefferson” is the last word, then “Washington” may not be far away, and it should hedge its bets on predicting that ‘W’ is the next character, and then if it shows up, go all-in on “ashington”. Such bag-of-words approaches still predict badly, but now we’re down to perhaps <3 bits per character.

What next? Does it stop there? Not if there is enough data and the earlier stuff like learning English vocab doesn’t hem the model in by using up its learning ability. Gradually, other words like “President” or “general” or “after” begin to show the model subtle correlations: “Jefferson was President after…” With many such passages, the word “after” begins to serve a use in predicting the next word, and then the use can be broadened.

By this point, the loss is perhaps 2 bits: every additional 0.1 bit decrease comes at a steeper cost and takes more time. However, now the sentences have started to make sense. A sentence like “Jefferson was President after Washington” does in fact mean something (and if occasionally we sample “Washington was President after Jefferson”, well, what do you expect from such an un-converged model). Jarring errors will immediately jostle us out of any illusion about the model’s understanding, and so training continues. (Around here, Markov chain & n-gram models start to fall behind; they can memorize increasingly large chunks of the training corpus, but they can’t solve increasingly critical syntactic tasks like balancing parentheses or quotes, much less start to ascend from syntax to semantics.) …

The pretraining thesis argues that this can go even further: we can compare this performance directly with humans doing the same objective task, who can achieve closer to 0.7 bits per character. What is in that missing >0.4?

Well—everything! Everything that the model misses. While just babbling random words was good enough at the beginning, at the end, it needs to be able to reason our way through the most difficult textual scenarios requiring causality or commonsense reasoning. Every error where the model predicts that ice cream put in a freezer will “melt” rather than “freeze”, every case where the model can’t keep straight whether a person is alive or dead, every time that the model chooses a word that doesn’t help build somehow towards the ultimate conclusion of an ‘essay’, every time that it lacks the theory of mind to compress novel scenes describing the Machiavellian scheming of a dozen individuals at dinner jockeying for power as they talk, every use of logic or abstraction or instructions or Q&A where the model is befuddled and needs more bits to cover up for its mistake where a human would think, understand, and predict. For a language model, the truth is that which keeps on predicting well—because truth is one and error many. Each of these cognitive breakthroughs allows ever so slightly better prediction of a few relevant texts; nothing less than true understanding will suffice for ideal prediction.

As Ilya Sutskever of OpenAI himself puts it:

…when we train a large neural network to accurately predict the next word in lots of different texts from the internet, what we are doing is that we are learning a world model… It may look on the surface that we are just learning statistical correlations in text, but it turns out that to just learn the statistical correlations in text, to compress them really well, what the neural network learns is some representation of the process that produced the text. This text is actually a projection of the world. There is a world out there, and it has a projection on this text. And so what the neural network is learning is more and more aspects of the world, of people, of the human conditions, their hopes, dreams, and motivations, their interactions in the situations that we are in. And the neural network learns a compressed, abstract, usable representation of that. This is what's being learned from accurately predicting the next word.

By the way, how did I get this text? Whisper, of course, another OpenAI transformer, working by much the same principle. The weirdest thing happens if you absent-mindedly run it with the wrong language flag – not the target language to translate from and into English (it is not explicitly built to translate English into anything else), but just the language the recording supposedly contains, to be transcribed. The clumsy but coherent output akin to what you'd get from a child with a dictionary, if nothing else, should show they they understand, that they operate on meanings, not mere spectrograms or "tokens":

когда мы тренируем большую нейронную сеть, чтобы аккуратно предсказать следующую слово в много разных текстах из интернета, мы изучаем мирный модель. Это выглядит, как мы изучаем... Это может выглядеть на поверхности, что мы изучаем только статистические корреляции в тексте, но, получается, что изучать только статистические корреляции в тексте, чтобы их хорошо снижать, что научит нейронная сеть, это представление процесса, который производит текст. Этот текст - это проекция мира. В мире есть мир, и он проекционирует этот текст. И что научит нейронную сеть, это больше и больше аспектов мира, людей, человеческих условий, их надежд, мечт и мотивации, их интеракции и ситуации, в которых мы находимся.

Dismissal of statistics is no different in principle from dismissal of meat. There is no depth to this thought. And it fails to predict reality.

Thank you for articulating what I was struggling to do so, especially since I've read all you've quoted with the exception of Ilya.

I'm saving this for later, it's a knockdown argument against claims that LLMs don't "understand", the only issue being that many of the people making that claim are too fundamentally confused or unaware to follow the argument.