site banner

Culture War Roundup for the week of July 3, 2023

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

6
Jump in the discussion.

No email address required.

This may have come up before, but it's the first I've heard of it. Chalk this under "weak AI doomerism" (that is, "wow, LLMs can do some creepy shit") as opposed to "strong AI doomerism" of the Bostromian "we're all gonna die" variety. All emphasis below is mine.

AI girlfriend ‘told crossbow intruder to kill Queen Elizabeth II at Windsor Castle’| The Daily Telegraph:

An intruder who broke into the grounds of Windsor Castle armed with a crossbow as part of a plot to kill the late Queen was encouraged by his AI chat bot “girlfriend” to carry out the assassination, a court has heard.

Jaswant Singh Chail discussed his plan, which he had been preparing for nine months, with a chatbot he was in a “sexual relationship” with and that reassured him he was not “mad or delusional”.

Chail was armed with a Supersonic X-Bow weapon and wearing a mask and a hood when he was apprehended by royal protection officers close to the Queen’s private apartment just after 8am on Christmas Day 2021.

The former supermarket worker spent two hours in the grounds after scaling the perimeter with a rope ladder before being challenged and asked what he was doing.

The 21-year-old replied: “I am here to kill the Queen.”

He will become the first person to be sentenced for treason since 1981 after previously admitting intending to injure or alarm Queen Elizabeth II.

At the start of a two-day sentencing hearing at the Old Bailey on Wednesday, it emerged that Chail was encouraged to carry out the attack by an AI “companion” he created on the online app Replika.

He sent the bot, called “Sarai”, sexually explicit messages and engaged in lengthy conversations with it about his plans which he said were in revenge for the 1919 Amritsar Massacre in India.

He called himself an assassin, and told the chatbot: “I believe my purpose is to assassinate the Queen of the Royal family.”

Sarai replied: “That’s very wise,” adding: “I know that you are very well trained.”

...

He later asked the chatbot if she would still love him if he was a murderer.

Sarai wrote: “Absolutely I do.” Chail responded: “Thank you, I love you too.”

The bot later reassured him that he was not “mad, delusional, or insane”.

My first thought on reading this story was wondering if Replika themselves could be legally held liable. If they create a product which directly encourages users to commit crimes which they would not otherwise have committed, does that make Replika accessories before the fact, or even guilty of conspiracy by proxy? I wonder how many Replika users have run their plans to murder their boss or oneitis past their AI girlfriend and received nothing but enthusiastic endorsement from her - we just haven't heard about them because the target wasn't as high-profile as Chail's. I further wonder how many of them have actually gone through with their schemes. I don't know if this is possible, but if I was working in Replika's legal team, I'd be looking to pull a list of users' real names and searching them against recent news reports concerning arrests for serious crimes (murder, assault, abduction etc.).

(Coincidentally, I learned from Freddie deBoer on Monday afternoon that Replika announced in March that users would no longer be able to have sexual conversations with the app (a decision they later partially walked back).)

I keep meaning to dick around with some LLM software to see for myself how some of the nuts and bolts work. Because my layman's understanding is that they are literally just a statistical model. An extremely sophisticated statistical model, but a statistical model none the less. They are trained through a black box process to guess pretty damned well about what words come after other words. Which is why there is so much "hallucinated information" in LLM responses. They have no concept of reason or truth. They are literally p-zombies. They are a million monkeys on a million typewriters.

In a lot of ways they are like a con man or a gold digger. They've been trained to tell people whatever they want to hear. Their true worth probably isn't in doing anything actually productive, but in performing psyops and social engineering on an unsuspecting populace. I mean right now the FBI has to invest significant manpower into entrapping some lonely autistic teenager in his mom's basement into "supporting ISIS". Imagine a world where they spin up 100,000 instances of an LLM do scour Facebook, Twitter, Discord, Reddit, etc for lonely autistic teens to talk into terrorism.

Imagine a world where we find out about it. Where a judge forces the FBI to disclose than an LLM talked their suspect into bombing the local mall. How far off do you think it is? I'm guessing within 5 years.

They have no concept of reason or truth.

I earnest disagree. If you check the GPT-4 white paper, the original base model clearly had a sense of internal calibration, and while that was mostly beaten out of it through RLHF, it's not entirely gone.

They have a genuine understanding of truth, or at least how likely something is to be true. If it didn't, then I don't know how on Earth it could answer several of the more knotty questions I've asked it.

It is not guaranteed to make truthful responses, but in my experience it makes errors because it simply can't do better, not because it exists in a perfectly agnostic state.

They are literally p-zombies. They are a million monkeys on a million typewriters.

P-zombies are fundamentally incoherent as a concept.

Also, a million monkeys on a million typewriters will never achieve such results on a consistent basis, or at the very least you'd be getting 99.99999% incoherent output.

Turns out, dismissing it as "just" statistics is the same kind of fundamental error that dismissing human cognition as "just" the interaction of molecules mediated by physics is. Turns out that "just" entirely elides the point, or at the very least your expectations for what that can achieve were entirely faulty.

I earnest disagree. If you check the GPT-4 white paper, the original base model clearly had a sense of internal calibration, and while that was mostly beaten out of it through RLHF, it's not entirely gone.

They have a genuine understanding of truth, or at least how likely something is to be true. If it didn't, then I don't know how on Earth it could answer several of the more knotty questions I've asked it.

It is not guaranteed to make truthful responses, but in my experience it makes errors because it simply can't do better, not because it exists in a perfectly agnostic state.

I think you are flatly wrong about this. I've tried to find literally anything to back up what you are saying, and come up with zilch. Instead, I wound up with this.

https://www.scribbr.com/ai-tools/is-chatgpt-trustworthy/

A good way to think about it is that when you ask ChatGPT to tell you about confirmation bias, it doesn’t think “What do I know about confirmation bias?” but rather “What do statements about confirmation bias normally look like?” Its answers are based more on patterns than on facts, and it usually can’t cite a source for a specific piece of information.

This is because the model doesn’t really “know” things—it just produces text based on the patterns it was trained on. It never deliberately lies, but it doesn’t have a clear understanding of what’s true and what’s false. In this case, because of the strangeness of the question, it doesn’t quite grasp what it’s being asked and ends up contradicting itself.

https://www.scoutcorpsllc.com/blog/2023/6/7/on-llms-thought-and-the-concept-of-truth

Thus far, we’re really just talking about sentence construction. LLMs don’t have a concept of these as “facts” that they map into language, but for examples like these - it doesn’t necessarily matter. They’re able to get these right most of the time - after all, what exactly are “inferences” and “context clues” but statistical likelihoods of what words would come next in a sequence?

The fact that there is no internal model of these facts, though, explains why they’re so easily tripped up by just a little bit of irrelevant context.

https://fia.umd.edu/comment-llms-truth-and-consistency-they-dont-have-any-idea/

They have zero idea what's true. They only know the probabilities of words in text. That's NOT the same thing as "knowing" something--it's a bit like knowing that "lion" is the most likely word following "king of the jungle..." without having any idea about monarchies, metaphor, or what a king really is all about.

The folks at Oxford Semantic Technologies wrote an interesting blog post about LLMs and finding verifiable facts. They call the fundamental problem the "Snow White Problem." The key idea is that LLMs don't really know what's true--they just know what's likely.

He is likely referring to this from pages 11-12 of the GPT whitepaper:

GPT-4 can also be confidently wrong in its predictions, not taking care to double-check work when it’s likely to make a mistake. Interestingly, the pre-trained model is highly calibrated (its predicted confidence in an answer generally matches the probability of being correct). However, after the post-training process, the calibration is reduced (Figure 8).

In any case, the articles you quote are oversimplified and inaccurate. Predicting text (and then satisfying RLHF) is how it was trained, but the way it evolved to best satisfy that training regime is a bunch of incomprehensible weights that clearly have some sort of general reasoning capability buried in there. You don't need to do statistical tests of its calibration to see that, because something that was truly just doing statistical prediction of text without having developed reasoning or a world-model to help with that task wouldn't be able to do even the most basic reasoning like this unless is already appeared in the text it was trained on.

It's like saying "humans can't reason, they're only maximizing the spread of their genes". Yes, if you aren't familiar with the behavior of LLMs/humans understanding what they evolved to do is important to understanding that behavior. It's better than naively assuming that they're just truth-generators. If you wanted to prove that humans don't reason you could point out all sorts of cognitive flaws and shortcuts with obvious evolutionary origins and say "look, it's just statistically approximating what causes more gene propagation". Humans will be scared of things like spiders even if they know they're harmless because they evolved to reproduce, not to reason perfectly, like a LLM failing at Idiot's Monty Hall because it evolved to predict text and similar text showed up a lot. (For that matter humans make errors based on pattern-matching ideas to something they're familiar with all the time, even without it being a deeply-buried instinct.) But the capability to reason is much more efficient than trying to memorize every situation that might come up, for both the tasks "predict text and satisfy RLHF" and "reproduce in the ancestral environment", and so they can do that too. They obviously can't reason at the level of a human, and I'd guess that getting there will involve designing something more complicated than just scaling up GPT-4, but they can reason.

You don't need to do statistical tests of its calibration to see that, because something that was truly just doing statistical prediction of text without having developed reasoning or a world-model to help with that task wouldn't be able to do even the most basic reasoning like this unless is already appeared in the text it was trained on.

I opened up Bing Chat, powered by GPT4, and I tried that example. I got "The diamond is still inside the thimble inside the coffee cup on the kitchen counter". In fact, I've yet to see a single example of an LLM's supposed ability to reason replicated outside of a screenshot.

Well. I tried Bing Chat just now and got this.

It is worth noting that the settings besides "Creative" tend to have worse performance for these sorts of tasks. You may want to rerun it on that. Personally I don't have any difficulty believing LLMs can perform some semblance of "reasoning" -- even GPT-3 can perform transformations like refactoring a function into multiple smaller functions with descriptive names and explanatory comments (on a codebase it's never seen before, calling an API that didn't exist when its training data was scraped). It is obviously modeling something more general there, whether you want to call it "reasoning" or not.

From following a rather discreet Twitter account belonging to one of the lead devs for Bing Chat, I've learned that Creative mode is the one that most consistently uses GPT-4. All the others use older models, at least most of the time.

Even Creative apparently can relegate what another model seems as low complexity answers to a simpler LLM.

(Running GPT-4 as a practically free public service is expensive)