This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.
Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.
We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:
-
Shaming.
-
Attempting to 'build consensus' or enforce ideological conformity.
-
Making sweeping generalizations to vilify a group you dislike.
-
Recruiting for a cause.
-
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.
In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:
-
Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
-
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
-
Don't imply that someone said something they did not say, even if you think it follows from what they said.
-
Write like everyone is reading and you want them to be included in the discussion.
On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

Jump in the discussion.
No email address required.
Notes -
You used to get this sorta thing on ratsphere tumblr, where "rapture of the nerds" was so common as to be a cliche. I kinda wonder if deBoer's "imminent AI rupture" follows from that and he edited it, or if it's just a coincidence. There's a fun Bulverist analysis of why religion was the focus there and 'the primacy of material conditions' from deBoer, but that's even more of a distraction from the actual discussion matter.
There's a boring sense where it's kinda funny how bad deBoer is at this. I'll overlook the typos, because lord knows I make enough of those myself, but look at his actual central example, that he opens up his story around:
There's a steelman of deBoer's argument, here. But the one he actually presented isn't engaging, in the very slightest, with what Scott is trying to bring up, or even with a strawman of what Scott was trying to bring up. What, exactly, does deBoer believe a cure to aging (or even just a treatment for diabetes, if we want to go all tech-hyper-optimism) would look like, if not new medical technology? What, exactly, does deBoer think of the actual problem of long-term commitment strategies in a rapidly changing environment?
Okay, deBoer doesn't care, and/or doesn't even recognize those things as questions. It's really just a springboard for I Hate Advocates For This Technology. Whatever extent he's engaging with the specific claims is just a tool to get to that point. Does he actually do his chores or eat his broccoli?
Well, no.
Ah, nobody makes that claim, r-
Okay, so 'nobody' includes the very person making this story.
This isn't even a good technical understanding of how ChatGPT, as opposed to just the LLM, work, and even if I'm not willing to go as far as self_made_human for people raising the parrots critique here, I'm still pretty critical for it, but the more damning bit is where and deBoer is either unfamiliar with or choosing to ignore the many domains in favor of One
StudyRando With A Chess Game. Will he change his mind if someone presents a chess-focused LLM with a high ELO score?I could break into his examples and values a lot deeper -- the hallucination problem is actually a lot more interesting and complicated, questions of bias are usually just smuggling in 'doesn't agree with the writer's politics' but there are some genuine technical questions -- but if you locked the two of us in a room and only provided escape if we agreed I still don't think either of us would find discussing it with each other more interesting that talking to the walls. It's not just that we have different understandings of what we're debating; it's whether we're even trying to debate something that can be changed by actual changes in the real world.
Okay, deBoer isn't debating honestly. His claim about New York Times fact-checking everything is hilarious, but his link to a special issue that he literally claims "not a single line of real skepticism appears" and also has as its first headline "Everyone is Using AI for Everything. Is That Bad?" and includes the phrase "The mental model I sometimes have of these chatbots is as a very smart assistant who has a dozen Ph.D.s but is also high on ketamine like 30 percent of the time". He tries to portray Mounk as outraged by "indifference of people like Tolentino (and me) to the LLM “revolution.”" But look at Mounk or Tolentino's actual pieces, and there's actual factual claims that they're making, not just vague vibes that they're bouncing off each other; the central criticism Mounk has is whether Tolentino's piece and its siblings are actually engaging with what LLMs can change rather than complaining about a litany of lizardman evils. (At least deBoer's not falsely calling anyone a rapist, this time.)
((Tbf, Mounk, in turn, is just using Tolentino as a springboard; her piece is actually about digital disassociation and the increasing power of AIgen technologies that she loathes. It's not really the sorta piece that's supposed to talk about how you grapple with things, for better or worse.))
But ultimately, that's just not the point. None of deBoer's readers are going to treat him any less seriously because of ChessLLM (or because many LLMs will, in fact, both say they reason and quod erat demonstratum), or because deBoer turns "But in practice, I too find it hard to act on that knowledge." into “I too find it hard to act on that knowledge [of our forthcoming AI-driven species reorganization]” when commenting on an essay that does not use the word "species" at all, and only uses "organization" twice in the same paragraph to talk about regulatory changes, and when "that knowledge" is actually just Mounk's (imo, wrong) claim that AI is under-hyped. That's not what his readers are paying him for, and that's not why anyone who links to him in the slightly most laudatory manner is doing so.
The question of Bulverism versus factual debate is an important one, but it's undermined when the facts don't matter, either.
Huh. I was confident that I had a better writeup about why "stochastic parrots" are a laughable idea, at least as a description for LLMs. But no, after getting a minor headache figuring out the search operators here, it turns out that's all I've written on the topic.
I guess I never bothered because it's a Gary Marcus-tier critique, and anyone using it loses about 20 IQ points in my estimation.
But I guess now is as good a time as any? In short, it is a pithy, evocative critique that makes no sense.
LLMs are not inherently stochastic. They have a (not usually exposed to end-user except via API) setting called temperature. Without going into how that works, it suffices it to say that by setting the value to zero, their output becomes deterministic. The exact same prompt gives the exact same output.
The reason why temperature isn't just set to zero all the time is because the ability to choose something other than the next most likely token has benefits when it comes to creativity. At the very least it saves you from getting stuck with the same subpar result.
Alas, this means that LLMs aren't stochastic parrots. Minus the stochasticity, are they just "parrots"? Anyone thinking this is on crack, since Polly won't debug your Python no matter how many crackers you feed her.
If LLMs were merely interpolating between memorized n-grams or "stitching together" text, their performance would be bounded by the literal contents of their training data. They would excel at retrieving facts and mimicking styles present in the corpus, but would fail catastrophically at any task requiring genuine abstraction or generalization to novel domains. This is not what we observe.
Let’s get specific. The “parrot” model implies the following:
LLMs can only repeat (paraphrase, interpolate, or permute) what they have seen.
They lack generalization, abstraction, or true reasoning.
They are, in essence, Markov chains with steroids.
To disprove any of those claims, just gestures angrily look at the things they can do. If winning gold in the latest IMO is something a "stochastic parrot" can pull off, then well, the only valid takeaway is that the damn parrot is smarter than we thought. Definitely smarter than the people who use the phrase unironically.
The inventors of the phrase, Bender & Koller gave two toy “gotchas” that they claimed no pure language model could ever solve: (1) a short vignette about a bear chasing a hiker, and (2) the spelled-out arithmetic prompt “Three plus five equals”. GPT-3 solved both within a year. The response? Crickets, followed by goal-post shifting: “Well, it must have memorized those exact patterns.” But the bear prompt isn’t in any training set at scale, and GPT-3 could generalize the schema to new animals, new hazards, and new resolutions. Memorization is a finite resource but generalization is not.
(I hope everyone here recalls that GPT-3 is ancient now)
On point 2: Consider the IMO example. Or better yet, come up with a rigorous definition of reasoning by which we can differentiate a human from an LLM. It's all word games, or word salad.
On 3: Just a few weeks back, I was trying to better understand the actual difference between a Markov Chain and an LLM, and I had asked o3 if it wasn't possible to approximate the latter with the former. After all, I wondered, if MCs only consider the previous unit (usually words, or a few words/n-gram), then couldn't we just train the MC to output the next word conditioned on every word that came before? The answer was yes, but that this was completely computationally intractable. The fact that we can run LLMs on something smaller than a Matrioshka brain is because of their autoregressive nature, and the brilliance of the transformer architecture/attention mechanism.
Overall, even the steelman interpretation of the parrot analogy is only as helpful as this meme, which I have helpfully appended below. It is a bankrupt notion, a thought-terminating cliché at best, and I wouldn't cry if anyone using it meets a tiger outside the confines of a cage.
/images/17544215520465958.webp
Computationally, maybe all we are is Markov chains. I'm not sold, but Markov chat bots have been around for a few decades now and used to fool people occasionally even at smaller scales.
LLMs can do pretty impressive things, but I haven't seen convincing evidence that any of them have stepped clearly outside the bounds of their training dataset. In part that's hard to evaluate because we've been training them on everything we can find. Can a LLM trained on purely pre-Einstein sources adequately discuss relativity? A human can be well versed in lots of things with substantially less training material.
I still don't think we have a good model for what intelligence is. Some have recently suggested "compression", which is interesting from an information theory perspective. But I won't be surprised to find that whatever it is, it's actually an NP-hard problem in the perfect case, and everything else is just heuristics and approximations trying to be close. In some ways it'd be amusing if it turns out to be a good application of quantum computing.
I don't want to speak on 'intelligence' or genuine reasoning or heuristics and approximations, but when it comes to going outside the bounds of their training data, it's pretty trivially possible to take an LLM and give a problem related to a video game (or a mod for a video game) that was well outside of its knowledge cutoff or training date.
I can't test this right now, it's definitely not an optimal solution (see uploaded file for comparison), and I think it misinterpreted the Evanition operator, but it's a question that I'm pretty sure didn't have an equivalent on the public web anywhere until today. There's something damning in getting a trivial computer science problem either non-optimal or wrong, especially when given the total documentation, but there's also something interesting in getting one like this close at all with such minimum of information.
/images/17544296446888535.webp
That is pretty impressive. Is it allowed to search the web? It looks like it might be. I think the canonical test I'm proposing would disallow that, but it is a useful step in general.
Huh.
Uploading just the Patterns section of the HexBook webpage and disabling search on web looks better even on Grok3, though that's just a quick glance and I won't be able to test it for a bit.EDIT: nope, several hallucinated patterns on Grok 3, including a number that break from the naming convention. And Grok4 can't have web search turned off. Bah.
Have you tried simply asking it not to search the web? The models usually comply when asked. If they don't, it should be evident from the UI.
That's a fair point, and does seem to work with Grok, as does just giving it only one web page and requesting it to not use others. Still struggles, though.
That said, a lot of the logic 'thinking' steps are things like "The summary suggests list operations exist, but they're not fully listed due to cutoff.", getting confused by how Consideration/Introspection works (as start/end escape characters) or trying to recommend Concat Distillation, which doesn't exist but is a reasonable (indeed, the code) name for Speaker's Distillation. So it's possible I'm more running into issues with the way I'm asking the question, such that Grok's research tooling is preventing it from seeing the necessary parts of the puzzle to find the answer.
I tried using o3, but it correctly noted that the file you mentioned isn't available, and its web browsing tool failed when trying to use the website.
I can't do anything about the missing document, but I did manually copy and paste most of the website. This is its answer:
https://chatgpt.com/s/t_6892b68c0c3081919777d514df3ba8c2
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link