This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.
Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.
We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:
-
Shaming.
-
Attempting to 'build consensus' or enforce ideological conformity.
-
Making sweeping generalizations to vilify a group you dislike.
-
Recruiting for a cause.
-
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.
In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:
-
Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
-
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
-
Don't imply that someone said something they did not say, even if you think it follows from what they said.
-
Write like everyone is reading and you want them to be included in the discussion.
On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

Jump in the discussion.
No email address required.
Notes -
...
It's not "naive" it's generating an average. If your training data is full of extraneous material (or otherwise insufficiently tokenized/vetted) your response will also be full of extraneous material, and again its not rationalizing it's averaging.
...
Again, its not "naive" it is generating an average if the bulk of the tokenized training data related to your prompt is press releases, the response is going to reflect the press releases. Whether those press releases are true or false doesn't enter into the equation. This is expected.
...
Can you elaborate on what you think words like "read", "searches", and "know" mean in this context. Im not asking just to pedantic, how you think about this question has informs how you approach algorithmic behavior.
Edit: if that is a bit too abstract instead try explain why you believe that the algo "knows" which claims are likely spurious and then explain why you would expect that to have any influence on the algorithm's output.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
My experience with AI bots has generally been that they are extremely articulate when it comes to producing correct English text, but they have no awareness or intentionality and therefore no sense of relationship to fact, and no sense of context or meaning. What they do very well is string together words in response to prompts, and despite heroic efforts to get their output to be more fact-sensitive, the fundamental issue has never really been overcome.
I call them nonsense because I think that sense requires some sort of relationship to both fact and context. To be sensible is to be aware of your surroundings. That's not the case with bots.
I would add, at least, that this:
seems to depend on definitions of rationality or intelligence that I don't think I share. I think bots are very efficient at producing English text, even quite complex text. It's trivial enough to show that a bot can produce a better written letter or better poem or what have you than the average man or woman on the street.
But I think that written verbal acuity is, at best, a very restricted kind of 'intelligence'. In human beings we use it as a reasonable proxy for intelligence and make estimations based off it because, in most cases, written expression does correlate well with other measures of intelligence. But those correlations don't apply with machines, and it seems to me that a common mistake today is for people to just apply them. This is the error of the Turing test, isn't it? In humans, yes, expression seems to correlate with intelligence, at least in broad terms. But we made expression machines and because we are so used to expression meaning intelligence, personality, feeling, etc., we fantasise all those things into being, even when the only thing we have is an expression machine.
Bots and LLMs can produce statements that look very polished, and which purport to describe the world. In many cases, those descriptions are even accurate. But they are still, it seems to me, generating nonsense.
The other day I gave Sonnet 7000 lines of code, (much of it irrelevant to this specific task) and asked it to create a feature in quite general language.
I get out six files that do everything I've asked for and a bunch of random, related, useful things, plus some entirely unnecessary stuff like a word cloud (maybe it thinks I'm one of those people who likes word clouds). There are some weird leap-of-logic hacks, showing imaginary figures in one of the features I didn't even ask for.
But it just works. Oneshot.
How is that not intelligence? What do we even mean by intelligence if not that? Sonnet 4 has to interpret my meaning, formulate a plan, transform my meaning into computer code and then add things it thinks fit in the context of what I asked.
Fact-sensitive? It just works. It's sensitive to facts, if I want it to change something it will do it. I accidentally failed to rename one of the files and got an error. I tell Sonnet about the error, it deduces I don't have the file or misnamed it, tells me to check this and I feel like a fool. You simply can't write working code without connection to 'fact'. It's not 'polished', it just works.
How the hell can an AI write thousands of words of fiction if it doesn't have a relationship with 'context'? We know it can do this. I have seen it myself.
Now if you're talking about spatial intelligence and visual interpretation, then sure. AI is subhuman in spatial reasoning. A blind person is even more subhuman in visual tasks. But a blind person is not necessarily unintelligent because of this, just as modern AI is not unintelligent because of its blind spots in the tokenizer or occasional weaknesses.
The AI-doubter camp seems to be taking extreme liberties with the meaning of 'intelligence', bringing it far beyond the meaning used by reasonable people.
I can't actually tell what you asked a bot to do. You asked a bot to 'create a feature'? What the heck is that? A feature of what? At first I assumed you meant a coding task of some kind, but then you described it as writing 'thousands of words of fiction', which sounds like something else entirely. I have no idea what you had a bot do that you thought was so impressive.
At any rate, I think I've explained myself adequately? To repeat myself:
Yes, a bot can generate 'thousands of words of fiction'. But I already explained why I don't think that's equivalent to intelligence. Generating English sentences is not intelligence. It is one thing that you can do with intelligence, and in humans it correlates sufficiently well with other signs of intelligence that we often safely make assumptions based on it. But an LLM isn't a human, and its ability to generate sentences in no way implies any other ability that we commonly associate with intelligence, much less any general factor of intelligence.
Yes, I made the bot do a programming task.
I ALSO observed it write long-form fiction. This is not an advanced reading comprehension task. It should be obvious that programming and creative writing are two different things.
You said this:
Normal people would think that 'fact' and 'context' would be adequately achieved by writing code that runs and fiction that isn't obviously derpy 'Harry Potter and the cup of ashes that looked like Hermione's parents'. But you have some special, strange definition of intelligence that you never make clear, except to repeat that LLMs do not possess it because they don't have apprehension of fact and context. Yet they do have these qualities, because we can see that they do creative writing and coding tasks and as a result they are intelligent.
I don't buy your appeal to normal people here. I think that most normal people do not think that chatbots are intelligent.
Realistically, I don't think most people can explain why they're not intelligent, because most people don't have definitions of intelligence on-hand. I think for most people it's an I-know-it-when-I-see-it situation. That's why we need to philosophise a bit about it in order to produce more reasonable definitions and criteria for intelligence.
Anyway, I think that intuitions of most normal people would say that bots aren't intelligent, and if we explored that with them, and had a patient, philosophically nuanced conversation about why, we probably would find that most people intuitively think that intelligence involves things like, to quote myself, 'awareness or intentionality'.
It's hard to say what "normal people" think about this (or even what "normal people" are), but in my experience, people I would consider in that category use the label "AI chatbots" to describe things like ChatGPT or Copilot or Deepseek, while also being aware that "AI" is short for "artificial intelligence." This seems fundamentally incompatible with believing that these things aren't "intelligent."
Now, almost every one of these "normal people" I've encountered also believe that these "AI chatbots" lack free will, sentience, consciousness, internal monologue, and often even logical reasoning abilities. "Stochastic parrots" or "autocomplete on steroids" are phrases I've seen used by the more knowledgeable among such people. But given that they're still willing to call these chatbots "AI," I think this indicates that they consider "intelligence" to mean something that doesn't require such things.
Computer scientists call their field computer science despite it being more about mathematics and logic than science, and despite the field having far less to do with computers than one might expect.
Normies have been calling computer opponents in video games "AI" since the 80's despite them knowing that they clearly aren't "intelligent"
Sure, and when I say that I have a "theory" about who took the cookies from the cookie jar, it doesn't meet the same bar that the "theory of relativity" or "theory of evolution" meet in terms of scientific evidence and consensus. That doesn't make my theory not a theory, it just reflects the squishiness of word definitions. Likewise for "science" and "intelligence."
I disagree. I think people consider, say, the ghosts in Pacman or the imps in 1993's Doom "intelligent." Not sentient, not logical, not conscious, but certainly intelligent. Hence the willingness to use the term "enemy artificial intelligence" to describe them. This willingness reflects - a possibly subconscious - understanding that "intelligence" doesn't indicate sentience, consciousness, logical thinking, etc.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
...
Well, I wouldn't use intentionality for bots at all. I think intentionality presupposes consciousness, or that is to say, subjectivity or interiority. Bots have none of those things. I don't think it's possible to get from language manipulation to consciousness.
At any rate, I certainly agree that every ideological person believes untrue things about the world. I'm not sure about the qualification 'for instrumental reasons' - I suspect that's true if you define 'instrumental' broadly enough, but at that point it's becoming trivial. At any rate, if you leave off reasons, I am confident that every person full stop holds some false beliefs.
That doesn't seem like the same thing to me, though. Humans sometimes represent the world falsely to ourselves. That's not what bots do. Bots don't represent the world to themselves at all. We sometimes believe falsely; they don't believe at all. They are not the kinds of things capable of holding beliefs.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
Even the best models will confidently spout absolute falsehoods every once in a while without any warning.
Buddy, have you seen humans?
As a math nerd I seriously despise this line of argument as it ultimately reduces to a fully generalized argument against "true", "false", and "accuracy" as meaningful concepts.
Let's try a concrete example. Excerpted from here:
65.8% accuracy isn't that great, but buddy, have you seen humans?
The state of the art for generating accurate medical diagnoses doesn't involve gathering the brightest highschoolers, giving them another decade(-ish) of formal education, then more clinical experience before asking for their opinions. It involves training an LLM.
More options
Context Copy link
I don't think so. Those concepts still have pretty clear meaning and can be applied to the output of AI as well as humans. What this line of argument is disputing is the (often unstated) conclusion: "therefore, AI is not valuable." But this doesn't follow. Humans distort information, accidentally or maliciously, make errors, hallucinate, and are generally somewhat unreliable, but their output still has value. An AI can share all of those same characteristics and still be very valuable as an information processing agent.
More options
Context Copy link
I invite further clarification.
Imagine a a trick abacus where the beads move on thier own their own via some pseudorandom process, or a pocket calculator where digits are guaranteed to a +/- 1 range. IE you plug in "243 + 67 =" and more often then not you get the answer "320" but you might just as well get the answer "310", "321" or "420". After all, the difference between all of those numbers is very small. Only one digit, and that digit is only off by one.
Now imagine you work in a field where numbers are important, you lives depend on getting this math right. Or maybe you're just doing your taxes, and the Government is going to ruin you if the accounts don't add up.
Are you going to use the trick calculator? If not, why not?
That is not an explanation for:
You're arguing that since LLMs are not perfectly reliable, therefore they're unreliable. There are different degrees of reliability necessary to do useful things with them. It is a false dichotomy to divide them so. I contend that they've crossed the threshold for many important, once well-paying lines of cognitive labor.
Besides, your thought experiment is obviously flawed. If you're sampling from a noisy distribution, what's stopping you from doing so multiple times, to reduce the error bars involved? I'd expect a "math nerd" to be aware of such techniques, or did your interest end before statistics?
If I had to rely on an LLM for truly high-stakes work, I'd be working double time to personally verify the information provided, while also using techniques like running multiple instances of the same prompt, self-critique or debate between multiple models.
Fortunately, that's a largely academic exercise, since very few issues of such consequences should be decided by even modern LLMs. I give it a generation or two before you can fire and forget.
I have no objections to my own doctor using an LLM, and I use them personally. All I ask is that they have the courtesy and common sense to use o3 instead of 4o.
Besides, the contraption you describe is quite similar to how quantum computing works. You get an answer which is sampled from a probability distribution. You are not guaranteed to get a single correct answer. Yet quantum computers are at least theoretically useful.
Hell, as a maths nerd, you should be aware that the overwhelming majority of numbers cannot be physically represented. If you also happen to be a CS nerd on the side, you might also be aware of the vagaries of floating point arithmetic. Digital computers are not perfect, but they're close enough for government work. LLMs are probably close enough for government work too, given the quality of the average bureaucrat.
Humans are fallible. LLMs are fallible, but they're becoming less so. The level of reliability needed for a commercially viable self-driving vehicle is far higher than that for a useful Roomba. And yet, Waymos are now safer than humans.
I rest my case.
You did not say "no", as such i find it disingenuous of you to suddenly back-pedal and claim to care about reliability after the the fact.
Buddy, have you seen humans?
Humans are unreliable. You are a human are you not? You have not given any indication that you care about accuracy or reliability and instead (by chosing to use the trick calculator over doing the math yourself) have strongly implied that you do not care about such things.
Now if you feel that I've been unfairly dismissive, antagonistic, or uncharitable in my response towards you then perhapse then you might begin to grasp why i hate the whole "bUt HuMaNs ArE FaLaBlE ToO UwU" argument with such a passion. Im not claiming that LLMs are unreliable because they are "less than perfect" i am claiming that they are unreliable because they are not only unreliable, but unreliable by design. I know its long but seriously watch the video essay on Badness = 0 I posted up thread. It is highly relevant to this conversation.
Why would anyone answer a thought experiment with a direct factual analysis? I wouldn't use the trick calculator because I would use a normal one, or possibly specialized software that has error-checking that goes beyond faithfully calculating my button presses. Wow, I'm so insightful.
I notice that you haven't answered the question either: Have you seen humans? I personally see dozens of humans on an average day, but I wouldn't want to assume anything about your answer.
Where's the relevance? Was it "Using an LLM to answer your questions will cut your workload by 99% but not 99.99% because you have to follow one link to confirm its response"?
0-6:00 Detail orientation!
6:00 - 9:00 Instead of watching >100 videos each about 10-30 minutes long and assessing them himself (or using any other research strategy), the author used a (now) old model with 5% the parameters of GPT4, and it confused a video about error correction algorithms with a video about admitting to and correcting your errors. He got his answer within minutes.
9:00-12:00 Intro to LLMs and his toy example.
12:00-19:00 BoVeX, which is a typesetting software he made that rewrites text to eliminate "bad" breaks in text (e.g. hyphens, overspacing).
19:00-22:00 Conclusion/credits.
More options
Context Copy link
You're putting far too much into your interpretation of what I initially said. That's the polite way to put it, because it's a lot of putting words in my mouth that I never said.
In the context of:
My point is clearly that humans, even the "best" humans, aren't immune to the same accusation.
What are you on about? If my only option was that faulty calculator, then I would use it, after making every attempt to mitigate its shortcomings. If it was worth my time to do the calculation by hand, I'd do that instead. Yet for anything more complicated than 5 digit sums, I'd be better off working around the faulty calculator. That is the same approach I use with LLMs, to excellent effect. Verify everything that is worth the effort of verifying.
Why would you assume that I don't care about reliability? A perfect calculator beats a faulty calculator. Multiple faulty calculators beat a single faulty calculator. A faulty calculator beats no calculator at all.
Once again, your insistence on dividing the world into "reliable" vs "unreliable" is a choice you're making, and not one of mine. If you, instead, assume that I'm the one making such a claim, you're off by light-years.
Humans are not perfectly reliable, and we have entire systems meant to address that. That's a significant purpose behind the whole civilization thing.
Are human pilots perfectly reliable? No, hence we have copilots, flight computers, and check-lists.
Are human mathematicians perfectly reliable, even working within the rigorous confines of mathematics? Nope. That's why we invented calculators, theorem provers like Coq, and so on.
Am I perfectly reliable? I wish. That's why I make sure to fact-check my own claims and use Google, and yes, LLMs, because I expect the combination to be more robust as well as faster than figuring out everything from first principles myself.
Our entire civilization is a human-fallibility-management-system. So when I say "Buddy, have you seen humans?", I'm not making a "fully generalized argument against 'true' and 'false'". I'm making the opposite point: The pursuit of truth and accuracy is so important that we've spent millennia developing robust, multi-agent, error-correcting systems to compensate for the fact that our base hardware (a single human brain) is unreliable.
Cost and speed are factors too, and one that can be meaningfully traded off with reliability if you can't have it all.
Hardly. If, for some reason, normal calculators weren't an option, then I offered ways to mitigate the failures of even the faulty ones you conjecture. That steps adds extra time and headache, but if you really cared to, you could get indistinguishable results.
Even if were to grant your framing of LLMs as less than perfectly reliable oracles, then I obviously endorse working around those failures. I also point to the fact that humans are less than perfectly reliable.
Besides, you're the one who made the entirely unfounded claim that:
What does you being a math nerd have to do with anything? Without further justification, it's an argument from authority, and authority you then didn't demonstrate. You have yet to remotely demonstrate that I am making a "fully generalized argument" against those concepts. Everything you said afterwards is, at bare minimum, tangential to that point.
Without quantifying "reliability", or even quantifying one's willingness to tradeoff reliability for other things, such an argument is pointless.
Modern electronics are some of the most robust and error-resistant physical devices to ever exist, with more sigmas of accuracy than I care to count. Yet, they're still at risk of failure or inaccuracy, if some random cosmic ray were to hit them during an operation. In situations where you absolutely need to reduce this to the bare minimum, you can pay for ECC memory or run computations in parallel. This still doesn't entirely mitigate the risk, but it reduces it to levels that aren't a concern except over periods of billions of years.
Does this mean that modern computers are "unreliable by design"? Absolutely not. It means that some unreliability is, unfortunately, unavoidable, but can be reduced to tolerable levels. They were designed, in the human-intent sense, for reliability.
You claim LLMs are "unreliable by design". This is a misunderstanding of what they are. LLMs are stochastic by design. This is a feature, not a bug. It allows them to produce a diverse range of outputs from the same prompt, which is essential for creative and exploratory tasks. This stochasticity is controllable via sampling parameters like temperature. If one requires deterministic output for a given state, one can simply set temperature=0. The resulting output will be the single most probable completion. It may still be factually incorrect, but it will not be randomly incorrect in the way your trick abacus analogy suggests. The unreliability is an emergent property of imperfect modeling of the data distribution, not a deliberate design choice in the sense you imply.
The argument "humans are fallible too" is not a "fully generalized argument against 'true' and 'false'". It is the establishment of the relevant baseline for performance. To hold a new technology to a standard of flawless perfection that no existing system (especially its human predecessors) can meet is not a rigorous critique; it is simply moving the goalposts.
More options
Context Copy link
I don't understand why anyone would hate that argument. Humans are also unreliable... not by design, perhaps, but intrinsically due to the realities of biology. The point of the argument is that, even though humans are intrinsically and inescapably unreliable, we still manage to make reliable systems based around relying on them, and as such, the intrinsic, inescapable unreliability of LLMs doesn't make them incapable of being used as the basis of unreliable systems.
There are good arguments to be made against this. It's possible that we can't get LLMs' unreliability to be lower than humans at the same cost. It's possible that even if that were possible, the nature of the unreliability of LLMs will always remain less predictable than that of humans, in such a way as to make making reliable systems based on them impossible. The fact that LLMs can't be shamed or punished based on failing in their reliability could be a fatal flaw for creating reliable systems based on them. And there are probably a myriad of other better reasons I haven't even thought of.
But I'd like to actually see those arguments actually being made. Maybe that video you say you linked makes them, but I'm one of the users of a text-based forum like this who don't have either interest or ability to view long-form videos during normal usage of this forum.
More options
Context Copy link
More options
Context Copy link
o3 is definitely more capable, but it also has a remarkable ability to hallucinate more believable things, and to communicate ideas in highly technical ways that are hard to understand — and thus fact-check — if you’re not a domain-specific expert. I don’t ask ChatGPT questions about personal medical problems, but when I ask dumb shower thoughts about medical research (“what do researchers think causes Alzheimer’s?” etc) it starts going on about highly technical detail with no introduction or explanation. If it’s right, wow is it smart. But if it’s wrong… I’m not smart enough to know how.
With 4o, I know I’m going to get something overly emotive and excessively buttkissing, but at least I can understand what it’s giving me.
That's fair, o3 has a conversational style that is rather unique, even when considering other SOTA reasoning models. It's like a bright zoomer intern with ADHD who will try just about anything.
I would hope that a doctor using o3 would be able to parse the jargon! If not, they have bigger issues than merely using an LLM. 4o might be more conversational, but for knotty problems, I'd rather use o3 itself to explain arcane terminology or have another model break it down for me.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
I'm always right. (except when I'm wrong) I'm in fact many times more accurate than even the best ai models, and I'm just an ordinary person.
I wonder how well you'd do if asked to opine accurately on the range of topics that people demand of their humble chatbots. Better yet, how would you fare if you didn't have access to Google? Search is a relatively new feature for LLMs, and they do better with it enabled.
I doubt you could accurately answer questions regarding astrophysics, botany, niche psychological theories, Color Revolutions, the sexual habits of Australian Indigenes and Ska music.
You would definitely not fare better when it came to specifics like dates and names.
LLMs have grossly superhuman world-knowledge, but not crystalline intelligence. I don't care who you are, not even Gwern could match them.
LLMs do worse with search enabled, because LLM search is garbage in garbage out.
An LLM without search has many advantages over a human without search. But an LLM with search is absolute worthless dogshit garbage compared to a human with search.
I might know much less off the top of my head, but my confidence calibration will be through the roof. Those topics are just begging for hallucinations.
If knowledge isn't a concern and all we care about is a Brier score, I must regretfully inform you that a rock saying "nothing ever happens" has you beat.
A rock generated less bullshit than an LLM. Of course an LLM is much more useful than a rock but the characterization of LLMs as bullshit generators is accurate.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
Sure, but so does everybody else.
I don't. (Not as much as AI at least)
How do you know?
I catch AI spouting falsehoods far more often than AI catches me 🙃
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
...
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link