This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.
Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.
We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:
-
Shaming.
-
Attempting to 'build consensus' or enforce ideological conformity.
-
Making sweeping generalizations to vilify a group you dislike.
-
Recruiting for a cause.
-
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.
In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:
-
Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
-
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
-
Don't imply that someone said something they did not say, even if you think it follows from what they said.
-
Write like everyone is reading and you want them to be included in the discussion.
On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

Jump in the discussion.
No email address required.
Notes -
Even the best models will confidently spout absolute falsehoods every once in a while without any warning.
Buddy, have you seen humans?
As a math nerd I seriously despise this line of argument as it ultimately reduces to a fully generalized argument against "true", "false", and "accuracy" as meaningful concepts.
I invite further clarification.
Imagine a a trick abacus where the beads move on thier own their own via some pseudorandom process, or a pocket calculator where digits are guaranteed to a +/- 1 range. IE you plug in "243 + 67 =" and more often then not you get the answer "320" but you might just as well get the answer "310", "321" or "420". After all, the difference between all of those numbers is very small. Only one digit, and that digit is only off by one.
Now imagine you work in a field where numbers are important, you lives depend on getting this math right. Or maybe you're just doing your taxes, and the Government is going to ruin you if the accounts don't add up.
Are you going to use the trick calculator? If not, why not?
That is not an explanation for:
You're arguing that since LLMs are not perfectly reliable, therefore they're unreliable. There are different degrees of reliability necessary to do useful things with them. It is a false dichotomy to divide them so. I contend that they've crossed the threshold for many important, once well-paying lines of cognitive labor.
Besides, your thought experiment is obviously flawed. If you're sampling from a noisy distribution, what's stopping you from doing so multiple times, to reduce the error bars involved? I'd expect a "math nerd" to be aware of such techniques, or did your interest end before statistics?
If I had to rely on an LLM for truly high-stakes work, I'd be working double time to personally verify the information provided, while also using techniques like running multiple instances of the same prompt, self-critique or debate between multiple models.
Fortunately, that's a largely academic exercise, since very few issues of such consequences should be decided by even modern LLMs. I give it a generation or two before you can fire and forget.
I have no objections to my own doctor using an LLM, and I use them personally. All I ask is that they have the courtesy and common sense to use o3 instead of 4o.
Besides, the contraption you describe is quite similar to how quantum computing works. You get an answer which is sampled from a probability distribution. You are not guaranteed to get a single correct answer. Yet quantum computers are at least theoretically useful.
Hell, as a maths nerd, you should be aware that the overwhelming majority of numbers cannot be physically represented. If you also happen to be a CS nerd on the side, you might also be aware of the vagaries of floating point arithmetic. Digital computers are not perfect, but they're close enough for government work. LLMs are probably close enough for government work too, given the quality of the average bureaucrat.
Humans are fallible. LLMs are fallible, but they're becoming less so. The level of reliability needed for a commercially viable self-driving vehicle is far higher than that for a useful Roomba. And yet, Waymos are now safer than humans.
I rest my case.
You did not say "no", as such i find it disingenuous of you to suddenly back-pedal and claim to care about reliability after the the fact.
Buddy, have you seen humans?
Humans are unreliable. You are a human are you not? You have not given any indication that you care about accuracy or reliability and instead (by chosing to use the trick calculator over doing the math yourself) have strongly implied that you do not care about such things.
Now if you feel that I've been unfairly dismissive, antagonistic, or uncharitable in my response towards you then perhapse then you might begin to grasp why i hate the whole "bUt HuMaNs ArE FaLaBlE ToO UwU" argument with such a passion. Im not claiming that LLMs are unreliable because they are "less than perfect" i am claiming that they are unreliable because they are not only unreliable, but unreliable by design. I know its long but seriously watch the video essay on Badness = 0 I posted up thread. It is highly relevant to this conversation.
You're putting far too much into your interpretation of what I initially said. That's the polite way to put it, because it's a lot of putting words in my mouth that I never said.
In the context of:
My point is clearly that humans, even the "best" humans, aren't immune to the same accusation.
What are you on about? If my only option was that faulty calculator, then I would use it, after making every attempt to mitigate its shortcomings. If it was worth my time to do the calculation by hand, I'd do that instead. Yet for anything more complicated than 5 digit sums, I'd be better off working around the faulty calculator. That is the same approach I use with LLMs, to excellent effect. Verify everything that is worth the effort of verifying.
Why would you assume that I don't care about reliability? A perfect calculator beats a faulty calculator. Multiple faulty calculators beat a single faulty calculator. A faulty calculator beats no calculator at all.
Once again, your insistence on dividing the world into "reliable" vs "unreliable" is a choice you're making, and not one of mine. If you, instead, assume that I'm the one making such a claim, you're off by light-years.
Humans are not perfectly reliable, and we have entire systems meant to address that. That's a significant purpose behind the whole civilization thing.
Are human pilots perfectly reliable? No, hence we have copilots, flight computers, and check-lists.
Are human mathematicians perfectly reliable, even working within the rigorous confines of mathematics? Nope. That's why we invented calculators, theorem provers like Coq, and so on.
Am I perfectly reliable? I wish. That's why I make sure to fact-check my own claims and use Google, and yes, LLMs, because I expect the combination to be more robust as well as faster than figuring out everything from first principles myself.
Our entire civilization is a human-fallibility-management-system. So when I say "Buddy, have you seen humans?", I'm not making a "fully generalized argument against 'true' and 'false'". I'm making the opposite point: The pursuit of truth and accuracy is so important that we've spent millennia developing robust, multi-agent, error-correcting systems to compensate for the fact that our base hardware (a single human brain) is unreliable.
Cost and speed are factors too, and one that can be meaningfully traded off with reliability if you can't have it all.
Hardly. If, for some reason, normal calculators weren't an option, then I offered ways to mitigate the failures of even the faulty ones you conjecture. That steps adds extra time and headache, but if you really cared to, you could get indistinguishable results.
Even if were to grant your framing of LLMs as less than perfectly reliable oracles, then I obviously endorse working around those failures. I also point to the fact that humans are less than perfectly reliable.
Besides, you're the one who made the entirely unfounded claim that:
What does you being a math nerd have to do with anything? Without further justification, it's an argument from authority, and authority you then didn't demonstrate. You have yet to remotely demonstrate that I am making a "fully generalized argument" against those concepts. Everything you said afterwards is, at bare minimum, tangential to that point.
Without quantifying "reliability", or even quantifying one's willingness to tradeoff reliability for other things, such an argument is pointless.
Modern electronics are some of the most robust and error-resistant physical devices to ever exist, with more sigmas of accuracy than I care to count. Yet, they're still at risk of failure or inaccuracy, if some random cosmic ray were to hit them during an operation. In situations where you absolutely need to reduce this to the bare minimum, you can pay for ECC memory or run computations in parallel. This still doesn't entirely mitigate the risk, but it reduces it to levels that aren't a concern except over periods of billions of years.
Does this mean that modern computers are "unreliable by design"? Absolutely not. It means that some unreliability is, unfortunately, unavoidable, but can be reduced to tolerable levels. They were designed, in the human-intent sense, for reliability.
You claim LLMs are "unreliable by design". This is a misunderstanding of what they are. LLMs are stochastic by design. This is a feature, not a bug. It allows them to produce a diverse range of outputs from the same prompt, which is essential for creative and exploratory tasks. This stochasticity is controllable via sampling parameters like temperature. If one requires deterministic output for a given state, one can simply set temperature=0. The resulting output will be the single most probable completion. It may still be factually incorrect, but it will not be randomly incorrect in the way your trick abacus analogy suggests. The unreliability is an emergent property of imperfect modeling of the data distribution, not a deliberate design choice in the sense you imply.
The argument "humans are fallible too" is not a "fully generalized argument against 'true' and 'false'". It is the establishment of the relevant baseline for performance. To hold a new technology to a standard of flawless perfection that no existing system (especially its human predecessors) can meet is not a rigorous critique; it is simply moving the goalposts.
Again, if you feel that i have been uncharitable, perhaps you should take a moment because all i did was volley your own argument (almost word for word) right back at you.
And this is supposed to be an argument for trusting AI over human judgment? It seems to me that you are doing the inverse of what you accused me of doing. Arguing that ecause humans are less than 100% reliable they must be useless.
Because it means being prone to a certain sort of thought-process where you examine every assumption and follow every assertion to its conclusion.
This claim is simply false. I've worked with legacy electronics and there is no comparison. Modern electronics are no where near as robust or fault tolerant they are just light enough and cheap enough that providing multiple redundancy is reasonable by comparison.
No it is a description of how they work, the essence of the Epsom vs Knuthian approach described in the video essay i was referring to.
Meanwhile you are still not engaging with my point. You have not given any indication that you care about accuracy or reliability and instead (by chosing to use the trick calculator over doing the math yourself) you have strongly implied that you do not care about such things at all.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link