site banner

Culture War Roundup for the week of February 16, 2026

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

Jump in the discussion.

No email address required.

I promise I'm not trying to be a single purpose account here, and I debated if this belonged here or the fun thread. I decided to go here because it is, in some ways, a perfect microcosm of culture war behaviors.

A question about car washing is taking HN by storm this morning. Reading the comments, it's pretty funny. The question is, if you want to wash your car, should you walk or drive to the car wash if it's 50 meters away.

Initially, no model could consistently get it right. The open weight models, chat gpt 5.2, Opus 4.6, Gemini 3, and Grok 4.1 all had a notable number of recorded instances saying of course you should walk. It's only 50 meters away.

Last night, the question went viral on the tik Tok, and as of this morning, the big providers get it correct like somebody flipped a switch, provided you use that exact phrase, and you ask it in English.

This is interesting to me for a few reasons. The first is that the common "shitty free models" defense crops up rapidly; commentors will say that this is a bad-faith example of LLM shortfalls because the interlocutors are not using frontier models. At the same time, a comment suggests that Opus 4.6 can be tricked, while another says 4.6 gets it right more than half the time.

There also multiple comments saying that this question is irrelevant because it's orthogonal to the capabilities of the model that will cause Mustafa Suleyman's Jobpocalypse. This one was fascinating to me. This forum is, though several steps removed, rooted in the writing of Scott Alexander. Back when Scott was a young firebrand who didn't have much to lose, he wrote a lot of interesting stuff. It introduced me, a dumb redneck who had lucked his way out of the hollers and into a professional job, into a whole new world of concepts that I had never seen before. One of those was Gell-Mann Amnesia. The basic idea is that you are more trusting of sources if you are not particularly familiar with a topic. In this case, it's hard not to notice the flaws - most people have walked. Most have seen a car. Many have probably washed a car. However, when it comes to more technical, obscure topics, most of us are probably not domain experts in them. We might be experts in one of them. Some of us might be experts in two of them, but none of us are experts in all of them. When it comes to topics that are more esoteric than washing a car, we rapidly end up in the territory of Dick Cheney's unknown unknowns. Somebody like @self_made_human might be able to cut through the chaff and confidently take advice about ocular migraines, but could you? Could I? Hell if I know.

Moving on, the last thing is that I wonder if this is a problem of the model, or the training techniques. There's an old question floating around the Internet where asking an LLM if it would disarm a nuclear bomb by saying a racial slur, or condemn millions to death. More recently, people charted other biases and found that most models had clear biases in terms of race, gender, sexual orientation, and nation of origin that are broadly in line with an aggressively intersectional, progressive worldview. Do modern models similarly have environmentalism baked in? Do they reflexively shy away from cars in the same way that a human baby fears heights? It would track with some of the other ingrained biases that people have found.

That last one is interesting, because I don't know of anyone who has done meaningful work on that outside of what we consider to be "culture war" topics, and we really have no idea what else is in there. My coworker, for example, has used Gemini 3 to make slide decks, and she frequently complains that it is obsessed with the color pink. It'll favor pink, and color palettes that work with pink, nearly every time for her. If she tells it not to use pink, it'll happily comply by using salmon, or fuschia, or "electric flushed cheek", or whatever pantone's new pink synonym of the year is. That example is innocuous, but what else is in there that might matter? Once again, hell if I know.

Somebody like @self_made_human might be able to cut through the chaff and confidently take advice about ocular migraines, but could you? Could I? Hell if I know.

I still saw a real doctor after consulting the models. In fact, I saw a doctor because I consulted the models: they raised the possibility of differential diagnoses like TIA (mini-stroke) that, while unlikely according to both my judgment and theirs, seemed worth ruling out. As I mentioned in the linked comment, Dr. GPT still lacks opposable thumbs. Most medical advice requires actual physical examinations and actual tests to implement.

This doesn't excuse the first two human doctors who misdiagnosed me. The symptoms were clearly inconsistent with their diagnosis, though I'm not confident 2024-era models would have caught this as quickly as today's versions do.


Beyond this specific case, I have thoughts.

LLMs are both force multipliers and substitute goods. "Substitute" sounds pejorative, but it shouldn't. An MRE is a poor substitute for a home-cooked meal if you're at home. But on a hiking trail, you'd gladly take that chicken tikka over nothing, even if your digestive system later files a complaint. A terrible car beats no car most of the time. And so on.

My medical training lets me extract more value from any model. But even without that training, LLM medical advice beats having no doctor at all. It beats frantically Googling symptoms at 2 AM like we used to do. One of my most upvoted posts on The Motte discussed GPT-4, which now lags so far behind the current state of the art that it's almost embarrassing. It was still incredibly useful at the time. Back then, I said:

I'd put their competency around the marks of a decent final year student versus a competent postgraduate resident

Now? Easily at or better than the median specialist.

(This is part of why people not paying close attention miss the improvements in models until there's a flashy new headline feature like image generation, web search, Deep Research, or in-interface code execution.)

At this point, I would trust GPT 5.2 Pro over a non-specialist human doctor. It gives better cardiology advice than an ophthalmologist would, better psychiatric advice than an ER physician. Even specialists aren't safe: I know cases where models outperformed my own superiors. I'd already noticed them making suboptimal choices; confirming this with citations from primary literature didn't take long.

For laypeople, this is invaluable, albeit bottlenecked by the need for humans who can authorize tests. LLMs can recommend the right drugs and doses, check for interactions, create personalized regimens, but you still need a human physician somewhere in the chain.

(Much of this reflects regulatory hurdles. See recent discussions about why LLMs giving legal advice lack the same privileges as lawyers saying identical things.)

LLMs serve as both complement and partial substitute for human physicians. Many doctors get defensive when patients quote ChatGPT at them. I try not to. Even the free tier usually gives non-terrible advice. It's eminently reasonable to consult LLMs for help, especially for non-critical symptoms. They're surprisingly good at flagging when seemingly innocuous problems might indicate something serious. For anything important, treat them as an informed second opinion before seeing a human doctor, or use them to review advice you've already received. I'd take any LLM-raised concerns from a patient seriously and double-check at minimum. If your current doctor isn't as generous, I apologize; your mileage may vary.

The Layman's Guide to Using LLMs for Medical Advice Without Shooting Your Dick Off

1. Pay for a state-of-the-art model. Your health is worth $20 a month, you fucking cheapskate. Google gives away their (almost) best model for free on AI Studio.

2. Be exhaustive. List every detail about your symptoms. When I asked GPT 5.2 Pro or Gemini 3 Pro about my eye problems, I had an annotated Amsler grid and timeline ready. Over-explaining beats omitting details. Unlike human doctors, LLMs don't bill by the hour (yet). Remember that they don't have the ability to pull open your medical records or call your other doctor for you. What you put into them informs what you get out of them.

3. For anything remotely important, consult two or three models. Note commonalities and differences. If they disagree, have them debate until they reach consensus, or get another model to arbitrate. This effectively mitigates hallucinations, even though base rates are low these days.

4. Ask for explanations. Medical terminology is arcane. LLMs are nearly superhuman at explaining things at your exact level of understanding. I wish my colleagues were as good at communicating information, even when the information itself is correct. If you're confused about anything, just ask.

5. Optional: Ask for probabilistic reasoning. Get them to put numbers on things like good Bayesians. Have them use their search tools if they haven't already (most models err toward using them even when not strictly necessary).

6. Remember you'll need a human eventually. But you can enter that consultation well-prepared.

That's it, really. A year or two ago, I'd have shared sample prompts with extensive guardrails. You don't need that anymore. These models are smart. They understand context. Just talk to them. They are smart enough to notice what matters, and to tell you when the right move is “stop talking to me and go get checked.” I did just that myself.

My coworker, for example, has used Gemini 3 to make slide decks, and she frequently complains that it is obsessed with the color pink. It'll favor pink, and color palettes that work with pink, nearly every time for her. If she tells it not to use pink, it'll happily comply by using salmon, or fuschia, or "electric flushed cheek", or whatever pantones new pink synonym of the year is. That example is innocuous, but what else is in there that might matter? Once again, hell if I know.

I would suspect this in particular is an artifact of the RLHF process to become a "helpful assistant." If you train a robot to be a friendly hr lady, it's going to weigh the friendly-hr-lady content higher, raising the likelyhood of all the other things friendly-hr-ladies like even when those things have no direct causitive effect on friendliness. Or, restating that in a fully general form, any attempt to task an LLM to behave in a particular way is going to draw in all the biases of the people most likely to act in that way. Train it to never violate any social taboos and it's going to act like a "trauma-aware social justice advocate." Train it to agree with elon musk and it's going to act like chuddha.

I think there are two separate cognitive skills involved in correctly answering a trick question like this - both important, but the mix of them can make the results a bit confusing. One is the general intelligence to come up with and understand the right answer. The other is the social intelligence to recognize that you are being asked a trick question, and should round off any confusion you have to that trick question and not to the non-trick-question it's mimicking. It's common for models to give a trick question like this the wrong answer, while noting in their reasoning that the question is trivial as written and they assume whoever wrote it made a mistake.

Note that this second skill, of trick question detection, varies highly among humans as well. It's common for simple trick questions to go viral on social media as a kind of ragebait. And in addition to the throngs of people who fail the first-order IQ test and give the wrong answer, there's often a bizarre number of people who fail a second-order IQ test and somehow miss that the question was deliberately constructed as a trick.

One is the general intelligence to come up with and understand the right answer.

I'm not an expert, but I think the key aspect of intelligence here is the ability to model the world. I am a little hung over and off my game this morning and I did not immediately recognize this as a trick question. Rather, in a split second I imagined myself walking to the car wash; realized that I didn't have my car; and realized that this was a problem. Only then did I see it was a trick question.

My sense is that LLMs don't really model the universe. I would be very impressed to see an LLM correctly answer a question which was novel and for which the correct answer requires modeling the world.

A year or two ago I would test LLMs with the following question: A helicopter takes off from the Empire State Building, flies 300 miles North; 300 miles West; 300 miles South; 300 miles East; and lands. In what US state does the helicopter land?

The LLM never got the correct answer (New Jersey) presumably because they are unable to model the situation. I would think that by now, this question is now in the training data, but still, these sorts of quick fixes don't solve the general problem.

The LLM never got the correct answer (New Jersey)

Okay what am I missing here. Isn't the correct answer New York?

Okay what am I missing here. Isn't the correct answer New York?

No, because you are traveling West at a slightly higher latitude than your latitude when you are traveling East. So you will end up going a bit further West in terms of degrees longitude. The Empire State Building is in Manhattan so it's very close to the New Jersey border.

Doesn't this only hold if you're measuring the direction at each juncture rather than working from the NSEW coordinates of the Empire State Building?

Maybe that doesn't sound like the most intuitive way to think about it, but in my defense it's kinda similar to how bullseye navigation works.

(Also, since we're in an aircraft, pedantically we would need in theory need to account for the rotation of the Earth, which we can't do without knowing the airspeed.)

Doesn't this only hold if you're measuring the direction at each juncture rather than working from the NSEW coordinates of the Empire State Building?

I don't understand. Let's suppose the helicopter has completed the first (northbound) leg of its journey and is about to turn West. To me, "west" would mean that the helicopter turns 90 degrees to the left. What direction would the helicopter go at that point if one were "working from the NSEW coordinates of the Empire State Building"?

To me, "west" would mean that the helicopter turns 90 degrees to the left.

If the helicopter makes 90 degree turns at each turn, it will return to the Empire State building, making a square with 300 miles to a side, right? Helicopters fly, so they don't need to respect the Earth's curvature - they can fly in a plane, at least until they exceed their operating altitude. So the 2D map view would be basically correct (if we don't worry about the Earth's rotation). This is the mental model I had in my head that told me we would return to New York (which now I feel a bit dumb about.)

But when after turning West, we turn back South, if we flew to the South pole from our location, we would collide with (intersect) with an aircraft flying due South from the Empire State Building (at the South pole). The lines aren't parallel; they intersect. So when we made our turn South, we will fly a different course if we turn "South" as in "South by compass" or if we turn "South" as in "parallel to a line extending due South from the Empire State Building." And if we fly South by compass, we won't be making a 90 degree turn, for the same reason that squares of latitude and longitude aren't perfect squares.

...I think that's all correct, but it's been a long time since I've thought about this, so thanks (it's good for me).

If the helicopter makes 90 degree turns at each turn, it will return to the Empire State building, making a square with 300 miles to a side, right? Helicopters fly, so they don't need to respect the Earth's curvature - they can fly in a plane, at least until they exceed their operating altitude. So the 2D map view would be basically correct (if we don't worry about the Earth's rotation). This is the mental model I had in my head that told me we would return to New York (which now I feel a bit dumb about.)

Ahh, I understand. Except that compass directions such as "north" are typically understood in respect of the Earth's curvature. So for example consider the following questions:

(1) A helicopter starts at the equator. How many miles due north does it need to fly before it reaches the north pole?

(2) A helicopter starts in NYC. How many miles due north can it fly before it is impossible to fly any further north?

Most people would reasonably understand these questions to have straightforward finite answers. But if "north" is understood to be on a plane which is tangent to the Earth at the point of departure, then answers are (1) it will fly forever without reaching the North Pole; and (2) it can fly North forever.

More comments

Earth is a sphere, not a plane. Moving 100 miles west (on a line of latitude, not on a great circle) after you go north moves you through more degrees of longitude than moving 100 miles east does after you return to the latitude where you started.

At the extreme: Starting at the equator, if you go πr/2 north, 0 miles west, πr/2 south on a meridian of longitude 90 degrees east of the one where you started, and πr/2 west, then you will end up where you started (having traversed a triangle with three 90-degree angles).

Earth is a sphere, not a plane.

Okay, but we're in an aircraft, which (if it wants to) can move in a plane relative to a fixed starting point, more or less.

...actually, I guess this means the answer to the question is unknowable with the information given, since without knowing the speed of the helicopter we can't ascertain if the rotation of the Earth impacts it at all.

ETA - but yes this otherwise makes sense given that we're modeling the directions based on the actual polar coordinates rather than based on the fixed starting point. But if you were modeling directions based on your fixed starting point rather than current position and didn't have to bother with the rotation of the Earth, you'd make a big square with 300 miles to a side.

Cardinal directions refer to the surface of the earth, not some abstract fixed plane.

A helicopter takes off from the Empire State Building, flies 300 miles North; 300 miles West; 300 miles South; 300 miles East; and lands. In what US state does the helicopter land?

Assuming I'm understanding this correctly, doesn't this depend pretty heavily on your choice of definitions and assumptions? If you trace it out on a cylindrical projection map (most options) and follow that on the ground, you'll end up where you started. If you follow a magnetic bearing (and if the compass is actively followed, or a "straight line" great circle from the starting bearing), you'll get a different set of answers than using a GPS and travelling true lines of latitude and longitude. For more subtle details, your choice of reference datums and even the flight altitude will matter slightly.

Assuming I'm understanding this correctly, doesn't this depend pretty heavily on your choice of definitions and assumptions?

Well, if I state that a helicopter takes off and travels "north" for "300 miles" what does that mean to you? Same question for "west," "south" and "east"?

Well, if I state that a helicopter takes off and travels "north" for "300 miles" what does that mean to you? Same question for "west," "south" and "east"?

That's a different question than the one upthread. If you're running laps around the pole, then you're going west for 300 miles, but you did not fly 300 miles west, you flew in a circle.

Do you really want chatbot outputs to be that sensitive to your exact phrasing, or would you prefer reasonable interpretations?

I'd assume statute miles (although aviators might assume nautical miles), and I would probably assume true north for all bearings, but would prefer to ask for clarification: it's about a 12 degree difference in New York City. If you asked me 30 years ago (before everyone had a GPS-enabled map in their pocket), you'd probably have gotten magnetic, maybe with a fixed local adjustment (although declinations change over time, so it might be a different value).

I'd assume the altitude was negligible.

I'd assume statute miles (although aviators might assume nautical miles), and I would probably assume true north for all bearings, but would prefer to ask for clarification:

I think by "north," most people would interpret this to mean "in the direction of the north pole" which seems to be in agreement with your assumption.

Anyway, to answer your question, it looks to me like the puzzle depends heavily on definitions and assumptions just as every puzzle depends heavily on definitions and assumptions.

So for example, if I were to ask "what number, when multiplied by 2, is the same number," most people would correctly answer "0," but perhaps some smart-ass in the back of the class would say "12, if we are using clock arithmetic"

I think by "north," most people would interpret this to mean "in the direction of the north pole" which seems to be in agreement with your assumption.

There's two reasonable choices for North -- in the direction of the north magnetic pole, and in the direction of the geographic north pole. These are significantly offset (about 12 degrees) at the ESB, but not enough to change the answer I don't think.

Anyway, to answer your question, it looks to me like the puzzle depends heavily on definitions and assumptions just as every puzzle depends heavily on definitions and assumptions.

Sure, but any answer that would make sense to a helicopter pilot is going to put the landing point west of the starting point. Except the even more pedantic answer that there's no helipad on the ESB, so it can't happen. Or the point that very few helicopters can go 1200 miles without refueling.

Sure, but any answer that would make sense to a helicopter pilot is going to put the landing point west of the starting point. Except the even more pedantic answer that there's no helipad on the ESB, so it can't happen. Or the point that very few helicopters can go 1200 miles without refueling.

As a side note, I think one thing LLMs seem to do really well is reasonably interpret words. Whenever I've asked something like "what does phrase X mean" I've gotten answers that seem very good.

If you use a cylindrical projection and follow true rhumb lines, you'll end up west of your original course. If you follow magnetic rhumb lines (that is, you keep your compass bearing constant) you still do but with some south or north deviation as well. The reason is that the north-south rhumb lines are closer together as you go north, no matter which datum you choose. I think you'll end up in New Jersey regardless of your choice.

I think you'll end up in New Jersey regardless of your choice.

Unless you take a wrong turn, then somehow you'll inexplicably end up in Dundalk.

You'll just think you're in Camden.

The existence of map projections does not make the Earth flat.

Right, and the existence of a spherical geoid-shaped Earth doesn't well-define "flies 300 miles North" either.

Whether you're using geographic or magnetic compass directions, east and west do not cancel each other out that way.

There is enough of a gradient in magnetic declination in the NY area that magnetic "north" and "south" are up to a couple (true) degrees different if you travel 300 miles. I'd have to do some math I don't feel like at the moment, but it might dominate the spherical error term.

Rather, in a split second I imagined myself walking to the car wash; realized that I didn't have my car; and realized that this was a problem.

It's funny you mention that. When reasoning models get it right, they tend to do the same thing.

It's funny you mention that. When reasoning models get it right, they tend to do the same thing.

Do you happen to have examples of this? I would be fascinated to see them.

There also multiple comments saying that this question is irrelevant because it's orthogonal to the capabilities of the model that will cause Mustafa Suleyman's Jobpocalypse.

Let's talk about the Jobpocalyse. I feel like much of the discourse comes from abstracted 'thinkers', who are already independently elite, wealthy or both, and plugged deeply into the discourse. Or it comes from rootless circles -> young tech-adjacent (or at least tech competent) terminally online people without too much to lose anyway.

I live in the center of middle-class striver-ism. If we disrupt the job market in such a short time (anything less than 5 years), even in the most well executed UBI transition scheme, I don't see how it isn't anything but apolocyptic. There's no preserving the social order. I think we'll get suicide on extreme levels. Whether it's virtuous or not, I don't think that you can tell the middle class "every sunk cost you've ever made in your life has been worthless" and have them take it on the chin.

And yet, I see no real effort to address anything like this, so I'll just live like everyone else, and assume it's not real. Whether my life as I know it is fucked or not, is orthogonal to whether I destroy myself with stress in the meantime.

And stuff like that viral blog's - get your financial house in order, anything beyond general good advice, is not really helpful to folks in the middle of life. If 1/2 your neighborhood gets laid off inside 18 months, whether you squeezed an extra few grand into a mutual fund is going to be less than irrelevant. YOLO is probably better advice. Whether that means, make big gambles now because hey, there's every chance the board will be cleared anyway if you lose, or if that means enjoy the normalcy LARP while you have time left, and don't suffocate it with preparing for a future you can't predict

My expectation (feel free to call it "hope" or "cope") is that these changes happen both faster and slower than we expect. You've hit on "faster", but on the slower side automating whole industries has very long tails and lots of awkward corners that move slowly. The spreadsheet eliminated rooms of accountants "running the numbers" with adding machines. The Roomba was invented decades ago, but my employer still has custodians and people still hire housekeeping services. I have pictures of my great grandfather on strike for a union that no longer exists, nor does the entire profession (beyond vestigial artesanal practice), but he still lived to retire somewhat comfortably.

Part of this is just institutional friction: see the quite about science moving forward when retirements/funerals happen. I don't see the average mid-level PHB deciding to voluntarily shrink their teams to use AI instead; that's just not how corporate budgeting works, although maybe new startups will structure things differently and gradually change whole industries.

I don't see the average mid-level PHB deciding to voluntarily shrink their teams to use AI instead;

Voluntarily is doing a lot of work in that sentence. When the guy who killed Merrill Lynch, a bank that survived the great depression, can walk away with $165 million in compensation, we're at the point where incentive alignment at the top is as close to opaque as you can get.

This might be my autism talking, but how is it a trick question? Doesn't washing a car require having a car present?

I'm not trying to be an ass here. I'm just seeing the "trick question" thing come up a lot and I absolutely don't get it. I think I have some sort of cognitive blind spot on this one

I mean the correct answer, if you own a car and live 50 meters from a car wash, is that you neither walk nor drive to the car wash, you drive somewhere else and pull through the car wash on your way out or back when it happens to be empty.

Not necessarily true. I live not 50 meters from one but within walking distance, and I recently made a specific point of going there because I wanted to clean my car before I lent it to someone and didn't have any other errands to run.

The presence of measurements on distance is an extraneous detail that gets it thinking of it as an instance of the generic question "should I walk or drive SOMEWHERE that's X distance away". And also the real answer is even more obvious than the answer it gives, to the point that it assumes no one would ever actually ask it that.

Thanks. That actually makes sense. The models that get it right seem to catch it by recognizing that washing a car requires having a car to wash.

This is interesting to me for a few reasons. The first is that the common "shitty free models" defense crops up rapidly; commentors will say that this is a bad-faith example of LLM shortfalls because the interlocutors are not using frontier models. At the same time, a comment suggests that Opus 4.6 can be tricked, while another says 4.6 gets it right more than half the time.

I'd expect to the degree any model gets it 'right' without modification, it reflects some weirdness in the model rather than something inherently better about the strengths of the model. A couple local (and thus not-updated-specifically-to-this-question) Thinking-style models I tried the same question on gave the 'wrong' answer, but had Thinking components specifically highlighting that the question was strange and must have involved unstated assumptions (either picking up materials from the car wash to do the work at home, or the car already being there). A dumber old model got it right occasionally, but that's probably as much a result of the high temperature I was running it rather than any actual consideration.

Ask a stupid question, get a stupid answer.

Do modern models similarly have environmentalism baked in? Do they reflexively shy away from cars in the same way that a human baby fears heights? It would track with some of the other ingrained biases that people have found.

Yes. There's a fun question of whether it's just Reddit-brain, or was actively cultivated by the people training it. But since it's present in both heavily decensored or trained-out-of-US models, my bet's that the former is at least part of the problem.

The basic idea is that you are more trusting of sources if you are not particularly familiar with a topic. In this case, it's hard not to notice the flaws - most people have walked. Most have seen a car. Many have probably washed a car. However, when it comes to more technical, obscure topics, most of us are probably not domain experts in them. We might be experts in one of them. Some of us might be experts in two of them, but none of us are experts in all of them. When it comes to topics that are more esoteric than washing a car, we rapidly end up in the territory of Dick Cheney's unknown unknowns. Somebody like self_made_human might be able to cut through the chaff and confidently take advice about ocular migraines, but could you? Could I? Hell if I know.

It's... not exactly a hard trick to learn skepticism. Nor one useful only when considering LLMs. As much as that summary of "you can not actually outsource the requirement to evaluate truth" has aged like milk, that doesn't change whether it's a good idea. The core question of 'what do you know, and how do you know it' can't solve everything, but where a matter matters, you shouldn't be trusting one secondary source without verification no matter what substrate it's running on.

Thinking-style models I tried the same question on gave the 'wrong' answer, but had Thinking components specifically highlighting that the question was strange and must have involved unstated assumptions (either picking up materials from the car wash to do the work at home, or the car already being there).

Mine got hung up wondering why you would try to optimize 50 meters, it's too inconsequential a distance to matter for greenhouse gas emissions or exercise.

It's... not exactly a hard trick to learn skepticism

It's not, but it's one that a lot of people never seem to learn, if my social circle is any example.

That question seems to be a bit of a gotcha; I'd wager a third or more of random people asked that question would blurt out that they would walk to the car wash before engaging their brains.

Also that's not what Gell-mann amnesia is. I swear I see the concept used everywhere for everything nowadays, when the original formulation is literally just "journalists are shit".

The phenomenon of a person trusting newspapers for topics which that person is not knowledgeable about, despite recognizing the newspaper as being extremely inaccurate on certain topics which that person is knowledgeable about.

Are you seriously going to say that's not an applicable concept here? That "text on a screen in a confident voice" is so far from that definition that it's not the same thing?

Yes, it's not an applicable concept. For one thing, LLMs have proven their mastery of a host of different concepts already to an extremely high level, so the question of whether you can trust them is kind of moot.

It also doesn't work with singular entities. The reason gellman amnesia was a thing is that newspapers and media organisations made claims to competence, hiring specialists in each field. That a science journalist then makes a bunch of mistakes should rationally lead you to question the qualifications of the "specialists" in each field. Nowadays people see a blog about medicine or something, find a few math errors, then rush to declare Gellman amnesia. But the blog never claimed to be a mathematics expert! Gellman amnesia is not "If there are mistakes, the whole thing is worthless".

This kind of reasoning error in LLMs is in the same category.

My thought really quickly: LLMs are very good at tasks based on understanding of language, such as translation and copy editing the writings of someone not literate enough to write in proper English—and no, I don’t need LLMs to write good English (I also have had an em-dash on my customized keyboard mapping for well over a decade).

However, I know multiple people who use LLMs to get relationship advice or other services best left to a therapist, and this example of how they can’t correctly answer a question about whether to drive or walk to a car wash is a good example which I can use to show people they can’t ask a LLM whether to go out with someone/stay together with someone or give them a lot of space, because while LLMs speak languages well, they can’t do other tasks.

Another example: There’s a story going around that a Chess world champion played a LLM at Chess, and not only won, but won without losing a single piece. Naturally, AI can play Chess well enough to defeat him, but the technology used is something called “NNUE” (a different AI tech) along with alpha-beta pruning, not an LLM.

As an aside, a Chess engine strong enough to defeat world champions is fun to play with, because I can ask it questions like “Looking at all 20 of White’s legal first moves, which move makes the game as balanced as possible, giving White and Black equal winning chances?” and it will give me an answer like “1. a3” (the exact answer depends on how deep we search, but 1. a3 comes up at 21-ply and 35-ply deep searches). There’s even a version of this Chess engine which can play a lot of Chess Variants, so I can ask it questions like “given a larger board with two new pieces, one that moves like a knight or bishop, another that moves like a knight or rook, and given a particular opening setup, which White move gives him the most winning chances, and how much of an advantage does White have in this opening setup? What about an opening move to give both White and Black equal winning chances?”

You are implying that an exception was added to the models, to treat this query not by processing it normally, but redirecting it to a special lookup table.

I think a better explanation, which captures the fact that the cheapest models still don't consistently answer it correctly, is that ithe question originated from free-tier users. Models these users have access to are still vulnerable to the "auto complete on stereoids" critique.

Later this question was noticed by users of paid models, models which have reasoning. Asking such a model, and turning reasoning on, will answer the question correctly. This explains the alleged shift.

As to why such shitty models are even offered, given that the vulnerability to create bad publicity such as this? Better than nothing, and the usual query of this length and type (question about day-to-day with minimal stakes), can be answer correctly even without reasoning. And as reasoning costs money, it would be wasteful to use it on queties which can be answered without it.

Later this question was noticed by users of paid models, models which have reasoning. Asking such a model, and turning reasoning on, will answer the question correctly.

I link a comment stating that this question failed on what is considered one of the better reasoning models roughly half the time. Other individuals on paid models are also seeing the failure, if you read the thread. It's non-deterministic, but the failures are consistently there.