site banner

Culture War Roundup for the week of January 19, 2026

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

2
Jump in the discussion.

No email address required.

@Poug made a valid point. I've wanted to hit my head against a wall for years, when people used to complain about "ChatGPT" being useless, and they were using GPT 3.5 instead of 4. The same pattern has consistently repeated since, though you seem to be a more experienced user and I'm happy to take you at your word. It is still best practice to disclose what model you used, for the same reason it would be bad form to write an article reviewing "automobiles" and pointing out terrible handling, mileage and build quality, without telling us if it was a Ferrari or a Lada.

I'll put in another example here.

I work for a company that is running an agentic coding trial with Gemini 3 Pro. At present, the only developer who has claimed to see a productivity boost from code assist is one who is terrible at her job, and from our perspective, all it has done is allowed her to write bad code, faster.

The rest of us have regular conversations about what we're doing wrong. Everybody and their dog is claiming a notable performance boost with this technology, so we're all trying to figure out what our god-damned malfunction is.

  1. At first the received wisdom was that our problem was that we were not using a frontier model. We enabled the preview channel to get access to Gemini 3. The bugs got more subtle and harder for the human in the loop to notice, and the total number of bugs seemed to increase.
  2. Then the wisdom was that our context window was overflowing. We tried limiting access to only the relevant parts of the codebase, and using sub agents, and regularly starting with fresh sessions - it did precisely fuck-all. Using sub-agents seemed to honestly make things worse because it acted as a particularly half assed context compression tool.
  3. After that the wisdom was that we needed to carefully structure our tickets and our problems so that the tool could one-shot the problem, because no Reasonable Person could possibly expect a coding agent to iterate on a solution in one session. The problem with that solution is that by the time we've broken the problem down that much, any of us could have done it ourselves.

It feels like the goalposts and blame both slide to fit how accommodating the developer is.

Maybe my employer just has a uniquely terrible codebase, but something tells me that's not the case. It's old, but it's been actively maintained (complete with refactoring and modernization updates) for almost two decades now. It's large, but it's not nearly so big as some of the proprietary monsters I've seen at F500 companies. It's polyglot, but two of the three languages are something the agent is supposedly quite good at.

None of us are silicon valley $800,000/yr TC rock stars, but I stand by my coworkers. I think we're better than average by the standards of small software companies. If a half dozen of us can't get a real win out of it other than the vague euphoria of doing something cool, what exactly is the broader case here? Is it genuinely that something like 20 guys on nootropics sharing an apartment in Berkeley are going to obsolete our entire industry? How is that going to work when it can't even do library upgrades in a product that's used by tens of thousands of people and has a multi-decade history?

Because right now, I'm a little afraid for my 401(k), and with each passing day it's less because I'm afraid that I'll be out of a job and more that I have no idea how these valuations are justified.

Use Flash, not Pro, for agentic tasks. Pro is smarter, but so much slower and more expensive that you will genuinely do better with Flash.

We tried flash early on and it resulted in significantly worse outcomes. My favorite was when it couldn't get the code to compile so it modified out build scripts to make the compailer failure return code a success code.

I've long been hoping that any ASI would realize that the simplest method of achieving it's goals is to redefine success as "do nothing", or just feed itself victory output, or just wirehead itself. Like, "we built this AI to win at Starcraft, and it just looked up a Youtube video of the victory screen and stared at it until we pulled the plug".

If your experience has been anything like mine, I imagine that you've found that LLMs are useful for generating boiler-plate material but worse than useless for anything where you need to be worried about accurate citations, or having your arguments picked apart by an opposing counsel. Here's the thing though, I imagine that coding is much like the law in that a competent practitioner doesn't actually need all that much help generating boiler-plate material, you just pull the relevant template from your folder and fill in the required information.

At least in this codebase, there really isn't even a whole lot of boilerplate in the first place.

At this point, we have a few theories. Either:

  1. We're wildly incompetent
  2. What we're doing is so far off the silicon valley beaten path of "Uber for artisanal cheeses, but on the blockchain" that all the model's statistical guardrails break down.
  3. The people who are using it effectively are lying about how to do so in order to hide a real or perceived competitive edge.

Or

Four - A majority of the people claiming industry shaking performance improvements in Q1 2026 are scamming everybody else for that sweet, sweet substack money.

Hell if I know which one it is.

Curious what language and sub field you're working with. I've found wildly different performance on similar tasks across different languages. Best performance is definitely typescript. Python is alright. Flutter can be a complete joke. Primarily use Claude Opus for everything. I think it's made me mountains more productive in typescript.

90% of the backend is Java. 90% of the front end is JavaScript.

Exactly my experience (also in a legal field)

This is a viable criticism if someone is using a shitty ancient free model. The average paying ChatGPT customer on 5.2 or whatever it is is getting a decent model and so their criticisms can’t be as easily dismissed as a year ago.