site banner

Culture War Roundup for the week of March 24, 2025

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

4
Jump in the discussion.

No email address required.

When will the AI penny drop?

I returned from lunch to find that a gray morning had given way to a beautiful spring afternoon in the City, the sun shining on courtyard flowers and through the pints of the insurance men standing outside the pub, who still start drinking at midday. I walked into the office, past the receptionists and security staff, then went up to our floor, passed the back office, the HR team who sit near us, our friendly sysadmin, my analysts, associate, my own boss. I sent some emails to a client, to our lawyers, to theirs, called our small graphics team who design graphics for pitchbooks and prospectuses for roadshows in Adobe whatever. I spoke to our team secretary about some flights and a hotel meeting room in a few weeks. I reviewed a bad model and fired off some pls fixes. I called our health insurance provider and spoke to a surprisingly nice woman about some extra information they need for a claim.

And I thought to myself can it really be that all this is about to end, not in the steady process envisioned by a prescient few a decade ago but in an all-encompassing crescendo that will soon overwhelm us all? I walk around now like a tourist in the world I have lived in my whole life, appreciating every strange interaction with another worker, the hum of commerce, the flow of labor. Even the commute has taken on a strange new meaning to me, because I know it might be over so soon.

All of these jobs, including my own, can be automated with current generation AI agents and some relatively minor additional work (much of which can itself be done by AI). Next generation agents (already in testing at leading labs) will be able to take screen and keystroke recordings (plus audio from calls if applicable) of, say, 20 people performing a niche white collar role over a few weeks and learn pretty much immediately know how to do it as well or better. This job destruction is only part of the puzzle, though, because as these roles go so do tens of millions of other middlemen, from recruiters and consultants and HR and accountants to millions employed at SaaS providers that build tools - like Salesforce, Trello, even Microsoft with Office - that will soon be largely or entirely redundant because whole workflows will be replaced by AI. The friction facilitators of technical modernity, from CRMs to emails to dashboards to spreadsheets to cloud document storage will be mostly valueless. Adobe alone, which those coworkers use to photoshop cute little cover images for M&A pitchbooks, is worth $173bn and yet has been surely rendered worthless, in the last couple of weeks alone, by new multimodal LLMs that allow for precise image generation and editing by prompt1. With them will come an almighty economic crash that will affect every business from residential property managing to plumbing, automobiles to restaurants. Like the old cartoon trope, it feels like we have run off a cliff but have yet to speak gravity into existence.

It was announced yesterday that employment in the securities industry on Wall Street hit a 30-year high (I suspect that that is ‘since records began’, but if not I suppose it coincides with the final end of open outcry trading). I wonder what that figure will be just a few years from now. This was a great bonus season (albeit mostly in trading), perhaps the last great one. My coworker spent the evening speaking to students at his old high school about careers in finance; students are being prepared for jobs that will not exist, a world that will not exist, by the time they graduate.

Walking through the city I feel a strange sense of foreboding, of a liminal time. Perhaps it is self-induced; I have spent much of the past six months obsessed by 1911 to 1914, the final years of the long 19th century, by Mann and Zweig and Proust. The German writer Florian Illies wrote a work of pop-history about 1913 called “the year before the storm”. Most of it has nothing to do with the coming war or the arms race; it is a portrait (in many ways) of peace and mundanity, of quiet progress, of sports tournaments and scientific advancement and banal artistic introspection, of what felt like a rational and evolutionary march toward modernity tempered by a faint dread, the kind you feel when you see flowers on their last good day. You know what will happen and yet are no less able to stop it than those who are comfortably oblivious.

In recent months I have spoken to almost all smartest people I know about the coming crisis. Most are still largely oblivious; “new jobs will be created”, “this will just make humans more productive”, “people said the same thing about the internet in the 90s”, and - of course - “it’s not real creativity”. A few - some quants, the smarter portfolio managers, a couple of VCs who realize that every pitch is from a company that wants to automate one business while relying for revenue on every other industry that will supposedly have just the same need for people and therefore middlemen SaaS contracts as it does today - realize what is coming, can talk about little else.

Many who never before expressed any fear or doubts about the future of capitalism have begun what can only be described as prepping, buying land in remote corners of Europe and North America where they have family connections (or sometimes none at all), buying crypto as a hedge rather than an investment, investigating residency in Switzerland and researching countries likely to best quickly adapt to an automated age in which service industry exports are liable to collapse (wealthy, domestic manufacturing, energy resources or nuclear power, reasonably low population density, produce most food domestically, some natural resources, political system capable of quick adaptation). America is blessed with many of these but its size, political divisions and regional, ethnic and cultural tensions, plus an ingrained highly individualistic culture mean it will struggle, at least for a time. A gay Japanese friend who previously swore he would never return to his homeland on account of the homophobia he had experienced there has started pouring huge money into his family’s ancestral village and directly told me he was expecting some kind of large scale economic and social collapse as a result of AI to force him to return home soon.

Unfortunately Britain, where manufacturing has been largely outsourced, most food and much fuel has to be imported and which is heavily reliant on exactly the professional services that will be automated first seems likely to have to go through one of the harshest transitions. A Scottish portfolio manager, probably in his 40s told me of the compound he is building on one of the remote islands off Scotland’s west coast. He grew up in Edinburgh, but was considering contributing a large amount of money towards some church repairs and the renovation of a beloved local store or pub of some kind to endear himself to the community in case he needed it. I presume that in big tech money, where I know far fewer people than others here, similar preparations are being made. I have made a few smaller preparations of my own, although what started as ‘just in case’ now occupies an ever greater place in my imagination.

For almost ten years we have discussed politics and society on this forum. Now events, at last, seem about to overwhelm us. It is unclear whether AGI will entrench, reshape or collapse existing power structures, will freeze or accelerate the culture war. Much depends on who exactly is in power when things happen, and on whether tools that create chaos (like those causing mass unemployment) arrive much before those that create order (mass autonomous police drone fleets, ubiquitous VR dopamine at negligible cost). It is also a twist of fate that so many involved in AI research were themselves loosely involved in the Silicon Valley circles that spawned the rationalist movement, and eventually through that, and Scott, this place. For a long time there was truth in the old internet adage that “nothing ever happens”. I think it will be hard to say the same five years from now.

1 Some part of me wants to resign and short the big SaaS firms that are going to crash first, but I’ve always been a bad gambler (and am lucky enough, mostly, to know it).

Confusion sets in when you spend most of your life not doing anything real. Metrics and statistics were supposed to be a tool that would aid in the interpretation of reality, not supercede it. Just because a salesman with some metrics claims that these models are better than butter does not make it true. Even if they manage to convince every single human alive.

I just tried out GPT 4.5, asking some questions about the game Old School Runescape (because every metric like math has been gamed to hell and back). This game has the best wiki every created, effectively documenting everything there is to know about the game in unnecessary detail. Spoiler: The answer is completely incoherent. It makes up item names, locations, misunderstand basic concepts like what type of gear is useful where. Asking it for a gear setup for a specific boss results in horrible results, despite the fact that it could just have copied the literally wiki (which has some faults like overdoing min-maxing, but it's generally coherent). The net utility of this answer was negative given the incorrect answer, the time it took for me to read it, and the cost of generating it (which is quite high, I wonder what happens when these companies want to make money).

Same thing happens when asking questions about programming that are not student-level (student-level question just returns an answer copied from a text-book. Did you know you can solve a leetcode question in 10 seconds by just copying the answer someone else wrote down? Holy shit!). The idea that these models will soon (especially given the plateau the seem to be hitting) replace real work is absurd. They will make programming faster, which means we'll build more shit (that's probably not needed, but that's another argument). They currently make me about 50% faster and make programming more fun since it's a fantastic search tool as long as it's used with care. But it's also destroying the knowledge of students. Test scores are going up, but understanding is dropping like a stone.

I'm sure it will keep getting better, eventually a large enough model with a gigantic dataset will get better at fooling people, but it's still just a poor image of reality. Of course this is how humans also function to some degree, copying other people more competant than us. However most of us combine that with real knowledge (creating a coherent model of something that manages to predict something new accurately). Without that part it's just a race to the bottom.

But a lot of people are like you, so these models will start to get used everywhere, destroying quality like never before. For example, I tried contacting a company regarding a missing order a few weeks ago. Their first line support had been replaced by an AI. Completely useless. It kept responding to the question it thought I made, instead of the one I made. Then asking me to double check things I told it I had checked. The funny thing is that a better automated support could have been created 20 years ago with some basic scripting (looking for order number and responding with details if it was included.). Or having an intern spend 30 second copy-pasting data into an email. But here we are, at the AI revolution, doing thing we have always been able to do, now in a shittier and more costly way. With some added pictures to make it seem useful. Fits right in in the finance world I guess?

I can however imagine a future workflow where these models do basic tasks (answer emails, business operations, programming tickets) overseen by someone that can intervene if it messes up. But this won't end capitalism. If you stopped LARPing on this forum/twitter you would barely even notice it. Though it is a shame that graphic design and similar things will be hurt more than it should.

I wish I had a dollar for every time people use the current state of AI as their primary justification for claiming it won't get noticeably better, I wouldn't need UBI.

I just tried out GPT 4.5, asking some questions about the game Old School Runescape (because every metric like math has been gamed to hell and back). This game has the best wiki every created, effectively documenting everything there is to know about the game in unnecessary detail. Spoiler: The answer is completely incoherent. It makes up item names, locations, misunderstand basic concepts like what type of gear is useful where. Asking it for a gear setup for a specific boss results in horrible results, despite the fact that it could just have copied the literally wiki (which has some faults like overdoing min-maxing, but it's generally coherent). The net utility of this answer was negative given the incorrect answer, the time it took for me to read it, and the cost of generating it (which is quite high, I wonder what happens when these companies want to make money).

I just used Gemini 2.5 to reproduce, from memory, the NICE CKS guidance for the diagnosis and management of dementia. I explicitly told it to use its own knowledge, and made sure it didn't have grounding with Google search enabled. I then spot-checked it with reference to the official website.

It was bang-on. I'd call it a 9.5/10 reproduction, only falling short of perfection through minor sins of omission (it didn't mention all the validated screening tests by name, skipped a few alternative drugs that I wasn't even aware of before). It wasn't a word for word reproduction, but it covered all the essentials and even most of the fine detail.

The net utility of this answer is rather high to say the least, and I don't expect even senior clinicians who haven't explicitly tried to memorize the entire page to be able to do better from memory. If you want to argue that I could have just googled this, well, you could have just googled the Runescape build too.

I think it's fair to say that this makes your Runescape example seem like an inconsequential failing. It's about the same magnitude of error as saying that a world-class surgeon is incompetent because he sometimes forgets how to lace his shoes.

You didn't even use the best model for the job, for a query like that you'd want a reasoning model. 4.5 is a relic of a different regime, too weird to live, too rare to die. OAI pushed it out because people were clamoring for it. I expect that with the same prompt, o3 or o1, which I presume you have access to as a paying user, would fare much better.

The idea that these models will soon (especially given the plateau the seem to be hitting) replace real work is absurd

Man, there's plateaus, and there's plateaus. Anyone who thinks this is an AI winter probably packs a fur coat to the Bahamas.

The rate of iteration in AI development has ramped up massively, which contributes to the impression that there aren't massive gaps between successive models. Which is true, jumps of the same magnitude as say GPT 3.5 to 4 are rare, but that's mostly because the race is so hot that companies release new versions the moment they have even the slightest justification in performance. It's not like back when OAI could leisurely dole out releases, their competitors have caught up or even beaten them in some aspects.

In the last year, we had a paradigm shift with reasoning models like o1 or R1. We just got public access to native image gen.

Even as the old scaling paradigms leveled off, we've already found new ones. Brand new steep slopes of the sigmoidal curve to ascend.

METR finds that the duration of tasks (based on how long humans take to do it) that AIs can reliably perform doubles every 7 months.

On a diverse set of multi-step software and reasoning tasks, we record the time needed to complete the task for humans with appropriate expertise. We find that the time taken by human experts is strongly predictive of model success on a given task: current models have almost 100% success rate on tasks taking humans less than 4 minutes, but succeed <10% of the time on tasks taking more than around 4 hours. This allows us to characterize the abilities of a given model by “the length (for humans) of tasks that the model can successfully complete with x% probability”.

We think these results help resolve the apparent contradiction between superhuman performance on many benchmarks and the common empirical observations that models do not seem to be robustly helpful in automating parts of people’s day-to-day work: the best current models—such as Claude 3.7 Sonnet—are capable of some tasks that take even expert humans hours, but can only reliably complete tasks of up to a few minutes long

At any rate, what does it matter? I expect reality to smack you in the face, and that's always more convincing than random people on the internet asking why you can't even look ahead while considering even modest and iterative improvement.

My main gripe with current day models is their lack of consistency. On the one hand, they can do very impressive things that save me hours of work, on the other, they can fuck things up in simple ways and it costs me hours of work to fix it. I was using Claude to program a scene in Godot and the file was showing a parsing error at line 1. I let him try to fix it multiple times, started new chats, etc. Then I just looked at the file, noticed a comment in line 1 starting with # and thought "maybe that's not allowed". I took the comment away and the file was fixed. It's insanely frustrating when the AI fucks up such a simple thing. The main benefit of AI is that you can just let it rip and create something without knowing what you are doing. If I have to check the code all the time for fuckups, it really drags on productivity.

I'm not a programmer, the best I can say about myself is that I once did a Leetcode medium successfully, in Python, with an abysmal score because it wasn't remotely optimized. At that level, everything from GPT-4 onwards is clearly superior to what I can do unaided.

I think the utility varies in different ways based off the domain-skill of the user. A beginner programmer? Even if they get frustrating issues I find it hard to imagine they aren't immensely better off. The other end of the spectrum? You have people like Karpathy and Carmac singing their praises, while Linus says they're not nearly good enough. There are a dozen different programmers here saying different things.

There's also skill when it comes to using them, and that's an acquired ability. In your situation, it would likely have been better to give up on that conversation and try again, or to copy and paste the code into a different instance or a different model and ask it to find the issue. I expect this would have worked well. With too much gunking up the context, LLMs can still fall into ruts or miss obvious problems. When in doubt, retry.

I'm a mid-level software dev who mostly spent the age of the LLMs doing Java Spring Boot backend development at two different companies. I've tried using the various chatbots provided to me, and so far they've been useless in 100% of all cases. It's entirely possible that I'm doing it wrong.

I've already mentioned Karpathy and Co. Even in this subreddit, you've got people like @DaseindustriesLtd or @faul_sname (are you a programmer? Well, you know your ML, so close enough for government work) who get clear utility out of them.

You recognize you might using them wrong (and what are the specifics of how you attempted to use them? Which model? What kind of prompt? Which interface?), but I'm certainly not the best person to tell you how to go about it better. I could still try, if you want me to.