This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.
Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.
We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:
-
Shaming.
-
Attempting to 'build consensus' or enforce ideological conformity.
-
Making sweeping generalizations to vilify a group you dislike.
-
Recruiting for a cause.
-
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.
In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:
-
Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
-
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
-
Don't imply that someone said something they did not say, even if you think it follows from what they said.
-
Write like everyone is reading and you want them to be included in the discussion.
On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.
Jump in the discussion.
No email address required.
Notes -
When will the AI penny drop?
I returned from lunch to find that a gray morning had given way to a beautiful spring afternoon in the City, the sun shining on courtyard flowers and through the pints of the insurance men standing outside the pub, who still start drinking at midday. I walked into the office, past the receptionists and security staff, then went up to our floor, passed the back office, the HR team who sit near us, our friendly sysadmin, my analysts, associate, my own boss. I sent some emails to a client, to our lawyers, to theirs, called our small graphics team who design graphics for pitchbooks and prospectuses for roadshows in Adobe whatever. I spoke to our team secretary about some flights and a hotel meeting room in a few weeks. I reviewed a bad model and fired off some pls fixes. I called our health insurance provider and spoke to a surprisingly nice woman about some extra information they need for a claim.
And I thought to myself can it really be that all this is about to end, not in the steady process envisioned by a prescient few a decade ago but in an all-encompassing crescendo that will soon overwhelm us all? I walk around now like a tourist in the world I have lived in my whole life, appreciating every strange interaction with another worker, the hum of commerce, the flow of labor. Even the commute has taken on a strange new meaning to me, because I know it might be over so soon.
All of these jobs, including my own, can be automated with current generation AI agents and some relatively minor additional work (much of which can itself be done by AI). Next generation agents (already in testing at leading labs) will be able to take screen and keystroke recordings (plus audio from calls if applicable) of, say, 20 people performing a niche white collar role over a few weeks and learn pretty much immediately know how to do it as well or better. This job destruction is only part of the puzzle, though, because as these roles go so do tens of millions of other middlemen, from recruiters and consultants and HR and accountants to millions employed at SaaS providers that build tools - like Salesforce, Trello, even Microsoft with Office - that will soon be largely or entirely redundant because whole workflows will be replaced by AI. The friction facilitators of technical modernity, from CRMs to emails to dashboards to spreadsheets to cloud document storage will be mostly valueless. Adobe alone, which those coworkers use to photoshop cute little cover images for M&A pitchbooks, is worth $173bn and yet has been surely rendered worthless, in the last couple of weeks alone, by new multimodal LLMs that allow for precise image generation and editing by prompt1. With them will come an almighty economic crash that will affect every business from residential property managing to plumbing, automobiles to restaurants. Like the old cartoon trope, it feels like we have run off a cliff but have yet to speak gravity into existence.
It was announced yesterday that employment in the securities industry on Wall Street hit a 30-year high (I suspect that that is ‘since records began’, but if not I suppose it coincides with the final end of open outcry trading). I wonder what that figure will be just a few years from now. This was a great bonus season (albeit mostly in trading), perhaps the last great one. My coworker spent the evening speaking to students at his old high school about careers in finance; students are being prepared for jobs that will not exist, a world that will not exist, by the time they graduate.
Walking through the city I feel a strange sense of foreboding, of a liminal time. Perhaps it is self-induced; I have spent much of the past six months obsessed by 1911 to 1914, the final years of the long 19th century, by Mann and Zweig and Proust. The German writer Florian Illies wrote a work of pop-history about 1913 called “the year before the storm”. Most of it has nothing to do with the coming war or the arms race; it is a portrait (in many ways) of peace and mundanity, of quiet progress, of sports tournaments and scientific advancement and banal artistic introspection, of what felt like a rational and evolutionary march toward modernity tempered by a faint dread, the kind you feel when you see flowers on their last good day. You know what will happen and yet are no less able to stop it than those who are comfortably oblivious.
In recent months I have spoken to almost all smartest people I know about the coming crisis. Most are still largely oblivious; “new jobs will be created”, “this will just make humans more productive”, “people said the same thing about the internet in the 90s”, and - of course - “it’s not real creativity”. A few - some quants, the smarter portfolio managers, a couple of VCs who realize that every pitch is from a company that wants to automate one business while relying for revenue on every other industry that will supposedly have just the same need for people and therefore middlemen SaaS contracts as it does today - realize what is coming, can talk about little else.
Many who never before expressed any fear or doubts about the future of capitalism have begun what can only be described as prepping, buying land in remote corners of Europe and North America where they have family connections (or sometimes none at all), buying crypto as a hedge rather than an investment, investigating residency in Switzerland and researching countries likely to best quickly adapt to an automated age in which service industry exports are liable to collapse (wealthy, domestic manufacturing, energy resources or nuclear power, reasonably low population density, produce most food domestically, some natural resources, political system capable of quick adaptation). America is blessed with many of these but its size, political divisions and regional, ethnic and cultural tensions, plus an ingrained highly individualistic culture mean it will struggle, at least for a time. A gay Japanese friend who previously swore he would never return to his homeland on account of the homophobia he had experienced there has started pouring huge money into his family’s ancestral village and directly told me he was expecting some kind of large scale economic and social collapse as a result of AI to force him to return home soon.
Unfortunately Britain, where manufacturing has been largely outsourced, most food and much fuel has to be imported and which is heavily reliant on exactly the professional services that will be automated first seems likely to have to go through one of the harshest transitions. A Scottish portfolio manager, probably in his 40s told me of the compound he is building on one of the remote islands off Scotland’s west coast. He grew up in Edinburgh, but was considering contributing a large amount of money towards some church repairs and the renovation of a beloved local store or pub of some kind to endear himself to the community in case he needed it. I presume that in big tech money, where I know far fewer people than others here, similar preparations are being made. I have made a few smaller preparations of my own, although what started as ‘just in case’ now occupies an ever greater place in my imagination.
For almost ten years we have discussed politics and society on this forum. Now events, at last, seem about to overwhelm us. It is unclear whether AGI will entrench, reshape or collapse existing power structures, will freeze or accelerate the culture war. Much depends on who exactly is in power when things happen, and on whether tools that create chaos (like those causing mass unemployment) arrive much before those that create order (mass autonomous police drone fleets, ubiquitous VR dopamine at negligible cost). It is also a twist of fate that so many involved in AI research were themselves loosely involved in the Silicon Valley circles that spawned the rationalist movement, and eventually through that, and Scott, this place. For a long time there was truth in the old internet adage that “nothing ever happens”. I think it will be hard to say the same five years from now.
1 Some part of me wants to resign and short the big SaaS firms that are going to crash first, but I’ve always been a bad gambler (and am lucky enough, mostly, to know it).
Confusion sets in when you spend most of your life not doing anything real. Metrics and statistics were supposed to be a tool that would aid in the interpretation of reality, not supercede it. Just because a salesman with some metrics claims that these models are better than butter does not make it true. Even if they manage to convince every single human alive.
I just tried out GPT 4.5, asking some questions about the game Old School Runescape (because every metric like math has been gamed to hell and back). This game has the best wiki every created, effectively documenting everything there is to know about the game in unnecessary detail. Spoiler: The answer is completely incoherent. It makes up item names, locations, misunderstand basic concepts like what type of gear is useful where. Asking it for a gear setup for a specific boss results in horrible results, despite the fact that it could just have copied the literally wiki (which has some faults like overdoing min-maxing, but it's generally coherent). The net utility of this answer was negative given the incorrect answer, the time it took for me to read it, and the cost of generating it (which is quite high, I wonder what happens when these companies want to make money).
Same thing happens when asking questions about programming that are not student-level (student-level question just returns an answer copied from a text-book. Did you know you can solve a leetcode question in 10 seconds by just copying the answer someone else wrote down? Holy shit!). The idea that these models will soon (especially given the plateau the seem to be hitting) replace real work is absurd. They will make programming faster, which means we'll build more shit (that's probably not needed, but that's another argument). They currently make me about 50% faster and make programming more fun since it's a fantastic search tool as long as it's used with care. But it's also destroying the knowledge of students. Test scores are going up, but understanding is dropping like a stone.
I'm sure it will keep getting better, eventually a large enough model with a gigantic dataset will get better at fooling people, but it's still just a poor image of reality. Of course this is how humans also function to some degree, copying other people more competant than us. However most of us combine that with real knowledge (creating a coherent model of something that manages to predict something new accurately). Without that part it's just a race to the bottom.
But a lot of people are like you, so these models will start to get used everywhere, destroying quality like never before. For example, I tried contacting a company regarding a missing order a few weeks ago. Their first line support had been replaced by an AI. Completely useless. It kept responding to the question it thought I made, instead of the one I made. Then asking me to double check things I told it I had checked. The funny thing is that a better automated support could have been created 20 years ago with some basic scripting (looking for order number and responding with details if it was included.). Or having an intern spend 30 second copy-pasting data into an email. But here we are, at the AI revolution, doing thing we have always been able to do, now in a shittier and more costly way. With some added pictures to make it seem useful. Fits right in in the finance world I guess?
I can however imagine a future workflow where these models do basic tasks (answer emails, business operations, programming tickets) overseen by someone that can intervene if it messes up. But this won't end capitalism. If you stopped LARPing on this forum/twitter you would barely even notice it. Though it is a shame that graphic design and similar things will be hurt more than it should.
I wish I had a dollar for every time people use the current state of AI as their primary justification for claiming it won't get noticeably better, I wouldn't need UBI.
I just used Gemini 2.5 to reproduce, from memory, the NICE CKS guidance for the diagnosis and management of dementia. I explicitly told it to use its own knowledge, and made sure it didn't have grounding with Google search enabled. I then spot-checked it with reference to the official website.
It was bang-on. I'd call it a 9.5/10 reproduction, only falling short of perfection through minor sins of omission (it didn't mention all the validated screening tests by name, skipped a few alternative drugs that I wasn't even aware of before). It wasn't a word for word reproduction, but it covered all the essentials and even most of the fine detail.
The net utility of this answer is rather high to say the least, and I don't expect even senior clinicians who haven't explicitly tried to memorize the entire page to be able to do better from memory. If you want to argue that I could have just googled this, well, you could have just googled the Runescape build too.
I think it's fair to say that this makes your Runescape example seem like an inconsequential failing. It's about the same magnitude of error as saying that a world-class surgeon is incompetent because he sometimes forgets how to lace his shoes.
You didn't even use the best model for the job, for a query like that you'd want a reasoning model. 4.5 is a relic of a different regime, too weird to live, too rare to die. OAI pushed it out because people were clamoring for it. I expect that with the same prompt, o3 or o1, which I presume you have access to as a paying user, would fare much better.
Man, there's plateaus, and there's plateaus. Anyone who thinks this is an AI winter probably packs a fur coat to the Bahamas.
The rate of iteration in AI development has ramped up massively, which contributes to the impression that there aren't massive gaps between successive models. Which is true, jumps of the same magnitude as say GPT 3.5 to 4 are rare, but that's mostly because the race is so hot that companies release new versions the moment they have even the slightest justification in performance. It's not like back when OAI could leisurely dole out releases, their competitors have caught up or even beaten them in some aspects.
In the last year, we had a paradigm shift with reasoning models like o1 or R1. We just got public access to native image gen.
Even as the old scaling paradigms leveled off, we've already found new ones. Brand new steep slopes of the sigmoidal curve to ascend.
METR finds that the duration of tasks (based on how long humans take to do it) that AIs can reliably perform doubles every 7 months.
At any rate, what does it matter? I expect reality to smack you in the face, and that's always more convincing than random people on the internet asking why you can't even look ahead while considering even modest and iterative improvement.
I've tried the reasoning models. They fail just as much (just tried Gemini 2.5 too and it did even worse). The purpose was to illustrate an example of how they fail. To showcase their poor reliability. I did not say they won't get better. They will, just not as much as you think. You can't just take 2 datapoints and extrapolate forever.
And I don't get your example, wouldn't the NICE CKS be in the dataset many times over? Maybe my point wasn't clear. These tools are amazing as search engines as long as the user using them is responsible and able to validate the responses. It does not mean they are thinking very well. Which means they will have a hard time doing things not in the dataset. These models are not a pathway to AGI. They might be a part of it, but it's gonna need something else. And that/those parts might be discovered tomorrow, or in 50 years.
And I don't see why reality will smack me in the face. I'm already using these as much as possible since they are great tools. But I don't expect my work to look very different in 2030 compared to now. Since programming does not feel very different today compared to 2015. The main problem has always been to make the program not collapse under its own weight, by simplifying it as much as possible. Typing the code has never been relevant. Thanks for the comment btw, it made me try out programming with gemini 2.5 and it's pretty good.
I mean, I assume both of us are operating on far more than 2 data points. I just think that if you open with an example of a model failing at a rather inconsequential task, I'm eligible to respond with an example of it succeeding at a task that could be more important.
My impression of LLMs is that in the domains I personally care about:
They've been great at 1 and 3 for a while, since GPT-4. 2? It's only circa Claude 3.5 Sonnet that I've been reasonably happy with their creative output, occasionally very impressed.
Number 3 encompasses a whole heap of topics. Back in the day, I'd spot check far more frequently, these days, if something looks iffy, these days I'll shop around with different SOTA models and see if they've got a consensus or critique that makes sense to me. This almost never fails me.
Almost certainly. But does that really matter to the end user? I don't know if the RS wiki has anti-scraping measures, but there's tons of random nuggets of RS build and items guide all over the internet. Memorization isn't the only reason that models are good, they think, or do something so indistinguishable from the output of human thought that it doesn't matter.
If you met a person who was secretly GPT-4.5 in disguise, you would be rather unlikely to be able to tell at all that they weren't a normal human, not unless you went about suspicious from the start. (Don't ask me how this thought experiment would work, assume a human who just reads lines off AR lenses I guess).
This is a far more reasonable take in my opinion, if you'd said this at the start I'd have been far more agreeable.
I have minor disagreements nonetheless:
Well, if you're using the tools regularly and paying for them, you'll note improvements if and when they come. I expect reality to smack me in the face too, in the sense that even if I expect all kinds of AI related shenanigans, seeing a brick wall coming at my car doesn't matter all that much when I don't control the brakes.
For a short span of time, I was seriously considering switching careers from medicine to ML. I did MIT OCW programs, managed to solve one Leetcode medium, and then realized that AI was getting better at coding faster than I would. (And that there are a million Indian coders already, that was a factor). I'm not saying I'm a programmer, but I have at least a superficial understanding.
I distinctly remember what a difference GPT-4 made. GPT-3.5 was tripped up by even simple problems and hallucinated all the time. 4 was usually reliable, and I would wonder how I'd ever learned to code before it.
I have little reason to write code these days, but I can see myself vibe-coding. Despite your claims that you don't feel that programming had changed since 2015, there are no end of talented programmers like Karpathy or Carmac who would disagree.
You're welcome. It's probably the best LLM for code at the moment. That title changes hands every other week, but it's true for now.
Okay can we get people to start using delusions or confabulations instead of hallucinations. This always irks me.
I know we've bickered about this in the past but I think you have to be very cautious about what decision support tools and LLMs are doing in practical medicine at this time - fact recall is not most of the problem or difficulty.
The average person here could use UpToDate to answer many types of clinical questions, even without the clinical context that you, I, and ChatGPT have.
That's not the hard part of medicine. The hard part is managing volume (which AI tools can do better than people) and vagary (which they are shit at). Patients reporting symptoms incorrectly, complex comorbidity, a Physical Exam, these sorts of things are HARD.
Furthermore the research base in medicine is ass, and deciding if you want a decision support tool to use the research base or not is not a simple question.
On the topic of hallucinations/confabulations from LLMs in medicine:
https://x.com/emollick/status/1899562684405670394
This should scare you. It certainly scares me. The paper in question has no end of big names in it. Sigh, what happened to loyalty to your professional brethren? I might praise LLMs, but I'm not conducting the studies that put us out of work.
I expect that without medical education, and only googling things, the average person might get by fine for the majority of complaints, but the moment it gets complex (as in the medical presentation isn't textbook), they have a rate of error that mostly justifies deferring to a medical professional.
I don't think this is true when LLMs are involved. When presented with the same data as a human clinician, they're good enough to be the kind of doctor who wouldn't lose their license. The primary obstacles, as I see them, lie in legality, collecting the data, and the fact that the system is not set up for a user that has no arms and legs.
I expect that when compared to a telemedicine setup, an LLM would do just as well, or too close to call.
I disagree that they can't handle vagary. They seem epistemically well calibrated, consider horses before zebras, and are perfectly capable of asking clarifying questions. If a user lies, human doctors are often shit out of luck. In a psych setting, I'd be forced to go off previous records and seek collateral histories.
Complex comorbidities? I haven't run into a scenario where an LLM gave me a grossly incorrect answer. It's been a while since I was an ICU doc, that was GPT-3 days, but I don't think they'd have bungled the management of any case that comes to mind.
Physical exams? Big issue, but if existing medical systems often use non-doctor AHPs to triage, then LLMs can often slot into the position of the senior clinician. I wouldn't trust the average psych consultant to find anything but the rather obvious physical abnormalities. They spend blissful decades avoiding PRs or palpating livers. In other specialities, such as for internists, that's certainly different.
I don't think an LLM could replace me out of the box. I think a system that included an LLM, with additional human support, could, and for significant cost-savings.
Where I currently work, we're more bed-constrained than anything, and that's true for a lot of in-patient psych work. My workload is 90% paperwork versus interacting with patients. My boss, probably 50%. He's actually doing more real work, at least in terms of care provided.
Current setup:
3-4 resident or intern doctors. 1 in-patient cons. 1 outpatient cons. 4 nurses a ward. 4-5 HCAs per ward. Two wards total, and about 16-20 patients.
?number of AHPs like mental health nurses and social workers triaging out in the community. 2 ward clerks. A secretary or two, and a bunch of people whose roles are still inscrutable to me.
Today, if you gave me the money and computers that weren't locked down, I could probably get rid of half the doctors, and one of the clerks. I could probably knock off a consultant, but at significant risk of degrading service to unacceptable levels.
We're rather underemployed as-is, and this is a sleepy district hospital, so I'm considering the case where it's not.
You would need at least one trainee or intern doctor who remembered clinical medicine. A trainee 2 years ahead of me would be effectively autonomous, and could replace a cons barring the legal authority the latter holds. If you need token human oversight for prescribing and authorizing detention, then keep a cons and have him see the truly difficult cases.
I don't think even the ridiculous amount of electronic paperwork we have would rack up more than $20 a day for LLM queries.
I estimate this would represent about £292,910 in savings from not needing to employ those people, without degrading service. I think I'm grossly over-estimating LLM query costs, asking one (how kind of it) suggests a more realistic $5 a day.
This is far from a hyperoptimized setup. A lot of the social workers spend a good fraction of their time doing paperwork and admin. Easy savings there, have the rest go out and glad-hand.
I re-iterate that this is something I'm quite sure could be done today. At a certain point, it would stop making sense to train new psychiatrists at all, and that day might be now (not a 100% confidence claim). In 2 years? 5?
Do keep in mind how terrible most medical research is, and that includes research into our replacements. This isn't from lack of effort but from the various systems, pressures, and ethics at play.
How do you simulate a real patient encounter when testing an LLM? Well maybe you write a vignette (okay that's artificial and not a good example. Maybe you sanitize the data inputs and have a physician translate into the LLM. Well shit, that's not good either.
Do you have the patient directly talk to the LLM and have someone else feed in lab results? Okay maybe getting closer but let's see evidence they are actually doing that.
All in the setting of people very motivated to show the the tool works well and therefore are biased in research publication (not to mention all the people who run similar experiments and find that it doesn't work but can't get published!).
You see this all the time in microdosing, weed, and psychedelic research. The quality is ass.
Also keep in mind that a good physician is a manager also - you are picking up the slack on everyone else's job, calling family, coordinating communication for a variety of people, and doing things like actually convincing the patient to follow recommendations.
I haven't seen any papers on an LLMs attempts to get someone to take their 'beetus medication vs a living breathing person.
Also Psych will be up there with the procedurealists in the last to be replaced.
Also also other white collar jobs will go first.
I expect this would work. You could have the AI be something like GPT-4o Advanced Voice for the audio communication. You could record video and feed it into the LLM. This is something you can do now with Gemini, I'm not sure about ChatGPT.
You could, alternatively, have a human (cheaper than the doctor) handle the fussy bits. Ask the questions the AI wants asked, while there's a continuous processing loop in the background.
No promises, but I could try recording a video of myself pretending to be a patient and see how it fares.
I mean, quite a few of the authors are doctors, and I presume they'd also have a stake in us being gainfully employed.
I'd take orders from an LLM, if I was being paid to. This doesn't represent the bulk of a doctor's work, so if you keep a fraction of them around.. People are already being conditioned to take what LLMs take seriously. They can be convinced to take them more seriously, especially if vouched for.
That specific topic? Me neither. But there are plenty of studies of the ability of LLMs to persuade humans, and the very short answer is that they're not bad.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link