Contact Us
Sign In
Sign Up
Rules Admins Moderation Log Random Post Random User
What is this place?

This website is a place for people who want to move past shady thinking and test their ideas in a court of people who don't all share the same biases. Our goal is to optimize for light, not heat; this is a group effort, and all commentators are asked to do their part.

The weekly Culture War threads host the most controversial topics and are the most visible aspect of The Motte. However, many other topics are appropriate here. We encourage people to post anything related to science, politics, or philosophy; if in doubt, post!

Check out The Vault for an archive of old quality posts. You are encouraged to crosspost these elsewhere.

Why are you called The Motte?

A motte is a stone keep on a raised earthwork common in early medieval fortifications. More pertinently, it's an element in a rhetorical move called a "Motte-and-Bailey", originally identified by philosopher Nicholas Shackel. It describes the tendency in discourse for people to move from a controversial but high value claim to a defensible but less exciting one upon any resistance to the former. He likens this to the medieval fortification, where a desirable land (the bailey) is abandoned when in danger for the more easily defended motte. In Shackel's words, "The Motte represents the defensible but undesired propositions to which one retreats when hard pressed."

On The Motte, always attempt to remain inside your defensible territory, even if you are not being pressed.

New post guidelines

If you're posting something that isn't related to the culture war, we encourage you to post a thread for it. A submission statement is highly appreciated, but isn't necessary for text posts or links to largely-text posts such as blogs or news articles; if we're unsure of the value of your post, we might remove it until you add a submission statement. A submission statement is required for non-text sources (videos, podcasts, images).

Culture war posts go in the culture war thread; all links must either include a submission statement or significant commentary. Bare links without those will be removed.

If in doubt, please post it!

Rules
Recommended Posts And Communities
Recommended Realtime Chats
- Quokka's Den Telegram
- Astral Codex Ten Discord

PaperclipPerfector 6mo ago (text post) 35960 thread views

Culture War Roundup for the week of June 9, 2025

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

Shaming.
Attempting to 'build consensus' or enforce ideological conformity.
Making sweeping generalizations to vilify a group you dislike.
Recruiting for a cause.
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
Don't imply that someone said something they did not say, even if you think it follows from what they said.
Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

2124

2124
5

Jump in the discussion.

No email address required.

self_made_human amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi 6mo ago · Edited 6mo ago

At this point, I don't even know what an AGI is. The word has just been semantically saturated for me.

What I do know, based on having followed the field since before GPT-2 days, and personally fucked around since GPT-3, is that for at least a year or so, SOTA LLMs have been smarter and more useful than the average person. Perhaps one might consider even the ancient GPT 3.5 to have met this (low) bar.

They can't write? Have you seen the quality of the average /r/WritingPrompts post?

They can't code? Have you seen the average code monkey?

They can't do medicine/math/..? Have you tried?

The average human, when confronted with a problem outside their core domain of expertise, is dumb as rocks compared to an LLM.

I don't even know how I managed before LLMs were a thing. It hasn't been that long, I've spent the overwhelming majority of my life without them. If cheap and easy access to them were to magically vanish, my willingness to pay to get back access would be rather high.

Ah, it's all too easy to forget how goddamn useful it can be to have access to an alien intelligence in one's pocket. Even if it's a spiky, inhuman form of intelligence.

On the topic of them being cheap/free, it's a damn shame that AI Studio is moving to API access only. Google was very flustered by the rise of ChatGPT and the failure of Bard, it was practically begging people to give Gemini a try instead. I was pleasantly surprised and impressed since the 1.5 Pro days, and I'm annoyed that their gambit has paid off, that demand even among normies and casual /r/ChatGPT users increased to the point that even a niche website meant for powerusers got saturated.

Context

TheLoser self_made_human 6mo ago

They can't write? Have you seen the quality of the average /r/WritingPrompts post?

I'm sorry but being a better writer than literal redditors on /r/WritingPrompts is not a high bar to pass.

Context

erwgv3g34 TheLoser 6mo ago · Edited 6mo ago

And yet it is a bar that most humans cannot pass. We know this because redditors are humans (and, in fact, since they are selected for being literate and interested in creative writing, they must be above average human writing ability). That's the point of the grandparent; ChatGPT blew right past the Turing Test, and people didn't notice because they redefined it from "can pass for the average human at a given task" to "can pass for the top human at a given task".

Context

SnapDragon erwgv3g34 6mo ago

There are plenty of tasks (e.g. speaking multiple languages) where ChatGPT exceeds the top human, too. Given how much cherrypicking the "AI is overhyped" people do, it really seems like we've actually redefined AGI to "can exceed the top human at EVERY task", which is kind of ridiculous. There's a reasonable argument that even lowly ChatGPT 3.0 was our first encounter with "general" AI, after all. You can have "general" intelligence and still, you know, fail at things. See: humans.

Context

Jiro SnapDragon 6mo ago

If you say "it's okay for the AI to do as poorly as a poorly performing human", you'll end up concluding that even an Eliza program can do better than a drunk human who can barely type out words on a keyboard. And if you say "the AI only needs to exceed a top human at a few tasks", then a C64, which can run a simple calculator or chess program, would count as a general AI.

People are not cherrypicking. What they are doing is like the Turing test itself, but testing for intelligence instead of for "is like a human". People asking questions in a Turing test can't tell you in advance which questions would prove the target is a computer, but they have implicit knowledge that lets them dynamically change their questions to whatever is appropriate. Likewise, we don't know in advance exactly what things ChatGPT would have to do to prove it's a general intelligence, but we can use our implicit knowledge to dynamically impose new requirements based on how it succeeds at the previous requirements.

Saying "well, it can write, but can it code" is ultimately no different from saying "well, it can tell me its favorite food, but can it tell me something about recipes, and its favorite book, and what it did on Halloween". We don't complain that when someone does a Turing test and suddenly asks the computer what it did on Halloween, that he's cherrypicking criteria because he didn't write down that question ahead of time.

Context

SnapDragon Jiro 6mo ago · Edited 6mo ago

Well, I don't think your analogy of the Turing Test to a test for general intelligence is a good one. The reason the Turing Test is so popular is that it's a nice, objective, pass-or-fail test. Which makes it easy to apply - even if it's understood that it isn't perfectly correlated with AGI. (If you take HAL and force it to output a modem sound after every sentence it speaks, it fails the Turing Test every time, but that has nothing to do with its intelligence.)

Unfortunately we just don't have any simple definition or test for "general intelligence". You can't just ask questions across a variety of fields and declare "not intelligent" as soon as it fails one (or else humans would fail as soon as you asked them to rotate an 8-dimensional object in their head). I do agree that a proper test requires that we dynamically change the questions (so you can't just fit the AI to the test). But I think that, unavoidably, the test is going to boil down to a wishy-washy preponderance-of-evidence kind of thing. Hence everyone has their own vague definition of what "AGI" means to them; honestly, I'm fine with saying we're not there yet, but I'm also fine arguing that ChatGPT already satisfies it.

There are plenty of dynamic, "general", never-before-seen questions you can ask where ChatGPT does just fine! I do it all the time. The cherrypicking I'm referring to is, for example, the "how many Rs in strawberry" question, which is easy for us and hard for LLMs because of how they see tokens (and, also, I think humans are better at subitizing than LLMs). The fact that LLMs often get this wrong is a mark against them, but it's not iron-clad "proof" that they're not generally intelligent. (The channel AI Explained has a "Simple Bench" that I also don't really consider a proper test of AGI, because it's full of questions that are easy if you have embodied experience as a human. LLMs obviously do not.)

In the movie Phenomenon, rapidly listing mammals from A-Z is considered a sign of extreme intelligence. I can't do it without serious thought. ChatGPT does it instantly. In Bizarro ChatGPT world, somebody could write a cherrypicked blog post about how I do not have general intelligence.

Context

kky SnapDragon 6mo ago

The Turing Test ain’t simple pass/fail. It doesn’t specify an amount of time for the interaction, for instance, or whether it iterates, or whether people know the characteristics of the AI. I’d say that current LLMs could fool Turing himself, on the first go, but given a few iterations and enough time he’d notice something was up. Look at how our mods play spot the LLM. This would be a blanket yes/no if the Turing Test were pass/fail, but in reality it’s an evolving thing.

Context

self_made_human amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi TheLoser 6mo ago

Perhaps one might consider even the ancient GPT 3.5 to have met this (low) bar.

Context

SubstantialFrivolity I'm not even supposed to be here today self_made_human 6mo ago

Why do you consistently assume that people who don't share your views of LLM capabilities just haven't seen what they can do/what humans can do? For example:

They can't code? Have you seen the average code monkey?

Yes I have (and of course, I've used LLMs as well). That's why I say LLMs suck at code. I'm not some ignorant caricature like you seem to think, who is judging things without having proper frame of reference for them. I actually know what I'm talking about. I don't gainsay you when you say that an LLM is good at medical diagnoses, because that's not my field of expertise. But programming is, and they simply are not good at programming in my opinion. Obviously reasonable people can disagree on that evaluation, but it really irks me that you are writing like anyone who disagrees with your take is too inexperienced to give a proper evaluation.

Context

aqouta SubstantialFrivolity 6mo ago

I join the chorus of people who don't quite understand what your problem is with LLMs. What kind of code do you write? The tools are at the point where I can give them a picture of a screen I want along with some API endpoints and it reliably spits out immediately functioning react code. I can then ask it to write the middleware code for those endpoints and finally ask it to create a sproc for the database component. It's possible you hover high above us react monkeys and barely even consider that programming but surely you understand that's the level like at least half of all programmers operate on? I had copilot do all these things today, I know that it can do these things. So where is the disconnect? It's truly possible there is some higher plane of coding us uninspired 9-5 paycheck Andy's can only obliquely perceive and this is your standard for being able to program but it'd be nice if you could just say that to resolve the confusion.

Context

kky aqouta 6mo ago

I’ll give a description of what I do.

I manage servers. Or rather, I write code to do this, in accordance with some rather specific customer contracts. The times we take action, and the actions we take, are highly constrained. Even the basic concept of updates is not especially simple. I’m sure you remember Crowdstrike taking most of the Windows world down in a day. What I do is not so apocalyptic on the world scale, but our customers would find a similar event devastating. So most of my time is spent figuring out every possible path through server states and ensuring that they all lead back to places where faults can be cheaply recovered. These properties lie above the code. You can’t understand them, for the most part, just by reading the code. But they are incredibly important and must be thoroughly safeguarded, and even highly intelligent humans who just happen to be ignorant of the problem space or are a little careless have made really, really bad mistakes here. The code compiled, the tests passed, and it even seemed to work for a little in our integration environments - but it was horrifically flawed and came within an ace of causing material customer damage. So I don’t much trust an LLM which has a much more constrained sort of awareness, and in practice, they don’t much deliver.

I realize that’s a little vague, but I hope it explains a little about a more backend perspective on these problems. If I were more clever I’d give a clear example which was not real, but barring that, I hope a characterization helps.

Context

TequilaMockingbird Brown-skinned Fascist MAGA boot-licker aqouta 6mo ago · Edited 6mo ago

As somone who's been working in the field of machine learning since 2012 and generally agrees with @SubstantialFrivolity's assesment, I think that what we are looking here is a bifurcation in opinion between people looking for "bouba" solutions and those looking for "kiki" solutions.

If you're a high-school student or literature major with zero background in computer science looking to build a website or develop baby's first mobile app LLM generated code is a complete game changer. Literally the best thing since sliced bread. (The OP, and @self_made_human's comments reflect this)

If you're a decently competent programmer at a big tech firm, LLMs are at best a mild productivity booster. (See @kky's comments below)

If you are decently competent programmer working in an industry where things like accuracy, precision, and security are core concerns, LLMs start to look anti-productive as in the time you spent messing around with prompts, checking the LLM's work, and correcting it's errors, you could've easily done the work yourself.

Finally if you're one of those dark wizards working in FORTRAN or some proprietary machine language because this is ~~Sparta~~ IBM/Nvidia/TMSC and the compute must flow, you're skeptical of the claim that an LLM can write code that would compile at all.

Context

SubstantialFrivolity I'm not even supposed to be here today TequilaMockingbird 6mo ago

If you are decently competent programmer working in an industry where things like accuracy, precision, and security are core concerns, LLMs start to look anti-productive as in the time you spent messing around with prompts, checking the LLM's work, and correcting it's errors, you could've easily done the work yourself.

I think this fairly nicely summarizes how I feel. Not that I do work in one of those industries to be fair, but it's part of my personal work ethic I guess you might say. I want computers (and programs) to be correct first and foremost. Speed or ease of development don't mean much to me if the result can't be relied upon. Not only that, I want my tools to be correct first and foremost. I wouldn't accept a hammer where the head randomly fell off the handle 10% of the time or even 1% of the time. So I similarly have very little patience for an LLM which is inherently going to make mistakes in non-deterministic ways.

Context

kky SubstantialFrivolity 6mo ago

Preach, brother. Software is made to be clear and predictable. Learning to make it that way, one line at a time, is our craft. You can always tell the brilliant programmer apart because 99% of that code is simple as can be and 1% is commented like a formal proof. Worse than LLMs, reliance on LLMs risks undermining this skill. Who can say if something is correct if the justification is just that it came from the machine? There needs to be an external standard by which code is validated, and it must be internalized by humans so they can judge.

Context

aqouta TequilaMockingbird 6mo ago · Edited 6mo ago

If you're a high-school student or literature major with zero background in computer science looking to build a website or develop baby's first mobile app LLM generated code is a complete game changer. Literally the best thing since sliced bread.

You have to contend with the fact that like 95+% of employed programmers are at this level for this whole thing to click into place. It can write full stack CRUD code easily and consistently. five years ago you could have walked into any bank in any of the top 20 major cities in the united states with the coding ability of o3 and some basic soft skills and be earning six figures within 5 years. I know this to be the case, I've trained and hired these people.

If you are decently competent programmer working in an industry where things like accuracy, precision, and security are core concerns, LLMs start to look anti-productive as in the time you spent messing around with prompts, checking the LLM's work, and correcting it's errors, you could've easily done the work yourself.

I did allude that there might be a level of programming where one needs to see through the matrix to do but in SF's post and in most situations I've heard the critique in it's not really the case. They're just using it for writing config files that are annoying because they pull together a bunch of confusing contexts and interface with proprietary systems that you need to basically learn from institutional knowledge. The thing LLMs are worst at. Infrastructure and configuration are the two things most programmers hate the most because it's not really the more fulfilling code parts. But AI is good at the fulfilling code parts for the same reason people like doing them.

In time LLMs will be baked into the infrastructure parts too because it really is just a matter of context and standardization. It's not a capabilities problem, just a situation where context is splined between different systems.

Finally if you're one of those dark wizards working in FORTRAN or some proprietary machine language because this is Sparta IBM/Nvidia/TMSC and the compute must flow, you're skeptical of the claim that an LLM can write code that would compile at all.

If anything this is reversed, it can write FORTRAN fine, it probably can't do it in the proprietary hacked together nonsense installations put together in the 80s by people working in a time where patterns came on printed paper and might collaborate on standards once a year at a conference if they were all stars. but that's not the bot's fault. This is the kind of thinking that is impressed by calculators because it doesn't properly understand what's hard about some things.

I feel like I'm taking crazy pills here. No one's examples about how it can't write code are about it writing code. It's all config files and vague evals. No one is talking about it's ability to write code. It's all devops stuff.

Context

TequilaMockingbird Brown-skinned Fascist MAGA boot-licker aqouta 6mo ago · Edited 6mo ago

This is the kind of thinking that is impressed by calculators because it doesn't properly understand what's hard about some things.

Ironically I considered saying almost this exact thing in my above comment, but scratched it out as too antagonistic.

The high-school students and literature majors are impressed by LLMs ability to write code because they do not know enough about coding to know what parts are easy and what parts are hard.

Writing something that looks like netcode and maybe even compiles/runs is easy. (All you need is a socket, a for loop, a few if statements, a return case, and you're done) Writing netcode that is stable, functional, and secure enough to pass muster in the banking industry is hard. This is what i was gesturing towards with "Bouba" vs "Kiki" distinction. Banks are notoriously "prickly" about thier code because banking (unlike most of what Facebook, Amazon, and Google do) is one of those industries where the accuracy and security of information are core concerns.

Finally which LLM are you using to write FORTRAN? because after some brief experimentation niether Gemini nor Claude are anywhere close.

Context

aqouta TequilaMockingbird 6mo ago

What do you imagine is the ratio just at banks between people writing performant net code and people writing crud apps? If you want to be an elitist about it then be my guest, but it's a completely insane standard. Honestly the people rolling out the internal llm tooling almost certainly outnumber the people doing the work you're describing.

Context

TequilaMockingbird Brown-skinned Fascist MAGA boot-licker aqouta 6mo ago · Edited 6mo ago

I do not think that expecting basic competency is an "insane standard" or even that elitist. Stop making excuses for sub-par work and answer the question.

Which LLM are you using to write FORTRAN?

What sort of problem did you ask it to solve?

Context

aqouta TequilaMockingbird 6mo ago

In your effort to declare LLMs as incapable programmers you're excluding 95%+ of the profession, not literature majors. not high school students. Professional programers with CS and SE degrees. All I've been asking is for you to acknowledge that. If your standard is quant on a hft desk then great for you. I'm sure you're an excellent programmer. You'll probably have a job for six months longer than me.

Context

More comments

kky TequilaMockingbird 6mo ago

I mean, my full opinion and experience with LLMs is much harsher than my comment suggested, but I don’t want to start fights with enjoyers on the net. (At least, not this time.) Chances are their circumstances are different. But I would be seriously offended if someone sent me AI-generated code in my main area of expertise because it would be subtly or blatantly wrong and be a serious waste of my time trying to figure out all the errors of logic which only become apparent if you understand the implicit contracts involved in the domain. Goodness knows it’s bad enough when merely inexperienced programmers ask for review without first asking advice on how to approach the problem, or even without serious testing…

Context

TequilaMockingbird Brown-skinned Fascist MAGA boot-licker kky 6mo ago

Goodness knows it’s bad enough when merely inexperienced programmers ask for review without first asking advice on how to approach the problem, or even without serious testing…

I know that pain.

Context

SubstantialFrivolity I'm not even supposed to be here today aqouta 6mo ago

Oh for heaven's sake, dude. When did I ever say I consider myself better than anyone else, that I would deserve such a litany of sarcasm directed at me? I don't think that and certainly haven't said it. I am just an ordinary programmer - I doubt very much that I'm better at programming than anyone here except the non-programmers, and I'm sure I'm worse than more than a few. Not only did I say "hey I'm not trying to litigate this right now" and that got ignored, now I get people dogpiling me saying I'm a troll or think I'm better than everyone else or whatever.

But fine, since you and @SnapDragon are insistent on pressing me on the topic (and since I apparently didn't say to him what my experience was, my bad on that, but I know I have posted this in a previous thread before), I will reiterate the things that I personally have seen LLMs fall flat on their face with. This is of course in addition to the various embarrassments that are public, like Microsoft's ill-conceived attempt to let Copilot loose on PRs.

Tried to get ChatGPT to help me generate a fluentd config file that would process logs in a way I needed to do for work. It gave me a config file that not only didn't do the thing, it didn't conform to the schema and caused the software to crash
Tried to get it to help me order CloudFormation resource delete-and-recreate in a different way than the default order. It first gave me options that didn't even exist, then it gave me options that did exist but didn't do what I asked for. I had a similar issue with the AWS-trained model they provide, which also suggested options that don't do what I asked for (and are documented as such).
A coworker used ChatGPT (a custom one trained on our API docs) to generate a script to run against our API. Again it hallucinated methods that simply do not exist.

These were all within the last year, though I couldn't tell you exactly when or what model or anything. And I've been honest that sometimes it has done good work for me, namely in generating short snippets of code in a language (or using an API) that I know well enough to recognize as correct when I see it, but not well enough to produce without laborious reading of docs. I've never claimed that LLMs work 0% of the time (if people have taken that away, I've done a poor job communicating), but the failure rate is much too high for them to be considered viable tools in my book. Most frustratingly, the things that I actually need help on, the ones where I don't know really anything about the topic and a workable AI assistant would actually save me a ton of time, are precisely the cases where it fails hard (as in my examples where stuff doesn't even work at all).

So those are again my experiences with LLMs that have caused me to conclude that they are hype without substance. Disagree if you like, I don't mind if you find it useful and like I have tried to say I'm not actually trying to convince people of my views on this topic any more. Like I tried to say earlier, the only reason I posted in this thread was to push back on the idea that one simply must be ignorant if they don't think LLMs are good at coding (and other things). That idea is neither true, necessary, or kind (as the rules allude to) and I felt that it deserved some sort of rebuttal. Though heaven knows I wish I had just left it alone and had peace and quiet rather than multiple people jumping down my throat.

Context

aqouta SubstantialFrivolity 6mo ago

Apologies if I came on too hard, it's just you've been expressing this opinion for a while and had gone down several reply chains without bringing the thing to the object level. It's emblematic of the whole question, AI is "spikey", as in it's very good at some things and inexplicably bad at some other things. I don't think a lot of people would take so much offense if you just said it still seems bad at some tasks, that's broadly a consensus. But when you just say it "sucks at code" it's perplexing to the people watching it effortlessly do wide swaths of what used to be core programming work.

I could definitely see it struggle with highly context dependent config files but something seems strange about it not producing at least a valid file, did you try different prompts and giving it different contexts? I find giving it an example of valid output helps but I'm not familiar with fluentd and it's possible giving it enough context is unreasonable.

Context

SubstantialFrivolity I'm not even supposed to be here today aqouta 6mo ago

I have not tried that, but it also seems like kind of a failure of the tool if I have to, you know? The whole point of a tool that can understand natural language is that you can just talk to it normally. If one has to figure out how to word the incantations just right to get a useful result... I'm not sure how that's better than just figuring out the code myself at that point.

Context

aqouta SubstantialFrivolity 6mo ago

Prompting is a skill like any other. Sending it off without context is like telling an underling to fix your config file without explaining or letting them look at the system they're writing it for. It's often a mistake to assume the prompt needs to be something a human would understand. You can and should just dump unformatted logs, barely related examples of working config files, anything you can imagine an underline with infinite time in a locked room might find useful in solving your problem.

Context

SnapDragon SubstantialFrivolity 6mo ago

FWIW, I appreciate this reply, and I'm sorry for persistently dogpiling you. We disagree (and I wrongly thought you weren't arguing in good faith), but I definitely could have done a better job of keeping it friendly. Thank you for your perspective.

Most frustratingly, the things that I actually need help on, the ones where I don't know really anything about the topic and a workable AI assistant would actually save me a ton of time, are precisely the cases where it fails hard (as in my examples where stuff doesn't even work at all).

That does sound like a real Catch-22. My queries are typically in C++/Rust/Python, which the models know backwards, forwards, and sideways. I can believe that there's still a real limit to how much an LLM can "learn" a new language/schema/API just by dumping docs into the prompt. (And I don't know anything about OpenAI's custom models, but I suspect they're just manipulating the prompt, not using RL.) And when an LLM doesn't know how to do something, there's a risk it will fake it (hallucinate). We're agreed there.

Maybe using the best models would help. Or maybe, given the speed things are improving, just try again next year. :)

Context

SubstantialFrivolity I'm not even supposed to be here today SnapDragon 6mo ago

Thanks. And for my part I'm sorry that I blew you off unjustly; I really thought I had explained myself in detail but I was wrong.

And yeah, the tech might improve. I imagine you can see why I'm skeptical of the strong predictions that it'll do so (given that I don't agree it's as good as people say it is today), but I try to keep an open mind. It is possible, so we'll see.

Context

TheAntipopulist Formerly Ben___Garrison SubstantialFrivolity 6mo ago

and they simply are not good at programming

At @self_made_human's request, I'm answering this. I strongly believe LLMs to be a powerful force-multiplier for SWEs and programmers. I'm relatively new in my latest position, and most of the devs there were pessimistic about AI until I started showing them what I was doing with it, and how to use it properly. Some notes:

LLMs will be best where you know the least. If you're working on a 100k codebase that you've been dealing with for 10+ years in a language you've known for 20+ years, then the alpha on LLMs might be genuinely small. But if you have to deal with a new framework or language that's at least somewhat popular, then LLMs will speed you up massively. At the very least it will be able to rapidly generate discrete chunks of code to build a toolbelt like a Super StackOverflow.
Using LLMs are a skill, and if you don't prompt it correctly then it can veer towards garbage. You'll want to learn things like setting up a system prompt and initial messages, chaining queries from higher level design decisions down to smaller tasks, and especially managing context are all important. One of the devs at my workplace tried to raw-dog the LLM by dumping in a massive codebase with no further instruction while asking for like 10 different things simultaneously, and claimed AI was worthless when the result didn't compile after one attempt. Stuff like that is just a skill issue.
Use recent models, not stuff like 4o-mini. A lot of the devs at my current workplace tried experimenting with LLMs when they first blew up in early 2023, but those models were quite rudimentary compared to what we have today. Yet a lot of tools like Roo Cline or whatever have defaulted to old, crappy models to keep costs down, but that just results in bad code. You should be using one of 1) Claude Opus, 2) ChatGPT o3, or 3) Google Gemini 2.5 pro.

Context

kky TheAntipopulist 6mo ago

Speaking from my own experience with literal top-of-class LLMs.

LLMs are good for getting overviews of public, popular, highly documented technical systems. They can meaningfully reduce ramp-up time there. But it’s not too significant for the overall job, for most jobs. I’d estimate ramp-up time to be a modest fixed cost that is already effectively ameliorated by existing resources like Stack Overflow. So maybe a 2x speed up on 2% of overall working time.

They are also good for writing repetitive boilerplate. Copy/paste features are cool and helpful. This takes maybe 1% of my overall working time. I just don’t wind up repeating myself that much.

They can be good for getting code coverage, but that does not equate to good testing. I can elaborate if needed, but figuring out which system properties are most likely to need explicit coverage is an art that requires a high-level perspective that an LLM will not have for the majority of serious projects. This is around 10% of my job.

For lesser-known or internal APIs (common at larger companies), the LLM will hallucinate at extraordinary rates. This is around 5% of my job.

For anything technical, like refactoring class hierarchies, the LLM will get way out of its depth and is likely to produce gibberish. This is around 4% of my job.

It simply will not understand the larger requirements of a project, and what would make one solution valid and another invalid. This is about 15% of my job as it relates to code, and maybe 8% as it relates to design specifications, and 20% as it relates to talking with other people about said requirements.

The rest of my job is code review and progress updates, which maybe could be automated but which feels a little cheap to do. So I stand to save about 2% of my working time with AI, which is pretty marginal. And on my team, you can’t tell any meaningful difference in output between the people who use AI and the ones who don’t, which ties into my general assertion that it’s just not that helpful.

Then again, I’m a backend engineer in a pretty gritty ecosystem, so maybe this isn’t true for other software roles.

Context

TheAntipopulist Formerly Ben___Garrison kky 6mo ago

If there's one place I doubt AI will improve much in the near future, it's stakeholder management. That's why I think even if AI becomes an astronomically better coder than the average SWE, that SWE's could just rebrand as AI whisperers and translate the nuances of a manager's human-speak into AI prompts. Maybe it'll get there eventually, but we're still a good ways off from non-technical people being able to use AI to get any software they want without massive issues arising. The higher up in the org you are, the bigger a % of your job that stakeholder management becomes. I think we agree on this point overall.

On less well-known systems and APIs, I think the hallucination issue is more of a skill issue (within reason, I'm not making an accusation here). I'm translating a bunch of SQR (a niche language you've probably never heard of) queries to an antiquated version of TSQL right now, and the AI indeed hallucinates every now and then, but it's in predictable ways that can be solved with the right system prompts. E.g. sometimes it will put semicolons at the end of every line thinking its in a more modern version of SQL, and I have to tell it not to do that which is somewhat annoying, but simply writing a system prompt that has that information cuts down that issue by 99%. It's similar for unknown APIs -- if the AI is struggling, giving it a bit of context usually resolves those problems from what I've seen. Perhaps if you're working in a large org with mountains of bespoke stuff then the giving an AI all that context would just overwhelm it, but aside from that issue I've still found AI to be very helpful even in more niche topics.

On the time saved, you might want to be on the lookout for the dark leisure theory for some folks, while for others the time savings of using AI might be eaten up somewhat by learning to use the AI in the first place. I agree that the productivity boost hasn't been astronomical like some people claim, but I think it will increase over time as models improve, people become more skilled at AI, and people using AI to slack off get found out.

Context

kky TheAntipopulist 6mo ago

Haha, I really, really don’t think there’s any dark leisure here. None of the best performers rest much at all, and I talk with them pretty openly about their habits. Plus, our direct manager is bullish on AI and got the most enthusiastic guy on the team to do an AI demo a few weeks back. Using AI as a force multiplier would get you a raise, not more work.

The more I have to babysit the LLM, the less time-efficient it is for me. I don’t know what everyone’s experience is, but typing out code (even SQL) is just not that time consuming. I know, logically, what I want to happen, and so I write the statements that correspond to that behavior. Reading code for validity, rewriting it to make it more elegant and obviously correct, that takes more of my time, and LLM output is (like a junior dev) unreliable enough that I have to read deeply for (unlike a junior dev) no chance of it improving future output. Plus, the code I write tends to be different enough that the prospect of reprompting the LLM repeatedly is pretty unpleasant.

That said, I absolutely use it for Bash, which is arcane and unfamiliar to me. I still have to go through the slow process of validating its suggestions and rewriting pieces to make them more proper, but the way you perform simple logical actions in Bash is so far outside my wheelhouse that getting pointed in the right direction is valuable. So if you’re in a position where you’re doing more regular and rote work with particularly obnoxious but well-documented languages, it makes sense we’d have different opinions and experiences.

Context

self_made_human amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi SubstantialFrivolity 6mo ago

Or even consider a comment from your fellow programmer, @TheAntipopulist:

https://www.themotte.org/post/2154/culture-war-roundup-for-the-week/333796?context=8#context

They're generating billions in revenue already -- it's not nearly enough to sustain their current burn rates, but there's lots of genuine value there. I'm a professional software engineer, and AI is extremely helpful for my job; anyone who says it isn't is probably just using it wrong (skill issue).

Context

Ioper self_made_human 6mo ago · Edited 6mo ago

Notice how he didn't say that they're good at coding? He said that they're useful for his job.

LLMs are useful for SWEs, at least for some types some of the time. There is value here but they're poor programmers and to use them effectively you have to be relatively competent.

Its also very easy to fool yourself into thinking that they're much more valuable than they really are, likely due to how eloquently and verbosely they answer queries and requests.

Context

TheAntipopulist Formerly Ben___Garrison Ioper 6mo ago

I'd like to think I'm reasonably good at coding considering it's my job. However, it's somewhat hard to measure how effective a programmer or SWE is (Leetcode style questions are broadly known to be awful at this, yet it's what most interviewers ask for and judge candidates by).

Code is pretty easy to evaluate at a baseline. The biggest questions are "does it compile", and "does it give you the result you want" can be evaluated in like 10 seconds for most prompts, and that's like 90% of programming done right there. There's not a lot of room for BS'ing. There are of course other questions that take longer to answer, like "will this be prone to breaking due to weird edge cases", "is this reasonably performant", and "is this well documented". However, those have always been tougher questions to answer, even for things that are 100% done by professional devs.

Context

Ioper TheAntipopulist 6mo ago · Edited 6mo ago

While I'd say the only thing easy to answer is "does it compile", reading your other list I'd say I largely agree with your assesment.

LLMs can be a force multiplier for SWEs, but that doesn't mean they're good programmers. They're not programmers at all.

Looking at the points you made in your other post I'd argue that the biggest force multiplier is your first point and that this is a pretty big deal and bigger than people might first realise, especially non-engineers.

The second one is the issue I'm having with claims about LLM usability. Its kind of like dealing with mediocre Indian resources. You have break down and define the problem to such a degree that you've "almost" written the code yourself. This can still be useful and depending on your role very useful, but it isn't effectively replacing local resources either. Its not a method for solving problems but more of an advanced auto complete.

How useful is this? It depends on the situation and indivual and I'd rate it as moderately useful. Having managed developers, it also seems like something that (for some people) can feel like more of a productivity boost than it is due to time being spent differently (I'm not saying you're doing this).

Context

TheAntipopulist Formerly Ben___Garrison Ioper 6mo ago

it also seems like something that (for some people) can feel like more of a productivity boost than it is due to time being spent differently

I also wonder about this. I think in particularly bad cases it can be true, since if something doesn't work it becomes very tempting to just reprompt the AI with the error and see what comes back. Sometimes that works on a second attempt, and in other times I'll go back and forth for a dozen prompts or so. Whoops, there went an entire hour of my time! I'm trying to explicitly not fall into that habit more than I already have.

Overall I'd say it's a moderate productivity boost overall even factoring that in, and it's getting slowly better as both AI models improve and my skill in using them also improves.

Context

self_made_human amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi Ioper 6mo ago

@TheAntipopulist I'll let you speak for yourself instead of us reading the tea leaves.

Context

self_made_human amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi SubstantialFrivolity 6mo ago · Edited 6mo ago

Hang on. You're assuming I'm implying something in this comment that I don't think is a point I'm making. Notice I said average.

The average person who writes code. Not an UMC programmer who works for FAANG.

I strongly disagree that LLMs "suck at code". The proof of the pudding is in the eating; and for code, if it compiles and has the desired functionality.

More importantly, even from my perspective of not being able to exhaustively evaluate talent at coding (whereas I can usually tell if someone is giving out legitimate medical advice), there are dozens of talented, famous programmers who state the precise opposite of what you are saying. I don't have an exhaustive list handy, but at the very least, John Carmack? Andrej Karpathy? Less illustrious, but still a fan, Simon Willison?

Why should I privilege your claims over theirs?

Even the companies creating LLMs are use >10% of LLM written code for their own internal code bases. Google and Nvidia have papers about them being superhumanly good at things like writing optimized GPU kernels. Here's an example from Stanford:

https://crfm.stanford.edu/2025/05/28/fast-kernels.html

Or here's an example of someone finding 0day vulnerabilities in Linux using o3.

I (barely) know how to write code. I can't do it. I doubt even the average, competent programmer can find zero-days in Linux.

Of course, I'm just a humble doctor, and not an actual employable programmer. Tell me, are the examples I provided not about LLMs writing code? If they are, then I'm not sure you've got a leg to stand on.

TLDR: Other programmers, respected ones to boot, disagree strongly with you. Some of them even write up papers and research articles proving their point.

-1

Context

SubstantialFrivolity I'm not even supposed to be here today self_made_human 6mo ago

The average person who writes code. Not an UMC programmer who works for FAANG.

Yes, that is indeed what I meant as well.

The proof of the pudding is in the eating; and for code, if it compiles and has the desired functionality.

I agree. And it doesn't. Code generated by LLMs routinely hallucinates APIs that simply don't exist, has grievous security flaws, or doesn't achieve the desired objective. Which is not to say humans never make such mistakes (well, they never make up non-existent APIs in my experience but the other two happen), but they can learn and improve. LLMs can't do that, at least not yet, so they are doing worse than humans.

Why should I privilege your claims over [famous programmers]?

I'm not saying you should! I'm not telling you that mine is the only valid opinion; I did after all say that reasonable people can disagree on this. My issue is solely that your comment comes off as dismissing anyone who disagrees with you as too inexperienced to have an informed opinion. When you say "They can't code? Have you seen the average code monkey?", it implies "because if you had, you wouldn't say that LLMs are worse". That is what I object to, not your choice to listen to other programmers who have different opinions than mine.

Context

SnapDragon SubstantialFrivolity 6mo ago

Please post an example of what you claim is a "routine" failure by a modern model (2.5 Pro, o3, Claude 3.7 Sonnet). This should be easy! I want to understand how you could possibly know how to program and still believe what you're writing (unless you're just a troll, sigh).

Context

SubstantialFrivolity I'm not even supposed to be here today SnapDragon 6mo ago

I've tried to have this debate with you in the past and I'm not doing it again, as nothing has changed. I'm not even trying to debate it with self_made_human really - I certainly wouldn't believe me over Carmack if I was in his shoes. My point here is that one should not attribute "this person disagrees with my take" to "they don't know what they're talking about".

Context

SnapDragon SubstantialFrivolity 6mo ago

Right, and I asked you for evidence last time too. Is that an unreasonable request? This isn't some ephemeral value judgement we're debating; your factual claims are in direct contradiction to my experience.

Context

SubstantialFrivolity I'm not even supposed to be here today SnapDragon 6mo ago

Right, and I gave it then. Which is why I am not going to bother doing it this time. Like I said, nothing has changed.

Context

SnapDragon SubstantialFrivolity 6mo ago

What the hell? You most definitely did NOT give any evidence then. Nor in our first argument. I'm not asking so I can nitpick. I would genuinely like to see a somewhat-compact example of a modern LLM failing at code in a way that we both, as programmers, can agree "sucks".

Context

dr_analog top 1% of underdog fetishists self_made_human 6mo ago

They can't do medicine/math/..? Have you tried?

Yes. The number of times I've gotten a better differential diagnosis from an LLM than in an ER is too damn high.

Context

Corvos dr_analog 6mo ago

Are you an actual doctor? (I’m not.) I’ve found LLMs good at coming up with plausible hypotheses but bad at blocking them off.

Context

dr_analog top 1% of underdog fetishists Corvos 6mo ago

No. Just a person who has taken my kids to the ER too many times.

Context

Corvos dr_analog 6mo ago

I remember (will never forget) that awful story about the tick.

Context

George_E_Hale insufferable blowhard dr_analog 6mo ago

Allergies? Not my business, but that was always my fear as my boys were coming up. A bite of a piece of chocolate that was apparently near a peanut sent my one son to a hospital. Just hives, but I am happy to say they did the right thing and kept him overnight. Bi/multiphasic anaphylaxis precaution. The horror stories are usually because the epipen is treated as a one and done.

I just wrote a lot about allergies if you're talking about something completely different.

Context

dr_analog top 1% of underdog fetishists George_E_Hale 6mo ago · Edited 6mo ago

A tick, actually :/

https://www.themotte.org/post/1986/culture-war-roundup-for-the-week/331290?context=8#context

When he woke up paralyzed I was about to start the usual techbro thing of asking ChatGPT but said no, don't be that guy, lets just take him to the ER.

But then after we found the tick through no thanks to the ER, I plugged his symptoms and circumstances, exactly what we told the ER people, into ChatGPT4 classic and it listed ticks as the second thing to check for.

Context

What is this place?

This website is a place for people who want to move past shady thinking and test their ideas in a court of people who don't all share the same biases. Our goal is to optimize for light, not heat; this is a group effort, and all commentators are asked to do their part.

The weekly Culture War threads host the most controversial topics and are the most visible aspect of The Motte. However, many other topics are appropriate here. We encourage people to post anything related to science, politics, or philosophy; if in doubt, post!

Check out The Vault for an archive of old quality posts. You are encouraged to crosspost these elsewhere.

Why are you called The Motte?

A motte is a stone keep on a raised earthwork common in early medieval fortifications. More pertinently, it's an element in a rhetorical move called a "Motte-and-Bailey", originally identified by philosopher Nicholas Shackel. It describes the tendency in discourse for people to move from a controversial but high value claim to a defensible but less exciting one upon any resistance to the former. He likens this to the medieval fortification, where a desirable land (the bailey) is abandoned when in danger for the more easily defended motte. In Shackel's words, "The Motte represents the defensible but undesired propositions to which one retreats when hard pressed."

On The Motte, always attempt to remain inside your defensible territory, even if you are not being pressed.

New post guidelines

If you're posting something that isn't related to the culture war, we encourage you to post a thread for it. A submission statement is highly appreciated, but isn't necessary for text posts or links to largely-text posts such as blogs or news articles; if we're unsure of the value of your post, we might remove it until you add a submission statement. A submission statement is required for non-text sources (videos, podcasts, images).

Culture war posts go in the culture war thread; all links must either include a submission statement or significant commentary. Bare links without those will be removed.

If in doubt, please post it!

Rules

Recommended Realtime Chats

Link copied to clipboard

Action successful!

Error, please try again later.

Culture War Roundup for the week of June 9, 2025

Jump in the discussion.

What is this place?

Why are you called The Motte?

New post guidelines

Rules

Recommended Posts And Communities

Recommended Realtime Chats