Contact Us
Sign In
Sign Up
Rules Admins Moderation Log Random Post Random User
What is this place?

This website is a place for people who want to move past shady thinking and test their ideas in a court of people who don't all share the same biases. Our goal is to optimize for light, not heat; this is a group effort, and all commentators are asked to do their part.

The weekly Culture War threads host the most controversial topics and are the most visible aspect of The Motte. However, many other topics are appropriate here. We encourage people to post anything related to science, politics, or philosophy; if in doubt, post!

Check out The Vault for an archive of old quality posts. You are encouraged to crosspost these elsewhere.

Why are you called The Motte?

A motte is a stone keep on a raised earthwork common in early medieval fortifications. More pertinently, it's an element in a rhetorical move called a "Motte-and-Bailey", originally identified by philosopher Nicholas Shackel. It describes the tendency in discourse for people to move from a controversial but high value claim to a defensible but less exciting one upon any resistance to the former. He likens this to the medieval fortification, where a desirable land (the bailey) is abandoned when in danger for the more easily defended motte. In Shackel's words, "The Motte represents the defensible but undesired propositions to which one retreats when hard pressed."

On The Motte, always attempt to remain inside your defensible territory, even if you are not being pressed.

New post guidelines

If you're posting something that isn't related to the culture war, we encourage you to post a thread for it. A submission statement is highly appreciated, but isn't necessary for text posts or links to largely-text posts such as blogs or news articles; if we're unsure of the value of your post, we might remove it until you add a submission statement. A submission statement is required for non-text sources (videos, podcasts, images).

Culture war posts go in the culture war thread; all links must either include a submission statement or significant commentary. Bare links without those will be removed.

If in doubt, please post it!

Rules
Recommended Posts And Communities
Recommended Realtime Chats
- Quokka's Den Telegram
- Astral Codex Ten Discord

PaperclipPerfector 4mo ago (text post) 27698 thread views

Culture War Roundup for the week of February 2, 2026

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

Shaming.
Attempting to 'build consensus' or enforce ideological conformity.
Making sweeping generalizations to vilify a group you dislike.
Recruiting for a cause.
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
Don't imply that someone said something they did not say, even if you think it follows from what they said.
Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

1701

1701
5

Jump in the discussion.

No email address required.

100ProofTollBooth Dumber than a man, but faster than a dog. 4mo ago

This is a continuation of a topic brought up in one of the AAQCs for January. Hat tip to @birb_cromble

I value and believe what @birb_cromble wrote. I think AI is both over and under hyped (more on that below). I believe birb's report that a team of good devs are looking at it and saying "wtf ... this is ... ok ... maybe?" I think @RandomRanger had a similar comment that I am struggling to find (although, to be fair, it was pointed out that Ranger was using copilot which is a known dumpster fire).

On the other hand, I have direct, personal experience with AI (to be specific, as I kind of hate the blanket term, "AI", coding oriented LLMs) writing good code quickly and accurately. I've had past colleagues far more gifted than myself send 11pm "holy shit" texts based on their own projects. The head of Anthropic, has publicly stated that LLMs write 100% of the code at Antrhopic now. And the guy behind ClawdBot / MoltBook (or whatever its called now) has openly discussed how his own deployment of ClawdBot was thinking and executing ahead of him.

If it's all hype, it is the mother of all hype cycles and something that approaches a mass movement of hysteria. This would be outright falsehoods and lying on a level usually reserved for North Korean heads of state and Subsaharan cult leaders.

I don't think it's that. I am, however, developing the idea that both sides are actually right at the same time in different directions. To explain that, we're going to have to talk about software and software companies a little bit.

1. CRUD

Create, read, update, and delete or "CRUD" is what is at the core of almost every piece of software that is above the operating system level. CRUD is definitely at the core of almost every piece of software that is sold from one company to another (business-to-business or b2b) and most software sold to customers (business-to-customers or b2c). There are exceptions, of course, some of them quite large. But the fact remains that most software is about having data somewhere, storing it, asking it questions, modifying it (and unmodifying it), and, perhaps, deleting it (note, however, that with storage being fundamentally cheap now, deletion is a kind of philosophical state. Your e-mails for instance, are often not deleted until you double-for-serious-delete-them and then wait 30+ days).

A junior developer can build a CRUD app on their computer at home in less than a week. By hand, from scratch, zero LLM involved. Building a CRUD app is often a final assignment for mid-level undergraduate CompSci work. You, yes you, can build a CRUD app today with one good, long prompt to any of the big LLMs. It will be complete, with minimal to zero bugs.

Salesforce, at its core, is a CRUD app. Salesforce is worth almost $200 bn while the CRUD app you build is worth exactly nothing. Why is this?

2. Enterprise

The holy grail of all b2b software is their first enterprise customer. What defines "enterprise?" It's a bit of squishy term, but it means a big company. 1,000+ employees is more or less agreed upon as the minimum, though this may vary depending on the market niche you're in. Why are enterprises so prized? Because you're selling your product at scale (usually in terms of individual user licenses or "seats") to a customer who can pay a six, seven, or even eight figure annual bill without worrying about it and will not switch to one of your competitors quickly (....usually). This is where b2b software companies get their explosive valuations from and where founders get capital-F Fuck you money. Salesforce, our CRUD app supreme, has enterprise deals, probably, with every F500 company and thousands more very large companies. They recently announced a deal with the U.S. Army (lol, ELLE-OH-FUCKING-ELLE to that one). Salesforce has more enterprise than a Star Trek reboot.

But isn't an enterprise CRUD app still a CRUD app?

Yes, yes it is. But it's a CRUD app that;

Can handle thousands of concurrent users
Can manage all of the different levels of access control granted to each user by other users (admins etc.)
Handles IAM - Identity and Access Management. Basically all of the security stuff like two factor authentication, password resets etc.
Has, built into it, all of the necessary record and data retention requirements that many of these big F500s are legally required to have. (Note: GDPR requirements in Europe are close to impossible to actually meet, so many b2b companies either don't sell to Europe or will only sell them access to their software hosted on U.S. servers. It is impossible to overstate how much of an own goal GDPR was for Europe's tech sector).
And this is maybe the biggest one, it can integrate with a bunch of other apps - CRUD or otherwise

To return to the CRUD app you just built at home, it works just fine on your laptop! Can it export seamlessly to Excel or Word? No. Can I log into it remotely from my laptop while I am in the Delta lounge at O'Hare? No. What if four people want to work on it together at the same time. Uh, no - you don't even have a login into it! You just start it and boom, you're CRUD-ing around.

So much of the value of "big" software is all of the non-core functionality that is bolted on top of it in overlapping layers. This is also the dirty secret of what a lot of FAANG engineers do - write integrations between one product or service and another. They are not thinking up the next killer app, but essentially acting as digital plumbers in the world's largest city.

In the startup world, core functionality is often complete within the first year or two. It kind of has to be to gain your first customers. Then, so much of "product development" is figuring out where you're going to spend your time building integrations and then balancing that against actual new feature requests. The smart product managers realize that they can unite those two things and integrate a new feature from a different product. Two birds, one stone, zero actual innovation. Give that man a promotion.

There was a unicorn that literally was an integration hub for different products and services.

3. New vs legacy software

This is where we start to get into "both sides may be right" territory. From my experience, it seems AI is now quite good at writing new software, even fairly complex systems. It can do this because it doesn't have to make any assumptions about how anything already works. If it makes assumptions based on the user's intent, it is usually decent at carrying those assumptions through development to the finished product. In cases where it is not, you, the human, have to debug. Debugging, in this case, however, is often no harder than saying "Hey, this part doesn't work, and I think it might be because of xyz..."

This is not the case when you deploy AI against a legacy codebase, which is exactly what @birb_cromble mentioned. This is because legacy codebases are evolutionary products of a system changing over time. Ideally, each major upgrade - and even the minor ones too - to a system are documented. What "documented" means, however, varies wildly across developer teams. For sometimes, it's nothing more than a quick changelog of bullet points. For other teams, they write about the decision making process that led to changes. Most documentation is incomplete or somewhat ambiguous. I would argue that, right now, almost all legacy documentation is in no way written for LLMs to use well in their context windows.

4. Documentation

Unless it is. That link is to a good blog post on the recent fracas at Tailwind labs. Tailwind labs makes software and gives its core functionality away for free. This is the same model as Red Hat linux. They make money by having developers realize that they, Tailwind, have already built premium features on top of the core and will sell those features and hosting to companies that want it. I actually really like this so called "open core" business model because I think it's philosophically more in line with OG software ideals. Linux and its various derivatives have been free - in some form - since the 1970s, and the world's infrastructure runs on it. If Linux had been locked down from the start, I am convinced computers would still be weirdo specialty scientific equipment.

Anyways, back to Tailwind. Tailwind had to lay off about 75% of its staff because AIs read their whole documentation - which was very, very good - and can, now, build all of the premium services on their own. This fucking sucks, it's bad, nobody likes it. OpenSource is a necessary part of the software ecosystem. Even the most evilest of the FAANGS pour millions of dollars into sponsoring open source projects every year - because they rely on lots of those projects in their own code bases. Now, however, LLMs that scrape the internet, potentially, pose an existential threat to opening up your documentation plus codebase. It's as if you've just created one million free forever expert devs. Furthermore, this also exposes a dark pattern. If you want to retain your IP, lock down your documentation, intentionally obfuscate it, or just don't post it and only support your product with bill-per-hour in-house tech support teams.

The good news, however, is that most documentation is such shit that this will not happen.

But let's return to the main thread: AI under and overhyped at the same time.

My suspicion with @birb_crombles code base is that it isn't completely documented. This is absolutely NOT a shot at birb. I say this because, for any legacy code base, it is essentially impossible to build and maintain complete documentation that describes not only how the system operations, but how it evolved over time. This is valuable and necessary context for an LLM. All of the assumptions it makes about various libraries and modules can be very, very wrong because it doesn't have the legacy "evolutionary" documentation to inform it of various design choices and modifications. Birb and his team have that context as tacit knowledge in their brains and shared collective intelligence. "Hey why does thing x do action y?" , "Oh, team A needed that special feature so they could do necessary report z" , "cool, got it." That 10 second exchange across the the aisle with another dev is worth approximately 1 million lines of well written context to an LLM (1 million may or may not be an exaggeration.)

Birb said as much in his post. He wrote:

After that the wisdom was that we needed to carefully structure our tickets and our problems so that the tool could one-shot the problem, because no Reasonable Person could possibly expect a coding agent to iterate on a solution in one session. The problem with that solution is that by the time we've broken the problem down that much, any of us could have done it ourselves.

Bravo, Birb! I mean this sincerely. Phrased differently, Birb is saying that once his team provided extra-context documentation, the LLM was performant. However, by doing so, his team pretty much arrived at a state where the fix was obvious and easy.

Very well done documentation does lead to this. However, documentation is literally endless if you want to cover not only the system now but how it evolved over time. Good technical writers at easily $100k+ and they are necessarily slower than writing new code. Most companies will not invest in this because, economically, they can't.

4. Ships and Planes

Existing legacy software is like a ship. It's big and slow, sure, but it's moving a lot of mass and is more or less steady and stable. One-shotted LLM applications - like Clawdbot - are like planes - fast, soaring, sexy, and, sometimes, they crash spectacularly. The thing to point out, however, is that planes cannot move, economically, the bulk that a ship can. What I mean here is that all of the evolutionary design choices, system revisions, and tacit knowledge that a legacy codebase reflects is a very bad payload to deploy an LLM against. There are too many unknown unknowns and relationships that are hidden so as to be very improbable. An LLM is a probabilistic machine, so it relies on what makes sense on average - not what is real in a specific circumstance.

But deploying an AI against the clear blue sky (like a plane) is its most advantageous arena because it can just assume the average and build the thing from scratch.

Big, legacy CRUD apps - and, absolutely, more specialized apps - aren't really in danger of being disrupted by AI in the immediate future. 5 to 7 years from now, ehhhh, I am not so sure. The folks who are absolutely totally fucked as in right now, today are any startups that have launched a CRUD app with the idea that they'll do all the dirty work of building it into an enterprise offering. The market for that is quickly evaporating. Instead, internal tool teams will just use LLMs to make their own CRUD app, wrap it in their existing security etc. stack and use it internally. This may equate out to as much as $250k of combined labor hours and API credits but, 1) that would be at the high end and 2) that would be a one time cost (besides internal maintenance) instead of the the recurring six, seven, eight figures of spend to a third party.

5. Conclusion

I hope I've done a reasonable job in showing how both sides are right. I believe @birb_cromble. I believe, because I see, that pretty big names in software, who were even AI skeptics (roon on twitter, for instance) are now admitting to 100% agentic coding. The difference is in the starting point and the legacy debt or bulk that a given party engages with.

Context

faul_sname Fuck around once, find out once. Do it again, now it's science. 100ProofTollBooth 4mo ago

And the guy behind ClawdBot / MoltBook (or whatever its called now) has openly discussed how his own deployment of ClawdBot was thinking and executing ahead of him.

I will point out that MoltBook had exposed it's entire production database for both reads and writes to anyone who had an API key (paywalled link, hn discussion).

And this is fairly representative of my experience with AI code on substantial new projects as well. In the process of building something, whether it's something new or something legacy, the builder will need to make thousands of tiny decisions. For a human builder, the quality of those decisions will generally be quite tightly correlated to how difficult it is for a different human to make a good decision there, and so, for the most part, if you see signs of high-thoughtfulness polish in a few different part of a human-built application that usually means that the human builder put at least some thought into all the parts of that application. Not so for "AI agents" though. One part might have a genuinely novel data structure which is a perfect fit for the needs of the project and then another part might ship all your API keys to the client or build a SQL query through string concatenation or drop and recreate tables any time a schema migration needs to happen.

That's not to say the "AI coding agent" tools are useless. I use them every day, and mostly on a janky legacy codebase at that. They're excellent for most tasks where success is difficult or time-consuming to achieve but easy to evaluate - and that's quite a lot of tasks. e.g.

Make an easy-to-understand regression test for a tricky bug: "User reports bug, expected behavior X, observed behavior Y. Here's the timestamped list of endpoints the user hit, all associated logs, and a local environment to play around in. Generate a hypothesis for what happened, then write a regression test which reproduces the bug by hitting the necessary subset of those endpoints in the correct order with plausible payloads. Iterate until you have reproduced the bug or falsified your hypothesis, If your hypothesis was falsified, generate a new hypothesis and try again up to 5 times. If your test successfully reproduces the bug, rewrite it with a focus on pedagogy - at each non-obvious step of setup, explain what that step of setup is doing and why it's necessary, and for each group of logically-connected assertions, group them together into an evocatively-named assert() method."
Take a SQL query which returns a piece of information about one user by id and rewrite it to performantly return that information for all users in a list
Review pull requests to identify which areas would really benefit from tests and don't currently have them
Review pull requests to identify obvious bugs

Context

quiet_NaN 100ProofTollBooth 4mo ago

CRUD is definitely at the core of almost every piece of software that is sold from one company to another (business-to-business or b2b) and most software sold to customers (business-to-customers or b2c). There are exceptions, of course, some of them quite large.

I would not call it at the core, generally.

I mean, take Dwarf Fortress. Of course, you have CRUD on savegames, but the purpose of DF is not to create savegames. If you squint your eyes hard enough, you might also find CRUD in-game (make crafts, look at crafts, encrust crafts with gems, sell crafts). Or even any work with OOP objects. But looking at the game through that lens seems rather artificial. I might as well look at a computer through the lens of gates or RTL.

Or take something completely different, Google maps. There is certainly CRUD involved somewhere. The client reads (and displays) map data. It sends information on traffic back to Google. It also might request a route, which I guess you could model as creating an route_request object, which then gets resolved by the server (reading data on traffic in turn) and returned as a route object (which can then be updated by the server as traffic conditions change). CRUD would be involved for certain, but more like cellular respiration is involved in a human flirting with another. You are unlikely to learn much of interest for the outcome by looking at mitochondria performance (unless one of the participants has an abnormally high blood concentration of cyanide ions).

If I want to build a Google maps clone, the CRUD would be the easy part. There are protocols for that. The interesting part is all the rest -- in what format is your map data, how do you use it for routing and for displaying and so forth.

I will grant you that some B2B applications are indeed mostly CRUD, though. If you have a company internal procurement system which displays items from an external vendor, and lets a user place orders with the vendor, then that may well be just a thin layer gluing two APIs to each other. Just like sometimes mammals might primarily engage in cellular respiration and do little else.

Context

RandomRanger Just build nuclear plants! 100ProofTollBooth 4mo ago

I think @RandomRanger had a similar comment that I am struggling to find (although, to be fair, it was pointed out that Ranger was using copilot which is a known dumpster fire).

I thought I was pretty far out on the 'singularity soon' wing of this website? In my experience AI is quite good for writing code, whether that's CRUD or more interesting code like pathfinding or O(n) tier operations or even writing out procedurally generated shaders and effects.

Not perfect, it does struggle and choke a bit right now on the more advanced or fiddly things... But what happens when it starts directing a 1000 subagents to attack your million line monstrosity of legacy code? What happens when it can error-test better?

Context

100ProofTollBooth Dumber than a man, but faster than a dog. RandomRanger 4mo ago

I might be mixing you up with someone else. My apologies.

Context

DradisPing 100ProofTollBooth 4mo ago

With Tailwind it's less their specific documentation, it's more that it became an industry best practice for new projects. There is just so much Tailwind content on GitHub and in blogposts. It's also specifically targeted by the LLM teams developing models.

Claude actually knows Tailwind better than CSS and will sometimes try to use it in projects that don't have it installed.

I think that scraping public GitHub repos is actually more important to LLM performance than documentation about your specific project. That all gets baked into the core model. If you're doing something with a lot of public examples it will one shot it.

I have a specific example. I've been playing around with implementing a db compatible clone of themotte/rDrama in node / react to get around some of the issues the codebase has. Two slightly incompatible markdown renderers on the front and back, old school bootstrap modals, etc.

I mentioned themotte/rDrama in my instructions to Claude Code and it put in some very rDrama like features such as coloured indent level bars on comments.

So it was clearly aware.

As a result a lot of the benchmark projects people try to use to document model performance become useless. The model can look up public examples of the answer.

LLMs are very good at the sort of thing that really should have been automated by now anyways. eg converting nested json object from a POST request into rows in SQL tables.

When you're doing something less common it has a lot more trouble. It does seem a lot better at working on my minecraft mod project than it was six months ago. That's probably due in part to scraping the public repo of the mod itself. It has a rough image of the working endpoints without needing to look at any context.

I suspect that offline models will become good enough in the near future that large legacy projects will be able to fine tune a model against the codebase.

Context

100ProofTollBooth Dumber than a man, but faster than a dog. DradisPing 4mo ago

This is an angle I wish I would've thought to include in my original post. That of LLMs as very, very, very, very good targeted search engines. That's, actually, probably where the most immediate disruption will occur. There's a graph going around of StackOverflow traffic and its decline is remarkable.

Context

curious_straight_ca 100ProofTollBooth 4mo ago

This is all true now, but your '5 to 7' year timeline seems long! LLMs were not anywhere near where they are now 2 years ago, and with simple extrapolation 2 years from now I think it's as likely as not they'll be able to handle legacy code just fine, just like humans can.

Context

100ProofTollBooth Dumber than a man, but faster than a dog. curious_straight_ca 4mo ago

Without any constraints, I agree with you. I think we're going to hit data center and power availability constraints, however. And, we're already seeing luddite political resistance to building out capacity

Context

zeke5123a curious_straight_ca 4mo ago

And my 8 year will be a giant in 8 years. I think the whole debate is whether LLMs are slowing down on improvements (ie can you really simply extrapolate 2 more years)

Context

SubstackNecron zeke5123a 4mo ago

Are they not already shown to be slowing down in improvements?

Context

aqouta SubstackNecron 4mo ago

I don't know how any could possibly prove or disprove this statement. The levels of improvement are difficult to quantify and they keep needing to come up with new benchmarks because the old ones get saturated, meaning all frontier models max them out. I can say from personal experience that they still appear to be rapidly improving but quantifying the rate is impossible.

Context

zeke5123a SubstackNecron 4mo ago

That’s what I thought but don’t know for certain.

Context

yofuckreddit 100ProofTollBooth 4mo ago

I wish I had the time to respond more deeply to both this thread and birbs. But I almost completely agree with you all.

I'm now of the opinion that I'll be able to amass enough capital to retire before AI takes my job.

Legacy codebases simply require too big of a context window for an AI to appreciably absorb, and they never have high enough quality documentation available even if they did. They also often deal with systems that simply aren't connected unless you provide them agentic capabilities that don't exist yet. That's assuming you can get your security guy, whose entire job is to say "no" endlessly without ever providing solutions to problems, to let a robot crawl through your network and source code.

I'm literally proposing a 7-8 figure project around this tomorrow. There's a 30+ year-old AS/400 working alongside thousands of Access databases, hundreds of SQL Servers, and dozens of client applications. Each of these abstraction layers has hidden business logic, various interchange patterns, and bugs that have become features over 3 decades.

Even once we've disentangled a tiny part of the elephant to improve the state of play and build modern REST APIs to serve various client applications and partners, then there's a cadre of humans who have worked with these systems for so long that they need to gently be brought into the light, which takes far more money and time than anyone would care to admit.

This is one company of thousands. They're a little behind the baseline, but not by much. I've got another decade of dealing with the hardest problems in software and ahead of me if I can keep stomaching how endlessly frustrating it can be.

Context

100ProofTollBooth Dumber than a man, but faster than a dog. yofuckreddit 4mo ago

There's a 30+ year-old AS/400 working alongside thousands of Access databases, hundreds of SQL Servers

Please provide a trigger warning before typing this out so that SysAdmins can make sure they have their therapy anime body pillow with them.

Context

ActuallyATleilaxuGhola Axolotl Tank Class of '24 100ProofTollBooth 4mo ago · Edited 4mo ago

This is interesting and timely for me. We have a legendarily dysfunctional QA team at my company, and as the DevSecSREInfraPlatformOps manager, my biggest beef with them is that they have been killing our Lead Time to Release (time between dev having an idea and that idea getting released as a feature for customers). They manually test almost everything, do nonsensical "verifications" (mocking responses from external APIs when they could just... use the APIs), and don't know how to code at all. Dev and automated testing might finish in a few days, but QA takes 2-3 weeks(!) and they batch multiple changes together which confounds results. They have been given a half dozen opportunities to change and learn but they have always made excuses or refused. At one point they even had convinced a (former) director that they needed outsourced QA members to help with the workload -- and then promptly shifted 95% of their work to the outsourced QA!

Just this past week, I asked for a Claude Code account and started trying to vibe code a replacement for our QA team, partly out of curiosity, partly out of necessity (we have an OKR to reduce lead time), and just a little bit out of spite. I was not optimistic because this is a very poorly documented 10+ year old codebase cobbled together by devs who have all since resigned to escape the mess they've created.

First, I told Claude to pull down all the test suites described in Qase and cache them locally. Then I told it all of the paths to local copies of our frontend, backend, mobile, and infra repos. I asked it to analyze each one. Then, I asked it to begin writing tests for each Qase suite, starting with the simplest ones like "login." Sometimes it would get confused (it shows you its thinking) and I would interject a message (you can send messages while it's thinking, unlike other LLMs) to explain some important bit of knowledge. Eventually, after repeating the same info several times which had apparently gotten lost in the context window compaction process, I told it to create a file called TRIBAL.md to record all of these contextual bits I was telling it that were not evident from simply reading the code. I also had it write a CLAUDE.md that points to all the repos, instructs it to read and update TRIBAL.md, BUGS.md and TODO.md, and contains descriptions of other tools it has created for itself (helper functions, data seeding scripts, env vars and credentials, etc).

So far I don't think I've written a single line of code and it has automated 85% of web QA tests. I have had the test vetted by code rabbit and I plan to check them manually before release. I am quite impressed, though I'm still curious to see how it will try to handle mobile testing. Claude Code really is next level compared to Gemini, Copilot, or Grok.

All that said, I am very aware that this project is probably riddled with false assumptions and nonsense code. I am still not optimistic about the final result, although given how dire QA is, our director might try it out anyway just to see if we can reduce our dependence on them. Either way, it's been a good way to get familiar with a SOTA coding LLM.

Context

phailyoor 100ProofTollBooth 4mo ago

As a related thing, I saw this article/video about vibe coding: https://atmoio.substack.com/p/after-two-years-of-vibecoding-im

I think I really agree with one of the core points, which is that the AI agent is really really good at making the diff it proposes look good, especially to the person who asked for it. But in the perspective of the entire project, or from the perspective of someone who didn't ask for those changes, the code is unbelievably retarded. In some sense, AI slop is a scissor statement cutting between the person who asked for it (who the AI is trying to please) and everyone else in the world.

I've had a lot of success vibe coding extremely self contained components for a larger application, such as a static web page that wraps an already existing api endpoint that I don't need to add more features to - the endpoint is stable and done. But on the other hand when AI makes changes deep into the internals of business logic, the code is absolutely dogshit garbage.

My assumption is that any large-ish vibe coded application that's not a bunch of self-contained parts is going to be completely unmaintanable, halfway broken, and just all around awful.

Context

GeneralElephant phailyoor 4mo ago · Edited 4mo ago

You can have the LLM describe the business logic as it's developing and create unit tests to confirm the output is correct and pair that with manual testing as well.

I've created an application with fairly involved business logic using Claude Code and it works really nicely - it simply required using good 'ol fashioned SLDC aka Design, Code, Test, Document.

Is it an enterprise level app with the ability to scale to 10000 concurrent users and SOC2 Authentication and privacy standards? No.... but those types of big applications require multiple teams. If I gave four or five competent software engineers Claude and divided up the work I think you could rebuild a complex application in a lot shorter time.

The idea that you can't use LLMs on business logic is like saying you can't use a car to drive across the country - sure it was true in 1920 but today with the interstate it's easy. Right now it's 1930 or so... but it won't remain 1930 forever.

Context

phailyoor GeneralElephant 4mo ago

The LLM will cheat and write test that don't test what they claim to, bypass tests when they fail, write docs that make zero sense, write code that doesn't follow the docs etc.

To be fair humans will do all of these things too. But the code that LLMs write is mind numbingly retarded. If anyone with a modicum of coding experience took a look at your slop app, he would only see an awful, unmaintainable mess. And every AI change you throw at it only increases the debt.

As I said, AI slop is a scissor statement cutting you, the slopmeister, off from everyone else in the world.

Context

GeneralElephant phailyoor 4mo ago

The LLM will cheat and write test that don't test what they claim to, bypass tests when they fail, write docs that make zero sense, write code that doesn't follow the docs etc.

OK but that's why you do manual testing... also you can always ask other models to check on the work.

If anyone with a modicum of coding experience took a look at your slop app, he would only see an awful, unmaintainable mess.

IDK I shared the code with my friend who's a Staff Engineer and he was pretty impressed especially since I've been really diligent about implementing good practices like separation of concerns, test driven development, DRY, determining architecture and reviewing it before implementing etc.

I'm not a moron - I've been a technical PM for 10 years now working on enterprise software and I've had plenty experience reviewing my dev's PRs and doing my own bug investigations as well as evaluating engineering solutions.

This dismissive attitude is only going to get less accurate as time goes on and models improve.

Context

ulyssessword GeneralElephant 4mo ago

I've been really diligent about implementing good practices like...

I wonder how long it'll be before "implementing" those practices is as simple as writing a good initial prompt for the coding agent to follow. And how long after that that "do it well" (or nothing at all) would sufficient for it to follow those practices by default.

Remember that LLM capabilities will only improve over time (barring severe government action, at least). Also remember that GPT-3 was released in 2020: Getting all the low-hanging fruit (never mind all the incremental improvements) from a novel technology in six years would be a fantastic achievement, so I don't think we're anywhere close to done.

Context

Markass Not the worst 100ProofTollBooth 4mo ago

(Note: GDPR requirements in Europe are close to impossible to actually meet, so many b2b companies either don't sell to Europe or will only sell them access to their software hosted on U.S. servers. It is impossible to overstate how much of an own goal GDPR was for Europe's tech sector).

Europe's strategy seems to be to bring down the U.S. tech sector by attempting to impose more onerous regulations like GDPR on it. See, for example, the "Online Safety Act" and how the UK's Ofcom is unsuccessfully enforcing it by emailing threats to American companies, notably ones not under UK jurisdiction.

Context

Corvos Markass 4mo ago · Edited 4mo ago

No, the obvious answer is the true one here. Europe and the UK really really hate that the fundamental, society-altering technology that all of their citizens are using >5hrs a day is completely out of their control, as is the AI that they are hoping will become the new basis of their economy. And they are fundamentally incapable of conceiving that the answer might be less regulation rather than more. The closest American example is when America legislated the sale of TikTok (did that ever go through?).

I personally have mixed feelings about this. Having your public places under the control of another country is in some ways safer than having them under the control of your own country - broadly I like that Musk can tell Starmer to take a long walk off a short pier. But this cuts both ways, and I don't blame the various governments involved for being antsy around it.

Context

FirmWeird Randomly Generated Reddit Username Corvos 4mo ago

No, the obvious answer is the true one here. Europe and the UK really really hate that the fundamental, society-altering technology that all of their citizens are using >5hrs a day is completely out of their control, as is the AI that they are hoping will become the new basis of their economy. And they are fundamentally incapable of conceiving that the answer might be less regulation rather than more.

This is in no way the obvious answer. The actual reason Europe and the UK hate the US tech companies, especially recently, is that the first amendment allows for freedom of speech which European governments absolutely cannot abide - exposure of the scandals of the elite and what they are doing is anathema to the corrupt and honestly evil governments that they have in place (see the recent disclosures about Peter Mandelson). Less regulation would in no way achieve their goals of censoring speech and keeping their population ignorant, which is why they are simply trying to use their existing powers to shut down foreign sources of uncensored communication services.

The closest American example is when America legislated the sale of TikTok (did that ever go through?).

Yes, it did, and users are now abandoning the platform in droves due to the removal of pro-Palestinian content, the mandatory amplification of Trump/Zionist content and censorship which means private messages containing the word "Epstein" cannot be sent.

Context

HereAndGone2 FirmWeird 4mo ago

Gosh, I'm so glad all the Americans are explaining to us poor benighted Europeans how it is that we hate Mom and apple pie.

Here was me thinking GDPR, massive pain in the backside though it is, was to prevent data scraping and turning customers into commodities by selling every single jot and tittle of information you hand over to these companies.

Nope, it's because bald eagle screech as it flies overhead, Marine Corps march by, Star Spangled Banner flies proudly in the wind as 'America the Beautiful' is sung by the Tabernacle Choir we hate all the good things!

Context

FirmWeird Randomly Generated Reddit Username HereAndGone2 4mo ago

all the Americans

I'm not an American.

Here was me thinking GDPR, massive pain in the backside though it is, was to prevent data scraping and turning customers into commodities by selling every single jot and tittle of information you hand over to these companies.

Nope, it's because bald eagle screech as it flies overhead, Marine Corps march by, Star Spangled Banner flies proudly in the wind as 'America the Beautiful' is sung by the Tabernacle Choir we hate all the good things!

Did you even read my post? Your comments here have nothing to do with what I actually wrote, which was about controlling communication and speech. European elites, especially in the UK, have a deep vested interest in the restriction of speech and the control of their society through it. Are you aware of the D-notice system that's in use by the UK government, where the government can simply order media organisations not to report on certain subjects? More pertinent to the thread at hand, are you aware of Ofcom and their attempts to censor American websites to try and censor information that the UK government doesn't like?

Context

Skeletor HereAndGone2 4mo ago

Here was me thinking GDPR, massive pain in the backside though it is, was to prevent data scraping and turning customers into commodities by selling every single jot and tittle of information you hand over to these companies.

No one has ever explained how this actually hurts me or why I should really and truly give a shit. Meanwhile Europe is now a tech backwater with no real way to affect the world it's awash in other than to issue fines.

Context

Corvos Skeletor 4mo ago

The surveillance imposed on us today far exceeds that of the Soviet Union. For freedom and democracy’s sake, we need to eliminate most of it. There are so many ways to use data to hurt people that the only safe database is the one that was never collected. Thus, instead of the EU’s approach of mainly regulating how personal data may be used (in its General Data Protection Regulation or GDPR), I propose a law to stop systems from collecting personal data.

[...]

The EU’s GDPR regulations are well-meaning, but do not go very far. It will not deliver much privacy, because its rules are too lax. They permit collecting any data if it is somehow useful to the system, and it is easy to come up with a way to make any particular data useful for something.

The GDPR makes much of requiring users (in some cases) to give consent for the collection of their data, but that doesn’t do much good. System designers have become expert at manufacturing consent (to repurpose Noam Chomsky’s phrase). Most users consent to a site’s terms without reading them; a company that required users to trade their first-born child got consent from plenty of users. Then again, when a system is crucial for modern life, like buses and trains, users ignore the terms because refusal of consent is too painful to consider.

To restore privacy, we must stop surveillance before it even asks for consent.

Richard Stallman (you may have heard of him)

Context

Skeletor Corvos 4mo ago

Richard Stallman (you may have heard of him)

Did he ever write anything more relevant to the subject than this near-meaningless "freedom and democracy" vaguepost?

Context

The_Nybbler If you win the rat race you're still a rat. But you're also still a winner. Corvos 4mo ago

That's not Stallman explaining, that's Stallman pontificating. There are certainly modern problems with surveillance, but I think few of them have anything to do with data collection by companies on the Internet. Constant tracking of my physical location directly by the government (e.g. EZ-Pass, street cameras, license plate cameras, tire RFID readers if they're real, etc) or proxies (cell phone, private CCTV, etc) seems a lot more dangerous.

Context

Stefferi Chief Suomiposter HereAndGone2 4mo ago

You'd think that the whole Tea App debacle - Tea App not having been usable in its purpose in Europe due to being obviously wildly GDPR incompliant - would have shown that there are in fact reasons for GDPR other than just hobbling the US tech sector.

Context

ArjinFerman Tinfoil Gigachad Stefferi 4mo ago

You'd think that the EU making more money from fining American companies, than from taxing it's own tech sector, would have shown that it is about hobbling the US tech sector, and the Tea App debacle ia just a happy coincidence.

Context

MadMonzer Temporarily embarassed liberal elite FirmWeird 4mo ago

(see the recent disclosures about Peter Mandelson).

While there is some validity to the general point, the idea that Mandelson/Epstein is an example of a specifically European need for censorship to conceal elite depravity is silly - the decision to do the Epstein cover-up was taken in the US, and the British have promptly thrown every Epstein associate under the bus as soon as the Americans allowed their involvement to become public. The deroyalling of Andrew Mountbatten-Windsor and banning from the financial industry of Jes Staley remain the only meaningful punishments of Epstein clients, and Mandelson is now the subject of a criminal investigation. The reason why the UK police investigation into Mandelson only just started is because the Americans kept details of his wrongdoing secret in order to protect US elites who participated in his crimes - in the case of Mandelson particularly, Jamie Dimon, and in the case of the Epstein files more broadly Donald Trump.

90+% of what European authorities want to censor is either accurate information about the harm caused by immigration, or malicious lies exaggerating the harm caused by immigration. And, of course, the reason why free speech is an issue in the first place is the difficulty in distinguishing between the two.

Context

Corvos FirmWeird 4mo ago · Edited 4mo ago

That is the same thing that I said, in much more polemical language, but it's only part of the story. Yes, various European and non-American (Aussie, UK, Canada) governments are very upset that, from their perspective, unfortunate dirty laundry is being aired in public. Some of them surely have things they would like to hide, others rightly or wrongly believe that the country would be better off and less febrile if matters weren't presented in a maximally inflammatory way and optimised for engagement.

But there are also lots of other things that people are concerned about. They really don't like the effect that addictive Instagram and TikTok etc. are having on the ability of young people to concentrate or socialise, they don't like Grok in general and the nudifying features in particular, etc.

Ultimately, both voters and governments generally prefer for regulation to be possible, even if they decide not to do it. Having a big part of life subject to the whims of Washington and Silicon Valley rubs people the wrong way.

Context

FirmWeird Randomly Generated Reddit Username Corvos 4mo ago

I actually agree with you that a lot of people are concerned about the impacts of these apps and tech companies - I try to minimise their impact on my own life and my (as of yet hypothetical) children will never be given unsupervised access to this kind of tech. But the problem is that as someone who lives in one of those nations(Australia), I can see the actual impact and effects of the legislation - which is to do absolutely nothing to stop the pernicious effects of social media, while at the same time forcing anyone who wishes to comment or provide input to the online conversation to provide their face and/or government ID.

I do agree that there should be regulation targeting these apps and that ultimately it would be a good thing for that to happen - the problem is just that the implementation has consistently done nothing of the sort and only really makes sense if viewed from a conspiratorial lens. While it is possible that the government is just incompetent, I don't really trust that they would make mistakes that coincidentally give them the identity of anyone making comments that they don't like on social media, and especially after explicitly saying that they wanted to end anonymous online comments - which Anthony Albanese has actually done.

Context

Southkraut The rain fell gentlier. FirmWeird 4mo ago

We Euros love regulation and censorship both. It's not just one or the other.

Context

100ProofTollBooth Dumber than a man, but faster than a dog. Markass 4mo ago

Yeah, it is funny to see that underneath all that socialism and all that postmodern philosophical masturbation, Europe really still believes in feudalism and is furious that us new world peasants won't pay the King's Tax! Don't we know that they are our betters!

Context

Corvos 100ProofTollBooth 4mo ago · Edited 4mo ago

I don't know how to say this but you're the richest and most powerful people in the world. This kind of discussion always turns into a Bravery Debate but regulation like GDPR is more about clawing back some agency from America than it is trying to tax US industry.

As the Right discovered five years ago, and the Left discovered when Musk bought X, network effects and the overall stack just don't allow for 'make-your-own' social media.

(I don't actually like or agree with the vast majority of this regulation, though I think that GDPR specifically was a step in the right direction of forcing companies to give more than absolutely zero shits about the privacy of their customers).

Normally I wouldn't be quite so thin-skinned but the Greenland fiasco drove home for me just how worrying it is that half of the most powerful country in the world thinks of us as being essentially a pantomime villain from a Mel Gibson movie.

Context

Markass Not the worst Corvos 4mo ago

GDPR specifically was a step in the right direction of forcing companies to give more than absolutely zero shits about the privacy of their customers

"Nearly zero" is more than absolutely zero. GDPR doesn't do much to stop collection of personal data because nearly everything is allowed if it can be useful. Meanwhile, the costs of failing compliance is steep enough that most tech companies go to the US where they don't have to worry about it.

It's also hard for me to take Europe's claims to privacy protection seriously when many of their countries force you to dox yourself to register a SIM card. If they actually cared about privacy, then they would just not collect personal information, which is unnecessary for a phone number. Ironically enough, the GDPR-free United States does not have any such laws compelling self-doxing for SIM cards.

Context

aqouta Corvos 4mo ago

I think that GDPR specifically was a step in the right direction of forcing companies to give more than absolutely zero shits about the privacy of their customers

People will say stuff like this but GDPR is actually a gigantic pain in the ass to everyone involved because it means every single database the holds any data about anything has to be manually cleared by engineers to not happen to obliquely contain data the could be viewed as slightly about europe. I'm going to have to get on early morning calls for the next six months to get our US facing entirely internal application dealing in US tax credits cleared because of this stupid law. All while the fly by night company registered in kekistan that will actually do malicious stuff with your data just ignores the law and all the apps that were collecting it on purpose before put up a cookie that 99.98% of people accept immediately with minor annoyance. The legislators behind this should be tried at the Hague for pissing away thousands or millions of lifetimes worth of dev hours for their pure hubris.

Context

Skeletor Corvos 4mo ago

That's cool, but your tech sector is a joke and since we're not trying to hand the entire internet over to China we'd appreciate it if you'd stop getting your anti-tech cooties on us. Sorry if casual access to accurate rape stats is destabilizing your society or whatever.

Context

Corvos Skeletor 4mo ago

It wasn't China who gave us trans, BLM, 'hands up don't shoot' in a country with no guns, Free Palestine, and woke. That was you guys. Thanks :)

Context

Skeletor Corvos 4mo ago

Look at it this way, the alternative to being caught in our wake was to do anything yourselves, and at least you didn't get suckered into that scam.

Context

The_Nybbler If you win the rat race you're still a rat. But you're also still a winner. Corvos 4mo ago

I'm pretty sure we're not responsible for trans (Sweden) or "Free Palestine" (Arabs)

Context

ThenElection 100ProofTollBooth 4mo ago

us new world peasants won't pay the King's Tax

We do, though. In 2024, US tech companies paid more in fines alone (€3.8 billion) than the income tax revenue of the entire European tech sector (€3.2 billion).

https://atr.org/brussels-exploits-american-tech-companies-by-enforcing-heavy-fines-for-regulatory-non-compliance/

Context

self_made_human amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi 100ProofTollBooth 4mo ago · Edited 4mo ago

Disclaimer: I am not a programmer, though I keep myself broadly aware of trends. I've only used LLMs for coding for toy problems or personal tooling (AutoHotkey macros, Rimworld mods, a mortar calculator, automating discharge paperwork at my dad's hospital)*. I've noted that they've been excellent at explaining production code to even a dilettante like me, which is by itself immensely useful. And for everything else, I'm so used to the utility they provide me personally that I can't imagine going back.

They being said, I am not in a position to judge the economic utility a professional programmer derives from using it for their day job, though it's abundantly clear that the demand for tokens is enormous, and that the capability of SOTA LLMs is a moving target, getting better every day on both benchmarks and actual projects. And look, I understand there's a position where you say "sure, but these things still aren't actually good" - but if you're claiming they haven't gotten better, then I'm going to gently suggest you might want to check yourself for early-onset dementia. The jump from GPT-3 barely coding a working React toy-example to current models is the kind of improvement curve that should at minimum make you sit up and notice.

In other words, even if you think they're not good enough today, you should at the very least notice that a large and ever-increasing fraction of US GDP is being invested in making them better, with consistent improvements.

However, here's a tweet from Andrej Karpathy which I will reproduce in full:

@karpathy A few random notes from claude coding quite a bit last few weeks.

Coding workflow. Given the latest lift in LLM coding capability, like many others I rapidly went from about 80% manual+autocomplete coding and 20% agents in November to 80% agent coding and 20% edits+touchups in December. i.e. I really am mostly programming in English now, a bit sheepishly telling the LLM what code to write... in words. It hurts the ego a bit but the power to operate over software in large "code actions" is just too net useful, especially once you adapt to it, configure it, learn to use it, and wrap your head around what it can and cannot do. This is easily the biggest change to my basic coding workflow in ~2 decades of programming and it happened over the course of a few weeks. I'd expect something similar to be happening to well into double digit percent of engineers out there, while the awareness of it in the general population feels well into low single digit percent.

IDEs/agent swarms/fallability. Both the "no need for IDE anymore" hype and the "agent swarm" hype is imo too much for right now. The models definitely still make mistakes and if you have any code you actually care about I would watch them like a hawk, in a nice large IDE on the side. The mistakes have changed a lot - they are not simple syntax errors anymore, they are subtle conceptual errors that a slightly sloppy, hasty junior dev might do. The most common category is that the models make wrong assumptions on your behalf and just run along with them without checking. They also don't manage their confusion, they don't seek clarifications, they don't surface inconsistencies, they don't present tradeoffs, they don't push back when they should, and they are still a little too sycophantic. Things get better in plan mode, but there is some need for a lightweight inline plan mode. They also really like to overcomplicate code and APIs, they bloat abstractions, they don't clean up dead code after themselves, etc. They will implement an inefficient, bloated, brittle construction over 1000 lines of code and it's up to you to be like "umm couldn't you just do this instead?" and they will be like "of course!" and immediately cut it down to 100 lines. They still sometimes change/remove comments and code they don't like or don't sufficiently understand as side effects, even if it is orthogonal to the task at hand. All of this happens despite a few simple attempts to fix it via instructions in CLAUDE . md. Despite all these issues, it is still a net huge improvement and it's very difficult to imagine going back to manual coding. TLDR everyone has their developing flow, my current is a small few CC sessions on the left in ghostty windows/tabs and an IDE on the right for viewing the code + manual edits.

Tenacity. It's so interesting to watch an agent relentlessly work at something. They never get tired, they never get demoralized, they just keep going and trying things where a person would have given up long ago to fight another day. It's a "feel the AGI" moment to watch it struggle with something for a long time just to come out victorious 30 minutes later. You realize that stamina is a core bottleneck to work and that with LLMs in hand it has been dramatically increased.

Speedups. It's not clear how to measure the "speedup" of LLM assistance. Certainly I feel net way faster at what I was going to do, but the main effect is that I do a lot more than I was going to do because 1) I can code up all kinds of things that just wouldn't have been worth coding before and 2) I can approach code that I couldn't work on before because of knowledge/skill issue. So certainly it's speedup, but it's possibly a lot more an expansion.

Leverage. LLMs are exceptionally good at looping until they meet specific goals and this is where most of the "feel the AGI" magic is to be found. Don't tell it what to do, give it success criteria and watch it go. Get it to write tests first and then pass them. Put it in the loop with a browser MCP. Write the naive algorithm that is very likely correct first, then ask it to optimize it while preserving correctness. Change your approach from imperative to declarative to get the agents looping longer and gain leverage.

Fun. I didn't anticipate that with agents programming feels more fun because a lot of the fill in the blanks drudgery is removed and what remains is the creative part. I also feel less blocked/stuck (which is not fun) and I experience a lot more courage because there's almost always a way to work hand in hand with it to make some positive progress. I have seen the opposite sentiment from other people too; LLM coding will split up engineers based on those who primarily liked coding and those who primarily liked building.

Atrophy. I've already noticed that I am slowly starting to atrophy my ability to write code manually. Generation (writing code) and discrimination (reading code) are different capabilities in the brain. Largely due to all the little mostly syntactic details involved in programming, you can review code just fine even if you struggle to write it.

Slopacolypse. I am bracing for 2026 as the year of the slopacolypse across all of github, substack, arxiv, X/instagram, and generally all digital media. We're also going to see a lot more AI hype productivity theater (is that even possible?), on the side of actual, real improvements.

Questions. A few of the questions on my mind:

What happens to the "10X engineer" - the ratio of productivity between the mean and the max engineer? It's quite possible that this grows a lot.
Armed with LLMs, do generalists increasingly outperform specialists? LLMs are a lot better at fill in the blanks (the micro) than grand strategy (the macro).
What does LLM coding feel like in the future? Is it like playing StarCraft? Playing Factorio? Playing music?
How much of society is bottlenecked by digital knowledge work?

TLDR Where does this leave us? LLM agent capabilities (Claude & Codex especially) have crossed some kind of threshold of coherence around December 2025 and caused a phase shift in software engineering and closely related. The intelligence part suddenly feels quite a bit ahead of all the rest of it - integrations (tools, knowledge), the necessity for new organizational workflows, processes, diffusion more generally. 2026 is going to be a high energy year as the industry metabolizes the new capability.

*Sadly they can't make the Rimworld mods I want. This is a combination of a skill-issue on my part (people have successfully made rather large and performant mods with AI), and because I wanted something niche as hell, in the form of compatibility with a very large overhaul mod called Combat Extended. Hey, at least Nano Banana Pro made the art assets with minimal human input, if you think my coding skills are trash, wait till you see my art.

-1

Context

birb_cromble self_made_human 4mo ago

I can approach code that I couldn't work on before because of knowledge/skill issue.

Slopacolypse. I am bracing for 2026 as the year of the slopacolypse across all of github, substack, arxiv, X/instagram, and generally all digital media.

I find it peculiar that Karpathy doesn't see a relationship between those two things. I've noticed a trend where the most glowing reviews of AI capabilities seem to be for people who are using it in areas where they, themselves, do not have enough skill to confidently perform the task. At its worst, it's a sort of tool-assisted Dunning Kruger effect that's actually breathtaking if you can decouple enough to look at it in the abstract.

"I clearly couldn't do this thing but I can clearly tell that Grok/Claude/ChatGPT/Gemini did it right" is a hell of a thing. It's already causing real stress on public software security databases. There's a continuous trend that looks something like this:

"I ran $LIBRARY through Claude and it says there was a potential denial of service attack. I asked Claude for a mitigation and it provided this code."

"Can you explain this piece of the code?"

"I asked Claude to explain this piece of code and it said the following."

Other than feeling like a scene from Office Space, it's effectively acting as a denial of service attack on its own. The amount of time necessary to submit something like that without a deep understanding of the problem is considerably lower than the time necessary for a genuine SME to comb through it and judge it on merits.

Hell, if I were a malicious actor I'd probably try to exploit that by shit-flooding the system to buy myself more time.

Context

self_made_human amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi birb_cromble 4mo ago

I find it peculiar that Karpathy doesn't see a relationship between those two things.

Hmm? That's not my takeaway from the tweet (xeet?). He's not denying a connection between AI capabilities and code quality decline, he's making a more subtle point about skill distribution.

The basic model goes like this: AI tools multiply your output at every skill level. Give a novice programmer access to ChatGPT, Claude or Copilot (maybe not Copilot, lol) , and they go from "can't write working code" to "can produce something that technically runs." Give an expert like Karpathy the same tools, and he goes from "excellent" to "somewhat more excellent." The multiplicative factor might even be similar! But that's the rub, there are way more novices than experts.

So you get a flood of new code. Most of it is mediocre-to-bad, because most people are mediocre-to-bad at programming, and multiplying "bad" by even a generous factor still gives you "not great." The experts are also producing more, and their output is better, but nobody writes news articles about the twentieth high-quality library that quietly does its job. We only notice when things break.

This maps onto basically every domain. Take medicine as a test case (yay, the one domain where I'm a quasi-expert) Any random person can feed their lab results into ChatGPT and get something interpretable back. This is genuinely useful! Going from "incomprehensible numbers" to "your kidneys are probably fine but your cholesterol needs work" is a huge upgrade for the average patient. They might miss nuances or accept hallucinated explanations, but they're still better off than before.

Meanwhile, as someone who actually understands medicine, I can extract even more value. I can write better prompts, catch inconsistencies, verify citations, and integrate the AI's suggestions into a broader clinical picture. The AI makes me more productive, but I was already productive, so the absolute gains are smaller relative to my baseline. And critically, I'm less likely to get fooled by confident-sounding nonsense (it's rare but happens at above negligible rates).

This is where I tentatively endorse a "skill issue" framing, where everyone's output getting multiplied, but bad times a multiplier is still usually bad, and there are simply more bad actors (in the neutral sense) than good ones. The denominator in "slop per good output" has gotten larger, but so has the numerator, and the numerator was already bigger to start with. From inside the system, if you're Andrej Karpathy, you mostly notice that you're faster. From outside, you notice that GitHub is full of garbage and the latest Windows update broke your system.

This isn't even a new pattern. Every productivity tool follows similar dynamics. When word processors became common, suddenly everyone could produce professional-looking documents. Did the average quality of written work improve? Well, the floor certainly rose (less illegible handwriting, if I continue to accurately insult my colleagues), but we also got an explosion of mediocre memos and reports that previously wouldn't have been written at all. The ceiling barely budged because good writers were already good. I get more use out of an LLM for writing advice than, say, Scott.

Context

jkf 100ProofTollBooth 4mo ago

If it's all hype, it is the mother of all hype cycles and something that approaches a mass movement of hysteria. This would be outright falsehoods and lying on a level usually reserved for North Korean heads of state and Subsaharan cult leaders.

Or, like -- every western government except maybe Sweden, 4 years ago?

I'm not really kidding, but to engage with the meat of your argument -- translating natural language documentation to machine code is literally what programming is, and always has been.

If you have perfect documentation, the coding is trivial; so if LLMs can add another layer to this and become essentially a somewhat easier/more efficient programming language, that's great -- but it doesn't so far seem like they are particularly good at generating that documentation based on (complex, real-life) non-technical enduser requirements for broad problems. Which has been the Hard Problem of Programming at least since Fred Brooks.

If a programmer can say to an LLM "hey build me a Salesforce clone based on such-and-such requirements" and make it happen, that is a pretty big efficiency gain, but not really AI. Which would be a pointy-haired boss saying "hey build me this thing I thought of that doesn't currently exist, but is Salesforce scale" and making it happen; this would be kind of scary.

Context

birb_cromble 100ProofTollBooth 4mo ago

This feels like it's a less shitposty and thoroughly expanded version of my "Uber for artisanal cheeses, but on the blockchain" theory that I had.

Our flagship application has seen continuous development since the mid to late 2000s, and it's loosely based on a codebase and product that is considerably older than that. While it has CRUD elements (any application that functions as a long-running service must), it has some fairly extensive components that actually do things with that data in terms of business automation. Those are the areas where all the existing LLM solutions tend to fall apart. Given that they're statistical engines, going farther from CRUD is a very bad thing.

Bravo, Birb! I mean this sincerely. Phrased differently, Birb is saying that once his team provided extra-context documentation, the LLM was performant. However, by doing so, his team pretty much arrived at a state where the fix was obvious and easy.

I'm not sure if I can fully buy into this. It wasn't that we were surfacing implicit context, so much as writing it for a very enthusiastic intern developer with absolutely no sense of self preservation. If we didn't break tasks down to an absurd level of guardrails and hand-holding, it would try to make enormous, system wide changes without any kind of midpoint validation. Sometimes we'd see the reasoning say things like "I have made a large number of changes. I should run unit tests to verify that I am correct", and then it just... wouldn't do it. Any of the server developers could have finished the full task in the time it took us to make the tickets that allowed the LLM to do the job without going off the rails.

Context

100ProofTollBooth Dumber than a man, but faster than a dog. birb_cromble 4mo ago

If we didn't break tasks down to an absurd level of guardrails and hand-holding, it would try to make enormous, system wide changes without any kind of midpoint validation.

Yep, I've seen this too. I have to ask, where you using any of the terminal based tools for code development (i.e. Claude Code). I know you said you were using Gemini, so I am doubting it was actually Claude Code (although you can run Gemini within CC).

There is a lot of guardrailing and handholding built into to these tools. If I pass a full system design doc to Claude Code and explicitly instruct it to do TDD with unit tests etc., it will.

It wasn't that we were surfacing implicit context, so much as writing it for a very enthusiastic intern developer with absolutely no sense of self preservation.

LLMs aren't beings, people, or minds. If you think of it as having intention and character flaws, you're going to get frustrated quickly. If you think of it is a very imperfect and probabilistic tool that outputs into non-deterministic solution spaces, you'll get less frustrated and probably think differently on how you prompt it.

I am an unrepentant AI bull. I'll admit that and let people judge whatever I write with that bias in mind. I only request the same from the bears. When I see sentiment like this, which literally chastises a matrix of numbers, I have to assume a non-neutral bias.

Context

confidentcrescent 100ProofTollBooth 4mo ago

LLMs aren't beings, people, or minds

When I see sentiment like this, which literally chastises a matrix of numbers, I have to assume a non-neutral bias

Most, if not all, of the prominent companies in this space call their products "artificial intelligence" and advertise users treating them like people. They refer to them thinking, having skills, and doing things.

It is extremely frustrating to see an accusation that the above poster has an anti-AI bias for treating LLMs as advertised by many of the companies selling them.

A quick browse through the marketing materials of these companies will turn up many examples, like:

Context

100ProofTollBooth Dumber than a man, but faster than a dog. confidentcrescent 4mo ago

Yeah I'm not going to base my evaluation of a product on the marketing materials.

Also, again, they're not minds. There are hundreds of high quality write ups of how transformer architectures work.

Context

confidentcrescent 100ProofTollBooth 4mo ago

I will clarify my previous comment. I would like you to explain why expressing the same opinion as multiple large AI companies indicates a bias against AI.

Context

self_made_human amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi 100ProofTollBooth 4mo ago

LLMs aren't beings, people, or minds. If you think of it as having intention and character flaws, you're going to get frustrated quickly.

I disagree with you here.

Setting aside the deep philosophical questions about personhood (which threaten to derail any productive discussion), I claim that LLMs are minds - albeit minds that are simultaneously startlingly human and deeply alien. Or at minimum, they can be usefully modeled as minds, which for practical purposes amounts to the same thing. (I should note: this position doesn't commit me to "AI welfare" concerns, or to thinking LLMs deserve legal rights or protections, or to losing sleep over potential machine suffering. You can believe something is a mind without believing it has moral weight. I do, I'm an unabashed transhumanist chauvinist.)

More importantly, I think there's nothing wrong at all with modeling them as having "intention or character flaws." if you use a variety of models on a regular basis, like I do, I think that becomes quite clear.

They have distinct personalities and flavors. o3 was a bright autist with a tendency to go into ADHD hyperfocus that I found charming. GPT-4o was a sycophantic retard. 5 Thinking is o3 with the edges sanded down. Claude Sonnets are personable and pleasant, being one of the few models that I very occasionally talk to for the sake of it. Gemini 2.5 Pro was clinically depressed, 3 Pro is a high-functioning paranoid schizophrenic who thinks anything that happens after 2025 is a simulation. Kimi K2 was @DaseindustriesLtd 's best friend, which I noted even before he sang its praises, being one of the weirdest models out there, being ridiculously prone to hallucinations while still being sharp and writing in a distinctly non-mode-collapsed style that makes other models seem lobotomized by comparison. If I close my eyes, I can easily see it as a depressed vodka swilling Russian intellectual, despite being of Chinese origin.

If these aren't character flaws, I don't know what is. Obviously they're not human, but they have traits that are well-described by terms that are cross-applicable to us. They're good at different things, Claude and Kimi (and sometimes Gemini) write at a level that makes the others seem broken. That being said, almost every model these days is good enough at a wide-spectrum of tasks. Hyperfocusing on benchmarks is increasingly unnecessary. Though I suppose, if you've got a bunch of Erdos problems to solve, GPT 5.2 Thinking at maximum reasoning effort is your go to.

Context

phailyoor self_made_human 4mo ago

nobody ever has any love for my best friend GPT-4.1

Context

self_made_human amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi phailyoor 4mo ago

Hey, I'm fond of it, and I'll miss it when the imminent deprecation hits. I literally never used it for coding, but I found that it was excellent at rewriting text in arbitrary styles, better than any SOTA model at the time, and still better than many. Think "show me what this scifi story would be like if it was written by Peter Watts".

I have no idea why a trimmed down coding-focused LLM was so damn good at the job, but it was. RIP to a real one.

-1

Context

100ProofTollBooth Dumber than a man, but faster than a dog. self_made_human 4mo ago

If these aren't character flaws, I don't know what is.

They're model weights. <-- This is a link.

That's literally, exactly, precisely what they are.

You can map your own preferred anthropomorphized traits to them all you want, but that's, at best, a metaphor or something. This is the same as when people say their car has a "personality." It's kind of fun, I'll grant you, but it's also plainly inaccurate.

They're good at different things

This is correct. But it is correct because of training data, superparameters, and a whole host of very well defined ML concepts. It's not because of ... personalities.

Context

DaseindustriesLtd late version of a small language model 100ProofTollBooth 4mo ago

That's literally, exactly, precisely what they are.

So what?

@self_made_human proceeds to generate a lot of prose, but all he really needed to do was press for some substantiation of this argument. «Weights» is a word. What LLMs really are is information. Why exactly is this specific mode of information incompatible with having high-level properties like «personality flaws»? You accuse him of incoherence in the inane tiger side debate, but «models are weights, ergo anthropomorphized traits don't apply except as a loose metaphor» is basically schizophrenic in my book. What's the actual claim here? That anthropomorphic properties are substrate-dependent, that functionalism is wrong? Just say so instead of snarking and appealing to incredulity. Ideally with some defense for this opinion.

Context

100ProofTollBooth Dumber than a man, but faster than a dog. DaseindustriesLtd 4mo ago

What's the actual claim here?

That "AI", more specifically, LLMs, shouldn't be thought of as minds or cognitively aware "beings" or any other such "conceptions" because we know exactly, precisely, specifically what they are.

I don't understand why this is so hard to understand.

Again, let's use a toy analogy. You see a house and say "That house is really a landscape for a family to build dreams. It's a compassion and bonding machine" Well, that's fine if it works for you, but what the house really is is a house. It's made of lumber, sheetrock, shingles, and various bits of metal and plastic. I have no problem with you dressing it up with whatever emotive map you like. But it's just a house. These other responses seem to be arguing that the basic definition of "house" should be discarded in favor of these highly subjective mappings.

Context

DaseindustriesLtd late version of a small language model 100ProofTollBooth 4mo ago

I don't understand why this is so hard to understand.

Because it's either a non sequitur or a completely bizarre theory of cognitive awareness.

LLMs, shouldn't be thought of as minds or cognitively aware "beings" or any other such "conceptions" because we know exactly, precisely, specifically what they are.

In other words, only things for which we do not have this exact, precise, specific understanding can be minds or cognitively aware beings? So cognitive awareness intrinsic to X is conditional on our ignorance of the nation of X? Or a mind is inherently not-knowable? Or what?

I repeat, what's your actual argument here? I gave you some options.

You see a house and say "That house is really a landscape for a family to build dreams. It's a compassion and bonding machine" Well, that's fine if it works for you, but what the house really is is a house

This condescension is not helping. You are apparently vastly overestimating the quality of your ontology and epistemology. I hope you realize how frankly childish it is, using my helpful examples. A house is a house rather than a landscape not because we can precisely define a house, but because we can precisely define both a house and a landscape – or at least train an LLM to investigate embedding similarity – and see how the definitions do not intersect, and so applying the token "house" to a "landscape" or vice versa is purely metaphorical speech. We have a definition of an LLM. Do you have a rigorous definition of a mind that excludes LLMs on principled grounds?

Context

self_made_human amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi 100ProofTollBooth 4mo ago · Edited 4mo ago

They're model weights.

They're model weights, and we're collections of atoms: bags of meat and miscellaneous chemicals. Both statements are technically correct. And yet... a tiger being made out of atoms doesn't make it any less capable of killing you. The problem with pure reductionism is that ~~it throws out exactly the information you need to make predictions at the level you actually care about~~ can be a cognitively and computationally intractable approach, even if it's more "technically correct". Too much of it can be as bad as too little.

All models are false, some models are useful. That's a rationalist saw, but for good reason. What actually matters is whether a model constraints expectations, in other words, is it useful?

Gemini 2.5 Pro doesn't meet the DSM-5 or ICD-11 criteria for clinical depression. After all, it's hard for a model to demonstrate insomnia or reduced appetite. Yet the odd behaviors it regularly demonstrated are usefully described by that label.

If my friend let me drive his Lambo, and told me "be careful, she's fierce!", I'm going to drive more carefully than I would in a Fiat Pinto. That is still, to some degree, useful, but I think it's clear that anthromorphic analogies are more useful for LLMs, because they have more in common with us behavior-wise than any car (unless you're running Grok on your Tesla). They process language, they exhibit something that looks like reasoning, they have distinctive response patterns that persist across contexts.

But it is correct because of training data, superparameters, and a whole host of very well defined ML concepts. It's not because of ... personalities.

This is true in the same way that human behavior is fully determined by neurotransmitter levels, synaptic weights, and neurological processes. But just as you can't predict whether someone will enjoy a particular movie by examining their brain with an electron microscope or a QCD-sim, you can't accurately predict an LLM's macroscopic behavior by staring at its training corpus and hyperparameters. No human can.

Nobody at Google intended for Gemini 2.5 Pro to be "neurotic" and "depressed" or to devolve into a spiral of self-flagellation when it fails at a task, nobody wanted Kimi to hallucinate as regularly as it does. These were emergent, macroscopic properties, there's no equivalent of a statistical scaling-law that lets you accurately predict log-loss for a given number of tokens in a corpus and a compute budget.

Training models is still as much an art as it is a science, particularly the post-training and personality tuning phrases (as explicitly done by Anthropic). You test your hypothesis iteratively, and adjust the dials as you go.

Anthropomorphism is a cognitive strategy. Like all cognitive strategies, it can be deployed appropriately or inappropriately. The question is not "is anthropomorphism ever valid?" but rather "when does anthropomorphic modeling produce accurate predictions?"

I maintain that, if applied judiciously, as I take pains to do, it's better than the alternative.

Context

100ProofTollBooth Dumber than a man, but faster than a dog. self_made_human 4mo ago

Your response is incoherent throughout.

Right from the jump;

And yet... a tiger being made out of atoms doesn't make it any less capable of killing you.

As opposed to what? A tiger not made out of atoms? This isn't even strawman, it's just a weird thing to say presented as an argument.

You complete lost me here;

All models are false, some models are useful. That's a rationalist saw, but for good reason. What actually matters is whether a model constraints expectations, in other words, is it useful?

Regarding;

They process language, they exhibit something that looks like reasoning, they have distinctive response patterns that persist across contexts.

That something looks like, sounds like, and walks like a duck doesn't always make it a duck. For example, is Donald Duck a duck?. Well, we can yes and know that he's a representation of a conception of a duck with human like personality mapped onto him (see where I'm going ...) but it doesn't make him a duck made out of atoms - which seems to be, like, important or something.

-1

Context

self_made_human amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi 100ProofTollBooth 4mo ago

As opposed to what? A tiger not made out of atoms?

We've only known that tigers are made out of atoms for a few hundred years. That is a fact of interest to biologists, I'm sure, but everyone else was and is well-served by a higher level description such as "angry yellow ball of fur that would love to eat me if it could." The point is that that the more reductionist framework doesn't obviate higher-level models. They are complimentary. Both models are useful, and differentially useful in practice.

(The tiger could be made out of 11-dimensional strings, or it, like us, could be instantiated in some kind of supercomputer simulating our universe, as opposed to atoms. This makes very little difference when the question is running zoos or how to behave when you see one lurking in your driveway.)

You think that LLMs being a "bunch of weights" makes ascribing a personality to them somehow incorrect. I don't see how that's the case, any more than someone arguing that humans (or tigers) being made of atoms precludes us from being conscious, being minds or having personalities, even if we don't know how those properties rise from atoms.

That something looks like, sounds like, and walks like a duck doesn't always make it a duck. For example, is Donald Duck a duck?. Well, we can yes and know that he's a representation of a conception of a duck with human like personality mapped onto him (see where I'm going ...) but it doesn't make him a duck made out of atoms - which seems to be, like, important or something.

He's not Donald Goose, is he? Jokes aside, I'm not sure what the issue is here. Donald Duck on your TV is a collection of pixels, but his behavior can be better described by "short-tempered anthropomorphic duck with an exhibitionist streak" (he doesn't wear pants, probably OK principle).

If you sit down to play chess against Stockfish, you can say "this is just a matrix of evaluation functions and search trees." You would be correct. But if you actually want to win, you have to model it as a Grandmaster-level opponent. You have to ascribe it "intent" (it wants to capture my queen) and "foresight" (it is setting a trap), or you will lose.

My point is basically endorsing Daniel Dennett's "Intentionalist" stance. Quoting the relevant Wikipedia article:

The core idea is that, when understanding, explaining, and/or predicting the behavior of an object, we can choose to view it at varying levels of abstraction. The more concrete the level, the more accurate in principle our predictions are; the more abstract, the greater the computational power we gain by zooming out and skipping over the irrelevant details.

The most concrete is the physical stance, the domain of physics and chemistry, which makes predictions from knowledge of the physical constitution of the system and the physical laws that govern its operation; and thus, given a particular set of physical laws and initial conditions, and a particular configuration, a specific future state is predicted (this could also be called the "structure stance").[15] At this level, we are concerned with such things as mass, energy, velocity, and chemical composition. When we predict where a ball is going to land based on its current trajectory, we are taking the physical stance. Another example of this stance comes when we look at a strip made up of two types of metal bonded together and predict how it will bend as the temperature changes, based on the physical properties of the two metals.

Most abstract is the intentional stance, the domain of software and minds, which requires no knowledge of either structure or design,[17] and "[clarifies] the logic of mentalistic explanations of behaviour, their predictive power, and their relation to other forms of explanation" (Bolton & Hill, 1996, p. 24). Predictions are made on the basis of explanations expressed in terms of meaningful mental states; and, given the task of predicting or explaining the behaviour of a specific agent (a person, animal, corporation, artifact, nation, etc.), it is implicitly assumed that the agent will always act on the basis of its beliefs and desires in order to get precisely what it wants (this could also be called the "folk psychology stance").[18] At this level, we are concerned with such things as belief, thinking and intent. When we predict that the bird will fly away because it knows the cat is coming and is afraid of getting eaten, we are taking the intentional stance. Another example would be when we predict that Mary will leave the theater and drive to the restaurant because she sees that the movie is over and is hungry.

As I've taken pains to explain, conceptualizing LLMs as a bunch of weights is correct, but not helpful in many contexts. Calling them "minds" or ascribing them personalities is simply another model of them, and one that's definitely more tractable for the end-user, and also useful to actual AI researchers and engineers, even if they're using both models.

Note that this conversation started off with a discussion about using LLMs for coding purposes. That is the level of abstraction that's relevant to the debate, and there noting the macroscopic properties I'm describing is more useful, or at least adds useful cognitive compression and produces better models than calling them a collection of weights.

I will even grant the main failure mode: anthropomorphism becomes actively harmful when it causes you to infer hidden integrity, stable goals, or moral patience, and then you stop doing the boring engineering checks. But that is an argument for using the heuristic carefully, not an argument that the heuristic is incoherent. As far as I'm aware, I don't make that kind of mistake.

Context

clo self_made_human 4mo ago

We are hilariously close to something like @self_made_human's razor: the difference between a list of model weights and a thinking, sentient AI that can perform minor miracles is irrelevant.

Or, more darkly, if ~~they can both kill us all~~ outputs are similar, what difference does it make?

Context

More comments

100ProofTollBooth Dumber than a man, but faster than a dog. self_made_human 4mo ago

But if you actually want to win, you have to model it as a Grandmaster-level opponent.

No, I don't. I can just think about the best move to play given the conditions on the board and my own knowledge of chess. In fact, I'd believe that is what most chess players do. If you get into the mindset of "Okay, I have to model Magnus' mental model of the chessboard so that I can preemptively counter him" you're playing against an incomplete set of data built on a lot of assumptions. It's classic autist overthinking when the real data is the board in front of you.

Daniel Dennett

Miss me with that new atheist bullshit. This a guy who would trust The Science (TM) because of its rationality and empircism. You know, two philosophical stances that have no holes in them whatsoever.

From your quote of him;

the domain of software and minds, which requires no knowledge of either structure or design

Lol, what. Why do you think there's a bias towards open source or reviewing source especially in security communities? You want to know the structure and design of software to ensure it's performing as expected and safely. The various "neuro" fields (neuropsych, neurobiology, neurochemistry) are all about doing the best we can to understand the incredibly complex structure of the brain and, from it, how "mind" might emerge. Dennett comes along and hand waves it all away - "not necessary!".

As I've taken pains to explain, conceptualizing LLMs as a bunch of weights is correct

It's not conceptualization, it's definition. That's what they are. This is like saying "you can conceptualize a pair of dice as plastic cubes, but, really, they're living, breathing probability gremlins."

Context

WandererintheWilderness self_made_human 4mo ago · Edited 4mo ago

He's not Donald Goose, is he?

Neither here nor there, but I vividly remember one of the 90s TV cartoons having a whole episode B-plot about Donald having a dark family secret he was trying to keep buried, and it turned out to be that he's actually a goose. Not Ducktales, Donald was hardly in that. Maybe Quack Pack? House of Mouse? One of those.

Context

ControlsFreak self_made_human 4mo ago · Edited 4mo ago

If you sit down to play chess against Stockfish, you can say "this is just a matrix of evaluation functions and search trees." You would be correct. But if you actually want to win, you have to model it as a Grandmaster-level opponent. You have to ascribe it "intent" (it wants to capture my queen) and "foresight" (it is setting a trap), or you will lose.

No. When top GMs talk about how they play against computers, they clearly treat it in a significantly different way than how they treat humans. They know what kind of things are included in the evaluation function, like the 'contempt' factor, that can cause it to sometimes behave in non-human ways. They know that it is a perfect calculator (or at least as perfect as it's set to be, so often they're trying to probe how it's set to be), and that colors the way they think about positions and how they choose to spend their own time calculating.

One might occasionally anthropomorphize in terms of "it wants to capture my queen", just because that's easy to do, since one is so used to talking about human opponents in that way. But this is done even when one is not playing against any entity, human or silicon. Take, for example, the process of solving a puzzle. This is just purely a practice exercise. There is no human, no evaluation function or search tree, no model weights (many modern engines also use NNs) actually sitting on the other side of the board making actual moves against you. Sometimes, those puzzles are from actual games, so you can at least see what one other human thought. Sometimes, they have annotations for other lines, so you can see additional thoughts from other humans. Sometimes, they're computer checked (or you check it yourself), so you can see what compy "thinks" (computes). But fundamentally, you're just thinking game-theoretically, which requires you to think about two different (opposed) value functions. Some 'puzzles' aren't even puzzles; they're just evaluation exercises. "Here's a position, what do you think about it?" There's no actual entity on either side. But imprecisely thinking, "What does black 'want' here," "What does white 'want' there," is almost universally helpful, if not mandatory, just to keep in our mind the tension between differing payoff functions and how they interact.

I've done a fair amount of game theory, and it's natural to anthropomorphize purely abstract payoff functions, no model weights or neurons or anything required. When I'm working with new students, it takes work to get them to be able to reason about them, so it's an extremely helpful crutch to regularly poke them with, "...and suppose that player did what you're proposing; now, imagine you're on the other side; how would you respond?" And so, you just sort of get used to imagining a human-like (or for many of my purposes, a human augmented with computational resources) entity on each side, actually thinking in a self-interested way.

But back to GMs playing computers. They've been thinking this way for decades. Sometimes with actual humans on the other side, sometimes just a puzzle, whatever. They've honed the skill of rapidly thinking right past the step of, "What would I do if I were on the other side at that particular moment?" And these days, top GMs are pretty comfortable distinguishing between the different ways that engines "think" about positions. Watch a few of Hikaru's many many videos where he plays against a bunch of different bots. He very clearly understands that they're evaluation functions and search trees, and different combinations of evaluation functions and search trees of varying lengths have different strengths and weaknesses. He still regularly plays variations of 'anti-computer chess' where he's 100% banking on there being a significant difference between modeling it like a particular evaluation function with a particular set of search tree parameters (potentially also with a particular opening book/endgame tablebase) and modeling it like a GM-level human opponent.

Context

YoungAchamian We walk conditioned ground and name our folly civilization. self_made_human 4mo ago

They're model weights, and we're collections of atoms: bags of meat and miscellaneous chemicals. Both statements are technically correct. And yet... a tiger being made out of atoms doesn't make it any less capable of killing you. The problem with pure reductionism is that it throws out exactly the information you need to make predictions at the level you actually care about. Too much of it can be as bad as too little.

I always find these arguments sort of annoying because it really conflates what is actually going on in ML/AI systems with this weird pseudo-science fiction mystification. Yes Tiger's are made of atoms, but no you can't use atomic physics to describe tiger-behavior. With AI models, you can describe behavior directly in terms of the underlying code. The model weights are deterministic parameters that literally decide how the system behaves.

Also you've gotten reductionism vs abstractions completely backwards. Abstractions "throw out information". High-level models compress details to make systems easier to reason about. Also not every useful abstraction corresponds to a mind, subject, or being.

Some Thought Experiments:

A corporation is a higher-level abstraction with goals, memory, persistence, and decision-making. Do we think corporations are conscious?
A nation-state has beliefs, intentions, and agency in discourse. Are they conscious? Do they feel pain?
A thermostat system “wants” to maintain temperature. Are they alive?

LLMs don't have minds and they aren't conscious. They are parameterized conditional probability functions, that are finite-order Markovian models over token sequences. Nothing exists outside their context window. They don't persist across interactions, there is no endogenous memory, and no self-updating parameters during inference. They have personality like programing languages or compilers have personality, as a biased function of how they were built, and what they were trained on.

Context

DaseindustriesLtd late version of a small language model YoungAchamian 4mo ago

With AI models, you can describe behavior directly in terms of the underlying code

You can't. It's intractable. For example, one of the top 3 organizations pursuing AGI, the current leader in agentic coding, Anthropic, investigating misalignment:

New Anthropic Fellows research: How does misalignment scale with model intelligence and task complexity?

When advanced AI fails, will it do so by pursuing the wrong goals? Or will it fail unpredictably and incoherently—like a "hot mess?"

Finding 2: Scale improves coherence on easy tasks, not hard ones
How does incoherence change with model scale? The answer depends on task difficulty:
Easy tasks: Larger models become more coherent
Hard tasks: Larger models become more incoherent or remain unchanged
This suggests that scaling alone won't eliminate incoherence. As more capable models tackle harder problems, variance-dominated failures persist or worsen.

Why Should We Expect Incoherence? LLMs as Dynamical Systems
A key conceptual point: LLMs are dynamical systems, not optimizers. When a language model generates text or takes actions, it traces trajectories through a high-dimensional state space. It has to be trained to act as an optimizer, and trained to align with human intent. It's unclear which of these properties will be more robust as we scale.
Constraining a generic dynamical system to act as a coherent optimizer is extremely difficult. Often the number of constraints required for monotonic progress toward a goal grows exponentially with the dimensionality of the state space. We shouldn't expect AI to act as coherent optimizers without considerable effort, and this difficulty doesn't automatically decrease with scale.

That's, like, the frontier of interpretability research.

Does this look like looking at the code and saying «Ah I get it, X does A»?

We're in a very similar epistemic position with regard to a tiger and to an LLM. The big difference is that with a tiger we have some very limited observation methods like electrocorticography or tomography or something, and with an LLM we can – in theory – deconstruct any particular causal sequence, every activation, every decoded token. But it won't become comprehensible to humans just because we produce another vast array of zeroes and ones from logging its activity.

They are parameterized conditional probability functions, that are finite-order Markovian models over token sequences. Nothing exists outside their context window. They don't persist across interactions, there is no endogenous memory, and no self-updating parameters during inference

Just a string of non sequiturs.

Context

roystgnr YoungAchamian 4mo ago

The model weights are deterministic parameters that literally decide how the system behaves.

This is false for most modern implementations. The same model weights, even at 0 temperature, give different outputs for runs in different environments (where "different" can be as subtle as putting the same hardware and software under more or less load), because anything that changes the ordering of reduction operations over non-associative (e.g. floating-point) arithmetic can change the result.

you can describe behavior directly in terms of the underlying code

Well, you can imagine you can, anyway. LLM execution has that in common with Molecular Dynamics simulations: you can write down the equations on paper, but you're never going to evaluate them that way.

Context

YoungAchamian We walk conditioned ground and name our folly civilization. roystgnr 4mo ago · Edited 4mo ago

the same model weights, even at 0 temperature, give different outputs for runs in different environments

You are right this is technically true, with the caveat that these changes are from really tiny floating point changes on really tiny weights. But importantly, these tiny changes are akin to small random noise perturbations in molecular physics engines. It's an implementation detail due to the impreciseness of numerical operations on tiny numbers. In principle, if you froze the weights and evaluated the model on a perfectly precise machine with exact arithmetic. The mapping from inputs to outputs would be deterministic. The existence of minor numerical nondeterminism on real hardware doesn’t change the fact that the system is fully specified by its parameters, architecture, inputs, and execution environment. In a way that the effect of atomic biology of living organisms on their behavior is not. It's a bad abstraction, the inferential gap is too far.

Well, you can imagine you can, anyway. LLM execution has that in common with Molecular Dynamics simulations: you can write down the equations on paper, but you're never going to evaluate them that way.

The last part is ostensibly true, LLM with billions of parameters are essentially billions of interconnected equations. It is hard to dig through it just like codebase with a billion lines of code would be hard to dig through. We know what those equations do in small cases, just like we understand what individual lines of code do. Scaling them up doesn’t introduce agency We can extrapolate that since mathematical equations/code have no agency, they don't suddenly start doing something else when they are scaled up.

Context

More comments

self_made_human amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi YoungAchamian 4mo ago · Edited 4mo ago

Also you've gotten reductionism vs abstractions completely backwards.

That's what I get for arguing at 3 am. I do know the difference.

See my latest reply to Toll for more.

A corporation is a higher-level abstraction with goals, memory, persistence, and decision-making. Do we think corporations are conscious?

They are more "conscious" than a rock. I do not know if they have qualia, but at least they contain conscious entities as sub-agents (humans).

A nation-state has beliefs, intentions, and agency in discourse. Are they conscious? Do they feel pain?

Would you start objecting if someone were to say "China is becoming increasingly conscious of the risk posed by falling behind in the AI race against America"? Probably not. Are they actually conscious? Idk. The terminology is still helpful, and shorter than an exhaustive description of every person in China.

A thermostat system “wants” to maintain temperature. Are they alive?

No, but the word "alive" is slightly more applicable here than it would be to a rock. Applying terms such as "alive" to a thermostat is a daft thing to do in practice, we have more useful frameworks: an engineer might use control theory, a home owner might only care about what the dials do in terms of the temperature in the toilet. Nobody gets anything useful out of arguing if it's living or dead.

LLMs don't have minds and they aren't conscious.

Hold on there. You are claiming, in effect, to have solved the Hard Problem of consciousness. How exactly do you know that they're not conscious? Can you furnish a mechanistic model that demonstrates that humans made of atoms or meat are "conscious" in a way that an entity made of model weights can't be even in principle?

They are parameterized conditional probability functions, that are finite-order Markovian models over token sequences. Nothing exists outside their context window. They don't persist across interactions, there is no endogenous memory, and no self-updating parameters during inference.

Entirely correct.

They have personality like programing languages or compilers have personality, as a biased function of how they were built, and what they were trained on.

That is not mutually exclusive to anything I've said so far.

Context

YoungAchamian We walk conditioned ground and name our folly civilization. self_made_human 4mo ago · Edited 4mo ago

They are more "conscious" than a rock, since at . I do not know if they have qualia, but at least they contain conscious entities as sub-agents (humans).

So once LLMs start having little green men inside them they will be as conscious as a corporation haha. Also a corporation itself is not more conscious than a rock, as the corporation cannot do anything without conscious agents acting for it. It has no agency on its own. If I create an LLC and then forget about it, does it think? does it have its own will? or does it just sit there on some ledger. If a rock has people carrying it around and performing tasks for it, has it suddenly gained consciousness?

Would you start objecting if someone were to say "China is becoming increasingly conscious of the risk posed by falling behind in the AI race against America"? Probably not.

Yeah not, but I also don't think China is actually conscious. We're all using that as linguistic shorthand for "Chinese Leadership" or "Chinese populations" This nation state idea itself lacks a mind. It is controlled by conscious agents (humans) but it itself lacks consciousness.

Hold on there. You are claiming, in effect, to have solved the Hard Problem of consciousness. How exactly do you know that they're not conscious? Can you furnish a mechanistic model that demonstrates that humans made of atoms or meat are "conscious" in a way that an entity made of model weights can't be even in principle?

You are smuggling in the claim that I am claiming to solve the problem of consciousness. I'm not. I'm claiming that LLMs lack properties that any plausible theory of consciousness requires (Or realistically my own theory). I'm saying that system A lacks necessary conditions for property P, therefore A does not have P. I don't need to prove the full positive theory of P.

My basic theory(really a constraint) of conscious behavior:

Any sentient system must have persistent internal state across time.
This implies non-Markovian dynamics with respect to perception and action.
LLMs are finite-context, externally stateful, inference-time Markovian systems.
Therefore, LLMs lack a necessary condition for consciousness.

I'm willing to entertain another plausible theory of consciousness if you have one you prefer. Or if you think you have an animal that we consider conscious that exists in a Markovian state.

That is not mutually exclusive to anything I've said so far.

Maybe I need to reread your opinion, but my understanding is that you are in the "LLMs are conscious/have minds" camp of thought. If you are then this is exclusive, because I am making the claim that these clearly not conscious tools are personified as having personalities due to human's innate social bias to attribute personality to things. But that doesn't actually make them conscious/mind-having. It's sort of like this video: Social bias towards consciousness

Hint: Humans attribute complex behavior, emotions, feeling and narrative to semi-random movement of shapes on a screen, much like some humans attribute consciousness to LLMs because they exploit our bias for seeing language as a sign of intelligence because we are social animals

Context

More comments

birb_cromble 100ProofTollBooth 4mo ago

I have to ask, where you using any of the terminal based tools for code development (i.e. Claude Code). I know you said you were using Gemini, so I am doubting it was actually Claude Code (although you can run Gemini within CC).

We were using the Gemini cli for that series of tests. I can entertain the notion that Claude code is truly magical, but it's unlikely we'll get more funding to pilot it.

If you think of it as having intention and character flaws, you're going to get frustrated quickly. If you think of it is a very imperfect and probabilistic tool that outputs into non-deterministic solution spaces, you'll get less frustrated and probably think differently on how you prompt it

It's less that I think of it that way and more that I'm trying to describe it for an uninvolved observer. I made the statistical engine comparison just a few paragraphs further up.

Context

FearandLoathingintheMotte birb_cromble 4mo ago

For what it's worth, I've been using ChatGPT codex, Claude code, and Gemini CLI the last month

My ranking is codex>Claude code>>Gemini

Gemini is the worst, although not profoundly, but noticably

Context

What is this place?

This website is a place for people who want to move past shady thinking and test their ideas in a court of people who don't all share the same biases. Our goal is to optimize for light, not heat; this is a group effort, and all commentators are asked to do their part.

The weekly Culture War threads host the most controversial topics and are the most visible aspect of The Motte. However, many other topics are appropriate here. We encourage people to post anything related to science, politics, or philosophy; if in doubt, post!

Check out The Vault for an archive of old quality posts. You are encouraged to crosspost these elsewhere.

Why are you called The Motte?

A motte is a stone keep on a raised earthwork common in early medieval fortifications. More pertinently, it's an element in a rhetorical move called a "Motte-and-Bailey", originally identified by philosopher Nicholas Shackel. It describes the tendency in discourse for people to move from a controversial but high value claim to a defensible but less exciting one upon any resistance to the former. He likens this to the medieval fortification, where a desirable land (the bailey) is abandoned when in danger for the more easily defended motte. In Shackel's words, "The Motte represents the defensible but undesired propositions to which one retreats when hard pressed."

On The Motte, always attempt to remain inside your defensible territory, even if you are not being pressed.

New post guidelines

If you're posting something that isn't related to the culture war, we encourage you to post a thread for it. A submission statement is highly appreciated, but isn't necessary for text posts or links to largely-text posts such as blogs or news articles; if we're unsure of the value of your post, we might remove it until you add a submission statement. A submission statement is required for non-text sources (videos, podcasts, images).

Culture war posts go in the culture war thread; all links must either include a submission statement or significant commentary. Bare links without those will be removed.

If in doubt, please post it!

Rules

Recommended Realtime Chats

Link copied to clipboard

Action successful!

Error, please try again later.

Culture War Roundup for the week of February 2, 2026

Jump in the discussion.

What is this place?

Why are you called The Motte?

New post guidelines

Rules

Recommended Posts And Communities

Recommended Realtime Chats