site banner

Culture War Roundup for the week of April 6, 2026

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

Jump in the discussion.

No email address required.

Alright, AI bros, follow-up from last week. I was able to secure access to Claude Opus 4.6 at my job, and I gave it the same prompt that I had given to Sonnet. It overlooked the authentication part of the HTTP client library completely this time in what it generated. In a follow-up I asked it to extract out the common logic for the authentication portions specifically. It didn't do that, instead it generated a class with two helper methods.

The first helper method was just a thin wrapper around System.Text.Json for deserializing the response. There's an optional flag to pass in for when case insensitive deserialization is needed, and nothing else.

The second helper method was something for actually making the HTTP calls. The strangest part with this one is that it has two delegates as parameters, one for deserializing successful responses, the other for handling (but not deserializing) error responses. It didn't do anything to split out handling of the 2 different ways to authenticate at all.

The issues with what was generated (for both the API client as a whole, and for the authentication part of the code specifically) are numerous, here are a small handful that I identified:

  1. It assumes that an HTTP 200 code is the only successful response code, even though some endpoints return 202, 207, and more.

  2. It assumes that all endpoints return plaintext or JSON content, even though several return binary data, CSV data, etc.

  3. It didn't do null checking in several places. I assume it was mostly trained on C# code that either didn't do null checks correctly, and/or on code that doesn't use the nullable reference type feature that was added in C# 8 (back in 2019). Regardless, the null checks are missing/wrong regardless of whether nullable reference types are enabled or disabled. Also it always checks nulls with == or != null. This works 99% of the time, but best practice is to use "is null" and "is not null" for the rare cases where the equality operator is overloaded. Once again, I assume this is because most of the training data uses == and !=.

  4. It doesn't handle url query parameters (nor path parameters), it assumes everything is going to use a JSON body for the request.

  5. It uses the wrong logging templates for several of the logging calls. For example, the logs for an error response use the log template for logging the requests that are sent. Even more troubling is that it removed all the logic for stripping user secrets out of these logs.

There are quite a few more issues, but overall my experience with Opus was even worse than my experience with Sonnet, if anything. AI bros still in shambles. I definitely have zero fears that AI will replace me, though I'm still definitely fearful that retarded C-suite execs will think it can replace me.

My post from last week about using Claude Sonnet: https://www.themotte.org/post/3654/culture-war-roundup-for-the-week/426666?context=8#context

Edit: Just saw a very relevant post over on Orange Reddit about this very topic: https://news.ycombinator.com/item?id=47660925

Yeah I've been having some issues with Opus too recently as compared to Sonnet, probably they downgraded everyone since a new model is coming out soonish and they want more compute for that.

But I'm still getting good work done with it, working on a 100K LOC strategy game, which is not exactly boilerplate webdev slop. It is eminently possible to do pretty complex things with AI coding.

AI bros still in shambles. I definitely have zero fears that AI will replace me

Why don't you tell it what exactly you've been finding that's wrong, update the memory file, have a correction pass go over it or do something a little more sophisticated in your orchestration instead of bitching about it online?

This 'I was able to secure access to Claude Opus 4.6 at my job, and I gave it the same prompt that I had given to Sonnet' sounds like you're just trying to one-shot it. You try, review, adjust, try again, have it go in from another angle and then it works.

You don't see me complaining about bugs it gives me, units chasing eachother such that they exit the map. I see a bug, I note it down, I fix it and try and work on a way to avoid similar things happening again.

There's a cavernous gulf between 'lived experiences' here, ironically this is what the motte is kind of for. It's self-evident to me that AI coding is great and effective, whereas it's self-evident to you that it isn't.

If you're willing to iterate one more time can you try giving it this series of prompts?

  1. @agent "Please create a standard set of agents, skills and prompt files for this project. I want specifically for there to be an orchestrator that I can give a complex query to that will walk through a planning stage, asking me relevant questions, create a plan md file and then manage subagents to execute on that plan. Two agent definitions that I definitely want are a security specialist that will audit changes for best practice and a reviewer agents that will audit to make sure updates do not break previous functionality. The orchestrator should know to invoke these agents for these tasks"

  2. @orchestrator "Please scan through and document this project using standard claude.md files to aid agents in navigating and understanding the project. Update agent definitions with relevant information."

  3. @orchestrator "[Insert your prompt]"

You can tinker to make this much better but doing this should greatly improve your results alone.

I'll try that later this week when I get a chance. Maybe next time I'm stuck in an awful meeting for two hours

One thing I'll pre-register ahead of time. I'm sure if this doesn't get satisfactory results you'll declare the whole process pointless, but in the case that the harness fails I have had very good results of explaining to the agent why what it did was wrong and having it update the documentation to avoid that failure mode, over time the harness improves and stops making the same mistakes. Our harness has progressed to the point that I can pretty reliably directly drop a user's feature request email into a jira ticket, give the orchestrator the jira ticket number, maybe answer two of three questions and have a working feature branch building on out CI pipeline in under an hour, any failure is an opportunity for improvement.

I'm not a programmer, so what you just said is all Greek to me, but I'll take your word for it that what you described represents a significant departure from the expectations that the AI horny would lead one to have concerning the capabilities of the product. But they can always respond that these are problems that are solvable, and with the technology in a constant state of flux we can expect that in the coming years things will only continue to improve, since it was only very recently that even that level of functionality wasn't possible. My concerns with AI go beyond that, though, to problems that don't seem to be solvable in the short term and that have only gotten worse in recent years. These are more business-related than technology-related (though the limitations of the technology do factor in), and threaten the entire viability of AI as an industry.

I use Photoshop quite a bit. During the pandemic, though, my graphics card crapped out, and since they were in short supply, I replaced it with an old one from 2014 I had lying around. Since I don't play games or anything this was a perfectly acceptable solution, except that at some point newer versions of Photoshop started offloading some of the workload to the graphics card, for which mine was hilariously out of date. While the newer versions technically worked, there was a certain wonkiness that prevented me from adopting them full-time, and I continued using an install of Photoshop 2018, which was more than adequate for my purposes. In the meantime, I noticed that a newer version I had installed had incorporated "neural filters" aka AI into the program, which of course it did, and I fooled around with this a bit. Some functions were fun, if limited, while others, like upscaling and automatic scratch removal, didn't seem to do anything useful. But whatever. A few weeks ago I finally got a new graphics card after the old one gave up the ghost, and I looked into Photoshop 2026 to see what had changed since 2025. The answer was that the updates were basically all AI-driven, and not in a good way.

Adobe has been a convenient punching bag for the enshittification trend as of late, and the purpose of this post isn't to pile on, but to illustrate how it's representative of a greater rot in the software business and how AI only seems to accelerate that rot. Like previous iterations, some of these AI features are impressive, and some or stupid, but all of them cost extra. The way it works is that you get a certain amount of credits depending on your subscription (and as a long-time customer of the Photoshop-only plan I get a generous number of credits), and each time you use one of these features it costs a certain number of credits. And if you run out you can't just buy more, but you have to upgrade your subscription, and I already get the most credits you can with an out -of-the-box subscription that doesn't involve going through their sales department. To make matters worse, determining how many credits a given action will cost isn't based on a set rate but depends on 900 different factors, and is so complicated that the software can't even tell you how much an action will cost before it's run. And as a final blow, they don't even provide a way of telling you how many credits you have remaining; you eventually just get a message that you've run out.

The latter problem is obviously part of Adobe's slimy sales tactics where they want users to be unable to plan ahead so that they unexpectedly run out of credits in the middle of a time-sensitive project and are forced to upgrade, so I can choke that up to normal corporate bullshit. The former problem is due to the fact that there is simply no way of predicting how much compute an AI system is going to use until it's already used it. The real kicker is that, due to the inherently unpredictable nature of generative AI, you don't even know if the command is going to achieve the desired result, or how many attempts and tweaks it will take to get the desired result, and it may take multiple, expensive generations just to get something usable. The result is that the function is inherently self-defeating. There are lots of Photoshop functions that may require tweaking or not work at all, but they're integral parts of the software and aren't costing the user anything but time if they don't get things right on the first try, and the individual user will get more proficient with experience. The AI features are simply a black box that requires you to throw an unknown amount of money at it and hope it does what you want it to. I, as a user, thus am disincentivized to bother learning how to use these features because my access to them is liable to be cut off at any moment, whereas my existing workflow works fine as it is.

This is basically the problem with the whole "AI as a service" model these companies all seem to be banking on. If the response to Photoshop 2026 is any indication, customers want cost predictability and function predictability. If Microsoft Word cut you off after 1 million words per month it would seem less like you were buying software and more like a free trial. It would be even worse if the number of words you were allowed to type depended on font, font size, formatting, etc., and you didn't know how many credits each action you would take and were liable to be cut off while in the middle of writing something important. Luckily, I can use Word to my heart's content without it costing Microsoft any extra, so they have no reason to impose such a restriction. With generative AI, on the other hand, every action costs the company money, whether it benefits the customer or not, and the company can't predict in advance how much money that's going to be. So there's no way an AI company can realistically charge based on use without pissing off their customer base, who will cancel after getting that first $75,000 bill in the mail that no, they aren't paying.

Charging a flat monthly fee for unlimited usage doesn't solve this problem so much as stick the provider with the bill instead of the customer, so most of the AI services have resorted to a deceptive hybrid model where it looks like you're getting unlimited usage but has asterisks stating that it's subject to a cap, which caps are never explicitly defined. Some charge a monthly fee for access to a certain number of credits, which don't roll over at the end of the month. I'd find a lot to criticize about these models, which wouldn't fly in any normal business sales situation and would be relegated to the scummy end of the consumer pool in any other context, except that they still manage to lose money for the big players. Third-party agent developers may be profitable, but it's only because they're already buying their compute at a discount.

The only conclusion I can draw from all this is that software as a service, while loathed by customers, isn't really beneficial to companies either, other than as a cheap way of temporarily boosting numbers. And that's indicative of a deeper problem in the tech industry as a whole, a problem of their own making. From the 1980s through the 2000s, the computer industry grew exponentially. In the 1970s computers were things that large corporations and government agencies had to manage large databases. In the 1980s they became productivity tools that every employee had on his desk. By the mid-90s, home adoption had started in earnest, and by the end of the decade practically everyone had one. In ten years the internet went from being a hyped curiosity to an essential utility. The technology was also changing quickly, and the improvements were massive. In 1994, a typical home PC had a 486 processor clocked at 66 MHz, 8 MB of RAM, and a 500 MB hard drive. It would run Windows 3.1, which would be replaced a year later with Windows 95, a huge upgrade. Five years later that computer would be hopelessly obsolete; in 1999 a comparable build would have a 450MHz Pentium II, 128 MB of RAM, and a 13 GB hard drive. It would run Windows 98, which would be replaced 2 years later with Windows XP, and even bigger upgrade that eliminated the finickiness of DOS once and for all.

By 2010 CPUs would be clocked in the gigahertz and run multiple cores, RAM would be measured in gigabytes, and external hard drives of more than 1 TB would be affordable. Windows 7 was released the year prior to great acclaim. To put all that in perspective, I'm currently writing this on a Lenovo Thinkpad from 2024 that has the same amount of RAM as the currently-avalable model, which has the same amount of RAM as my home PC build from 2019. Or 2018; I can't remember the year I last did a major upgrade, but I haven't done any since before the pandemic, aside from the aforementioned graphics card. I haven't needed to upgrade it either, as there hasn't been any decline in performance in the tasks I actually use it for. And even that upgrade didn't appreciably improve performance from the 2014 gear I was running before that. Windows 7 was the last Windows release that was universally loved; every one since then has been met with varying degrees of derision. There had been flops before, but Vista was too far ahead of its time to be usable, and ME was a half-assed stopgap that never should have been released. The only mistake in this vein since then was 8, which completely misread the future of computing. Every new Windows since then has been an unexciting incremental upgrade that would probably have worked just as well as a security patch for 7.

I don't want to overstate my case here and suggest that computers haven't improved in the last 15 years; I'm sure my 2014 build would be woefully inadequate by today's standards. The point is that the advances aren't coming as fast as they did in years previous, and when they do come the improvements are more subtle. It feels like 2010 was the year that computer technology reached a mature phase where all adults, even your grandparents, knew how to use it, and good technology was as cheap as it was going to get. This wasn't clear at the time, but in a few years it was apparent that things had stagnated. In the early 2010s I listened to TWIT semi-regularly, and it didn't seem like there was much to get excited about. The two big things that the industry was pushing as the next frontier at the time were wearables and IOT devices. The former flopped spectacularly. The latter had better market penetration, though some of the implementations were ridiculous, and the whole concept has since become a metaphor of how technology has gone too far, trading simplicity and security for dubious functionality. As hardware stagnated, software quickly followed suit. Improvements in software follow improvements in hardware, and with hardware capability virtually unlimited, there was nowhere left to go. Sure, there would always be new features, support for new devices, and better security, but the game-changing upgrades seemed like a thing of the past.

So take a program like Photoshop that was first released in 1990 and had improved leaps and bounds by the time CS6 was released in 2012. A lot of users contend that this was peak Photoshop and that everything since then has been unnecessary bloat. I am not one of those people; the current software is significantly better. But CS6 was also the last version to be sold as a standalone product. Adobe had good reasons for doing this at the time—Photoshop was an incredibly expensive professional grade product that also had broad-based appeal. This meant that it was particularly susceptible to piracy, and lost more money to piracy than more modestly-priced products. They had tried to combat this in the past by releasing less expensive consumer-grade versions like Elements, but these never really took off, as consumers felt like they were missing something (most notably, Elements did not provide access to curves, which every photography book agreed was an essential tool). The decision to go subscription would give consumers access to an always-up-to-date full version of the product for less than it would cost to upgrade every other release.

The crowd who insists that CS6 is better is dwindling now, but even in its heyday it was mostly composed of people who had never actually paid for Photoshop and were mad that it was more difficult to pirate. But when Creative Cloud was first released in 2013, much of the criticism came from professionals and actual customers who were concerned about the new model. Sure, it was cheap now, but what was stopping them from jacking up the price in the future? Creative professionals aren't exactly the most highly paid. In the past one could upgrade whenever he could afford to and, if necessary, stick with a legacy version until things improved. But making one's continued access to software they needed for their job dependent on paying a ransom that they might not be able to afford was a different story. The reaction may have been better if CC offered a significant upgrade over CS6, but rather than wait a few years and offer a significantly improved version, CC came out earlier than one would expect and didn't offer much of an upgrade. Accordingly, the new subscription model was the only noteworthy thing about it. To Adobe's credit, the subscription price didn't change at all for over a decade, but in hindsight, there weren't any game-changing upgrades, only incremental improvements. If the company had simply relied on customers paying full price to upgrade whenever they felt it was worth it, they may have been waiting a long time.

As SaaS has matured from those early days, it has become less about preventing piracy and more about anxiety that newer products won't differentiate themselves enough from the old to merit the user to upgrade. Better instead to lock in that revenue stream with a user subscription that's impossible to cancel short of telling the bank to stop paying. Unfortunately, as a business move it's a one-time thing; make the number go up as all the old customers switch to subscriptions, but once they're aboard, the line flattens out again. In normal industries, this isn't a problem. In the computer industry, 30 years of exponential growth being not only welcomed but expected meant that the situation was unacceptable. Since there was nowhere left to go technologically, the industry had to resort to cheap gimmicks to keep the numbers up. SaaS was one. The aforementioned IoT was another; nothing better than announcing huge deals with appliance manufacturers who will be integrating your products. The problem with gimmicks like this is that, while they can increase revenue, they have a shelf life. A deal with Whirlpool to make a smart fridge may make both of your numbers go up, but once you have computers in every fridge sold, exponential growth is no longer possible. By the 2020s, the tech industry was running out of gimmicks. I think the reason Apple became the top dog during this period is because they were the only tech company that didn't seem to be peddling bullshit. I had a friend who was in and out of tech startups during this period (I even interviewed at one of his companies), and every idea was based on a free service that was really just scaffolding for advertising or data harvesting. A company like Apple that still sold products and services they expected customers to pay for was an outlier indeed.

So AI came to save the day. I'm not denying the fact that the technology is impressive and potentially useful, but it is just about the biggest gimmick one could imagine. Because simply being impressive and useful puts it in about the same league as, well, Photoshop, which, even in its first iteration, was a revolution to anyone who had ever worked in a darkroom. Unlike Photoshop, though AI promises to solve not one particular problem, but all of the problems, including ones that haven't been identified yet. This latter point is particularly salient, because exponential growth in the tech sector was never based on the present, but on the future. If the tech industry in the 2010s looked like it was in danger of stagnating and becoming a normal industry, in the 2020s the sky was the limit. It was now worth it for capital to invest all of the money in AI companies, because if they were successful, then money wouldn't matter anyway.

And if they weren't successful? Well, they never considered that possibility, because the line only moves in one direction. The equation is pretty simple: If AI companies are successful, then your support was worth it and will be repaid. If they aren't successful, then you need to give them more money. But what happens when the money isn't there? How good Photoshop's AI features are is ultimately secondary to how much they cost. Someone has to pay for them, be it the customer or Adobe. Some companies may be willing to subsidize AI, but if Adobe is willing to give product away for free, they'd do better by dumping CC and charging $500 for CS7, but we know that ain't going to happen. Instead, they've raised subscription prices by 50% in an attempt to get customers to pay for the privilege of having access to functionality they have to pay extra for if they actually want to use. I doubt it's a coincidence that the first substantial price hike in the history of CC coincides with the introduction of the expensive AI upgrades. I doubt Adobe will suffer much for it, because their business (like Apple's) is actually sound, and their products indispensable, but it's indicative of the perversion that's at the center of the tech world. Eventually, somebody is going to expect to get paid, and the party will be over. And as I write this, I don't see any scenario where the money is going to be there.

It's interesting that you mention C# and null checks.

I also work C# here and there, as well as a language that is a relatively verbose, garbage collected, class based, statically typed, single dispatch, object oriented language with single implementation inheritance and multiple interface inheritance. Like you, I'm seeing unimpressive results that do not justify the spend necessary for agentic coding.

Every time I've mentioned it here, I'm told the following:

  1. I'm using the wrong model. It does not matter what model I'm using - I'm using the wrong one. If it's not the absolute latest model as of three days ago, I'm speaking in bad faith because I'm using an outdated model (and I should ignore the fact that people were saying the same damned thing about the last version that they're now denigrating). If I am using the latest model, I should be using a different model from a different vendor. At this point I've tried Gemini 3.1 Pro/Thinking/Flash, Opus 4.5/4.6, and GPT 5.4. I'm running out of frontier models.
  2. Next, I'll be told I'm not using plan mode. I can read the manuals. I assure you that I am using plan mode. The fact that the agents frequently do not follow their own plan is apparently a moral failing on my part.
  3. Next, I'll be told I'm using writing a bad spec and providing bad prompts. I'm an experienced developer. I'm a published author. I have an English minor from college. I worked as a technical writer for a while. If I can't write a solid prompt, I have to wonder who the ideal candidate is - especially when these things are supposedly so frighteningly powerful that the vendors claim to be half-afraid to release them.
  4. After that, I'll get barraged with vague claims about how the tech is so rapidly improving that my personal tribulations don't matter. Depending on the person, they'll either refer to radiology as a benchmark (ignoring the fact that the models return results even without a film ) or something about how the models are only improving, and inference is only getting cheaper.

Nobody seems to want to offer the sane take, which seems to be that there can be real efficiency gains for small, well-specified projects, provided you are already an expert in the domain and are willing spend a considerable amount of time beating it into submission whenever it so much as coughs.

If you're working on a small (or perhaps exquisitely modularized) codebase, and it's chock full of documentation written in a way that the LLM can comfortably consume it without getting confused, and it's using only the happy path architecture and library set for your language, and it's in one of the "favored" languages (like python), and you have a robust set of preexisting end to end tests that can help keep the LLM on the rails, then this technology is probably pretty great.

Outside of FAANG and a few startups, however, I'm not sure how often that's the case. Legacy code is real. Enterprise customers can have upgrade cycles that are measured in years. Backwards compatibility is worth more than features. Regulatory compliance issues might end up a court summons instead of a JIRA ticket. That's not a world that does well with disposable code. Unless startups can outcompete every established player in every industry with those characteristics, I'm not sure how that changes. I can't rule out that such a future might happen, but given the moats around those industries, it'll be a tough row to hoe.

In our internal pilots, AI-generated PRs from frontier models make it through our test suite on the first try about 15% of the time. Another 30% never pass at all because they spiral out into schizophrenic fantasy lands, trying to call libraries that don't exist or attempting to rewrite a two million line codebase in "modern python". Of the ones that do make it through, about three quarters of them end up failing code review, even as we update and refine our agent instructions. At this point, dependabot has a better track record, and it doesn't even have Dario Amodei crying at night about how terrifyingly capable it is.

It pisses me off. The technology clearly has some uses, but fuck me if it doesn't feel like it's been wildly oversold. We still use it internally, but the mania is starting to die down. Management thinks it's the best thing ever because it can automatically spam LinkedIn for them. Development uses it as a more accessible StackOverflow. But we've given up on agentic coding for the time being. We'll probably look at it again in six months, assuming nothing bizarre happens between now and then.

I don't really know how to answer your posts because you seem to live in a different universe than me when it comes to AI efficacy. It's like someone checkmating "grass is green" bros by saying they checked and their lawn is brown.

Perhaps there are some unstated assumptions that lead to our differing views on it. Have you read this article about a guy accomplishing a highly nontrivial project with significant AI assistance? It matches my experience pretty well, from the pitfalls you can fall into to the genuinely new possibilities it opens up.

I don’t think anyone is saying it doesn’t have its use cases. It’s a problem of expectations both on the technical as well as the business side. If you’re using it to replace workers, you’re making an objective bad business decision. If you’re using it to automate and assist workflows, then it’s probably good.

I don't understand the people who don't understand how much of a big difference AI is making to software development. Speaking of Software development, this video is great and captures my frustration with a lot of the software developers who I have to work with and yeah for people like that I'd understand completely why AI is bad, all it does is it 10x the amount of slop they produce which then others have to review.

this video is great and captures my frustration with a lot of the software developers who I have to work with

Sadly the parody developer in that video would be more competent than most of my Indian coworkers, so I wouldn't be surprised if AI could replace them. But instead it will be one of the slightly more competent Indians generating mountains of barely functional AI slop (instead of the small hills of slop my coworkers currently generate).

That's one of the things that has caused the org I work for to re-think their internal AI push. The blast radius of bad developers is no longer limited by their own incompetence.

These tools don't generate 1-shot perfection - you need to create a feedback loop that will iterate until it reaches the goal. That can be either test coverage or using tool calling to hit a live service with a test API key or something. Even just prompting it to use a linter or a compiler to catch syntax errors makes a huge difference. Claude would fix most of the issues you flagged in a few loops of trying to test the library, failing and getting an error message, adding the error to its context, editing the code, and repeating. Then at the end once you have something that works, instruct it to write some regression tests, clean up the code, and make sure everything still works as intended.

You're doing the equivalent of handing an intern a sheet of paper, telling them to write down their program based on a vague problem description, and then calling them an idiot when it doesn't work on the first try.

I have no idea why Claude Code is working so badly for you. I work at a FAANG-level company, and a huge amount of our code is written by Claude. Garry Tan is in AI psychosis, but Claude Code is easily the biggest productivity unlock in CS since I started my career.

Few recommendations:

  • What thinking mode are you using ? Use at least high or max.
  • For the purpose of this test, give it all permissions and link it to an mcp like context7
    • This allows it to independently read documentation on your local and from remote sources
  • Basic, but update the app. This lapse happened to a very smart coworker of mine.
  • Use plan mode. It allows the model to build an intuition for the problem before it goes off on its own
  • If you want specific behaviors, then ask for that. Something like:
    • State and scrutinize your assumptions explicitly
    • Consider and invalidate counter factuals.
    • Utilize coding patterns that have already been established in the repo.
    • Ideally, ask it to go write readme.md files for core utility dirs in your repo, so it doesn't cold start
  • Pair it with a type checker / linter and add it as a post-model hook
    • In python land, ruff & based-pyright are the tools of choice.
    • I have used pre-defined open source linting rules, which allows the model to implement best-practice behaviors (eg: opinionated null checks) without human intervention.

I've noticed that the quality of the codebase plays a huge role in the model's ability to write effective code.

For ex:

It assumes that all endpoints return plaintext or JSON content, even though several return binary data, CSV data, etc.

Ideally, all endpoints will already be typed. The model should not have to guess the request-response types.


Unless there is a specific regression in Claude Code, I don't know why claude failed at your task. It should have worked.

Also, if you're looking for a model that prioritizes meticulousness, then I'd use codex. Codex has a tendency to autistically cover all of your bases, that benefits the sort of problem you're work with (again, Use in high or xhigh mode).

and a huge amount of our code is written by Claude. Garry Tan is in AI psychosis, but Claude Code is easily the biggest productivity unlock in CS since I started my career.

That's weird because in my experience, Codex 5.4 is way better than the most recent Sonnet. Haven't tried Opus though.

I have no idea why Claude Code is working so badly for you.

I'm not @ChickenOverlord, but I'm also seeing unimpressive results. Maybe we can get to the bottom of it.

I've tried Claude (via Claude Code), Gemini (via Gemini CLI), and GPT (via codex).

In all of them, I've used their equivalent of Claude.md/Agents.md to lay ground rules of how we expect the agent to behave. Multiple people have taken multiple shots at this.

We always use plan mode first.

Our documentation is markdown in the same repository, so that should be useful and accessible.

We're using Java, which is strongly typed and all our endpoints are annotated with additional openapi annotations that should provide even more metadata.

We're using a pretty basic bitch tech stack, but it's not spring boot. All three models regularly fight us on that fact.

We have a four levels of validation, each with their own entry point in the build scripts. These are described in a readme.md in the root of the project. The first is a linter. The second is unit tests and code coverage. The third is a single end to end test. The fourth is all end to end tests. We have instructed the models to use these validation targets to check their work.

Despite all this, we see common failure modes across all models we've tested.

  1. Bad assumptions about the tech stack. No, we do not use spring boot.
  2. A tendency to add more code, rather than fix code.
  3. An urge to "fix" "bad tests" that exist for very specific reasons. These specific reasons are usually covered with inline developer documentation as well.
  4. Confusion about what capabilities our version of java has available. Yeah, the pattern matching preview was cool. Stop trying to turn it on with experimental feature flags.
  5. Writing tests that don't actually test the thing it's changing.

I'm sure there are more, but these immediately come to mind. There are four of us trying to make these things work, and we all keep running into the same problems again and again. It's not just me - even people with dramatically different writing styles and thought processes are seeing the same thing. I feel like I'm taking crazy pills, because a lot of people I know in real life are experiencing the same pain, but on the Internet it seems like I'm a huge outlier.

What's the disconnect here?

What's the disconnect here?

It works a lot better if you bend to the AI and use a stack it likes. Why this specific Java stack?

Legacy concerns. The amount of custom code that has built up over the last 15 years is too big of a shift to deal with right now. It's on the backlog, but not anywhere near the top priority.

I'm sure there are more, but these immediately come to mind. There are four of us trying to make these things work, and we all keep running into the same problems again and again. It's not just me - even people with dramatically different writing styles and thought processes are seeing the same thing. I feel like I'm taking crazy pills, because a lot of people I know in real life are experiencing the same pain, but on the Internet it seems like I'm a huge outlier.

My most competent co-worker, a Russian guy who got his start writing assembly back in the 80's, was the most enthusiastic about/interested in AI person that I knew. He was always trying out the latest models from OpenAI, Google, and Anthropic. He was also running his own local LLMs and diffusion models locally. He even dropped $4-5k on a DGX Spark late last year. And even he seems to be getting disillusioned/losing interest in AI, he doesn't seem to think it's going to be able to achieve anything remotely close to the promises and hype. Though I will note that the push from our upper management to use AI hasn't pleased him much either, especially since the project we've been working on for the past year (modernizing a giant mess created by our Indian coworkers. They weren't using package management at all, they were literally emailing around zip files full of DLLs for years, I got pulled into 4 hour long calls to fix dependency conflict issues in prod once every 2 or 3 months) was very much not aided by AI, but management insisted we find a way to use AI on the project regardless.

This AI bro vs (idk what to call the opposition) schism on this site is very funny

I feel like both sides are talking passed each other in many ways, and also have no interest in bridging the epistemic gaps.

About me

I'm firmly in the "AI bro" camp I guess. I do not code, nor do I know how to do code aside from simple programming 101 type stuff, which is all I need(ed) to make VBA scripts work in excel. I will never copy/paste another line of Stack Overflow VBA to jank together a macro again, and that makes me very happy.

Adoption is slow, but it's gradually happening at my employer $MULTI_NATIONAL_FINANCE_CO. It is very clear to me that I will see (and already have seen) large productivity gains, especially as agent scaffolds are made for things other than coding.

LLMs are both extremely powerful and very jagged. I think a huge amount of their "jaggedness" is due to their nature as LLMs, and are very unlikely to get to ASI/some versions of AGI*. My best guess is they'll be as disruptive as the ~computer (i.e. the information age) was from 19XX-now, perhaps slightly smaller given "AI impact on human civilization" is kind of a subset of "computer impact on human civilization".

*Notwithstanding some kind of paradigm change in algorithm/AI approach. Which is always possible, but we're pretty clearly on the LLM-tech tree path for the next bit.

Vague Predictions

I am sure many white collar jobs will disappear entirely, many will be insulated for any number of reasons (ranging from genuine limits to retarded bureaucracy and everything in between) and will remain unchanged for a while, and some, like mine, will keep their core identity but day to day tasks will shift a lot and who knows what happens to employment (too many factors to guess per job).

Coding

It is clearly revolutionizing coding. This cannot be denied. GitHub commits are now going parabolic, so people are "building things". Much of which is slop. I am one of those people, I now have a small but growing fleet of personal tools. I'm sure they are coded awfully, I've never looked and wouldn't understand if I did. I don't care, they work for me.

There are much more accomplished coders on twitter, etc, who are also reporting massive changes to their lives. Many of them are incentivized to say such things and over exaggerate, but I doubt it's a massive coordinated lie or mass delusion. So there is truth there.

The more sensible ones will even agree that AI code is on average mediocre to bad, and AI can't do high precision high quality specialized code like a cracked human can. AI will even take your amazing high precision high quality specialized code and slop it if you're not careful. Many of them, like Karpathy, have just given up and accepted the slop as a price of doing business. Because they're accomplishing what they want with the code too. It works.

It's assumed that AI performance will improve massively from where it is today. It has so far, it's a pretty safe assumption right now. It's rumored that the new Claude model beat expectations on performance vs scaling laws. AI model hype is always a large % bullshit, but we'll find out the real capabilities soon, and no matter what they will be better than they are now.

I don't think LLMs are going to bring us the ASI digital god of Sam's wet dreams/nightmares. I think they are going to profoundly change our service economies regardless.

Your situation

I don't know your codebase or the thing you're getting it to do. I don't know anything about HTTP.

I seriously doubt you're trying to set the AI up for success at all. I can't code and I'm probably using more AI coding best practices than you are, and all my git commits are titled "lol".

It's also very possible that it's not worth the time to set up AI "properly" to fix this. There's a very real possibility it's much faster, if more tedious, to just do it yourself. But this is one task. N=1. There are things AI can do for you today, that's a guarantee.

The bubble

The usual retort of "skill issue" is "well if I have to set it up and use best practices then AI is a bubble". I think that's a strawman, because I am not stuck in a reflexive yes/no binary where if you like AI you can't also think it's a bubble. It could be a bubble, I don't know (or care). It's incredibly easy for an asset to be over-financed and you never know if you've done enough capex until you do too much (at any scale). What I care about is the AI tools I can access which are excellent and also flawed.

Maybe AI needs to be that good out of the box to justify the trillions in capex. It probably does. But does that matter here? Neither you nor I control capex spend or can predict how long the scaling laws will hold for.

I don't care if AI is a bubble - we'll all find out and predictions of this scale/magnitude are essentially worthless. If you have alpha and guess right, all power to you, but the bubble conversational branch strikes me as a fool's errand. And it's irrelevant to "can LLMs do things for you?".

Closing thoughts

We have LLMs here right now that are massively changing basically any digital task you point them at. It's not easy, and it doesn't work everywhere, but it's insane when it does.

It's cognitively exhausting. It's a new way of thinking + every time new models/tools come out you change many things you were previously doing. So many assumptions and bottlenecks change. It's genuinely not easy or obvious always how to implement it. We are learning this in real time as a culture.

It's so exciting, and I hope to soon quit my job at $MULTI_NATIONAL_FINANCE_CO to capture more of the value of my labor, which is about to increase a lot (probably lmao, could also go to 0).

If you want to refuse or deny the power of these tools you can. You can set about finding examples of them sucking to point and laugh. But you're letting your bias blind you, and leaving a lot of value on the table. You can tell your computer to do stuff and it can now, it's awesome.

Also noting that in your HN link the inventor of Claude Code is asking ppl for feedback/providing explanations live as I type this.

I've never looked and wouldn't understand if I did. I don't care, they work for me.

This might be a huge part of the divide between doubters and believers.

The code coming back might be ugly, buggy, insecure, and probably completely impossible to scale.

But if it works, how much does the 'average' user care?

Yet those who care for the quality of the code or product it might grate when they look and see the inelegance of the solutions and the lack of foresight.

Apply this to the AI art debate, too. Sure a trained eye will notice deficiencies and shortfalls. But the average user notices that they can produce a logo or a cute cartoon portrait in 15 seconds for pennies.

Me, I'm now basically using the LLMs to do final review on any work I don't feel 100% competent on, since its attention to detail is now impeccable and of course it never gets tired or complains.

Sometimes it hits some nitpicks I genuinely find stupid because in actual practice its an irrelevant detail for the actual outcome of the matter. But it catches things, so it almost feels like it'd be malpractice to not use the tool.

Anyway, its broke through to normies, AI agents are going to be huge among small busineses, I see people who are otherwise technologically inept with Grok AND ChatGPT on their phones lock screens. They are already relying on this tech to a degree that might startle you. Genie ain't going back in the bottle.

Get psychologically (and financially) prepared to adapt, that's the only advice that I can truly offer right now.

It's so exciting, and I hope to soon quit my job at $MULTI_NATIONAL_FINANCE_CO to capture more of the value of my labor, which is about to increase a lot (probably lmao, could also go to 0).

Love this uncertainty. On the one hand, I could 10x my productivity and cut my rates by half and still be making crazy money for myself. Seriously, the number of basic and intermediate tasks that GPT can do for me is freeing up time to engage with the higher leverage tasks that I enjoy and get paid the most for.

But if it gets just a little better then my role as an expert intermediary becomes redundant. I myself become a wrapper for the LLM, I'm just giving the stamp of approval to outputs that are already 99% perfect, and getting paid to eat the blame if something does go wrong 1% of the time. And competition with other humans in this role will drive my marginal profit down to pennies.

I hate this uncertainty.

I hate this uncertainty.

I've always been an anxious person, worried for the future, etc. I've basically given up with AI, the world has gotten so ridiculous it's just funny.

I have no control, everything is going to change. Everything has changed a lot already in my lifetime. I'm just gonna ride it out, I had my friends over for a BBQ last night. Trying to do more of that this year.

hell yeah brother.

The thing about singularity-like situations, reliable prediction becomes impossible. Although technically I don't have to predict with real accuracy, just better than 90+% of the population. Beat the masses to do alright, provided we aren't all killed. You can fret about this, or you can let go and focus in on the tiny parcel of territory in the vastness of probability-space that you have any influence over.

In my most primal moments, I sometimes think I should literally just locate the most physically enticing female I can attract (and compromise on everything else because what else matters if AGI hits?), liquidate most of my assets except like $100k kept in the S&P, and shack up in my house to have gratuitous amounts of sex, get all my groceries delivered, and just fuck around with AI art generators and see if I can make a bit of money off them before whatever comes next washes over us.

But man, it turns out somebody still has to do the hard work of keeping civilization turning so we can keep the lights on until we can finish the silicon god (or the false idol). Those data centers and nuclear plants won't build themselves. Yet.

I despise people who do that stupid "permanent underclass" posting, specifically to drive anxiety without any actionable outlet.

I had my friends over for a BBQ last night. Trying to do more of that this year.

Strong recommend. I've focused on keeping the friendships I have as strong as possible. Say "yes" to more social invites than you used to. As long as the activities don't kill you before we reach utopia, why spend this exciting time hunched over a desk or lying in bed doomscrolling?

One of my favorite parts of this forum is moments like this, when someone puts my thoughts into words better than I could. I agree with every word.

I have the exact same view on AI art. I have quite low skills in "artistic taste", it's never I skill I've been good at or sought to develop much (low reward per n time vs things I like more). But now I can get to make funny images and concept art and express ideas in mediums that were previously locked to me. What fun! Yet there's people crying and screaming on the internet because like game developers are using AI agents to help them make games faster+better. I'm just excited for the golden age of AI gameslop. Good dev studios are going to be absolutely cooking.

I myself become a wrapper for the LLM, I'm just giving the stamp of approval to outputs that are already 99% perfect, and getting paid to eat the blame if something does go a wrong 1% of the time. And competition with other humans in this role will drive my marginal profit down to pennies.

I'm hoping this window of time lasts a while. I'm adjacent to the legal world and they're going to use every institution they wield (many!) to keep themselves in this state for as long as they can.

I mean, there's no way that the legal profession doesn't outlaw AI use in law the moment it becomes a threat to their jobs, right? Lots of law makers are lawyers, and I don't think they are above using the levers of power to make sure their profession can't be replaced.

I'm not sure how they'll catch attorneys who are careful about the end products they're filing.

You might see attorneys staying suspiciously effective despite juggling large caseloads, making surprisingly adept legal arguments in their briefs while their performance at a live hearing is lacklustre.

But yeah it'll be banned from any client or public-facing roles to large extents.

AI use by attorneys will get lots of attention for job market and ethics reasons, but the courts are 100% unprepared for the day when pro se litigants start filing piles of plausible-sounding briefs in their traffic ticket/misdemeanor/family court cases.

They're already doing it in low-stakes Civil cases.

Ask me how I know.

The code coming back might be ugly, buggy, insecure, and probably completely impossible to scale.

But if it works, how much does the 'average' user care?

In my experience the average user starts to care right around the same time that heir credit card number and mother's maiden name end up for sale to the highest bidder.

No one is going to vibe code their own SAAS to replace Salesforce et al

Salesforce and other huge boys with giant moats will enjoy higher labor efficiency. May experience serious pain due to higher competition > margin pressure but hard to predict.

Mid-cap software will knife fight each other over margins as competitors grow like weeds.

Small-cap/VC/PE idek lol, really excited to watch this space.

I'm super curious to see what happens when a given VC can invest in 5x as many startups per unit of $capital. I assume startups will scale faster. Do VCs stretch themselves thin with more companies in a portfolio? Do funds get bigger or smaller? Are there more or less actual VCs? Is it easier or harder to get a VC fund going?

That last bit is the most interesting part to me.

Right now, my understanding is that VC is extremely hard to get because a handful of AI darlings have sucked all the air out of the room. If they IPO soon, VCs should theoretically have freed up capital to deploy as the OpenAIs/Anthropics of the world start to show a return.

If I believe the argument, then it should result in a much larger number of smaller investments, since labor is ostensibly the biggest cost of software startups and that cost should plummet.

I don't think the normies are THAT far along that they'd trust it with their financial information.

But not too far out, either.

They might trust a Vibe-coded website, though.

As I understand it any website taking customers' financial information will usually use a third party's software rather than roll their own.

If Paypal et al. are vibe coding without regard to security we are in for some pain.

Block is vibe coding now

Yeah, personally, I've never bought into the AI hype at all. Everything I've ever tried to use it for, it promptly shits the bed on, so I just dismiss it as worthless.

But even in an alternate universe where I'm the crazy one and everyone else is sane, there are severe problems with trusting this stuff: first, you're de facto ceding control over your technical infrastructure to a third party (run by exactly the sort of people who say stuff like "idk, they trust me. dumb fucks"). Yes, yes, you're supposed to religiously check the output before committing, not let it execute unsafe commands in a privileged environment, yada yada. I've got a bridge in Brooklyn to sell ya. Second, there is existing precedent for tech services being intentionally made worse to increase usage: for example, Google intentionally made Google search worse by doing things like disabling spell check so that users would have to search multiple times to find the result they were looking for, thus "increasing usage" (yes, this is from an actual court document lol). As OP and plenty of other smart people have noted, there is is a trivially obvious incentive and mechanism for this to be done with LLM coding agents. Just make the agent worse so people have to use more tokens!

If they're going to enshittify the AIs, it'll have to happen after one company gets sufficient market dominance that swapping to a different one isn't trivial.

And we're not at that point yet. If anything, the competition is at its fiercest right now. People seem to be willing to drop one product for another if they notice any tiny loss of performance.

I've mostly stuck it out with GPT so far, but I can't see any way they could lock me in hard enough that I wouldn't leave if it was obvious their model was consistently 10% 'stupider' than the alternatives.

I'm surprised (as a mostly non-user for now) at the complaints that engineering performance has degraded over months. Is this rapidly-improving expectations? Honeymoon phase wearing off? AI vendors cranking the screws to reduce costs and looking for pennies? I thought the models were difficult to train and largely static, so I didn't think that sort of scaling was trivially on the table.

Is this rapidly-improving expectations?

Yes

Honeymoon phase wearing off?

Yes + its fun building the first part of slop-software (slopware?). The last 20% of finalization/polish is much less fun. I have gone from "holy shit AI software development is so neat" to "AI software development is neat but I'm getting really sick of making X specific and complicated thing I have no business building work" (I'm so close though).

AI vendors cranking the screws to reduce costs and looking for pennies?

Yes, this is getting worse too.

I thought the models were difficult to train and largely static, so I didn't think that sort of scaling was trivially on the table.

That's why they're all spending Billions

I think it's largely AI vendors cranking the screws.

On the other hand, Anthropic can't even manage two nines of uptime, so it may just be outright incompetence on their part. Being a PhD in machine learning does not make you and expert at SRE, and the models aren't quite there yet either.

Mostly vendors cranking the screw to squeeze more cash out of us. Worst thing is when providers silently update or change the quantization of the model without making it known. Local Models don't have this problem people (I say this as someone who's managed to get Qwen 3.5 397b-a17b running locally on a server I rented).

Somewhere here is a good reaction meme joke about AI doomers having to cope with AGI dumbing itself down to maximize token usage ($$$), not paperclips.

Not saying it's happening, but it'd be ironic.

I listened to the recent Odd Lots episode with Gina Raimondo (Biden's Secretary of Commerce) and I would echo the sentiment here:

Most Americans when they hear AI, they get afraid, right? The vast, vast majority of Americans, "AI = anxiety", "I am going to lose my job". I get that, you know, people are scared. I think it would be a huge mistake to like retard our AI progress with overregulation. [We] just talked about China. I want to win the AI race, I want America to lead the AI world. And I think when we get to the fifth or sixth inning of this AI revolution, whatever you want to call it, I firmly believe there will be more jobs. I do. I think that there will be new industries, new companies, new products and services. I'm an optimist. [That being said,] I am pretty worried about getting from the first inning to that inning.

I am not dismissive of AI because it made me more productive but I also believe that software engineers will be around.