site banner

Small-Scale Question Sunday for March 22, 2026

Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?

This is your opportunity to ask questions. No question too simple or too silly.

Culture war topics are accepted, and proposals for a better intro post are appreciated.

3
Jump in the discussion.

No email address required.

How did AI annoy you this week?

I'm pretty pro-AI generally, but I've been frustrated by a huge uptick in vibecoded applications that have some use, but are pretty unimpressive and not well thought out, for what they are.

E.g. people in my socials:

  • "I made a tracker website for <some game achievements accessible to the game's API>!"
  • "I made a map of the city to display its trees!"
  • "I made a website to track all the social clubs in the city. Submit yours!"
  • "I made an android app to calculate your BMI!"

And I have a mess of feelings about it. On the one hand, we live in an age of technical wonders, and I'm glad people are discovering them. On the other, because we live in an age of technical wonders, the bar for quality has gone up so much in the last year or two, and these people seem to lack any self-awareness. The default vibe coded design tropes are immediately apparent in these apps, like how you can sense AI writing with em-dashes or "it's not X, it's Y!", or just its general tone. And like, it's fine. It's okay. The apps work, but they should be so much better.

It's not like I haven't vibecoded some turds. I've made websites and android apps and tools too. It's just that these are for me alone, or to be shared in person, if someone requested it. To release one to the public, the actual utility of the thing would have to be unimpeachable. Tracker website for <thing that is already tracked> does not meet that bar. Map of <thing that is already mapped> does not meet that bar. Yet another app doing the same thing as a hundred others does not meet that bar.

I'm probably one of the more AI-sceptical people on this board. I don't think the God Machine is going to techno-rapture us to cyber-heaven anytime soon, but I do try to keep an open mind around the idea that it might have domain specific value, particularly in coding tasks.

I've noted here in the past that I haven't seen much value, even when using frontier models. The responses that I get are:

  • I'm not using a frontier model, despite the fact that I mention that I'm using what is nominally a frontier model.
  • I'm using the wrong frontier model.
  • I'm not using the right harness and tooling.
  • Even if I am using the right harness and tooling, my lack of success is a personal failure on my part, because I'm clearly just prompting it wrong.

I had a chance at work to try using Codex with GPT-5.4. This is allegedly a top tier stack, and so far as I can tell, as close to the frontier as you can get.

I targeted a fairly straightforward performance issue in our codebase, where some JPA code was generating an inefficient query when two tables each grew two orders of magnitude larger than we usually see them grow. This is the kind of thing that would normally take me 30 - 40 minutes to write a few automated end to end tests, then ten minutes to fix.

Since I clearly have a problem with Prompting It Wrong, I spent almost two hours working with the planner describing the problem, and the root cause, and where the failing method was used. I described what might be at risk of breaking, and what tests we would need to write to prove out the fix. I described the architecture of the automated test system, and what the tests would need to verify.

After doing all this, I let Codex churn.

It generated tests that verified the wrong thing.

Then it did the fix wrong, in the name of "efficiency".

Then, rather than fixing the issue correctly, it tried to rewrite all the sites that called the method.

After losing most of a day to this, I fixed it my own damned self. I'm starting to think Djikstra was on to something.

My job is mostly just routine ports of legacy code into a new framework, with a few minor architectural decisions thrown in here and there. It took me some time to set up sufficiently exact instructions, but by now GPT-5.4 can do it pretty well. The end result isn't any worse or better than if I had done it myself (which makes sense, since I'm telling the copilot to utilize my own methodology), and it's just moderately faster, but notably it can do it while I'm off doing something else, with a handful of corrections on the way.

I don't like it, but it looks like it can do my job, at least.

Wow love that Djikstra essay!