site banner

Culture War Roundup for the week of August 18, 2025

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

4
Jump in the discussion.

No email address required.

I was browsing through the news today and I found an interesting article about the current state of AI for corporate productivity.

MIT report: 95% of generative AI pilots at companies are failing

Despite the rush to integrate powerful new models, about 5% of AI pilot programs achieve rapid revenue acceleration; the vast majority stall, delivering little to no measurable impact on P&L.

There seems to have been a feeling over the last few years that generative AI was going to gut white collar jobs the same way that offshoring gutted blue collar jobs in the 1980s and 90s, and that it was going to happen any day now.

If this study is trustworthy, the promise of AI appears to be less concrete and less imminent than many would hope or fear.

I've been thinking about why that might be, and I've reached three non-exclusive but somewhat unrelated thoughts.

The first is that Gartner hype cycle is real. With almost every new technology, investors tend to think that every sigmoid curve is an exponential curve that will asymptotically approach infinity. Few actually are. Are we reaching the point where the practical gains available in each iteration our current models are beginning to bottom out? I'm not deeply plugged in to the industry, nor the research, nor the subculture, but it seems like the substantive value increase per watt is rapidly diminishing. If that's true, and there aren't any efficiency improvements hiding around the next corner, it seems like we may be entering the through of disillusionment soon.

The other thought that occurs to me is that people seem to be absolutely astounded by the capabilities of LLMs and similar technology.

Caveat: My own experience with LLMs is that it's like talking to a personable schizophrenic from a parallel earth, so take my ramblings with a grain of salt.

It almost seems like LLMs exist in an area similar to very early claims of humanoid automata, like the mechanical Turk. It can do things that seem human, and as a result, we naturally and unconsciously ascribe other human capabilities to them while downplaying their limits. Eventually, the discrepancy grows to great - usually when somebody notices the cost.

On the third hand, maybe it is a good technology and 95% of companies just don't know how to use it?

Does anyone have any evidence that might lend weight to any of these thoughts, or discredit them?

The write up is almost completely dishonest in its spin. Here is a copy of the report itself. Don’t consider this any kind of actual statistic, even the authors hide behind “directionally accurate” and contradict themselves within a single paragraph.

“Only 5% of custom enterprise AI tools reach production” becomes a “95% failure rate for enterprise AI systems” a paragraph later and then there’s the so-called “research note” that says “We define successfully implemented for task-specific GenAl tools as ones users or executives have remarked as causing a marked and sustained productivity and/or P&L impact”.

These three statements are not the same and occur within three successive paragraphs. I think if you read the report there are some useful broad strokes stuff but any specific claim is methodological trash.

As others have already implied, this study seems to be a vehicle for attracting media attention, rather than a serious attempt at evaluating the impact of LLMs on productivity. "Rapid revenue acceleration"? So we're already excluding anything that is merely cost-saving by replacing employees?

The actual paper is not freely available, so I don't actually know how rigorous their research was. At the very least, it is described as being enterprise only - historically the slowest and least agile when it comes to adopting new technologies. There are basic bitch wrappers that already have billion dollar+ valuations! And if it is focused solely on revenue generation as the benchmark, you will be cutting out a huge swath of projects that involve LLMs.

One might also wonder at timing. While LLMs will seem old news to rats and SSC readers due to familiarity with GPT-2, ChatGPT has only been around since November of 2022: not even 3 years old. And that was GPT3.5, GPT4 only came out in March of '23. Any other technology would be incredible if it drove rapid revenue acceleration in ~15 enterprise deployments after such a tiny amount of time. That's not to mention the yuge problem of AI studies becoming out of date simply because the whole thing moves way too quickly for academia. When was this study completed? Autumn of last year, if we're being generous?

Again, without reading the primary source it would be harsh to jump to conclusions, but based on the article linked this just screams "proactive title finding to get attention" rather than something important to learn about business adoption.

Personally my view on the white collar replacement thing is more that AI can replicate most of the beige do-nothing-particular existing that defines a huge chunk of the white collar economy for far less per head. I don't think that AI's necessarily gonna launch a new frontier, but there's so much essentially complete dead-weight in the white collar economy and that might be a catalyst towards rationalization.

I work for a software consultancy that has recently gone heavily for building an "AI accelerator" for clients.

Full disclosure, I just moved from GPU based image generation (non-AI) to embedded, I'm trying desperately to avoid working on the AI projects. Take my comment with a big grain of salt, as I'm definitely biased.

They are definitely useful. Mostly as a way for executives to summarize and interrogate quarterly reports. They're probably going to replace several data analysis teams whose jobs have been building Power BI dashboard for the past 10 years.

The hype cycle is definitely real, but most clients have wanted the chat bots built as a box checking exercise, and have no idea what they actually want out of it (based on in-depth conversations with people on the AI accelerator teams). I expect the trough of disillusionment to hit hard and cancel most of these projects.

They're probably going to replace several data analysis teams whose jobs have been building Power BI dashboard for the past 10 years.

I consulted for a massive multinational a couple of years ago and they had this massive operation in India that produced those BI dashboards every week, that the regional and national executives immediately threw in the trash.

The issue was that while those dashboards looked good and contained a ton of data they didn't really say anything meaningful and it was too hard to both communicate with and change the workflow of the Indian BI teams so the output became useless.

What people defaulted to instead was just fairly simple KPIs that were relevant for whatever issue at hand and people showing things in excel. The dashboards were occasionally used for official reports and external communication but not for internal decision-making.

I'm not sure which bucket AI would fall into here. Would it enable people to quickly do the work themselves (or some kind of local resource) or will it just be a cheaper version to shit out even more useless graphs and dashboards than the Indians resources?

I'm not deeply plugged in to the industry, nor the research, nor the subculture, but it seems like the substantive value increase per watt is rapidly diminishing. If that's true, and there aren't any efficiency improvements hiding around the next corner, it seems like we may be entering the through of disillusionment soon.

Well there seems not enough money is being spent on trying to reduce the power use of inference. A startup that work with silicon for inference tried that can't raise funding enough to retain their engineering teams. Like something is off if companies that try to solve the concrete problem can't get funded but other companies lights stacks of cash on fire to subsidize model usage just to capture market share. The whole thing looks bonkers to me!

If you start with the assumption that the well has run dry and LLMs are never (not any time soon, at least) going be much better or much different than they are now, then yeah, very little about the market makes sense. Everyone willing to put substantial money into the project disagrees.

Inference costs are exaggerated (and the environmental costs of inference are vastly exaggerated). It's certainly a big number in aggregate, but a single large query (30k tokens in, 5k out) for Google's top model, Gemini 2.5 Pro, costs about $0.09 via the API. And further queries on substantially the same material are cheaper due to caching. If it saves your average $50,000 a year office drone 30 seconds, it's more than worth it.

Google ends up losing a lot of money on inference not because it's unaffordable, but because they insist providing inference not only for free, but to search users who didn't even request it. (With a smaller, cheaper model than 2.5 Pro, I'm sure, and I'm sure they do cache output.) Because they think real world feedback and metrics are worth more than their inference spend, because they think that the better models that data will let them build will make it all back and more.

But who knows what those models will even look like? Who wants to blow piles and piles of money on custom silicon that might eventually reduce their inference costs by a bit (though, since they were working with RISC-V, I kind of doubt it'd have ended up being better per-watt; cheaper only after licensing costs are factored in, probably) when a new architecture might render it obsolete at any moment? It's premature optimization.

(Granted, GPUs have remained viable compute platforms since the advent of deep learning, but that's because they're not too specialized. Not sure how much performance per watt they really leave on the table if you want to make something just as flexible. Though I have heard lately that NVidia & AMD have been prioritizing performance over efficiency at the request of their datacenter clients. Which I'd read as evidence they're still in the 'explore' domain rather than 'exploit.')

If you start with the assumption that the well has run dry and LLMs are never (not any time soon, at least) going be much better or much different than they are now, then yeah, very little about the market makes sense. Everyone willing to put substantial money into the project disagrees.

I'm actually assuming that the dumb money is pumping up a bubble with a significant gaps knowledge on what they are actually investing in and don't have any realistic way of getting a return. Much like other investment bubbles in the past.

Lets reverse the responses

Who wants to blow piles and piles of money on custom silicon that might eventually reduce their inference costs by a bit (though, since they were working with RISC-V, I kind of doubt it'd have ended up being better per-watt; cheaper only after licensing costs are factored in, probably) when a new architecture might render it obsolete at any moment?

Didn't Google already do it with TPU:s although not based on RISC-V?

Inference costs are exaggerated (and the environmental costs of inference are vastly exaggerated). It's certainly a big number in aggregate, but a single large query (30k tokens in, 5k out) for Google's top model, Gemini 2.5 Pro, costs about $0.09 via the API. And further queries on substantially the same material are cheaper due to caching. If it saves your average $50,000 a year office drone 30 seconds, it's more than worth it.

Google ends up losing a lot of money on inference not because it's unaffordable, but because they insist providing inference not only for free, but to search users who didn't even request it. (With a smaller, cheaper model than 2.5 Pro, I'm sure, and I'm sure they do cache output.) Because they think real world feedback and metrics are worth more than their inference spend, because they think that the better models that data will let them build will make it all back and more.

How much of the inference run on Google TPU:s and how much on GPU:s?

If you're anticipating capex in the 13 figures, it's still surprising that large companies don't do more research on fundamentally different learning algorithms and hardware. Which isn't to say they don't (e.g. there are a couple researchers at GDM doing excellent work in neuromorphic and more brain-inspired learning), but I'd be surprised if the aggregate research spending among the big three on this (as opposed to tweaks to make transformers perform incrementally better) exceeds $1B/year. Any given research path is likely to not lead to anything, but the potential payoff is enormous.

Having poked at ChatGPT a bit, I'm not particularly surprised. If I think of a job it could potentially do that I understand, like graphic designer, Chat GPT (the only LLM/diffusion router I've personally tried) is about as good as a drunk college student, but much, much faster. There are some use cases for that -- the sort of project that's basically fake and nobody actually cares about or gets any value out of, but someone said it should be done. "I'll have GPT do that" basically means that it's considered meaningless drivel no matter who does it.

I suppose at some point it'll be able to make materials not only quickly, but also well -- but that day is not today.

"I would like an illustration for my fanfiction/roleplaying character. No, I'm not hiring an artist--I'm doing this for free, after all."

Sure. That's in the drunk college student, but way way faster realm. Nice to have, provides consumer surplus at free tier or $20/month, but probably not $200/month.

How about proofreading a long document? You can get LLMs to go through page by page and check for errors like sate instead of state, pubic instead of public, dependent vs dependant...

That has to be most boring and obvious application. There are heaps more.

Or how about making making cartoons? These aren't too bad: https://x.com/emollick/status/1920700991298572682

Word processors already look for typos that are actual words, but don't make sense in the current context, without applying AI. More and better autocorrect is about in line with the original thesis -- they're good at spreadsheet scale tasks, which is useful but not a huge amount of a given person's job. I'm not completely sure what professional editors do, but I think it's probably a bit deeper than looking for typos.

Perhaps I was too flippant with the 'There are heaps more' applications for AI. I get this newsletter from alexander kruel almost daily where he gives a tonne of links about what people are using AI for. For example:

Interviewing people in the Phillipines (better than humans apparently). https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5395709

62% of coders in this survey are using it: https://survey.stackoverflow.co/2024/ai

76% of doctors are using it: https://www.fiercehealthcare.com/special-reports/some-doctors-are-using-public-generative-ai-tools-chatgpt-clinical-decisions-it

It's thought that the US govt might've decided what tariffs to impose via AI: https://www.newsweek.com/donald-trump-tariffs-chatgpt-2055203

It goes on and on and on...

I personally used it for proofreading and indeed it can't do all of an editor's job. Editors do lots of highly visual tasks managing how words fit on the page in ways that AI isn't so good at. But it can do some of an editor's job. It can do much of a cartoonist's job (Ben Garrison is in the clear for now with his ultra-wordy cartoons?). I think it's more than fast drunk college student and more than meaningless drivel.

How about proofreading a long document? You can get LLMs to go through page by page and check for errors like sate instead of state, pubic instead of public, dependent vs dependant...

Spellcheckers and grammar checkers have been a thing for ages in word processors, without throwing massive amounts of compute at it.

And none of them will fix errors like 'pubic law'. It won't notice when 'losses of profits' should be 'losses or profits'. It won't call out a date of 20008.

It does work but I think to do proper proofreading on an important document, you're going to need to supervise it, feed it your house style etc, and then check all its suggestions, or have someone competent who understands the subject matter do the same. Then you'll probably need to feed all the changes manually into InDesign (an LLM might be integrated into Adobe suite to be fair, I haven't used it lately).

By the time you've done that, maybe you'll have saved some time but I don't see it as that big a deal.

If you put the sentence "Ohio law states that you may not loiter less than three get away from a pubic building." into Google docs, it will correct "get" to "feet", and "pubic" to "public". This has been the case for around 15 years.

OK, how about losses or profits? Or 20008? I cited pubic law because it's funny, the other two are actually real examples from what I was getting it to do.

the Prime Minister's name is "Morrison" not "Morison"

I highly doubt Google docs could do tasks that require contextual understanding without some kind of LLM.

Sure it is incrementally better on what we already have. The problem I'm trying to illuminate with it is that is the compute worth the provided value? It is hardly taking away a job from anyone doing the proof reading, it is an improved version of what we already have.

Software spell check (on early computers at least) required a cute algorithm --- the Bloom filter --- to work reasonably efficiently. Actually checking each typed word against the whole dictionary wasn't (and likely isn't) practical, but a statistical guess of correctness is good enough.

wasn't (and likely isn't)

Sir, you were in a coma and woke up in the future.

Checking the inclusion of an element in a hashtable is a constant-time operation, or at least constant-ish -- you still need to compare the elements so it's gonna be proportional to the size of the largest one. So the limiting factor here is memory. I suspect keeping a dictionary resident in RAM on a home PC shouldn't have been a big deal for at least 25 years if not more.

I think there should be an even longer period where it would be fine to keep the dictionary on disk and access it for every typed word, because no human could plausibly type fast enough to outpace the throughput of random reads from a hard disk. No idea how long into the past that era would stretch.

I still get surprised at how fast computers can do basic tasks.

A few weeks ago, I had to compare some entries in a .csv list to the filenames that were buried a few layers deep in some subfolders. It went through the thousands of items in an instant. I didn't even bother saving the output because I could regenerate it as fast as doubleclicking a file (or faster if it has to do something silly like opening Microsoft Word).

Heh, I'm old enough to have owned a pocket electronic spell checker at one point. The hash table seems the right way to do it these days, but it will take up more memory (640K shakes fist at cloud). And sometimes you do want to scan faster than the user types, like opening a new large file.

I don't know if you have experience actually working in tech but the "rapid revenue acceleration" is ringing some alarm bells and even you article doesn't really support the pessimism in these comments, mostly it's saying if you just give chat bots to your front line workers it isn't driving huge growth, which I mean sure.

How companies adopt AI is crucial. Purchasing AI tools from specialized vendors and building partnerships succeed about 67% of the time, while internal builds succeed only one-third as often.

Yeah, especially if you want this thing to be "rapid" that'll be the case. My team is building some AI tooling into workflows and it is a time intensive process. And I can't stress enough that we're not expecting them to hugely scale revenue, we're expecting them to reduce costs through labor savings which your article just isn't about. It's a totally wrong measure.

The headline is almost objectively a lie. It’s completely incompatible with the stat you quoted and so I suspect they are gaming the word “failure” to mean something most people don’t consider it to mean.

I don't know if you have experience actually working in tech but the "rapid revenue acceleration" is ringing some alarm bells

I've been in tech so long that all my alarm bells have blown out. At this point in my career, I assume most things are bullshit until I'm pleasantly surprised.

How much of this is attributable to the extremely low startup cost of an AI project? Having someone spin up a few Claude instances and seeing if that works can be surprisingly cheap.

Previous technologies have required far more initial investment, so this might just be “everyone can have $10K tokens to try whatever you want if you can write a half cogent proposal”.

There are two companion articles of late that I'd add to comment on this.

  1. Why LLMs can't actually build software

This one is pretty short and to the point. LLMs, without any companion data management component, are prediction machines. They predict the next n-number of tokens based on the preceding (input) tokens. The context window functions like a very rough analog to a "memory" but it's really better to compare it to priors or biases in the bayesian sense. (This is why you can gradually prompt an LLM into and out of rabbit holes). Crucially, LLMs don't have nor hold an idea of state. They don't have a mental model of anything because they don't have a mental anything (re-read that twice, slowly).

In terms of corporate adoption, companies are seeing that once you get into complex, multi-stage tasks, especially those that might involve multiple teams working together, LLMs break down in hilarious ways. Software devs have been seeing this for months (years?). An LLM can make nice little toy python class or method pretty easily, but when you're getting into complex full stack development, all sorts of failure modes pop up (the best is when it nukes its own tests to make everything pass.)

"Complexity is the enemy" may be a cliche but it remains true. For any company above a certain size, any investment has to answer the question "will this reduce or increase complexity?" The answer may not need to be "reduce." There could be a tradeoff there that actually results in more revenue / reduced cost. But still, the question will come up. With LLMs, the answer, right now, is 100% "increase." Again, that's not a show stopper, but it makes the bar for actually going through with the investment higher. And the returns just aren't there at scale. From friends at large corporations in the middle of this, their anec-data is all the same "we realized pretty early that we'd have to build a whole new team of 'LLM watchers' for at least the first version of the rollout. We didn't want to hire and manage all of that."

  1. AWS may have shown what true pricing looks like

TLDR for this one: for LLM providers to actually break even, it might cost $2k/month per user.

There's room to disagree with that figure, but even the pro version of the big models that cost $200+ per month are probably being heavily subsidized through burning VC cash. A hackernews comment framed it well - "$24k / yr is 20% of a $120k / yr salary. Do we think that every engineer using LLMs for coding is seeing a 20% overall productivity boost?"

Survey says no (Note: there are more than a few "AI makes devs worse" research papers floating around right now. I haven't fully developed my own evaluation of them - I think a few conflate things - but the early data, such as it is, paints a grim picture)


I'm a believer in LLMs to be a transformational technology, but I think our first attempt with them - as a society - is going to be kind of a wet fart. Neither "spacing faring giga-civilizaiton" nor "paperclips ate my robot girlfriend." Two topical predictions are 1) One of the Big AI companies is going to go to zero. 2) A Fortune 100 company is going to go nearly bankrupt because of negligent use of AI, but not in a spectacular "it sent all of our money to china" way ... it'll be about 1 - 2 years slow creep of fucked up internal reporting and management before, all of a sudden, "we've entered a death spiral of declining revenue and rising costs."

An LLM can make nice little toy python class or method pretty easily, but when you're getting into complex full stack development, all sorts of failure modes pop up

I'm using it for full stack development on a $20 plan and it works. I guess it depends on what you mean by complex full stack development, how complex is complex? I wouldn't try to make an MMO or code global air traffic controls with AI but it can definitely handle frontend (if supervised by a human with eyes), backend, database, API calls, logging, cybersecurity...

And sure it does fail sometimes with complex requests, once you go above 10K lines in one context window the quality lowers. But you can use it to fix errors it makes and iterate, have it help with troubleshooting, refactor, focus the context length on what's critical... Seems like there are many programmers who expect it to one-shot everything and if it doesn't one-shot a task they just give up on it entirely.

The metr paper is somewhat specialized. It tests only experienced devs working on repositories they're already familiar with as they mention within, the most favourable conditions for human workers over AI: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/

Secondly, Claude 3.7 is now obsolete. I recall someone on twitter saying they were one of the devs in that study. He said that modern reasoning models are much more helpful than what they had then + people are getting better at using them.

Given that the general trend in AI is that inference costs are declining while capability increases, since the production frontier is moving outwards, then investment will probably pay off. Usage of Openrouter in terms of tokens has increased 30x within a year. The top 3 users of tokens there are coding tools. People clearly want AI and they're prepared to pay for it, I see no reason why their revealed preference should be disbelieved.

https://openrouter.ai/rankings

Two small notes. First, you are almost certainly being heavily subsidized on that $20 plan. All the evidence points in that direction. You may be paying 1-2 orders of magnitude under cost. Second, the most interesting part of the METR paper was that the devs thought they were being sped up, but the opposite was true. Provably so. Intuitions on AI efficacy cannot be trusted prima facia. Many people find them enjoyable and interesting to use, which of course is their right, but we should not trust their estimates on the actual utility of the tool. Both of these facts seriously undermine the boosters’ case.

If you think you’re being subsidised on a $20/month plan, switch to using the API and see the price difference. Keep in mind that providers make a profit on the API too - if you go on OpenRouter, random companies running Deepseek R1 offer tokens at a 7x cheaper rate than Claude Sonnet 4 despite Deepseek most likely being a large model.

As @RandomRanger said, it would make little sense for ALL companies to be directly subsidising users in terms of the actual cost of running the requests - inference is honestly cheaper than you think at scale. Now, many companies aren’t profitable in terms of revenue vs. R&D expenditure, but that’s a different problem with different causes, in part down to them not actually caring about efficiency and optimisation of training runs; who cares when you have billions in funding and can just buy more GPUs?

But the cat’s out of the bag and with all the open weight models out there, there’s no risk of the bigcos bumping up your $20/mo subscription to $2000/mo, unless the USD experiences hyperinflation at which point we’ll have other worries.

Does anyone seriously think that these tech companies are selling $200+ worth of compute for $20? The natural assumption should be that they're making good margins on inference and all the losses are due to research/training, fixed costs, wages, capital investment. Why would a venture capitalist, who's whole livelihood and fortune depends on prudent investment, hand money to Anthropic or OpenAI so they can just hand that money to NVIDIA and me, the customer?

Anthropic is providing its services for free to the US govt but that's a special case to buy influence/cultivate dependence. If you, a normal person, mega minmax the subscription you might use more than you pay for but not by that much and the average subscriber will use less. Plus you might praise it online and encourage other people to use the product so it's a good investment.

What evidence points in this direction of ultra-benign, pro-consumer capitalism with 10x subsidies? It seems like a pure myth to me. Extraordinary claims require extraordinary evidence.

Take OpenAI. Sam Altman said he was losing money on the $200 subscription. But Sam Altman says a lot of things and he didn't say 'losing 10x more than we gain'.

The company has projected that it would record losses of about $5 billion and revenue of $3.7 billion for 2024, the New York Times reported in September. The company’s biggest cost is due to the computing power used to run ChatGPT. Not only does it require huge investments in data centers, it also demands vast amounts of electricity to run them.

If the company is losing 150% of revenue (and Anthropic is similar), not 1000% or higher, then clearly it's what I'm saying, not what you're saying. Inference/API is profitable. User subscriptions are profitable. Investment is not profitable in the short term, that's why it's called investment. And they have their fixed costs... That's why AI companies are losing money, they're investing heavily and competing for users.

Furthermore, one study of a selected group of coders doing a subset of software tasks with old models does not disprove the general utility of AI generally, it's not a major, significant fact. I could find studies that show that AI produces productivity gains quite easily. That wouldn't mean that it produces productivity gains in all settings, for all people.

Here's one such study for instance, it finds what you'd expect. Juniors gain more than seniors.

https://mitsloan.mit.edu/ideas-made-to-matter/how-generative-ai-affects-highly-skilled-workers

Or here he lists some more and finds productivity gains with some downsides: https://addyo.substack.com/p/the-reality-of-ai-assisted-software

The metr paper just tells (some) people what they want to hear, it is not conclusive any more than the other papers are conclusive. And a lot of people don't read the metr paper closely. For instance:

Familiarity and inefficiency in use: These devs were relatively new to the specific AI tools. Only one participant had >50 hours experience with Cursor; notably, that one experienced user did see a positive speedup, suggesting a learning curve effect. Others may have used the AI sub-optimally or gotten stuck following it down wrong paths.

A couple things;

The natural assumption should be that they're making good margins on inference and all the losses are due to research/training, fixed costs, wages, capital investment.

This is a fun way to say "If you don't count up all my costs, my company is totally making money." Secondarily, I don't know why you would call this a "natural" assumption. Why would I naturally assume that they are making money on inference? More to the point, however, it's not that they need a decent or even good margin on inference, it's that they need wildly good margins on inference if they believe they'll never be able to cut the other fixed and variable costs. You say "they aren't selling $200 worth of inference for $20" I say "Are they selling $2 of inference for $20"?

Why would a venture capitalist, who's whole livelihood and fortune depends on prudent investment, hand money to Anthropic or OpenAI so they can just hand that money to NVIDIA and me, the customer?

Because this is literally post 2000s venture capital strategy. You find product-market fit, and then rush to semi-monopolize (totally legal, of course) a nice market using VC dollars to speed that growth. Not only do VCs not care if you burn cash, they want you to because it means there's still more market out there. This only stops once you hit real scale and the market is more or less saturated. Then, real unit economics and things like total customer value and cost of acquisition come into play. This is often when the MBAs come in and you start to see cost reductions - no more team happy hours at that trendy rooftop bar.

This dynamic has been dialed up to 1,000 in the AI wars; everyone thinks this could be a winner-take-all game or, at the very least, a power low distribution. If the forecast total market is well over $1 trillion, then VCs who give you literally 10s of billions of dollars are still making a positive EV bet. This is how these people think. Burning money in the present is, again, not only okay - but the preferred strategy.

Anthropic is providing its services for free to the US govt.

No, they are not. They are getting paid to do it because it is illegal to provide professional services to the government without compensation. Their federal margins are probably worse than commercial - this is always the case because of federal procurement law - but their costs are also almost certainly being fully covered. Look into "cost plus" contracting for more insight.

What evidence points in this direction of ultra-benign, pro-consumer capitalism with 10x subsidies? It seems like a pure myth to me. Extraordinary claims require extraordinary evidence.

See my second point above. This is the VC playbook. Uber didn't turn a profit for ever. Amazon's retail business didn't for over 20 years and now still operates with thin margins.

I don't fully buy into the "VCs are lizard people who eat babies" reddit style rhetoric. Mostly, I think they're essentially trust fund kinds who like to gamble but want to dress it up as "inNovATIon!" But one thing is for sure - VCs aren't interested in building long term sustainable businesses. It's a game of passing the bag and praying for exits (that's literally the handle of a twitter parody account). Your goal is to make sure the startup you invested in has a higher valuation in the next round. If that happens, you can mark your book up. The actual returns come when they get acquired, you sell secondaries, or they go public ... but it all follows the train of "price go up" from funding round to funding round.

What makes a price? A buyer. That's it. All you need is for another investment firm (really, a group of them) to buy into a story that your Uber For Cats play is actually worth more now then when you invested. You don't care beyond that. Margins fucked? Whatever. Even if you literally invested in a cult, or turned your blind eye to a magic box fake product, as long as there is a buyer, it's all fine.

You say "they aren't selling $200 worth of inference for $20" I say "Are they selling $2 of inference for $20"?

Why don't we try and look into this? People have tried to estimate OpenAI margins on inference and they come away with strong margins of 30, 55, 75%. We don't live in a total vacuum of information. When trying to work out their margins on inference, I base my opinion on the general established consensus of their margins.

they need wildly good margins on inference if they believe they'll never be able to cut the other fixed and variable costs

The demand for inference is rising, Openrouter records that demand for tokens rose about 30x in the last year as AI improves. Grow big enough and the margin on inference will outweigh the costs.

They are getting paid to do it

It's effectively free, they're 'selling' it for $1 per agency for a whole year. OpenAI is doing the same thing. Why are you trying to correct me on something you won't even check?

There is a significant difference between making a loss as you expand your business rapidly and try to secure a strong position in an emerging market and 'subsidized by 1-2 orders of magnitude'. No evidence has been supplied for the latter case and it's unbelievable.

Amazon wasn't making a profit because they were continuously expanding and investing in their retail business, not because the actual business was unprofitable. Investors were happy to tolerate them not making profits because they were growing. Uber wasn't making a profit but there were no 10x subsidies. We can see this immediately in how taxis weren't costing $20 while Uber was costing $2 for the same trip.

TLDR for this one: for LLM providers to actually break even, it might cost $2k/month per user.

If the Big AI companies try to actually implement that kind of pricing, they will face significant competition from local models. Right now you can run Qwen3-30B-A3B at ridiculous speeds on medium-end gaming rig or a decent Macbook, or if you're a decently sized company, you could rent a 8xH200 rig 8h/day, every workday, for ~$3.5k/mo, and give 64 engineers simultaneous, unlimited access to Deepseek R1 with comparable speed and performance to the big known models, so like... $55/month per engineer. And I highly doubt they're going to fully saturate it every minute of every workday, so you could probably add even more users, or use a quantized/smaller model.

you can run Qwen3-30B-A3B at ridiculous speeds on medium-end gaming rig

How are you doing that? Qwen3-30B-A3B-Q5_K_M.gguf is 21.7Gb, are you running it at 1it/s slowly swapping off the SSD or is your idea of a medium-end gaming rig a 3/4/5090?

I mostly gave up on local models because they hit such an obvious intelligence barrier compared to the big ones, but would love to give this a shot if you explain what you're doing. I have 16Gb VRAM.

The default settings for LM Studio on a GTX 3060, i3-12100, and 48GB DDR5 memory at pretty conservative XMP, using qwen3-30b-a3b-Q_6K gave >13 token/s for this test question. That's not amazing, and I could probably squeak out a lot more performance going to 4bpw or more aggressive tuning, but it's still performant enough to work.

((Although I'd put the code quality pretty low -- in addition to the normal brittleness, there's some stupid bugs unexpected behaviors with filesystemwatcher, and most LLMs so far have walked straight into them. But since some of the other LLMs don't even get the question understood enough to use a filesystemwatcher, there's a bit of a curve here.))

You can get 10-20 tokens/s with CPU only inference as long as you have at least 32GB of RAM. You can offload some layers to your GPU and get probably 30-40 tokens/s? Of course, a 3090 gives you >100t/s but it’s still only $800, I’d consider that mid-range compared to a $2k+ 5090.

Swapping from the SSD is only necessary if you’re running huge 100B+ models without enough RAM.

Yes.

Which is why the Big AI companies are looking to tightly couple with existing enterprise SaaS and/or consumer hardware as fast as possible. And I'm reasonably sure that the large hardware companies may want to aid them. NVIDIA keeps making noise about "AI first" hardware at, I think, a consumer level.

They really do want a version of Sky Net.

Most office work is fake email jobs. 10x'ing the productivity of fake is still fake. I do think that AI is going to roar on the margins. But the average office working is doing very little productive in the first place.

A lot of AI is helping write emails on the front end, and then summarizing them on the backend. Nothing of value is being added.

Most office work is fake email jobs.

This is a ridiculous statement. If you think so, you should post your short positions.

Or, you should start up competing white collar companies and undercut all the companies you think have massive amounts of fake email job deadweight. As you'd have a much leaner cost structure.

I'm surprised that this is controversial. I didn't think it was a hot take. Even at my first internship (at a large tech company), during the general onboarding, I was introduced to the concept of the Pareto principle, used to explain that 80% of the work is done by 20% of the employees.

If you think so, you should post your short positions.

Shorts on who?

I'm in much more agreement with "80% of work is done by 20% of employees" (although I think it's a larger share than 20%)

I disagreed strongly with "most office work is fake email jobs"

"Many employees don't work at full capacity" =/= "many employees have fake jobs"

The shorts would be against whatever companies you think are wasting large sums of money paying people to do nothing, as presumably they're very liable to be disrupted by companies with more competitive cost structures

That doesn't follow. There are colossal differences between:

  1. Suspecting that there are many large companies with massive amounts of fake email job deadweight
  2. Being able to reliably identify companies with more (or less) fake email job deadweight
  3. Being able to reliably generate above market returns using the information gleaned from 2)

Where 2) would be quite difficult for professional investors, much less regular individual investors. And even conditional on if one somehow were able to pull off 2), modern financial theory would suggest 3) is still highly unlikely. Since if you could pull off 2), so could others, and thus there leaves no opportunity for arbitrage.

Note that it’s unclear what the directionality of such an investment thesis should be. With or without arbitrage, one could argue that high-bullshit-email-job companies should have higher expected returns, since the market would perceive them as riskier in being more vulnerable to disruption.

The shorts would be against whatever companies you think are wasting large sums of money paying people to do nothing

All mid-large companies do this, there's none to short, as it's built in.

as presumably they're very liable to be disrupted by companies with more competitive cost structures

No because it's not a solved problem, it's a scaling and coordination problem. You can't easily pick out which jobs are fake. and which parts of which jobs. It's baked into the growth curve.Smaller companies generally are scrappier, and often cheaper as a result, which is how they compete. As they grow, they become less able to run a tight ship.

I don't find the statement so ridiculous, unfortunately. As @ThomasdelVasto and I posted before, the corporate market may be in an irrational but metastable state. Far too much of white-collar work is just "adult daycare", and society has been built around the idea that this is how you keep people occupied. It's possible that, at some point, the whole edifice will collapse. But hey, I don't have a bird's-eye view and I could be wrong. Let's hope so!

Pinging @fmac but what specific jobs are you referring to? I can't say I've ever worked with anyone whose job I would describe as a "fake email job".

I wouldn't discount the entire title, but a bad PM fits the "fake email job"-shaped hole so well that it might as well be made for it.

Some managers, sales reps, and HR workers come to mind (note that I'm not saying there's no need for those roles, but I get the impression there are far too many people in them). Heck, even many coders, despite having a real thing they make, are just skating by and not making a difference to anyone's life. I would possibly include myself in that. And I'm working for a successful company - I'm sure it's a dozen times worse in, say, the government, where even the distant hand of the market can't reach you.

I'm also open to the argument that 95% of jobs are useless but it's humanly impossible to know exactly which those are, so you need to keep everyone employed. I'm not arguing from omniscience here, just from my instincts after decades of code monkeying.

After reading more of your thoughts I'm actually much more in agreement with them than how I interpreted them initially

The true benchmark is GDP. If LLMs truely can boost productivity in most tasks except for pure manual labour we would see GDP roaring. If we saw a 10-40% increase in productivity among the majority of the labour force it would be like an industrial revolution on steroids. We are seeing lackluster economic growth, clearly production isn't booming.

Some how tech has the ability to radically change how people work and provide people with amazing tools without boosting productivity much. We went from type writers to Word to cloud services that allow us to share documents instantly across continents. We really haven't seen a matching boom in productivity. The amount of office workers wasn't slashed with the propagation of Email, excel, google search or CRM-systems.

Some how tech has the ability to radically change how people work and provide people with amazing tools without boosting productivity much. We went from type writers to Word to cloud services that allow us to share documents instantly across continents. We really haven't seen a matching boom in productivity. The amount of office workers wasn't slashed with the propagation of Email, excel, google search or CRM-systems.

I am willing to entertain claims that official statistics are distorted, but nevertheless someone should make the point: real GDP per capita has more than doubled since 1980s and typewriters. I never understood properly Total Factor Productivity, but it had respectable growth from 1980s to 00s.

The amount of office workers wasn't slashed with the propagation of Email, excel, google search or CRM-systems

I am too lazy to look-up the statistics, but qualitatively the office work functions has dramatically changed. There used to be significant pool of people whose job function was answering phone, typing documents on paper, managing the paper documents, managing someone's calendar and it was needed for the business to function. Secretaries and mail room jobs have practically disappeared.

I am amenable to argument that most of these efficiencies have been wasted (they enable more work to be done, but not all of the new work is productive).

I am willing to entertain claims that official statistics are distorted, but nevertheless someone should make the point: real GDP per capita has more than doubled since 1980s and typewriters.

Distorted doesn't begin to describe it. It's not like someone made an oopsie when refording data in their excel spreadsheet.

Citing these statistics is makes no sense until we establish we're even using the same definitions. If the bureaucratic sector expands to match the gains brought by technology, that's still "P" for the GDP god, and does little to argue against someone who says they're not seeing much productivity gains over their lifetime.

Outright dismissing them makes for a boring discussion, because then there is nothing to discuss unless the specific failures can be discussed. Concept of distortions I am ready accept, because whenever I hear how the calculations are made, they sound heartwarmingly crude. Method for constant prices adjustment even more so (what is the value of MacBook in 1980?). But any attempt to account for it is a headache, too.

I do believe the numbers are not wholly invented, so they track some kind of signal, which is a relevant to enough for to make the point: LLMs have not (hedge: not yet) caused dramatically increased GDP, but GDP (per capita) and productivity statistics increased when the office adopted MS Office.

Eta: sent an early draft too early.

We really haven't seen a matching boom in productivity.

Or the current march of productivity increases throughout the 2000s is as a result of shit like cloud software tools, and in a counter factual world where tech stagnated, productivity would be much lower than our actual world?

LLMs truely can boost productivity in most tasks except for pure manual labour we would see GDP roaring.

Ironically, I personally suspect that the current AI revolution may find its best use cases in manual labor fields. Replacing software developers is sexy, but hard. Training a model to operate robotics for conventionally "hard" problems (picking fruit, sewing garments, sorting trash) that require a bit of intelligence due to inconsistent inputs (fruits differ, fabric bunches in non-deterministic ways, trash isn't worth sorting) seems much more viable than groking a full software stack.

Training a model to operate robotics for conventionally "hard" problems

This is really hard and if current models could easily be trained to do it they already would have

Silicon Valley VCs seem hugely reluctant to back hardware startups, and even then, those take longer to come to market. It's not my wheelhouse, but I feel like we've seen lots of robotics demos recently that were previously the domain of Boston Dynamics. I've assumed those were driven by machine learning and neural models, as opposed to cheap MEMS sensors that have been around at least a decade now, but I'd be curious to learn more if you have links.

I'm just vibing based on what I read and observe

There's a snappy "X's name law" about this. Basically getting robots to do rehearsed and consistent activities (a backflip) is now "easy", somewhat independent of the complexity of the action itself.

But having them able to handle the profound randomness of life at a 99.999999% accuracy level is really really really hard (see: self driving cars).

Sure, but I have trouble believing that picking vegetables has as much profound randomness as self-driving cars: part of the reason the latter is hard is that it encompasses a huge range of general human knowledge ("Was that a shadow of a bird or an animal running across the road? I should watch out for deer in this sort of stretch"). Even sewing a shirt (I've done it) doesn't have that wide a range of random appearances.

I mean I agree with you, and I'm not involved in the field of robotics.

I have seen interesting videos of robot berry pickers (Dyson has a cool experimental farm), I hope commercially viable ones are invented and scaled

I'm just commenting on what I've picked up as an interested observer of the space

The lack of massive robotic automation indicates to me it's quite hard, if it wasn't, we'd see more of it

I hope LLMs help smooth over the "fuzzy" parts that make this so hard

The existence of LLMs makes me a better doctor (and I am studiously silent on whether you could replace me entirely with one). Perhaps this is an artifact of me being relatively junior in my career, but I had an uncle, who is a consultant psychiatrist with more degrees than a heat wave ask GPT-4o questions. He begrudgingly admitted that it gave a more satisfactory answer to one of his thorny questions than the overwhelming majority of other consultant shrinks.

(Said question was on the finer details of distinguishing schizoaffective disorder from bipolar disorder with psychosis. Why 4o? I used the Advanced Voice Mode for the sake of future shock, o3 could have given an even better answer)

Let's just say that my willingness to pay for SOTA LLMs is far higher than their sticker price. Thank god for market competition, and the fact that I don't need nearly as many tokens as vibe coders. The price I pay is comparable to the delicious pork belly I'm eating at a nice Chinese place, and I know what I'd take in a pinch.

I am studiously silent on whether you could replace me entirely with one

I can't find the paper but I was linked recently to a study illustrating that generative AI performance on medical content drops precipitously when "not one of these" is added to the answer list and used.

We aren't dead yet.

https://pmc.ncbi.nlm.nih.gov/articles/PMC12334947/

This the one? It's mildly ironic I found it using GPT-5T, but it's food for thought nonetheless.

it's food for thought nonetheless

Lol I mean I maintain shit ain't ready yet like I always have - it's very common for diseases to present atypically and even more common for patients to poorly explicate things. Neither of these is well captured in the literature and therefore the data set.

Despite the rush to integrate powerful new models, about 5% of AI pilot programs achieve rapid revenue acceleration; the vast majority stall, delivering little to no measurable impact on P&L.

If this study is trustworthy, the promise of AI appears to be less concrete and less imminent than many would hope or fear.

This seems like an extremely odd metric to support the argument that you are making.

At the very least, to use the 5% success rate to understand AI's revolutionary potential, we need to know what the average value unlocked in those 5% of successes is, and the average cost across the whole dataset. If the costs are minimal, and the returns are 100x costs for the successes, then even if only 5% succeed every single company should be making that bet.

On top of that, what's the timeline function? When were these programs launched? How long have they been going on? Are the older programs more successful than the the newer ones? If most of the 5% are over a year old, while most of the 95% are less than a year old, we might be judging unripe tomatoes here.

Then, add to that, there's value in having institutional knowledge and expertise about AI. By having employees who understand AI, even if the pilot program fail, they'll see opportunities to implement it in the future and understand how to integrate it into their workflow.

It just seems odd to declare AI dead based off this data.

I’d be curious to know what type of businesses that 5% were used at. It might be good for things like writing boilerplate news and bad at ad copy. It might be good at picking up trends in engineering and business to business stuff and not so good at picking the new fashion trends.

It just seems odd to declare AI dead based off this data.

I may have miscommunicated here. I don't think it's dead. I think it'll be useful on a much longer time horizon than was predicted, and not in a way that we expected. The slope of enlightenment is next.

Fair, I probably misinterpreted your post.

But still, I don't even know if that data said it isn't useful! If I published an article telling you that I ran the numbers, and the 40-1 bets on UFC fights hit 5% of the time, that would be a huge gambling tip telling you to bet on the longshots.

What’s the base rate?

If I saw “rapid revenue acceleration” on a mass email from my upper management, I’d expect roughly zero change in my day to day experience. 95% “little to no impact” is right there in Lizardman territory.

Press releases have the same incentives whether or not a technology (or policy, or reorg, or consent decree, or…) is actually going to benefit me. Companies compete on hype, and so long as AI is a Schelling point, we are basically obligated to mention it. That’s not evidence that the hype is real, or even that management believes it’s real. Just that it’s an accepted signal of agility and awareness.

The article points out a number of stumbling blocks. Centralizing adoption. Funding marketing instead of back-office optimizations. Rolling your own AI. Companies which avoided these were a lot more likely to see actual revenue improvements.

I can say that my company probably stalled out on the second one. I’m in a building full of programmers, but the even the most AI-motivated are doing more with Copilot at home than with the company’s GPT wrapper. There’s no pipeline for integrated programming tools. Given industry-specific concerns about data, there might never be!

But that means we haven’t reached the top of an adoption curve. If the state of the art never advanced, we could still get value just from catching up. That leaves me reluctant to wave away the underlying technology.

I’m in a building full of programmers

I'm also in software, and we've seen value in the following areas:

  1. Keeping juniors from completely stalling when unsupervised for a day or so (at the expense of going down a rabbit hole), like a beefed up search engine.
  2. Toy scripts to show management that we're "AI ready"
  3. Spinning up a lot of boilerplate on a Greenfield project that's similar to other pre-existing problems.

They seem to all be absolutely terrible for large legacy codebases. I've lost count of the number of times it spit out code in the wrong language entirely.

At this point, I don't think AI is a dead end, but I'm starting to think LLMs might be a blind alley for this particular application.

I'm in software too, and my productivity is boosted hugely by ChatGPT. However, there are caveats - I'm an experienced developer using an unfamiliar language (Rust), and my interactions consist of describing my problem, reading the code it generates, and then picking and choosing some of the ideas in the final code I write myself. My experience and judgement is not obsolete yet! If you just treat it as a personalized Stack Overflow, it's amazing.

On the other hand, in my personal time, I do use it to rapidly write one-off scripts for things like math problems and puzzles. If you don't need maintainable code, and the stakes aren't too high, it can be an extremely powerful tool that is much faster than any human. You can see the now-ruined Advent of Code leaderboards for evidence of that.

From my company's perspective, a lot of AI use is limited by policy. We aren't allowed to provide proprietary information. Company policy is to, "only enter things that we wouldn't mind going viral on the Internet." This really limits anything I could do with it. At most I can use it as a Mrs. Manners guide. The coders are able to use it more, which frees them up to play Madden for longer or attend more meetings with the Product Owner.

Would mind, I imagine. Some companies have (supposedly) locally self-contained, non-information-sharing LLMs for this reason.

Although it’d be funnier if it’s indeed “wouldn’t,” that your company wants to hoard employee-created memes for itself, lest they get plagiarized by LLMs.

Indeed. My employer forbids us from using common LLMs. That would involve giving them our proprietary information. We can only use pre-approved entirely internal LLMs. These are on our hardware.

Wouldn't is correct.

"Only enter information you wouldn't mind being leaked"

I edited it on him. He is. Correct

@OracleOutlook fixed it after my comment. It was originally a double-negative.

Sorry, you got the gist at least.

Ironically, I think our coders are least likely to use it. Integration is limited to a GPT wrapper, which is all well and good for people revising their emails, but not so much for serious programming. I suspect it’s an export compliance thing.

I know Copilot is used a bit here. They mostly use it to look up things and write tiny scripts.

My experience as a senior software engineer is that I am not worried about AI coming for my job any time soon. My impression (somewhat bolstered by the article) is that AI is most efficient when it is starting from scratch and runs into issues when attempting to integrate into existing workflows. I tell the AI to write unit tests and it fails to do all the mocking required because it doesn't really understand the code flow. I ask it to implement a feature or some flow and it hallucinates symbols that don't exist (enum values, object properties, etc). It will straight up try to lie to me about how certain language features work. It's best utilized where there is some very specific monotonous change I need to make across a variety of files. Even then it sometimes can't resist making a bunch of unrelated changes along the way. I believe that if you are a greenfield-ish startup writing a ton of boilerplate to get your app off the ground, AI is probably great. If you are a mature product that needs to make very targeted changes requiring domain knowledge AI is much less helpful.

I can believe people using AI for different things are having very different experiences and each reporting their impressions accurately.

It will straight up try to lie to me about how certain language features work.

I have also had this experience. Learning a new language and I couldn't tell whether my code was right or not. Claude Opus 4 spent an hour saying "oh sorry my previous code was wrong," and then give me slightly-rewritten copy of the same thing which had the same problem - it crashed on some function. Finally I suggested an environment setup issue and turned on search and it figured it out (a function it was using was deprecated two years ago). But the number of times it told me "oh you're right, I was wrong, here's a fix", and for the fix to not fix the issue was incredibly frustrating.

I had a similar loop with GPT-o3 a few weeks later, where it just made up academic references in my new (to me) sub-subfield. I swore at it, and had the chat banned for inappropriateness :)

This is awakening me to a sort of Gell-Mann amnesia effect: if the LLMs are this wrong and this stubborn in areas where I can test its output, where else is it wrong? Can I trust it in the rough analysis of a legal situation? In a summary of the literature on global warming? In pulling crime stats? I'm inclined to think it shouldn't be trusted for anything not either harmless or directly verifiable.

This is awakening me to a sort of Gell-Mann amnesia effect: if the LLMs are this wrong and this stubborn in areas where I can test its output, where else is it wrong? Can I trust it in the rough analysis of a legal situation? In a summary of the literature on global warming? In pulling crime stats? I'm inclined to think it shouldn't be trusted for anything not either harmless or directly verifiable.

Angela Collier has a video about "vibe physics" that talks about this in some detail. In the section I linked to she discusses how crackpot physics emails have changes since the advent of LLMs. People will add caveats about how they talked to this or that LLM about their theory and the LLM told them it made sense. She'll point out in reply how LLMs will just agree with whatever you say and tend to make stuff up. And then the people sending the email will... agree with her! They'll talk about how the LLM made simple mistakes when talking about the physics the emailer does understand. But obviously once the discussion has gotten to physics the emailer doesn't understand the LLM is much more intelligent and accurate! It turns out having the kind of meta-cognition to think "If this process produced incorrect outcomes for things I do understand, maybe it will also produce incorrect outcomes for things I don't understand" is basically a fucking super power.

I can believe people using AI for different things are having very different experiences and each reporting their impressions accurately.

Partially, but there is also a honeymoon phase and a phenomenon where people feel more productive but have mostly just shifted what they do, not increased their actual productivity.

Perhaps this is something that will pass with increased experience with the tools but it has not been my experience with the people i manage nor for my friends in similar managerial roles. It could of course be a combination of the above as well. Maybe the models just need to get a bit better and people need experience with those models. Who knows?

To me it seems highly specific where AI actually is a meaningful productivity booster for programming. It should be clear though that for these things it is very valuable.

I would be more worried for areas where things don't actually have to be "correct" (for quality or legal reasons), like visual art generation. Even there I imagine things will mostly affect the things liable to be (or already has been) outsourced.

Have you noticed a difference in quality of analysis of mature code-bases versus its ability to make changes/additions to them? The consensus on our team so far seems to be that its analysis is significantly better than its generation, though how much of that is the quality of the AI versus the quality of our prompting is rather up in the air.

I would agree that its analysis is often better. Even in cases where I ask it to solve a bug and it fails at doing that the description of the code and the problem often point me at a solution.

The big problem for now is some form of data validation. There are a lot of customer support jobs that can be 99.9% done by AI, but aren’t because of the tail risk that some combination of words will reveal the wrong customer’s information, will allow someone into the account without the right checks, etc, plus general reputational risk like the fact that countries and states are now accusing Facebook LLMs of flirting with minors or whatever. All the stuff that LLM red-teaming groups or ChatGPT jailbreak communities do, essentially. You can fire a bad employee as legal liability, but if its your LLM and the foundation model provider has a big fat liability disclaimer in its contract (which it will), you’re more fucked than you’d be if an employee had just gone rogue.

The eventual solution to this - as with self-driving cars - is to improve accuracy and consistency (by running things through multiple LLMs, including prompt security ones like those slowly coming online through Amazon bedrock and other platforms) until the risks are negligible and therefore insurance costs fall below the $50m a year a big corporation is paying for call centers.

But it will take a few more months, maybe a couple of years, sure.

5% success rate doesn't seem that bad, from the perspective of this being "the latest hyped technology." Did blockchain integration turn out this well? It's true that it's a lot less than you'd expect from the perspective of this being "the power of God created by man," but I don't think we've reached that point (already.)

There was not even a proposed vector for value with Blockchain most of the time. AI is very different.

Judging by the report, the main vector is “wouldn’t it be cool if we could achieve 10% more with the same personnel?” That’s more realistic than blockchain, but it’s not explosive. It’s not overcoming a longstanding cliff in the same way as, say, telecom.

Well. Maybe in fields like digital art and voice acting. It’s not a coincidence that those are seeing the most Luddics concerned citizens. But how many companies can directly turn voice synth into revenue?

how many companies can directly turn voice synth

I recently called a plumber that had an AI receptionist pretending to be human, so at least one.

Something about it was so off-putting, though, that I ended up calling somebody else.

But how many companies can directly turn voice synth into revenue?

Not directly, but I've started seeing (or rather hearing) tons of ads that use AI to generate the voice work for the commercial, so that's probably huge savings over hiring a voice actor and booking recording studio time. My favorite is when I hear a voice regularly used for memes being used for ads (I heard this voice in a radio ad about dealing with depression and just busted up laughing):

https://old.reddit.com/r/creepcast/comments/1gg2cjh/try_not_to_get_scared_scariest_stories/

It increasingly feels to me like the Tyler Cowen's of the world are right. That the impact will be large, huge even, but take a lot more time to play out then the boosters predict. It will take time, not only for the tech to improve, but for people and companies to learn how to best use it and for complementary infrastructure and skills to build up. The parallels to the personal computer or internet seem increasingly on point, especially the dot com era. People were, rightly, astounded by those. And, for all the jeering pets.com got, and all the (mostly valid!) reasons it wouldn't work, it ended up mostly just ahead of its time. I and everyone I know buy my pet food through an online subscription or auto-recurring purchase. In 20 years I expect AI will be omnipresent in white collar work.

Another thing that happened in the .com era was the telecom bubble - massive build-outs of broadband and other internet transmission lines across the country that would turn a profit any day now as the internet took off. The internet did not take off on pace and a number of companies lost their shirts, but the infrastructure was already there and turned out to be highly profitable a decade later. I suppose I'm not sure I understand AI enough to know how much continuing investment the models might need in the future, but you can see a world in which one or more major AI companies go bust, and that frees up their models now that the cost of capital is sunk into bankruptcy, and they go on to be widely used.

This makes sense to me. The impact of the spreadsheet has been huge, but it took a long time to settle in everywhere, and the accounting department still exists, even though the guys "running the numbers" don't anymore. There are still plenty of operational systems running on DOS or OS/2: if it isn't broken, don't fix it, and things take time to replace.

My expectation is that LLM/AI will boost productivity in the white collar sector and reduce pink collar jobs, but not totally upend the jobs market in those two sectors.

Why would it decrease pink collar work? Or do you mean the administrative overhang? But why would that hit pink collar stuff more than anything else?

Pink collar- low skill office work, named because it customarily has been done by women. Eg secretaries.

And secretaries and call centers and the like will be the jobs actually hardest hit.

Secretaries have barely existed for like 25 years at least and call centers aren't pink collar work. Pink collar work is overwhelmingly face to face service work, like nursing, teaching, childcare and social work.

I think they are thinking Project Manager or Customer Pleaser-type stuff. Which does tilt mostly female from what I have seen but isn't super automatable yet.

I think of AI a lot like I think of my experiences working with H1Bs. LLMs have no concept of truth, no actual work ethic, and basically make whatever mouth sounds get you to leave them alone. With enough supervision they can generate work product, but you can never exactly trust it. If you put them in charge of each other, things go completely off the rails exponentially.

The problem with LLMs will always be supervising them. I think in any area where the truth doesn't matter (fiction, art, chat, summaries of text to a lesser degree) LLMs might crush it. I think for many other automated tasks (data entry from disparate documents), their error rate will probably be in line with a human. But in terms of advanced knowledge work, I expect their output to always have a high variance, and it would be catastrophic for any company to integrate them into their workflow without even more knowledgeable and experienced humans in the loop.

Of course, you then run into the problem of not training up those humans because AI is doing the entry level jobs, or allowing them to go to seed because instead of keeping their skills sharp, they are doing nothing but double check AI output.

I understand your experience. There is something strange about certain versions H1B culture - zero pride in work, zero interest in getting something done in a final in production sense. It's like the only goal is just to generate more work - good, bad, repetitive, doesn't matter - so that the billable hours stay strong.

I just can't imagine the mentality of this. Zero personal pride, zero interest in personal development, hyper autist levels of emotional disinterest in other people.

I dunno if I'll get dinged for this but this is exactly what a low trust society looks like, they are importing their slacker ethic from their own society and clashing with the anglo-saxon way of doing things. There are exceptions, but on the average anything touched or god forbid managed by them turns to shit real fast. For a good example of what happens when the middle managment gets infiltarted look at the apocalypse that was Microsoft's Skype.

look at the apocalypse that was Microsoft's Skype.

I don't know any of the details on what went down with management. Can you share? I, of course, did see how a once "category leading" product turned into an unusable hunk of garbage.

I've had talks with some dev friends and they said they did everything in their power to get out of the Skype division once enough indians got into management and middle management positions, those managers would get nothing but indians under them, there was talk of utter retardation and constant slacking, they'll say sure yes, we'll get that done, then turn around and do nothing about it despite what they've been saying and nodding along for the last 5 minutes.

> make whatever mouth sounds get you to leave them alone
> zero pride in work
> zero interest in getting something done in a final in production sense

If your and @WhiningCoil's characterizations of H1Bs are accurate, that makes me think H1Bs are low-key based in being Punch Clock Villains (or should it be Heroes?)—in contrast to Westerners (especially Americans), who will "go beyond the extra mile" or whatever in grinding hard for some self-actualization and to make someone else rich. Although granted, they'd not sound like great coworkers.

It's like the only goal is just to generate more work - good, bad, repetitive, doesn't matter - so that the billable hours stay strong.

Generating billable hours would be a refreshing contrast to many a young Westerner in law/consulting/accounting/etc., who might work 60 hours in a given week but then shave it down to 40 (lest a partner or project/relationship manager tut-tut that he or she had to perform the Emotional Labor of shaving hours off the bill [or present the bill as is]). However, then said young Westerners in law/consulting/accounting/etc. will then fret as they might not have enough "utilization" for a given year due to their lack of billable hours during downtime between projects.

Chad H1B Billable Hour-Generator vs. Virgin Western Billable Hour-Shaver.

It's all fun and games until they've H1B'd critical infrastructure you didn't even realize existed. Oh the things I've seen...

It's one thing to laugh and joke about Meta or Microsoft enshittifying everything thanks to H1B value extraction. But you have no idea the enshittification that's coming in areas that desperately require a high trust, conscientious workforce.

Nothing you've said is wrong, it just reflects a different value prioritization and worldview.

When people use the phrase "work to make someone else richer", I very much enjoy YesChad.jpeg'ing that hard. I believe in a life of service. I want to do things in life that make other people better off. In a more economic yet abstract sense, I want to create more value and wealth than I consume.

I can hear @Sloot laughing as he pictures me as a doe-eyed whippersnapper who actually feels good about making the Boss more money. Well, maybe? What if the boss is smarter than me and can better allocate the resources of the company? What if I know the boss pretty well and also think he or she has a good set of moral principles as well?

One of the pitfalls of modern individualism is the idea that if you're "serving" or "working for" anyone else in a hierarchical arrangement, you're automatically being exploited. I can tell you for a fact that there are still thousands of Marines who loved the hell out of serving under General Mattis. Elon Musk's reality distortion field is so strong that he has ex employees on record stating he was pretty much abusive - and they were proud to take it! These are probably bad examples to bring up to defend my case, but my point remains.

Chad H1B Billable Hour-Generation, with his excellent ability to game the system will enjoy skating ahead while everyone else around him - fuck 'em - is being a naieve little wagecuck. But Chad H1B is also importing the, ahem, cultural peculiarities that don't look so good for the West when extrapolated across all of society (the UK and Canada would like to have a word in the alley -- which is where they spend most of their nights now).

Free riding is a problem and the answer isn't to applaud it.

I think it is probably worth thinking about how in the U.S. we are on the receiving end of a very successfully propaganda apparatus arguing that working hard and having pride in your job is stupid and pointless. Sure it's generally framed in something like "working for the man" or "capitalism sucks" but it is very successful, and past generations with similar views (ex: hippies) had quite a bit of pride in the endeavors they actually got up to which helped avoid this.

It's killing what makes America... America (and yes for the right excess immigration without cultural assimilation isn't helping).

I remember the days where you were more likely than not to find a helpful worker in a retail store. Those days are gone.

People have no pride in themselves or desire for excellence. It's sad.

In medicine it gets very gross because doctors still have that vibe but a lot of nurses do not and the ones who become NPs are often the worst. I've stayed late hundreds of times because I had the right skillset, I didn't want the night team to get swamped and so on. NPs just walk off.

I can hear @Sloot laughing as he pictures me as a doe-eyed whippersnapper who actually feels good about making the Boss more money.

I wasn’t picturing you at all as I was writing my comment reply, as your comment that I was responding to didn’t discuss your own work experiences, nor did I have a prior mental image on this front.

One of the pitfalls of modern individualism is the idea that if you're "serving" or "working for" anyone else in a hierarchical arrangement, you're automatically being exploited.

I understand you’re speaking generally and not necessarily attributing this to me, but I wouldn’t say so (that working under hierarchal arrangement = must be exploited). If anything, I’d disagree with the sentiment.

Free riding is a problem and the answer isn't to applaud it.

Nor should pathological altruism—to continuously cooperate in the face of defection—be applauded. All else equal, I loathe freeriders (to say the least).

Whether this is a corporate allegory for immigration/wealth transfers or vice versa (or both) can remain to be seen. As to immigration in general (illegal or legal, I’m somewhat indifferent to The H1B Question), an obvious solution could be to limit the arrival of welfare-state freeriders, and/or those likely to have children who are welfare-state freeriders. Or to domestically, limit (or at least not subsidize) the proliferation of population segments likely to be such freeriders.

Personal anecdote, we had an order from the higher ups that we must use LLMs, and that they will be tracking how often we use them. I asked Windsurf (which they provided me with a license for) and it generated C# code with the following issues (amongst many others):

  • It wrapped all uses of HttpClient in a using block. Despite HttpClient implementing IDisposable, you aren't actually supposed to dispose it because this will lead to socket exhaustion when under load
  • All DB queries it generated were called synchronously. Similarly to the socket exhaustion issue above, this will lead to thread exhaustion (and generally lower capacity for simultaneous connections, throughput, etc.). On the bright side, at least it parameterized them all.

I started generating crap to please whatever tracking mechanisms they are using, but have completely ignored the output.

Did you tell it not to do that in the rules?

If I have to tell it to avoid common footguns then it's faster to just write it myself

Personal anecdote, we had an order from the higher ups that we must use LLMs, and that they will be tracking how often we use them.

In Europe the push for AI is absolutely bonkers. On top of stories like yours, I've seen academics shill like they were sales reps for their field to adopt it, the public sector incentivizing it's workers to dip their toe in the water and start using them, etc. There was an entire infrastructure of workshop providers ready to go within weeks of when GPT-3 was announced, and it was aimed at some of the most calcified sectors of society.

The mundane theory I have is that this is (another one of) Europe's ill-conceived attempt(s) at overtaking the US in terms of innovation. The conspiracy theory is that they really really want to automate surveillance ASAP. Quite possibly it's both, but either way someone high up had a bright idea, and they'll be damned if they don't see it through.

Also they're just aware we don't have the personnel to make Europe work the old way any more. Even the politicians are increasingly aware that mass immigration isn't a long-term solution, though they can't wean themselves off it until something takes its place.

we had an order from the higher ups that we must use LLMs, and that they will be tracking how often we use them

And absolute dipshittery like this is why 95% of LLM projects (whatever that actually means) fail, not because LLMs are stupid, but because the people using them are

Recently at my company, all job level descriptions were updated to include use of AI, and all promotions now have to include a bit about how you use AI.

I'm definitely on the bullish side, but it's quite ridiculous. I just have a script that I run every morning to burn up a bunch of tokens so I meet whatever metric they're tracking. (I did use AI to write that script!)

(I did use AI to write that script!)

Self licking ice cream cone, electric boogaloo. ( the previous one was DEI commitments/forced personal pronouns in e-mail signatures.)

I'm definitely on the bullish side, but it's quite ridiculous. I just have a script that I run every morning to burn up a bunch of tokens so I meet whatever metric they're tracking. (I did use AI to write that script!)

Finally, a goal that can Goodhart itself.

Personal anecdote, we had an order from the higher ups that we must use LLMs, and that they will be tracking how often we use them

You're not the first person to tell me that at various companies. Is there some kind of misaligned incentive there, like a KPI from higher up, or a kickback scheme? Or are they true believers.

Often AI deals require a promise to spend X tokens over Y time period. It’s like promises to spend a certain amount of money on a company’s services without specifying the services to be bought. So if the buyer is under the spend count, they encourage people to use more tokens.

Wait, what happens if you don't hit the minimum? Is there some kind of penalty that's worse than just burning tokens to hit the minimum?

Usually, the buyer has to eat the difference. The seller gets to collect the money.

On the individual level, if you're the director who wanted to put a feather in his cap about how he's a forward-thinker pushing the company forward, you end up with egg on your face. So, before it happens, you mandate that all employees have to use AI. Then, once that goal of token consumption is hit, you declare victory and get a sweet bonus before jumping ship to another company to help modernize their processes by integrating AI based on your success in the first company.

Depends on the contract, I guess. But at minimum you’d have to pay the difference. So if you’ve already sunk the money and your devs aren’t even bothering to use something you’ve spent a cool few million on… well, that’s a pretty natural time for a desperate VP to start the mandates.

1st prize: Cadillac 2nd prize: steak knives 3rd prize: you're fired.

Since most (successful?) adopters get their tools by licensing from GPT or Claude, I would guess it’s an attempt to show return on that investment.

The popular interpretation is of course something about stupid managers following a hype train, but I imagine there is a more charitable explanation along the lines that AI adoption (/workforce replacement) can be expected to result in an increase in productivity and profits once an initial trough of decreased productivity is overcome by building experience and figuring out the best way to integrate AI. The sort of long-term planning that requires going against local incentive gradients (in this case, forcing workers to use AI even if it is detrimental to their productivity for now) is exactly what upper management is there for; if workers/subdivisions doing what is currently optimal were always a winning strategy, management could easily be replaced by a simple profit-based incentive scheme.

If your charitable interpretation is correct, what kind of timescale would you predict before you hit break-even?

Assuming AI use is kept up (whether by compulsion or voluntarily), 1.5-2.5 years (70% confidence interval), maybe?

Investors want to hear that the company is taking advantage of AI technology.

I don't think it's a kickback thing. I work at a megacorp (over 10k employees worldwide) and the focus on AI came all the way from the C suite

Yeah, it screams KPI to me.

We’re techy enough that our investors want to see it, so by God, we’re going to pay someone for their model. Then you’ve got to show that you’re actually using it.