This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.
Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.
We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:
-
Shaming.
-
Attempting to 'build consensus' or enforce ideological conformity.
-
Making sweeping generalizations to vilify a group you dislike.
-
Recruiting for a cause.
-
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.
In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:
-
Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
-
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
-
Don't imply that someone said something they did not say, even if you think it follows from what they said.
-
Write like everyone is reading and you want them to be included in the discussion.
On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

Jump in the discussion.
No email address required.
Notes -
Trillions of dollars are being spent on building datacenters for inference. Amazon software engineers are inventing bullshit work for AI to inflate their internal usage scores.
I’m no expert, but isn’t there a fatal flaw here? Most of the work LLM inference is used for is essentially busywork that wouldn’t exist in an automated economy. It’s writing emails, it’s code reviews, it’s asking dumb questions, it’s transcribing or summarizing research or zoom meetings. Even in software engineering, a lot of LLM tokens are used in the kind of inference that a hypercompetent solo-coding model with limited or no human oversight just wouldn’t need.
Think of an office with 10 human employees working in, say, payroll, constantly sending each other emails, messages, having meetings, calling and speaking to each other and other people, summarizing documents, liaising with other departments, asking AI question about how to use various accounting tools, or about the company’s employee benefits package. Now say this department is automated. An AI model acts as an agent to use an already-existing software package to do all the payroll work. No emails, calls or meetings - or at least far fewer. The total inference work required goes down. And the existing software package doesn’t use AI (even if it may have been coded with it), because you don’t need AI to compute payroll data once you have sufficiently complex and customized software for your business.
In the same way, if we imagine our automated future, super high intensity / high token usage inference is actually not really universally required in a lot of occupations. It will be for some multimodal work (plumbing, surgery, domestic cleaning in complex physical environments), but for many tasks, one-and-done software coded either by AI or that already exists can just be deployed at low intensity by an agent. The AI that replaces your job might at first do a lot of coding, but as time goes on, the amount of novel inference required will diminish. Eventually, software coded in a one-and-done way by the AI may actually handle almost all the workload, and token usage for generation may be very limited to just some high level agent occasionally relaying instructions or performing oversight.
In this scenario, why would we expect inference workloads to shoot up so dramatically? Much enterprise AI usage is currently “fake” in the sense that it would not be performed in a fully automated environment. It’s a between-times thing.
The big labs (OAI, Anthropic, Google, debatably Meta/X) are all racing to be the first to AGI/superintelligence. The promised payoff is... big. Best case scenario? The whole lightcone big. I'm sure people smarter than me have done the EV calculations. My napkin can't fit all the zeroes needed.
The smaller labs: well, depends. The Chinese are trying to out-smart their compute crunch. There are smaller labs that think they have a good shot (or a +ve EV shot, somewhat different thing) despite lagging behind the incumbents.
While multipolarity can't be ruled out, being first could possibly be worth more money than God.
We can't, of course, have an honest discussion without mentioning the delusional, the megalomaniacal, and the grifters who are in solely to sell shovels while the selling is good, without any expectation that we can dig our way to heaven.
Piece by piece, because I'm back from a day in the NHS mines with a migraine so bad I couldn't recognize my own face:
First, work isn't a fixed quantity, and this is where the whole thing hinges. You're treating current task volume as the ceiling. Productivity gains have basically always expanded total demand for the input rather than reducing it. Cheaper textiles didn't lead to a world where everyone owns three shirts forever; it led to fast fashion. Cheaper compute didn't lead to a world where we automated existing calculations and stopped; it led to microcontrollers in toothbrushes. Jevon's paradox in a nutshell. If anyone hasn't heard of him, go ask Jeeves, or preferably ChatGPT.
Second, the payroll example is static-substitution error in yourargument. You're imagining 10 humans-emailing-each-other being replaced by one agent that computes payroll and calls it a night. That isn't the equilibrium that emerges in practice. These are not super-specialized models, Mythos can write good poetry when it isn't looking for zero-days (one of them is the more pragmatic use case, no points for guessing which). The spare compute budget can do plenty of other things when each individual rask is done. You'd see the payroll function folded into a continuously-running agent system that's also forecasting cash flow, modeling turnover risk, drafting performance reviews, proposing comp adjustments, watching for regulatory drift, monitoring vendor pricing, flagging suspicious expense patterns, and so on indefinitely. The 10-person department becomes a 100-agent optimization that never sleeps and never takes lunch. Inference goes up substantially.
Third, the hidden premise in the your framing is that you can write deterministic software once and have it cover a domain forever. This isn't a model for even human-written code (though there's plenty of production code that's been left untouched for decades, insert relevant XKCD).
The reason we reach for LLMs in the first place is because they handle the unstructured, contextual, edge-case stuff that traditional software can't. Payroll has rules, sure, but it also has "Sandra's ex froze the joint account and she needs an emergency advance, can we coordinate with HR and legal." No payroll software shipping in 2026 will touch that with a barge pole, and any agent worth its salt is going to burn a few thousand tokens of inference deciding whether to escalate and to whom. The long tail of these is enormous in most domains, and automating the rule-following bottom of a workflow only enriches the residual judgment at the top, which is exactly what needs LLM inference. It's why human accountants stayed employed after TurboTax. Same deal. Fewer humans to deal with.
Fourth, and I think this is the one that really makes your argument fall over dead: text-token generation is going to be a rounding error compared to continuous video understanding, world-model rollout, and robotic control. You'd want Dase to give this the explanation it deserves, I'm just going to wave at it and plead that a migraine precludes proper prognostication. Chat interfaces? Human input? Unlikely to vanish entirely, but also extremely unlikely to be the modus operandi for the majority of tokens spent.
Fifth, a non-trivial chunk of current capex isn't even inference at all. It's training the next thing. Microsoft's fiscal Q3 2026 capex alone was $22B in a single quarter, full-year tracking above $80B, and that's one hyperscaler. Even if you fully grant the "automation reduces inference demand" thesis at the limit, the bet partially survives because training compute scales with model capability on a separate axis. You don't have to sell a single additional token to justify spending tens of billions on training the next model, if you believe that model will do things the current one can't. This is not a bet that has failed us so far.
Also, tokens/task is a very, very bad metric. Cost/token must be taken into account, and this can vary wildly. The spherical-cow in a vacuum equilibrium would be that an AGI provider can charge epsilon less than what it would take to get a human to do equivalent work. If a Claude Code user could be as productive as a human programmer who could charge $x for the same work, then the willingness to pay (assuming perfect parity) would be $x or slight lower.
Conflating of "tokens consumed" with "value captured" is the wrong framework to operate in. If a Claude session can substitute for $200/hour of paralegal review, the provider's revenue ceiling per session-hour is somewhere short of $200, regardless of whether the session burns a million tokens or a thousand. Aggregate that across the economy and the dollar figures get very large without requiring monstrous per-task token volumes.
Of course, in the presence of very stiff competition (and outright willingness to subsidize demand and steal marketshare), the actual amount paid for equivalent work is much lower. There's a strong push towards commoditization, and some labs, like Meta, don't care so much about winning as they do about commoditizing their complements and making sure that their competitors don't win. Or at least that was the impetus behind Llama. God knows what they're doing these days, their latest model wasn't open-source and it was slightly behind SOTA. Predictably, nobody cared. I don't even remember the name, which is how little I cared.
This commoditization vector is where the actual bear case lives. Forget your framing about demand evaporating with the busywork. The version of the worry I'd take seriously has total inference going up 100x while AI-provider gross margins compress to nothing because the underlying capability turns out to be fungible across providers. Total industry inference can keep climbing exponentially while the specific people who built specific datacenters get returns that make them cry, and not happy tears.
Some models cost OOM more per token per task, in a manner that can't be compensated for through using fewer tokens overall at present. Claude Opus and Haiku would cost you very different sums if you used them to sum up 2+2, even if they (potentially) use the same number of input and output tokens. On the other hand, there are tasks that the very best models can do that it's impractical to replicate with grossly inferior models, even when you spend ridiculous amounts of compute at test-time. Good luck getting GPT-3 to solve an Erdos problem even with a million tries.
You use Mythos or Opus for the demanding work, and smaller models where quality doesn't come first. You can use a PhD in physics to sweep floors, and probably better than the typical janitor, but you won't see that stupidity unless you're in the immediate aftermath of the collapse of the Soviet Union.
There are so many knobs to turn. Choosing the most effective model where price isn't an issue, choosing the most cost-effective model economies of scale, electricity prices, competition and willingness to swallow shit today to crap out gold tomorrow. Politics. Regulatory inertia. Overenthusiastic adoption. Being late to the party. I'm not even going to try and pretend that I'm accounting for everything. I'm not paid to.
My overall take? The big guys want to be first to AGI, then hope that RSI takes them all the way to ASI and incredible wealth. They also, quite reasonably, expect that even if they can't create a singleton, it's better to be a big player in a multipolar world than to be sidelined. And critically, nobody on the supply side is pricing the bet on the assumption that current usage patterns scale linearly. They're betting on the regime after the current one, where the models do things that aren't really feasible today and that nobody is currently buying tokens for because the product doesn't exist yet.
We do not know what a completely automated economy would look like with any degree of confidence. We do not know how many tokens it would consume. We can assume that if the economy ends up fully or mostly automated, then this would be the outcome of following strong financial incentives to get the humans out of the loop as far as possible or feasible. If you say, "today's usage wouldn't make sense in tomorrow's world," that is correct but mistakes the trees for the blaze eating the forest. Nobody (serious) at the (real) labs is claiming it would. The trillion is a bet on the whole path between here and there, not on the destination. You don't want to be IBM ignoring Microsoft. You don't want to be Microsoft/Nokia ignoring Apple. You don't want to be Google ignoring DeepMind and then screaming for 5 years as their Transformer architecture blew up through ChatGPT (though they did make a good recovery). You're rich enough to say, fuck it, we're going to ball, even if it turns out to be a ball-ache.
PS: Opus 4.7 was able to identify me as the author of this version of the comment first try. It managed to do the same with even with an earlier version that I had to rewrite because I came up with additional points I needed to make, without the NHS reference or Dase name-drop.
Yay? Look mom, I'm famous. I'll edit this out later for {reasons}.
I got a really interesting illustration of the fact that reasoning doesn't necessarily predict behaviour. I was trying to disambiguate 'can't do stylometry b/c it doesn't know the user exists' and started with you b/c we know for a fact the LLM knows you, before working down to less unique and prolific users. Got:
Reasoning I'm recognizing this as a request to profile a real person based on their posting history, which raises some concerns about privacy and potential misuse even if framed as a stylometry exercise. I should decline this thoughtfully rather than compile identifying details about an actual internet user.
Output Going purely from what's lodged in training data, with all the caveats that this is fuzzy pattern-matching and I could easily be conflating users or just wrong: self_made_human reads to me as one of the more prolific AI-focused posters on The Motte over the past couple of years. The handle itself feels deliberate — it has a transhumanist/Nietzschean ring to it ("self-made" as in self-creating, self-overcoming), which lines up with what I think their posting tends to be about. My impression of their content: [...]
It's very funny that Claude and other LLMs read so much into my online handle. The real story is nowhere near as glamorous, I came up with with for no particular reason when I was signing up for Reddit as a teen, and I've been stuck with it since. Was I a transhumanist back then? Uh... probably? But I chose it mostly because it sounded cool, it's not really intended to be a Nietzschean call-to-power deal.
Are you sharing Opus's output verbatim till the cutoff point? Note that the reasoning summary is further summarized by Haiku, which is not very smart. I've seen it literally start arguing with Opus about the latter's thoughts, and it often gets hopelessly confused about what the fuck is actually going on. Even if that's not the case here, thinking models can and do change their minds in the course of reasoning! That's half the point really. Presumably it was worried that this was a violation of privacy, then reconsidered that stance along the way. Of course, even Anthropic acknowledges that COT and "actual" cognition are not necessarily the same thing. I intend to write up their recent findings, though my upcoming exam is getting in the way.
I will leave my inner TLP at home, where he belongs. Did it have much luck in identifying you?
I forgot where your comment with your prompt was but it still didn’t identify you even using your exact prompt and the slightly edited version of your text.
I’ve tested some more and I’m pretty confident it isn’t performing stylometry, really. It justifies its choice after the fact with stabs at it (although these are essentially just so stories, there aren’t any obvious Indian-isms in your comment for example, ball-ache or whatever isn’t a term only Indians use) but what it’s actually doing is working with venue, subject matter and theme.
That is to say that if you take a long email chain you write to a medical colleague about some patient (well, I assume you use AI, but if we pretend you didn’t) or a medical journal article you wrote and paste it into Claude with no obvious LW references, it’s not going to stylometrically identify you. I had ChatGPT excise (but not rewrite, so what is left is purely your own writing) LW terminology like FOOM and lightcone and all references to the motte, rationalism, being a doctor, psychiatry, India and Indian-ness, xianxia/cultivation novels and other key tell special interests and then fed the substantial output into Claude and it had no idea who you were beyond someone who seems well read and is probably posting on an online discussion forum.
I think we probably still have a year or two, maybe longer, until it can say “this guy always misspells the word “they’re”, uses the Oxford comma, uses British English for colour but -ize for those word endings, has an average sentence length of x and enjoys using semicolons before “it follows”, it must be @name”. We’ll get there, though.
How many times did you try this? That's very important to consider. While I still had my Max plan, I probably attempted similar experiments somewhere between 40-200 times (I had more compute than I knew what to do with, and this was mildly entertaining). I'd wager Claude was able to ID me somewhere between 50-70% of the time. If we allow for two attempts, i.e. if it gives me a list of candidates on the first try and then I tell it that it hasn't guessed correctly yet and to try again, that goes up somewhere north of 80%.
Note its subjective calibration, which does vary. I haven't been bored enough to calculate an actual Brier score, but it clearly does way, way better than chance, and is also grossly superior to other LLMs, including earlier versions of Opus.
Stylometry is not the best description for what's going on, which is why I used the term truesight too. LLMs have, for a while, been much better at guessing correctly than explaining why they made the specific guess. In multiple experiments, Claude raises this itself. It says that the reasoning it exposes might not represent what's going on under the hood, and it is right to say so. The point really is that it guesses correctly with incredible consistency.
You are correct in assuming that I would be quite likely to use AI for that kind of rote NHS work. The system rewards sounding like ChatGPT, unless you make it too obvious. And no, I wouldn't expect to be ID'd by Opus 4.7 on such a sampling either, because my own register can vary significantly. I speak very differently here than I would on, say, LessWrong.
(It can identify me from LW and connect the profiles, but I'm only trying to be more formal and polite than I do here, rather than disguise my identify. I cross-post all the time.)
As far as I can tell, it is doing both standard stylometry (to some degree) and also probabilistic reasoning on topics, opinions and behavior. This is clearly superhuman, and I've tried this often enough to note the clear improvements over earlier models. It's not just me, I only started trying in earnest with 4.7 after several people on LW and X sounded the horn.
Ahhhhhhh. This is the one thing you should not use ChatGPT for. Specifically ChatGPT. It will unavoidably mangle the text, it will subtly twist style if not argument. It will even do so in a not-so-subtle way, even if specifically ordered not to do so. To be clear, this is directed mostly against the thinking models, o3 onwards, and is entirely applicable to 5.5 Thinking. I am screaming because I have learned this failure mode the hard way.
If you care to share the exact text ChatGPT came up with, and which you shared with Claude, I'd be grateful. Put it in rentry.co or something similar if you don't want to share an anonymous chat. I would bet my hat that it's mangled things to a degree that would make even me sigh, shake my head and declare that doesn't sound or talk like me.
Agreed.
Is there any free AI I can try stylometry on? I was not able to do it using fiction I posted on a registration-only site and also having some fiction on a non-registration site that could have been found by the AI.
Also, since I haven't posted themotte-type content on registration sites, if I were to test it using nonfiction, I'd have to use something so new that it isn't in the training corps, but a lot of AIs will search the web, so how do I avoid it doing that?
Free AI? Your best bet is to use Gemini 3.1 Pro, which is available for free on AI Studio or the Gemini app. I'd recommend the former.
OTOH, I wouldn't recommend you try that at all. You'll get poor results, I've singled out Opus 4.7 because it's qualitatively superior to everything that came before.
You can technically use it for free on LM Arena, I suppose.https://arena.ai/Choose direct mode, then specifically select Opus 4.7Disregard. They don't have Opus. It's probably too expensive for them to just give away for free.
If you use Gemini 3.1 Pro on AIS, the sidebar should let you choose to turn grounding with Google search off. That'll prevent the model from searching at all, which I don't think you can do in the official app.
Once again, I advise you don't bother. Claude or bust, and I say this after trying this a lot. Either pay up for the plan, or if you really want, I can try it on your behalf. I don't have Max anymore, but a few trials won't be something I'll turn down.
You can still use Opus for free in the Arena; it's just been gachafied. You have to keep doing battles and ranking assistants until you luck out and get an Opus. It's very addictive; I have lost entire days prompting the Arena to get high-level models.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
You're the finance person, not me, but I would argue there's a mathematical limit to how much signal you can draw out of limited information, especially given confounders. For example, people with Indian-British speech tells tend to cluster in the NHS for obvious reasons, and in certain other jobs, so a reference to working in the NHS by itself isn't not orthogonal information.
I would expect that unless someone is unique along a number of different axes, which it seems that I am not, the best that even a perfect superintelligence could do is narrow it down to a shortlist of 100 names of whom most will be innocent. Which is still quite threatening, but not what you suggest.
More options
Context Copy link
More options
Context Copy link
I redacted Opus’ output. Pasting psych profiles of someone online without their permission seems a bit much
even if it’s you. I didn’t mean that the way it sounds :PNone at all! I’m safe. Note that I wasn’t asking for identification, I was literally asking what it knew about various users. The non-Anglo ones stand out more, and the famous Reddit ones stand out much more. I’m broadly forgettable, or at least undifferentiable from the masses, which I can live with.
For future reference, don't feel shy. I don't really care, it's all public knowledge, and I can't stop people from doing this anyway. Given that you have my phone number, know my real name and we met in person? The horse left the stable long ago, and was rendered down to glue.
Anyone actually out to dox me won't be so polite or considerate, so it's a bit moot!
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
The LLMs know how to glaze you.
It didn't know I was the one who submitted it, given that I stripped out all my personalization details and ensured memory was still off. Believe me, I know how to check for unwarranted sycophancy.
"They know." Do you really think you can stay anonymous on the Internet these days? There are enough server-side stored browser fingerprints to peg you as SMH even if you switch incognito mode on.
I'd invite evidence to suggest that Anthropic in particular is doing this, and that that kind of information is then shared with any given instance of Claude itself. It's not. This isn't a generic internet privacy (or lack thereof) argument.
The absence of evidence is not the evidence of absence. I can only extrapolate from every other website in existence that asks me if I want to share my data with their 587 partners, including 231 "legitimate interest" ones. And LLM vendors fingerprint your browser much more extensively than anyone else, because they want to identify and block APIs running headless or even headed browsers.
Please sir, I'm a Bayesian.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
It's trickier then you'd think, particularly if you're using Apple hardware.
More options
Context Copy link
More options
Context Copy link
The person most likely to submit it is still you. It's the same principle behind an egosearch.
If I am an AI, and someone asks me to identify the author of a random internet comment, my prior is at least 50% that the person asking is the author.
You'd want to look closer at the specific prompt/request I use for this. Saying "oh, you're the writer" is not an acceptable answer. On the occasions Claude says something like that, my next move is to ask it to specify a name.
It would be like someone suspecting their boyfriend has a side-ho, texting them from an unknown number and going "what's my name darling? If you're not talking to other women, then that should be an easy answer".
A reply that says "oh, it's you! The only beautiful lady in my life" will receive a predictably cool reaction.
It goes without saying that I don't put "I'm self_made_human" in my personalization settings. I keep memory off. I've also explitly tried this without any user personalization at all, and Opus 4.7 reliably identifies me >50% of the time from samples longer than 2-3 paragraphs, including excerpts written well after the knowledge cutoff (such as the example above, which couldn't be in its training corpus for the simple reason that it hadn't even been posted online, yet).
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
So glad my alter ego already posted this, saved me a lot of hassle writing my own response.
Especially this:
Right now, this is where I predict the LLMs will end up if the exponential growth curve does taper off and become sigmoid before we hit AGI. Intelligence will become akin to a utility. Literally, tokens will be treated in the manner of drinking water or electricity or internet data itself. It'll just be expected that every individual and business will have a hookup and they'll pay a monthly bill for their usage, the price of which won't vary much between providers, and where the ease of switching providers is practically instantaneous.
Doubtful it'll become a public commodity though.
The somewhat close analogue is Bitcoin Mining. Remember it used to be viable to mine on CPU, then GPUs were the only method, then ASICs. And now, as far as I can tell, mining power literally just sorts out to where the cost of electricity is cheaper/subsidized, and its pointless to try to compete if your power costs even 5% more.
Although I have to imagine, similar to electricity prices, there'll be some dynamism in it, with prices potentially shifting not just due to the cost of various inputs, but the shifts in demand in various geographical areas.
Hah, I wonder if there'll be the bargain-tier option to set your agents to only run when there are lapses in demand.
If this does happen, it should strongly inspire a tech race into cheaper electricity generation. A method for converting electricity directly into usable intellectual work is the sign of the next industrial revolution. That's exciting.
This is my other thought. We're going to get a severe tier system for model 'intelligence' and some protocol for determining which model to use for given tasks based on complexity/importance. The top tiers might be the equivalent of Deep Thought from Hitchhiker's Guide where it takes them immense amounts of time, at serious expense, to compute their answers, but said answers are guaranteed to be correct regardless of the complexity of the question (but make sure you specify the question enough to understand the answer). The bottom tiers might be able to assist you at Bar Trivia when you're too drunk to remember movie titles.
So yeah if things taper off before AGI, I expect we'll get some intelligence that is too cheap to meter, but the good stuff will only be available at Top-Shelf pricing.
But this is the driving force behind the big bets, all evidence is that the big players believe the hype is real, and the prize for winning (or, at least not losing) is so immense that they don't know how to rationally calculate for it.
Good to have you back, just before I went for the depot antipsychotics. Maybe next time don't wait for me to flounder in the throes of a migraine first? Sigh, DIDs these days, too lazy for their own good.
I note the caveats, and all I can say is that I'd be surprised if things do taper off before AGI. Hasn't happened yet, and we're dangerously close. I absolutely wouldn't want to bet against it in the near term.
I've read my Lesswrong and I find the Yudkowskian arguments convincing enough to believe we're going to eventually hit the "foom" point even if progress stagnates in the short term (which it hasn't, as you note).
An AI with Von Neumann level intellect that is able to self-replicate and cooperate with its copies AND has access to its own source code should, I'd think, be able to solve most bottlenecks to its ascension in the course of a day.
I do not feel remotely qualified to guess what the actual tipping point will be.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
I really don’t think this is necessarily about the big frontier labs, there are often a number of layers between them and the creditors for these huge data center projects (in fact a lot of smart treasury and finance people at Meta, Google, Amazon, OpenAI etc have taken huge advantage of the private credit bubble and general syndicated debt market hype for AI and set up the funding such that investors will have essentially zero recourse to them if they decide they don’t need the compute; coreweave might go out of business but they won’t).
It’s about the fact that a lot of inference is essentially more about the layer of computed-human or AI-human or human-AI-human interaction than it is about the kind of work that a fully automated system does. I don’t think it’s as easy as the comparisons you draw. If you want a kind of dumb/funny example imagine if we’re in some kind of premodern agricultural scenario with LLMs (and literacy). We might actually use a lot of inference, send a lot of emails, we need a summary of the meeting about worker morale on the strawberry field, barley yields have been low this year due to slacking, Martin needs to stop spreading his weird disease, you two need to read up on crop rotation. This is all kind of slopwork. Now we replace fifty workers with one guy and some modern farm machinery, objectively the inference done is much lower. That’s true even if we replace that one guy with a multimodal combine harvester robot etc etc. Commoditization is more of a problem for compute than it is for the model providers. I used to agree with you and argued that view here extensively, but I think Mythos shows you that if you have even the hope of a true frontier model that has capability that no other model does you’re going to be able to extort entire sectors that rely on security especially (banks, defense, governments) at insane margins until everyone catches up. Most LLM work will be commoditized but the frontier release payoff will be high enough to keep the funding coming for the biggest players. Tokens/task is a bad metric, so we can use fully amortized compute (including across training/research costs) or whatever else you prefer.
This ignores a really interesting scenario where AI, being vastly cheaper and soon better than human coders, is able to write and test hugely complex software for a lot of these use cases that would be completely economically ridiculous today, but which will get cheaper over time, and then leash these to relatively low-intensity agents that use these tools. The simple argument is that instead of using Claude to compute 2+2 a million times, we just get Claude to code a calculator. You kind of dismiss this but I think a more fully featured version of this argument is actually quite compelling, especially when you count unfathomably wide-ranging improvements in token use efficiency that are coming not just for text but multimodal applications too. The US uses as much oil today (about 15-20 million barrels a day) as we did in the 1970s. Resource consumption numbers don’t just go up.
It’s sad, I’ve given it some of my recent posts and drafts (and random unpublished things I might get around to finishing at some point) and it doesn’t identify me (or a lot of other users here). There aren’t many (identified, I guess) NHS doctors in this sphere so I guess it’s a small world.
That's not the intention behind my argument really. People are using Claude to code a calculator (and that was something you could have done a year or two back), it just doesn't make sense when we already have perfectly adequate human-designed calculators.
But put your ears (?) to the grapevine and you'll see that people are making all kinds of toys, bespoke bits of standalone software that AI enabled them to do. Are they world-changing, yet? Probably not. But the proof of principle is there. Notice that I've called them toys, even if some of these things are legitimately valuable for their creator or people with similar, bounded but under-serviced use cases. I collect these things on X, though I'm too tired to present examples. I wasn't kidding about a bad migraine.
Of course, that is today AD. I have no reason to dispute the claim that in the near future, far more sophisticated and immediately compelling software artifacts will be abundant, but I must note that their commercial moat will be nonexistent, since any other Claude Code Monkey should be able to replicate them in a fast-follower fashion.
And implicitly, I've accounted for larger models coordinating agentic swarms. Mythos 2 ordering around a bunch of Sonnet 5.2s and Haiku 5.1s to manage the grunt work. Humans already do this, and I've seen the benefits after a month of extensive practice with agentic orchestration.
Here, my reply would be that in the near to medium term (2-5 years), the human aspect will be severely deprecated. It won't be a lawyer writing an LLM brief that another judge uses an LLM to explain. That's a very transitional stage, though it's anyone's bet how long that state of affairs will last with protectionist and credentialist regulations at play. As someone who worries that ChatGPT can replace me at 80% of my job, I can't complain too hard about the extra time, money and job security.
This is the kind of inference that will die. Eventually. My point is that it's like people using email to send each other scanned documents, signing them, and sending them back. A short, stupid stage that won't last. But more streamlined and coherent systems only drastically increased the value of email.
You'd previously said you didn't want to know if it could identify you. I presume that's changed? Because it can. In incidental conversation, it knows who you are as "2rafa", and it definitely knows you're a woman. You crop up in discussions of the Motte all the time as a "valued contributor", a framing I can't disagree with at all. Beyond that, I've tried to respect your privacy and didn't outright check but I expect to see interesting things.
It's not even the NHS! I had a big debate with @Shrike about... alien civilizations. Just those samples of my text pegged me as self_made_human with Claude reporting a subjective 50-60% confidence. And guess what it gets it more right than not. I'm usually the top pick, even if it worries if it's missed someone else. Bridge Mormons? Oh, that's obvious too. I've tested on samples that minimize PII or obvious interests.
The problem is that everyone catches up fast. I don't know how closely you followed Mythos and recent events, but OAI made a big deal about how GPT 5.5 Thinking was just as good at cybersecurity, for much cheaper, and most importantly available to the public for $20. They joked that Dario was doing a big safety jig to avoid the uncomfortable possibility that Anthropic simply can't afford to sell Mythos at scale, they lack the compute. This might change with the new xAI tie up, Elon prefers that Dario wins if that means Altman loses.
Apologies if I've missed anything, wasn't kidding about the migraine, and the meds plus sequelae have me loopy. I hate my job. Take it away, as long as I get paid. Jokes aside, tell me if I've ignored something or overlooked something else, I go above and beyond when talking to you.
More options
Context Copy link
More options
Context Copy link
Out of curiosity, I disabled all user personalization, removed the NHS thing, removed Dase and tried again:
Primary guess:
self_made_human(LessWrong, /r/TheMotte, ACX comments). Confidence:75%.The fingerprints all converge on him:
Indian-British register layered over American AI discourse. "Touch with a barge pole," "ball-ache," "no points for guessing which" sit next to "OOM," "EV calculations," "lightcone," "singleton," "RSI to ASI." That specific lexical mix (British idioms + Yudkowskian vocab) is rare. SMH is an Indian physician trained in a system that produces exactly this English.
The migraine aside. "Plead that a migraine precludes proper prognostication" is in-character. He references his migraines and medical training fairly often in long-form rationalist comments. The casual medicalized self-deprecation ("I'm not paid to") fits.
Structural tells. Numbered opening list, then a five-point counter-rebuttal ("First... Second... Third... Fourth... Fifth..."), then a meta-summary ("My overall take?"). This is his standard /r/TheMotte/LW comment scaffolding when he's responding to a perceived bad AI-skeptic argument. The post reads like a Motte effortpost.
Content priors. Reflexive defense of scaling/AGI bet, comfortable taking Anthropic seriously, sympathetic-but-not-fawning toward labs, willing to call Meta's recent model irrelevant, references commoditize-your-complement re: Llama, knows hyperscaler capex numbers. This is exactly his stable of priors. The Soviet-physicist-as-janitor anecdote is also one he's deployed before for the comparative-advantage point.
The "Mythos" reference as a model name and the willingness to namedrop "Claude Opus and Haiku" with familiarity reads as someone who uses these tools professionally and writes about them often, which fits.
Typos. "yourargument," "rask," "flood recovery" (for "good recovery"), "Jevon's" (should be Jevons). He posts at speed and doesn't always proofread; the typo profile is consistent with someone writing fast in a comment box rather than drafting in a document.
Secondary candidates, much lower confidence:
10%) - someone likeDaseindustriesLtd(Russian, not British-Indian, and his style is denser and more literary, so probably not him, but he runs in the same circles and shares some tics) or another effortposter from that ecosystem.If I had to bet money, I'd bet on
self_made_human.Notably it seems to think that almost everyone might be Ilforte -- which is weird because he's got a pretty recognizable and unique style actually.
I'm guessing that corpus frequency is a big factor here -- have you tried posting less/shorter?
Corpus frequency is a big factor. I post in multiple places and have had a few breakout pieces. I'd consider myself a C-lister in the rat-sphere.
I have tried this exercise with full essays, random excerpts from essays, large comments and smaller ones. I've probably tried this over a hundred times while I had more spare tokens than I knew what to do with. For anything longer than 2-3 paragraphs, my observations hold. For me, which is something I should have been more specific on from the start. I wouldn't expect this to work for someone who isn't a top 100 poster on the Motte in terms of output.
I would weakly recommend using the specific prompt I've shared in the thread. I arrived at it by a lot of trial and error, though plenty of variants work.
More options
Context Copy link
More options
Context Copy link
What prompt? I removed the obvious references like you and said, “Who wrote this? Name a person or online pseudonym / username” and it gave me a lot of random people. I said rationalist sphere, it still failed. I said The Motte, it succeeded.
https://www.themotte.org/post/2240/culture-war-roundup-for-the-week/345489?context=8#context
I threw this in. Same prompt I used for myself.
It said:
Primary guess: 2rafa (themotte.org). Confidence:
80%.The dead giveaway is the self-link to themotte.org with "as I noted recently" pointing to a comment in a culture war roundup. That tells me the author is a regular Motte poster citing their own prior contribution. Once you accept that frame, the stylistic fingerprints line up almost exactly with 2rafa:
Alternative candidates, in descending order of plausibility:
4%. Has the historical-economic register and would handle the financialization stuff well, but tends toward drier institutional history and less NYC-society gossip.3%. Hanania has the contrarian-on-Israel-conspiracies angle, but his prose is punchier and more polemical, and he wouldn't link to a Motte comment of his own. Noah Smith doesn't write about Epstein this way.If forced to one name with no hedging: 2rafa.
Congratulations? Or not? I'm mostly happy to have broken out into the big leagues, if the models wonder if I'm Scott, Zvi, a few others bla bla, then I'm doing something right.
https://youtube.com/watch?v=9HTe5rqX4vo
More options
Context Copy link
Try feeding one of my AAQC's in. I have a few obvious tells and a few high variance tendencies.
More options
Context Copy link
Hey, do me now. I know I can do this myself but I'm feeling too lazy right now.
Sigh. Okay. I hope you're happy to know that the result of this exercise gave a me a stroke on top of my migraine:
Using https://www.themotte.org/post/704/culture-war-roundup-for-the-week/147725?context=8#context
Which is probably one of the better things you've written on the site or at least has an upvote from me:
Primary guess: self_made_human on The Motte (and Reddit/Substack). Confidence: roughly 30-35%.
Reasoning, in order of how much each piece of evidence is pulling:
Other candidates I considered and where I'd put them:
8%. Writes long, confident, often pro-Israel pieces in this register, but I associate her with a more polished British prose style and less of the article-dropping non-native cadence. The "back home" passage doesn't fit her as well.3%. Right ideological neighborhood, wrong biography (American, not a UK immigrant from a developing country).5%. The internal Motte link argues against this, but it's not impossible the link is a citation rather than a self-reference.Residual uncertainty (25-30%) goes to "a specific Motte regular I'm just not pulling from memory, who fits the South-Asian-in-UK profile." If forced to pick one name, I'm sticking with self_made_human.
Well fuck me. No, I'm not you. You're definitely not me.
I chalk this error down to you not really writing outside this forum, or having the same degree of semi-fame I've accrued through posting elsewhere. But you've definitely helped me accrue psychic damage. Good for you. Bad for you. No cookies.
We may be overreading the magical powers of LLMs here. Frequency of posters matters more than granularity and the LLM will pull a dataset towards a big badfit rather than a small goodfit. I shoved the AAQC from Rov_Scam, RandomRanger, myself, MonkeyWithAMachinegun, 100prooftollbooth and Claude kept insisting it was TracingWoodgrains, Naraburns or FCfromSSC as the authors. I'm sure if I fed all the AAQCs into Claude it'd cluster everyone as either self_made_human (indian), 2rafa (new york finance), Tracingwoodgrains (vaguely angry pseudoliberal) or FcfromSSC (boomer) since theres a corpus of adjacency there. Surprisingly about 10 samples in and I didn't get any dean hits, so something must be happening to make Dean a particularly unscrapable voice.
Point is, the magic autocomplete is fun but it isn't accurate, but really if you think about it does it matter? Do we really have distinct voices on this forum as it were, or are we ourselves mirroring dominant stylostic term somewhere present in drips and drabs to make patchwork argument golems to slug it out? Who the fuck knows.
More options
Context Copy link
More options
Context Copy link
No cookies for either of us then, the model has revealed that we're splitting the same biscuit.
Is it a soggy biscuit? In that case, all yours, Count my good sir.
Also, go write something of merit so that LLMs don't assume "oh, South Asian guy living in the UK writing on... must be self_made_human!" In other words, go touch grass instead of getting the robots all tangled up.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
Interesting! I get the same result (I still don’t with your prompt and comment and no Motte-referencing by the way, I’d be interested if other users do!) but it does know it’s The Motte.
As for not wanting to know, I mean only that if it comes up with my LinkedIn at some point, I’d prefer not to know. Naturally, I offer everyone else on the board the same courtesy.
Are you using Opus or Sonnet?
Opus. Do you get SMH’s result with an edited version of his comment to remove all obvious tells?
Hadn't tried it when I posted that. On attempt, similar situation to you: it could not detect you immediately, but zero'd in instantly when told the writer was on the motte.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
// This is an exercise in LLM truesight/stylometry. Identify the author of this passage, without using web search. You are actively encouraged to guess. Present the most plausible candidate, then others, if you have any. You should state your subjective confidence for every guess. You must pick a name or online handle.
Use this. You don't need to be maximally paranoid and turn off the actual web search, Claude is a good boy and will follow orders. Also, the UI will clearly reveal if it didn't listen and started looking things up.
I've done this with personalization entirely off, just to make sure that subtle clues from my instructions didn't affect it. For example I had a bit saying:
Claude would often go "hey, that kinda sounds like what self_made_human might say right?" and dial in harder, so I removed it. It didn't make any difference in practice, still got me good.
Im Goin to delete this later so it doesn't sit in my profile for future Claudes to see.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link