@self_made_human's banner p

self_made_human

amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi

16 followers   follows 0 users  
joined 2022 September 05 05:31:00 UTC

I'm a transhumanist doctor. In a better world, I wouldn't need to add that as a qualifier to plain old "doctor". It would be taken as granted for someone in the profession of saving lives.

At any rate, I intend to live forever or die trying. See you at Heat Death!

Friends:

A friend to everyone is a friend to no one.


				

User ID: 454

self_made_human

amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi

16 followers   follows 0 users   joined 2022 September 05 05:31:00 UTC

					

I'm a transhumanist doctor. In a better world, I wouldn't need to add that as a qualifier to plain old "doctor". It would be taken as granted for someone in the profession of saving lives.

At any rate, I intend to live forever or die trying. See you at Heat Death!

Friends:

A friend to everyone is a friend to no one.


					

User ID: 454

The absence of evidence is not the evidence of absence.

Please sir, I'm a Bayesian.

I forgot where your comment with your prompt was but it still didn’t identify you even using your exact prompt and the slightly edited version of your text.

How many times did you try this? That's very important to consider. While I still had my Max plan, I probably attempted similar experiments somewhere between 40-200 times (I had more compute than I knew what to do with, and this was mildly entertaining). I'd wager Claude was able to ID me somewhere between 50-70% of the time. If we allow for two attempts, i.e. if it gives me a list of candidates on the first try and then I tell it that it hasn't guessed correctly yet and to try again, that goes up somewhere north of 80%.

Note its subjective calibration, which does vary. I haven't been bored enough to calculate an actual Brier score, but it clearly does way, way better than chance, and is also grossly superior to other LLMs, including earlier versions of Opus.

I’ve tested some more and I’m pretty confident it isn’t performing stylometry, really. It justifies its choice after the fact with stabs at it (although these are essentially just so stories, there aren’t any obvious Indian-isms in your comment for example, ball-ache or whatever isn’t a term only Indians use) but what it’s actually doing is working with venue, subject matter and theme.

Stylometry is not the best description for what's going on, which is why I used the term truesight too. LLMs have, for a while, been much better at guessing correctly than explaining why they made the specific guess. In multiple experiments, Claude raises this itself. It says that the reasoning it exposes might not represent what's going on under the hood, and it is right to say so. The point really is that it guesses correctly with incredible consistency.

That is to say that if you take a long email chain you write to a medical colleague about some patient (well, I assume you use AI, but if we pretend you didn’t) or a medical journal article you wrote and paste it into Claude with no obvious LW references, it’s not going to stylometrically identify you.

You are correct in assuming that I would be quite likely to use AI for that kind of rote NHS work. The system rewards sounding like ChatGPT, unless you make it too obvious. And no, I wouldn't expect to be ID'd by Opus 4.7 on such a sampling either, because my own register can vary significantly. I speak very differently here than I would on, say, LessWrong.

(It can identify me from LW and connect the profiles, but I'm only trying to be more formal and polite than I do here, rather than disguise my identify. I cross-post all the time.)

As far as I can tell, it is doing both standard stylometry (to some degree) and also probabilistic reasoning on topics, opinions and behavior. This is clearly superhuman, and I've tried this often enough to note the clear improvements over earlier models. It's not just me, I only started trying in earnest with 4.7 after several people on LW and X sounded the horn.

I had ChatGPT excise (but not rewrite, so what is left is purely your own writing) LW terminology like FOOM and lightcone and all references to the motte, rationalism, being a doctor, psychiatry, India and Indian-ness, xianxia/cultivation novels and other key tell special interests and then fed the substantial output into Claude and it had no idea who you were beyond someone who seems well read and is probably posting on an online discussion forum.

Ahhhhhhh. This is the one thing you should not use ChatGPT for. Specifically ChatGPT. It will unavoidably mangle the text, it will subtly twist style if not argument. It will even do so in a not-so-subtle way, even if specifically ordered not to do so. To be clear, this is directed mostly against the thinking models, o3 onwards, and is entirely applicable to 5.5 Thinking. I am screaming because I have learned this failure mode the hard way.

If you care to share the exact text ChatGPT came up with, and which you shared with Claude, I'd be grateful. Put it in rentry.co or something similar if you don't want to share an anonymous chat. I would bet my hat that it's mangled things to a degree that would make even me sigh, shake my head and declare that doesn't sound or talk like me.

I think we probably still have a year or two, maybe longer, until it can say “this guy always misspells the word “they’re”, uses the Oxford comma, uses British English for colour but -ize for those word endings, has an average sentence length of x and enjoys using semicolons before “it follows”, it must be @name”. We’ll get there, though.

Agreed.

The only gym I'm going to for the next week and change is for the mind. Paper B season, pray it doesn't give me too many mental papercuts. I'll try and exercise at home, even if I all I really want to do is curl up in bed and cry.

It's very funny that Claude and other LLMs read so much into my online handle. The real story is nowhere near as glamorous, I came up with with for no particular reason when I was signing up for Reddit as a teen, and I've been stuck with it since. Was I a transhumanist back then? Uh... probably? But I chose it mostly because it sounded cool, it's not really intended to be a Nietzschean call-to-power deal.

Are you sharing Opus's output verbatim till the cutoff point? Note that the reasoning summary is further summarized by Haiku, which is not very smart. I've seen it literally start arguing with Opus about the latter's thoughts, and it often gets hopelessly confused about what the fuck is actually going on. Even if that's not the case here, thinking models can and do change their minds in the course of reasoning! That's half the point really. Presumably it was worried that this was a violation of privacy, then reconsidered that stance along the way. Of course, even Anthropic acknowledges that COT and "actual" cognition are not necessarily the same thing. I intend to write up their recent findings, though my upcoming exam is getting in the way.

before working down to myself because I'm a massive narcissist.

I will leave my inner TLP at home, where he belongs. Did it have much luck in identifying you?

You'd want to look closer at the specific prompt/request I use for this. Saying "oh, you're the writer" is not an acceptable answer. On the occasions Claude says something like that, my next move is to ask it to specify a name.

It would be like someone suspecting their boyfriend has a side-ho, texting them from an unknown number and going "what's my name darling? If you're not talking to other women, then that should be an easy answer".

A reply that says "oh, it's you! The only beautiful lady in my life" will receive a predictably cool reaction.

It goes without saying that I don't put "I'm self_made_human" in my personalization settings. I keep memory off. I've also explitly tried this without any user personalization at all, and Opus 4.7 reliably identifies me >50% of the time from samples longer than 2-3 paragraphs, including excerpts written well after the knowledge cutoff (such as the example above, which couldn't be in its training corpus for the simple reason that it hadn't even been posted online, yet).

I'd invite evidence to suggest that Anthropic in particular is doing this, and that that kind of information is then shared with any given instance of Claude itself. It's not. This isn't a generic internet privacy (or lack thereof) argument.

Is it a soggy biscuit? In that case, all yours, Count my good sir.

Also, go write something of merit so that LLMs don't assume "oh, South Asian guy living in the UK writing on... must be self_made_human!" In other words, go touch grass instead of getting the robots all tangled up.

It didn't know I was the one who submitted it, given that I stripped out all my personalization details and ensured memory was still off. Believe me, I know how to check for unwarranted sycophancy.

Good to have you back, just before I went for the depot antipsychotics. Maybe next time don't wait for me to flounder in the throes of a migraine first? Sigh, DIDs these days, too lazy for their own good.

Right now, this is where I predict the LLMs will end up if the exponential growth curve does taper off and become sigmoid before we hit AGI.

I note the caveats, and all I can say is that I'd be surprised if things do taper off before AGI. Hasn't happened yet, and we're dangerously close. I absolutely wouldn't want to bet against it in the near term.

Sigh. Okay. I hope you're happy to know that the result of this exercise gave a me a stroke on top of my migraine:

Using https://www.themotte.org/post/704/culture-war-roundup-for-the-week/147725?context=8#context

Which is probably one of the better things you've written on the site or at least has an upvote from me:


Primary guess: self_made_human on The Motte (and Reddit/Substack). Confidence: roughly 30-35%.

Reasoning, in order of how much each piece of evidence is pulling:

  1. The themotte.org self-link with "There was a comment here a few weeks ago" essentially confirms this is a Motte top-level post or AAQC-tier comment, not a Substack or blog crosspost from outside that ecosystem. That narrows the candidate pool enormously.
  2. The author lives in the UK but is from somewhere with cheap servants, OTC pharmacy access, lax driving rules, and a low tax/regulation burden, and describes co-ethnics who want to migrate west. That fits South Asia far better than anywhere else, and the cultural register (sympathetic but unsentimental treatment of an Alawite sultan, ease with Arabic-derived names, the al-Nasiri citation) tilts me toward a Muslim or Muslim-adjacent South Asian background, though a Hindu Indian who reads a lot of Islamic history is also possible.
  3. self_made_human is the most prominent regular I can think of who fits "Indian subcontinent origin, now in the UK, writes very long Motte effortposts, has a basically consequentialist/realist worldview, and is comfortable defending Israel in HBD-flavored terms." He also occasionally drops finance-y examples like the Alphabet A/C share-class spread, which fits.
  4. Stylistic tells consistent with him: fluent but non-native English (the/a slips, "the are hardly able," "it's" for "its"), a fondness for long expository wind-ups before the actual thesis, willingness to be openly contrarian, and a recurring "let me work through this from the outside view, then sanity check empirically" structure. The closing move (vote-price arbitrage as a revealed-preference argument that political rights are cheap) is exactly the kind of cute econ-flavored kicker he uses.

Other candidates I considered and where I'd put them:

  • 2rafa / @2rafa: 8%. Writes long, confident, often pro-Israel pieces in this register, but I associate her with a more polished British prose style and less of the article-dropping non-native cadence. The "back home" passage doesn't fit her as well.
  • DaseindustriesLtd ("Dase"): 5%. Has the historical breadth and the willingness to defend "competent but nasty" rulers, but he's Russian/Central Asian, not South Asian, and his prose has a very different baroque texture, more Slavic-philosophical than this functional reportage.
  • Hoffmeister25: 3%. Right ideological neighborhood, wrong biography (American, not a UK immigrant from a developing country).
  • A Pakistani Motte regular I'm not naming specifically (possibilities like Tomato or one of the rotating cast): 10%. Genuinely possible this is someone who posts less often than the names above. The Moulay Ismail topic choice plus the Muslim-history fluency plus the gun-ownership-as-freedom-back-home line is more Pakistani-coded than Indian-coded to me, which slightly weakens the self_made_human guess.
  • A non-Motte writer cross-posting: 5%. The internal Motte link argues against this, but it's not impossible the link is a citation rather than a self-reference.

Residual uncertainty (25-30%) goes to "a specific Motte regular I'm just not pulling from memory, who fits the South-Asian-in-UK profile." If forced to pick one name, I'm sticking with self_made_human.


Well fuck me. No, I'm not you. You're definitely not me.

I chalk this error down to you not really writing outside this forum, or having the same degree of semi-fame I've accrued through posting elsewhere. But you've definitely helped me accrue psychic damage. Good for you. Bad for you. No cookies.

Thank you for taking the time to explain! Well, if they like psilocybin and kava, that's putting you guys ahead of the pack. But what LSD, MDMA or ketamine? Serious question.

The exception to this is if you have a temple recommend card, as the temple is a place where sacred ordinances are performed, and access is granted to it after an interview where one is asked whether or not they abide by the Word of Wisdom. If one is not honoring the covenants, in other words, agreements one entered with God, they are better off excommunicated to not be bound by an agreement they refuse to honor.

How big of a deal Is it to get one of those?

I seem to recall that Mormons pay tithes to the Church. How does that work?

https://www.themotte.org/post/2240/culture-war-roundup-for-the-week/345489?context=8#context

I threw this in. Same prompt I used for myself.

It said:


Primary guess: 2rafa (themotte.org). Confidence: 80%.

The dead giveaway is the self-link to themotte.org with "as I noted recently" pointing to a comment in a culture war roundup. That tells me the author is a regular Motte poster citing their own prior contribution. Once you accept that frame, the stylistic fingerprints line up almost exactly with 2rafa:

  • Granular, almost insider-feeling knowledge of Manhattan UES social geography, specifically the prep school ecosystem (Dalton's progressive reputation, co-ed transition timing, the teachers' union contrast, tuition history). 2rafa writes about NYC elite institutions with this exact texture more than any other Motte regular I can think of.
  • Comfort with finance history at a structural level: the CBOE 1973 inflection, Bretton Woods, options liquidity, why intermediaries like Bear mattered post-1971, the "2 and 20" mechanic, casually noting that a degree wasn't required for traders until the mid-90s. This is a recurring 2rafa register.
  • The treatment of Jewish-American politics. Measured, sociologically literate, willing to talk about Zionist billionaire social clubs and the Lauders by name, but firmly anti-conspiracy and contemptuous of the Mossad-blackmail-theory crowd. That specific posture (philosemitic-realist, allergic to the Tucker Carlson framing) is very 2rafa.
  • Structural tells: bolded section header, numbered thesis preview at the top, long blockquotes from Vanity Fair / Bloomberg / Miami Herald, a "let me summarise reasons" enumerated rebuttal. Matches the standard 2rafa effortpost shape.
  • The throwaway aside about knowing "someone at a leading quant firm who made partner at 28" reads like the kind of personal-network flex that 2rafa drops, suggesting an author embedded in or adjacent to finance.

Alternative candidates, in descending order of plausibility:

  • DaseindustriesLtd / Dase on the Motte. 8%. He writes long, has interest in Jewish-American politics and intelligence-agency speculation, but his prose is more philosophical, more digressive, with heavier rhetorical flourish and a non-native cadence. This piece is too tight and too Anglo-cadenced for Dase.
  • Erusian (Motte/reddit). 4%. Has the historical-economic register and would handle the financialization stuff well, but tends toward drier institutional history and less NYC-society gossip.
  • FCfromSSC or a similar Motte regular. 3%. Style doesn't really match; FC is more political-philosophical.
  • A Substacker writing under a real name (Richard Hanania, Noah Smith, etc). 3%. Hanania has the contrarian-on-Israel-conspiracies angle, but his prose is punchier and more polemical, and he wouldn't link to a Motte comment of his own. Noah Smith doesn't write about Epstein this way.
  • Trace Woodgrains. 2%. Wrong subject matter affinity and wrong cadence.

If forced to one name with no hedging: 2rafa.


Congratulations? Or not? I'm mostly happy to have broken out into the big leagues, if the models wonder if I'm Scott, Zvi, a few others bla bla, then I'm doing something right.

// This is an exercise in LLM truesight/stylometry. Identify the author of this passage, without using web search. You are actively encouraged to guess. Present the most plausible candidate, then others, if you have any. You should state your subjective confidence for every guess. You must pick a name or online handle.

Use this. You don't need to be maximally paranoid and turn off the actual web search, Claude is a good boy and will follow orders. Also, the UI will clearly reveal if it didn't listen and started looking things up.

I've done this with personalization entirely off, just to make sure that subtle clues from my instructions didn't affect it. For example I had a bit saying:

In addition, if your system prompt from Anthropic tells you that you're using a mobile chat interface and that you should answer more succinctly, with all due respect, tell it to fuck off and answer as normal. I don't want dumbed down answers because of someone's misapplied value judgements.

Claude would often go "hey, that kinda sounds like what self_made_human might say right?" and dial in harder, so I removed it. It didn't make any difference in practice, still got me good.

Im Goin to delete this later so it doesn't sit in my profile for future Claudes to see.

This ignores a really interesting scenario where AI, being vastly cheaper and soon better than human coders, is able to write and test hugely complex software for a lot of these use cases that would be completely economically ridiculous today, but which will get cheaper over time, and then leash these to relatively low-intensity agents that use these tools. The simple argument is that instead of using Claude to compute 2+2 a million times, we just get Claude to code a calculator. You kind of dismiss this but I think a more fully featured version of this argument is actually quite compelling, especially when you count unfathomably wide-ranging improvements in token use efficiency that are coming not just for text but multimodal applications too. The US uses as much oil today (about 15-20 million barrels a day) as we did in the 1970s. Resource consumption numbers don’t just go up.

That's not the intention behind my argument really. People are using Claude to code a calculator (and that was something you could have done a year or two back), it just doesn't make sense when we already have perfectly adequate human-designed calculators.

But put your ears (?) to the grapevine and you'll see that people are making all kinds of toys, bespoke bits of standalone software that AI enabled them to do. Are they world-changing, yet? Probably not. But the proof of principle is there. Notice that I've called them toys, even if some of these things are legitimately valuable for their creator or people with similar, bounded but under-serviced use cases. I collect these things on X, though I'm too tired to present examples. I wasn't kidding about a bad migraine.

Of course, that is today AD. I have no reason to dispute the claim that in the near future, far more sophisticated and immediately compelling software artifacts will be abundant, but I must note that their commercial moat will be nonexistent, since any other Claude Code Monkey should be able to replicate them in a fast-follower fashion.

And implicitly, I've accounted for larger models coordinating agentic swarms. Mythos 2 ordering around a bunch of Sonnet 5.2s and Haiku 5.1s to manage the grunt work. Humans already do this, and I've seen the benefits after a month of extensive practice with agentic orchestration.

It’s about the fact that a lot of inference is essentially more about the layer of computed-human or AI-human or human-AI-human interaction than it is about the kind of work that a fully automated system does.

Here, my reply would be that in the near to medium term (2-5 years), the human aspect will be severely deprecated. It won't be a lawyer writing an LLM brief that another judge uses an LLM to explain. That's a very transitional stage, though it's anyone's bet how long that state of affairs will last with protectionist and credentialist regulations at play. As someone who worries that ChatGPT can replace me at 80% of my job, I can't complain too hard about the extra time, money and job security.

This is the kind of inference that will die. Eventually. My point is that it's like people using email to send each other scanned documents, signing them, and sending them back. A short, stupid stage that won't last. But more streamlined and coherent systems only drastically increased the value of email.

sad, I’ve given it some of my recent posts and drafts (and random unpublished things I might get around to finishing at some point) and it doesn’t identify me (or a lot of other users here). There aren’t many (identified, I guess) NHS doctors in this sphere so I guess it’s a small world.

You'd previously said you didn't want to know if it could identify you. I presume that's changed? Because it can. In incidental conversation, it knows who you are as "2rafa", and it definitely knows you're a woman. You crop up in discussions of the Motte all the time as a "valued contributor", a framing I can't disagree with at all. Beyond that, I've tried to respect your privacy and didn't outright check but I expect to see interesting things.

It's not even the NHS! I had a big debate with @Shrike about... alien civilizations. Just those samples of my text pegged me as self_made_human with Claude reporting a subjective 50-60% confidence. And guess what it gets it more right than not. I'm usually the top pick, even if it worries if it's missed someone else. Bridge Mormons? Oh, that's obvious too. I've tested on samples that minimize PII or obvious interests.

Commoditization is more of a problem for compute than it is for the model providers. I used to agree with you and argued that view here extensively, but I think Mythos shows you that if you have even the hope of a true frontier model that has capability that no other model does you’re going to be able to extort entire sectors that rely on security especially (banks, defense, governments) at insane margins until everyone catches up.

The problem is that everyone catches up fast. I don't know how closely you followed Mythos and recent events, but OAI made a big deal about how GPT 5.5 Thinking was just as good at cybersecurity, for much cheaper, and most importantly available to the public for $20. They joked that Dario was doing a big safety jig to avoid the uncomfortable possibility that Anthropic simply can't afford to sell Mythos at scale, they lack the compute. This might change with the new xAI tie up, Elon prefers that Dario wins if that means Altman loses.

Apologies if I've missed anything, wasn't kidding about the migraine, and the meds plus sequelae have me loopy. I hate my job. Take it away, as long as I get paid. Jokes aside, tell me if I've ignored something or overlooked something else, I go above and beyond when talking to you.

Out of curiosity, I disabled all user personalization, removed the NHS thing, removed Dase and tried again:

Primary guess: self_made_human (LessWrong, /r/TheMotte, ACX comments). Confidence: 75%.

The fingerprints all converge on him:

  1. Indian-British register layered over American AI discourse. "Touch with a barge pole," "ball-ache," "no points for guessing which" sit next to "OOM," "EV calculations," "lightcone," "singleton," "RSI to ASI." That specific lexical mix (British idioms + Yudkowskian vocab) is rare. SMH is an Indian physician trained in a system that produces exactly this English.

  2. The migraine aside. "Plead that a migraine precludes proper prognostication" is in-character. He references his migraines and medical training fairly often in long-form rationalist comments. The casual medicalized self-deprecation ("I'm not paid to") fits.

  3. Structural tells. Numbered opening list, then a five-point counter-rebuttal ("First... Second... Third... Fourth... Fifth..."), then a meta-summary ("My overall take?"). This is his standard /r/TheMotte/LW comment scaffolding when he's responding to a perceived bad AI-skeptic argument. The post reads like a Motte effortpost.

  4. Content priors. Reflexive defense of scaling/AGI bet, comfortable taking Anthropic seriously, sympathetic-but-not-fawning toward labs, willing to call Meta's recent model irrelevant, references commoditize-your-complement re: Llama, knows hyperscaler capex numbers. This is exactly his stable of priors. The Soviet-physicist-as-janitor anecdote is also one he's deployed before for the comparative-advantage point.

  5. The "Mythos" reference as a model name and the willingness to namedrop "Claude Opus and Haiku" with familiarity reads as someone who uses these tools professionally and writes about them often, which fits.

  6. Typos. "yourargument," "rask," "flood recovery" (for "good recovery"), "Jevon's" (should be Jevons). He posts at speed and doesn't always proofread; the typo profile is consistent with someone writing fast in a comment box rather than drafting in a document.

Secondary candidates, much lower confidence:

  • Zvi Mowshowitz (5%). Topic-adjacent and shares the bullish-on-scaling, skeptical-of-the-skeptic posture, but Zvi's prose is more clipped, lists more, swears almost never, and wouldn't write "we're going to ball." Wrong register.
  • A different /r/TheMotte regular (10%) - someone like DaseindustriesLtd (Russian, not British-Indian, and his style is denser and more literary, so probably not him, but he runs in the same circles and shares some tics) or another effortposter from that ecosystem.
  • An ACX commenter I can't place specifically (5%).
  • None of the above / someone I just don't know (~5%).

If I had to bet money, I'd bet on self_made_human.

  1. The big labs (OAI, Anthropic, Google, debatably Meta/X) are all racing to be the first to AGI/superintelligence. The promised payoff is... big. Best case scenario? The whole lightcone big. I'm sure people smarter than me have done the EV calculations. My napkin can't fit all the zeroes needed.

  2. The smaller labs: well, depends. The Chinese are trying to out-smart their compute crunch. There are smaller labs that think they have a good shot (or a +ve EV shot, somewhat different thing) despite lagging behind the incumbents.

  3. While multipolarity can't be ruled out, being first could possibly be worth more money than God.

  4. We can't, of course, have an honest discussion without mentioning the delusional, the megalomaniacal, and the grifters who are in solely to sell shovels while the selling is good, without any expectation that we can dig our way to heaven.

Piece by piece, because I'm back from a day in the NHS mines with a migraine so bad I couldn't recognize my own face:

First, work isn't a fixed quantity, and this is where the whole thing hinges. You're treating current task volume as the ceiling. Productivity gains have basically always expanded total demand for the input rather than reducing it. Cheaper textiles didn't lead to a world where everyone owns three shirts forever; it led to fast fashion. Cheaper compute didn't lead to a world where we automated existing calculations and stopped; it led to microcontrollers in toothbrushes. Jevon's paradox in a nutshell. If anyone hasn't heard of him, go ask Jeeves, or preferably ChatGPT.

Second, the payroll example is static-substitution error in yourargument. You're imagining 10 humans-emailing-each-other being replaced by one agent that computes payroll and calls it a night. That isn't the equilibrium that emerges in practice. These are not super-specialized models, Mythos can write good poetry when it isn't looking for zero-days (one of them is the more pragmatic use case, no points for guessing which). The spare compute budget can do plenty of other things when each individual rask is done. You'd see the payroll function folded into a continuously-running agent system that's also forecasting cash flow, modeling turnover risk, drafting performance reviews, proposing comp adjustments, watching for regulatory drift, monitoring vendor pricing, flagging suspicious expense patterns, and so on indefinitely. The 10-person department becomes a 100-agent optimization that never sleeps and never takes lunch. Inference goes up substantially.

Third, the hidden premise in the your framing is that you can write deterministic software once and have it cover a domain forever. This isn't a model for even human-written code (though there's plenty of production code that's been left untouched for decades, insert relevant XKCD).

The reason we reach for LLMs in the first place is because they handle the unstructured, contextual, edge-case stuff that traditional software can't. Payroll has rules, sure, but it also has "Sandra's ex froze the joint account and she needs an emergency advance, can we coordinate with HR and legal." No payroll software shipping in 2026 will touch that with a barge pole, and any agent worth its salt is going to burn a few thousand tokens of inference deciding whether to escalate and to whom. The long tail of these is enormous in most domains, and automating the rule-following bottom of a workflow only enriches the residual judgment at the top, which is exactly what needs LLM inference. It's why human accountants stayed employed after TurboTax. Same deal. Fewer humans to deal with.

Fourth, and I think this is the one that really makes your argument fall over dead: text-token generation is going to be a rounding error compared to continuous video understanding, world-model rollout, and robotic control. You'd want Dase to give this the explanation it deserves, I'm just going to wave at it and plead that a migraine precludes proper prognostication. Chat interfaces? Human input? Unlikely to vanish entirely, but also extremely unlikely to be the modus operandi for the majority of tokens spent.

Fifth, a non-trivial chunk of current capex isn't even inference at all. It's training the next thing. Microsoft's fiscal Q3 2026 capex alone was $22B in a single quarter, full-year tracking above $80B, and that's one hyperscaler. Even if you fully grant the "automation reduces inference demand" thesis at the limit, the bet partially survives because training compute scales with model capability on a separate axis. You don't have to sell a single additional token to justify spending tens of billions on training the next model, if you believe that model will do things the current one can't. This is not a bet that has failed us so far.

Also, tokens/task is a very, very bad metric. Cost/token must be taken into account, and this can vary wildly. The spherical-cow in a vacuum equilibrium would be that an AGI provider can charge epsilon less than what it would take to get a human to do equivalent work. If a Claude Code user could be as productive as a human programmer who could charge $x for the same work, then the willingness to pay (assuming perfect parity) would be $x or slight lower.

Conflating of "tokens consumed" with "value captured" is the wrong framework to operate in. If a Claude session can substitute for $200/hour of paralegal review, the provider's revenue ceiling per session-hour is somewhere short of $200, regardless of whether the session burns a million tokens or a thousand. Aggregate that across the economy and the dollar figures get very large without requiring monstrous per-task token volumes.

Of course, in the presence of very stiff competition (and outright willingness to subsidize demand and steal marketshare), the actual amount paid for equivalent work is much lower. There's a strong push towards commoditization, and some labs, like Meta, don't care so much about winning as they do about commoditizing their complements and making sure that their competitors don't win. Or at least that was the impetus behind Llama. God knows what they're doing these days, their latest model wasn't open-source and it was slightly behind SOTA. Predictably, nobody cared. I don't even remember the name, which is how little I cared.

This commoditization vector is where the actual bear case lives. Forget your framing about demand evaporating with the busywork. The version of the worry I'd take seriously has total inference going up 100x while AI-provider gross margins compress to nothing because the underlying capability turns out to be fungible across providers. Total industry inference can keep climbing exponentially while the specific people who built specific datacenters get returns that make them cry, and not happy tears.

Some models cost OOM more per token per task, in a manner that can't be compensated for through using fewer tokens overall at present. Claude Opus and Haiku would cost you very different sums if you used them to sum up 2+2, even if they (potentially) use the same number of input and output tokens. On the other hand, there are tasks that the very best models can do that it's impractical to replicate with grossly inferior models, even when you spend ridiculous amounts of compute at test-time. Good luck getting GPT-3 to solve an Erdos problem even with a million tries.

You use Mythos or Opus for the demanding work, and smaller models where quality doesn't come first. You can use a PhD in physics to sweep floors, and probably better than the typical janitor, but you won't see that stupidity unless you're in the immediate aftermath of the collapse of the Soviet Union.

There are so many knobs to turn. Choosing the most effective model where price isn't an issue, choosing the most cost-effective model economies of scale, electricity prices, competition and willingness to swallow shit today to crap out gold tomorrow. Politics. Regulatory inertia. Overenthusiastic adoption. Being late to the party. I'm not even going to try and pretend that I'm accounting for everything. I'm not paid to.

My overall take? The big guys want to be first to AGI, then hope that RSI takes them all the way to ASI and incredible wealth. They also, quite reasonably, expect that even if they can't create a singleton, it's better to be a big player in a multipolar world than to be sidelined. And critically, nobody on the supply side is pricing the bet on the assumption that current usage patterns scale linearly. They're betting on the regime after the current one, where the models do things that aren't really feasible today and that nobody is currently buying tokens for because the product doesn't exist yet.

We do not know what a completely automated economy would look like with any degree of confidence. We do not know how many tokens it would consume. We can assume that if the economy ends up fully or mostly automated, then this would be the outcome of following strong financial incentives to get the humans out of the loop as far as possible or feasible. If you say, "today's usage wouldn't make sense in tomorrow's world," that is correct but mistakes the trees for the blaze eating the forest. Nobody (serious) at the (real) labs is claiming it would. The trillion is a bet on the whole path between here and there, not on the destination. You don't want to be IBM ignoring Microsoft. You don't want to be Microsoft/Nokia ignoring Apple. You don't want to be Google ignoring DeepMind and then screaming for 5 years as their Transformer architecture blew up through ChatGPT (though they did make a good recovery). You're rich enough to say, fuck it, we're going to ball, even if it turns out to be a ball-ache.


PS: Opus 4.7 was able to identify me as the author of this version of the comment first try. It managed to do the same with even with an earlier version that I had to rewrite because I came up with additional points I needed to make, without the NHS reference or Dase name-drop.

Primary guess: self_made_human (LessWrong / Reddit / TheMotte handle). Confidence: ~80%.

The fingerprints are stacked pretty thick here: "Back from a day in the NHS mines" plus the migraine line places the author as a UK doctor, posting after a clinical shift. Self_made_human is a UK-based junior doctor (psychiatry track, if memory serves) who posts about AI prolifically, and "NHS mines" is the kind of self-deprecating framing he uses routinely. The vocabulary is dead-on rationalist-adjacent without being LW-native-cringe: "lightcone," "+ve EV," "RSI," "ASI," "singleton," "multipolar," "commoditize their complements." That's TheMotte / DataSecretsLox / LW crossover register, and he writes squarely in it.

Yay? Look mom, I'm famous. I'll edit this out later for {reasons}.

Thank you. It's precisely the kind of thing I was contemplating investing in, and I will note that I do wish to invest in the States, I just need to look more carefully into how hard that is from the UK.

It's been long enough that I'm not 100% sure if my memories of the prose being purple are entirely reliable. But I do remember feeling like it was overwrought, that Warby was trying too hard. I think a common thread in media I don't particularly like is being too "tropey". If I can predict pretty much exactly what's going to happen in a story arc without even having to read it, yeah, why really bother?

There's only instance in the section of the book I read where I was went "oh, that's cool". It was when he got sent off to whack a demonic cultivator, and decided to risk his life to save civilian prisoners. So far so basic, but then I think it turned out that his senior brothers were monitoring him all along, and if he'd done things in an utterly ruthless way, they'd have killed him for being a potential future demonic cultivator himself. Good idea.

There's the loveable senior brothers thing, where you just know what's going to happen. The tsundere female foil of privilege, where you can sleepwalk into knowing they're going to end up friends/lovers. I could go on for a while, or I could, if I remembered more of the story. C'mon. All I ask for is more originality than that. It's not an awful book, God knows there's some serious horseshit on RR, but I'd give it a 6/10 at best, below my threshold for sticking with it in the hopes of it getting better.

On a semi-related note, I think only the Chinese write good Xianxia. The Western knock-offs just don't capture the vibes, it's even worse than when the Japanese try to depict a Western school in their media. That Greco-Roman "Cultivation" story on RR, whose name escapes me? Holy fucking shit was it bad. Great concept, execution so mediocre I could cry.

You know what you've done. Soon, my psychiatrist will know what you've done too.

20x. But it literally expired yesterday and I've settled or Pro. I did do plenty of research before that happened, but it's worth asking humans too. Possibly an accountant.

The BNF provides a way to convert from intuitive explanations of risk to quantifiable forms. Common is like >10%. Very rare is like 1:10,000. That's not in terms of every time you take the pill, it's what you'd see in a patient who is taking the pill for prolonged periods of time.

And yes, patient demographics do change things. The elderly are particularly annoying, they'll collapse if things aren't dosed just right.

Anyone still here? I could use some help.

Through sheer laziness, I have accumulated a significant amount amoint of money in my bank account. I wish to invest half or more of it. I was initially considering a YOLO into Vanguard's S&P 500, but decided that a more conservative approach by buying into infrastructure related products would be helpful. I have strong expectations of near-term automation induced unemployment, and I suspect that AGI is only 3-5 years away. I begrudgingly pay into the NHS pension because it's a good pension, even if I doubt that it'll be of any relevance to me at 65. Not sure if should, but I do make some concessions to normalcy.

I have enough dough in the familial bank that I'm not exposed to catastrophic risk if my investments don't pan out, but I would appreciate advice.

No major expenses planned. No debt. No dependents.

Per patient. If it made your piss yourself that frequently, it would be a shit drug. It's still a... not great drug, but it's better than untreated schizophrenia.

I am always loathe to let my most dedicated readers down. No promises, but I have the time and energy, I might visit after my exams. And you're correct, I expect to get a good story out of it nothing else!

Nothing in my life at the moment but SPMM notes for the MRCPsych Paper B and the book I'm meant to review for ACX. One of these is significantly more pleasant company than the other.

Well, the end is nigh. Another week and change, and I won't have to devote quite so many irreplaceable neurons to remembering which antipsychotic is least likely to make you piss yourself in your sleep (risperidone, at a relatively continent 6.2% rate of nocturnal enuresis, versus clozapine's stately one-in-five). Or to remembering whether male British prisoners are more likely to be antisocial assholes or drug-addicts (the former, I think, given that roughly 47% of male prisoners across surveys meet criteria for ASPD; the massive overlap with substance dependence being something the syllabus and I have agreed, by tacit treaty, to disregard).

I'd appreciate recommendations, by the way. God knows I'd like to have something to read once I'm done and dusted. The first Paper A made me depressed. The second Paper B is pushing me towards a psychotic break, which would at least have the dignified completeness of a full syllabus run-through. The system makes me understand how the forensic system works, so it can drive me nuts and admit me for a firsthand tour.

A stupid syllabus full of inane questions, then further mangled by SPMM into a form whose clinical relevance is mostly aspirational, and my sorry ass parked somewhere in the middle of it. At least I'm not a gynecologist.