Contact Us
Sign In
Sign Up
Rules Admins Moderation Log Random Post Random User
What is this place?

This website is a place for people who want to move past shady thinking and test their ideas in a court of people who don't all share the same biases. Our goal is to optimize for light, not heat; this is a group effort, and all commentators are asked to do their part.

The weekly Culture War threads host the most controversial topics and are the most visible aspect of The Motte. However, many other topics are appropriate here. We encourage people to post anything related to science, politics, or philosophy; if in doubt, post!

Check out The Vault for an archive of old quality posts. You are encouraged to crosspost these elsewhere.

Why are you called The Motte?

A motte is a stone keep on a raised earthwork common in early medieval fortifications. More pertinently, it's an element in a rhetorical move called a "Motte-and-Bailey", originally identified by philosopher Nicholas Shackel. It describes the tendency in discourse for people to move from a controversial but high value claim to a defensible but less exciting one upon any resistance to the former. He likens this to the medieval fortification, where a desirable land (the bailey) is abandoned when in danger for the more easily defended motte. In Shackel's words, "The Motte represents the defensible but undesired propositions to which one retreats when hard pressed."

On The Motte, always attempt to remain inside your defensible territory, even if you are not being pressed.

New post guidelines

If you're posting something that isn't related to the culture war, we encourage you to post a thread for it. A submission statement is highly appreciated, but isn't necessary for text posts or links to largely-text posts such as blogs or news articles; if we're unsure of the value of your post, we might remove it until you add a submission statement. A submission statement is required for non-text sources (videos, podcasts, images).

Culture war posts go in the culture war thread; all links must either include a submission statement or significant commentary. Bare links without those will be removed.

If in doubt, please post it!

Rules
Recommended Posts And Communities
Recommended Realtime Chats
- Quokka's Den Telegram
- Astral Codex Ten Discord

PaperclipPerfector 1mo ago (text post) 26620 thread views

Culture War Roundup for the week of April 13, 2026

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

Shaming.
Attempting to 'build consensus' or enforce ideological conformity.
Making sweeping generalizations to vilify a group you dislike.
Recruiting for a cause.
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
Don't imply that someone said something they did not say, even if you think it follows from what they said.
Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

1919

1919
3

Jump in the discussion.

No email address required.

ChickenOverlord 1mo ago

Another indicator that AI is a bubble. Anthropic just released Claude Opus 4.7, and users are reporting significantly higher token burn rates (and therefore costs) for what appears to be a minor improvement over Opus 4.6. Discussion on Orange Reddit is here: https://news.ycombinator.com/item?id=47816960 and a tracker of the increased token burn rate is here: https://tokens.billchambers.me/leaderboard

The token tracker is based on user reporting, but has been fluctuating between 37% and 45%.

Even if AGI is actually possible with LLMs (or at all, but I'm not trying to start a discussion on metaphysics here), it looks like the capital needed to achieve it is drying up before it can be reached. Anthropic's move here (combined with them handicapping Opus 4.6 a few weeks ago) seems to clearly be an attempt to achieve profitability. The free/subsidized rate train for end users has pulled into the station, and now you have to pay more for the same (or worse) capabilities you were enjoying before.

I normally don't care much for the median Hacker News commenter (if me calling it Orange Reddit didn't already give that away), but I do find them to be a useful barometer for general sentiment in the tech industry. And a few months ago I would have said roughly 60% of HN users were AI believers/enthusiasts, 20% neutral or unsure, and 20% anti/negative. Anthropic's antics over the last few months (and Sam Altman's antics for his entire life) seem to have soured their views significantly, and I see this as a big sign of a sea change in sentiment about AI in the tech industry.

At least for me personally, I just hope this leads to less retarded mandates from my higher-ups about using AI X times a month etc. (we're literally tracked on usage and it can affect our raises/bonuses).

For everyone here, nut perhaps especially the AGI believers, have your feelings changed at all over the last few months?

Context

georgioz ChickenOverlord 1mo ago · Edited 1mo ago

I don't see how this is a bubble. The best counteargument to all AI skeptics for broad range of critiques including this one is the following: this is the worst the AI will ever be - symbolized by famous Will Smith eating spaghetti evolution. The original 2023 quality is something you can do on consumer grade computer with standard graphic card for couple of hundreds bucks. Even the 2024 result you can replicate by renting hardware for maybe $20k.

The point being, that cost is still decreasing by factor of 100 every year. It may be questionable if the frontier models will ever pay off given how quickly the field progresses, but it does not matter. If all of these companies blow-up, the underlying technology will still be there in the same way Google replaced Yahoo or Amazon was the first to really scale the commercial potential of internet well into the internet era.

The bubble is not going anywhere, the value in terms of the models is real and tangible, it is here to stay. Or to be precise, there is no real bubble, we are not talking about worthless tulip bulbs or NTF images. The models are real and useful. If anything it would be akin to processor "bubble", where miniaturization and chip breakthroughs resulted in Moore's law which basically held for decades. Yes, there are mass graves of companies that failed to keep the pace, including titans like Global Foundries or Panasonic, where individual investors bet on the wrong horse and lost everything. It does not matter, chips are very cheap, very powerful and very useful. The market itself is huge and important and if you hedged your bets correctly, you would be very wealthy.

Context

omw_68 georgioz 1mo ago

the market itself is huge and important and if you hedged your bets correctly, you would be very wealthy.

This reminds me of a question I was kicking around with some friends. Suppose that around the time of the Internet Bubble, you had invested in a basket of all the hot tech stocks: Netscape, Amazon, etoys, PimentoLoaf.com, etc. How would you be doing 25 years later?

Context

TowardsPanna omw_68 1mo ago

Depends on your timing. Do you buy in 1997-1999 or at the absolute peak in 2000?

How many companies do you pick out? Based on what metrics?

When it all crashes, do you cut your losses at a set point, or do you buy more? Assuming you got kicked out of them all, do you buy back into the most promising ones during the recovery?

But off the cuff I'd say that holding and never selling Amazon would make up for complete losses in several other stocks.

Context

omw_68 TowardsPanna 1mo ago

Depends on your timing. Do you buy in 1997-1999 or at the absolute peak in 2000?

How many companies do you pick out? Based on what metrics?

I agree those are good questions. For the sake of discussion, I would say

(1) you buy at the time when there is a lot of public discussion about a "bubble." So roughly 1998 or 1999.

(2) you buy a big basket of tech stocks. I'm tempted to just pick the Vanguard VITAX fund, but apparently that was not established until 2004.

When it all crashes, do you cut your losses at a set point, or do you buy more? Assuming you got kicked out of them all, do you buy back into the most promising ones during the recovery?

I would say you buy and hold.

But off the cuff I'd say that holding and never selling Amazon would make up for complete losses in several other stocks.

Well that's kind of the point. I believe that there is an AI stock market bubble but I have still invested roughly 10% of my savings in tech. Roughly speaking in the sort of stocks which make up Vanguard's VITAX fund. And I think I have a pretty good chance of coming out ahead.

Context

TowardsPanna omw_68 1mo ago

Buying a big basket of more or less randomly chosen dotcom stocks would have destroyed you. If by big basket you mean 50-100 or more. The majority of those companies lost over 90% of their value and many never recovered at all. If Amazon was only 1% of your holding it might not have the weight to make up for the dozens of utter failures.

Buying a top, current tech fund is different, they're somewhat competently curated.

Context

omw_68 TowardsPanna 1mo ago

Buying a big basket of more or less randomly chosen dotcom stocks would have destroyed you. If by big basket you mean 50-100 or more. The majority of those companies lost over 90% of their value and many never recovered at all. If Amazon was only 1% of your holding it might not have the weight to make up for the dozens of utter failures.

Thanks for posting this, it's probably something I should have researched before buying a big basket of random tech stocks in 2024-2025.

Context

ChickenOverlord omw_68 1mo ago

Famously, Cisco only just barely got back to its Dotcom bubble era peak (and maybe it still hasn't yet if you account for inflation?).

Context

MaiqTheTrue Renrijra Krin ChickenOverlord 1mo ago

I’m not convinced it’s a bubble. It might be, but gaging that from random commentary on HN isn’t a good way to figure it out. There are all kinds of reasons that sentiment might be going south, a lot of it being that people are expecting it to come much faster than it actually can. Early LLMs fed this in my view because at the start minor changes were big improvements. Going from an AI that could barely understand a simply question to one that can write an essay on a topic was quick, maybe 3-4 releases. If it takes 6-10 to get AI to get you a publication worthy book on the topic of the query, I don’t think that’s a problem for AI — which will eventually get there — though it probably means a much harder time getting funding to work on the next projects.

Context

Tretiak If you know you know, if you don’t you don’t. MaiqTheTrue 1mo ago · Edited 1mo ago

I'm firmly convinced it'll pop late 2026 (this year) or 2027. Could be wrong, could entirely be on point.

Suppose the current state of the industry sustains itself at equilibrium. I still think when you factor in all the costs AI entails, it can't license the claim that it's good for almost anything. AI makes so many mistakes that it actually reduces productivity because for every mistake it makes, it costs even more in time and resources to go back and fix; which is often greater than the associated costs of just doing it yourself. Humans are more productive than AI (incidentally this was proved by an analysis that was meant to refute that claim).

With LLM's it's error rate is always going to be the same no matter how much data it gets or at what scale. If you want AGI, you have to abandon LLM's because it's a straight up, dead end technology. It's use cases are small, narrow and mostly consist of merely baseline automation of tasks (hence, it's just a fancy autocomplete). They're unreliable and can be exploited. They don't think. They don't comprehend what they're doing. In fact, they're actually stupid. And worst of all, it can't be fixed. It just doesn't help things. Like, at all. Everyone is always saying forthcoming iterations will eventually solve all these issues but really, they won't. And there's no evidence of that.

The notion as well that AI is going to cut the labor market down is also false due to a basic rule in economics that's been understand since Keynes' heyday: if you double the productivity of your workers, the 'general' tendency isn't to fire half of your staff, it's to sell twice as much stuff. The fact that a lot of AI is also being sold way below the cost just to get market is an indication that it may not be cheaper even if and when they turn out to work. It isn't sustainable.

Shit's fucked up and it's going to be bad.

Context

MaiqTheTrue Renrijra Krin Tretiak 1mo ago

I’m not sure. Again the entire field is in its infancy. You’re probably right that LLMs are not by themselves going to be AGI. But creating a system with multiple systems run by an agent might be able to go farther in that direction than just LLM with agent.

Context

tkt Tretiak 1mo ago

A lot of those sources are written last summer, or last fall (in which case they'd likely be building on older observations). Anecdata: my company encouraged use of LLMs then. I found them totally useless in our not so easy codebase, shelved the thing and went on the manual way. At the time I'd probably have agreed with the vibe of your post. Then reading some hype about Gemini 3 in the winter I gave it another shot; models turned out to have got over some hump; and now they look like genuinely useful productivity tools.

I can believe LLMs will have a way harder time cracking law or medicine or mechanical engineering or whatever, but with coding you can come up with endless tasks that are sort of real-world difficult that you can beat the model against on giant server farms without zero interaction with the real world, the same formula that worked for AlphaGo, so stands to reason that they'd git gud there faster.

Context

phailyoor Tretiak 1mo ago

(incidentally this was proved by an analysis that was meant to refute that claim)

An entirely ai slop analysis.... proves nothing in my eyes

Context

birb_cromble MaiqTheTrue 1mo ago

I’m not convinced it’s a bubble

My current layman's opinion is that the current environment is a bubble, but that bubble is entirely independent of the technology itself.

It's clear that at least some people, in some circumstances, are getting value out of the technology. It's not like NFTs, where even the best use cases are better served by simpler, pre-existing tech.

That said, the current economic environment is baffling to me. Every big provider is acting like this is a zero sum game where one company winning will give them a monopoly forever. They're also acting like the progress curve will produce exponentially increasing capabilities forever while operating costs approach zero.

I'm not sure if the market as it stands can achieve profitability that justifies the current AI company valuations if there are 3-4 winners instead of one. They're all priced with the assumption that one of them will utterly own the most transformative technology since the steam engine. If that's not true, people are going to start asking why they're not getting a 10% return on a company that has a 20x P/S ratio. Once people start asking that question, it's going to get uncomfortable for anybody that's not a monopoly already.

They're taking on significant debt, too. Take meta, for example. If just one of their data centers has a twelve month delay, that's a ~3% hit to free cash flow to service debt on an asset that isn't making any money. When was the last time that you saw a construction project more complex than a doghouse finish on time and on budget? Even if they finish construction, there are significant delays getting them powered, and gas turbines aren't a permanent solution. There's pretty enormous systemic risk there. Some companies are better equipped to handle it than others, but none of them are immune. Oracle, in particular, appears to be laundering questionable debt through their investment grade credit rating, which is unlikely to end well for them.

That said, even if Anthropic and OpenAI shit the bed and contagion through the bond market causes a market crash, and Google puts their research back on the shelf, LLMs don't go away. Local models exist. China is still plugging along with much more reasonable objectives.

I don't know exactly what the future holds, but either way, it'll have LLMs in it.

Context

FearandLoathingintheMotte birb_cromble 1mo ago

Phenomenal take. I largely agree, although things look very different depending on where capabilities stall out.

I just listened to an uncharacteristically poor quality (maybe just Gell-Mann) Odd Lots episode where the economist said essentially he didn't think it would be revolutionary but would add 2-3% to growth.

Having our economies double in growth would be insane, I can't wait

Context

birb_cromble FearandLoathingintheMotte 1mo ago

Did he mean it as "increase growth from 2% to 4%", or "increase growth from 2% to 2.04%"?

Context

FearandLoathingintheMotte birb_cromble 1mo ago

I think his verbatim quote was something like "it will add 2-3% growth"

I doubt he meant 2% * 1.02 = 2.04% as that's incredibly small and he was otherwise rather bullish, but maybe he did

Context

ulyssessword birb_cromble 1mo ago

My current layman's opinion is that the current environment is a bubble, but that bubble is entirely independent of the technology itself.

As another example of that, consider the dot-com bubble: the internet didn't go away when the companies failed.

Context

birb_cromble ulyssessword 1mo ago

The comparisons to the dot com and railroad bubbles concern me sometimes.

A railroad line can last centuries if properly maintained. Fiber has a 20 - 50 year lifespan. They were both totally usable by the time everyone finally got over the mania. I'm not sure the same is going to be true about GPUs. The data center physical structures will exist, and maybe the power infrastructure, but even the (IMHO optimistic) projections on GPUs show a 6 year depreciation schedule.

Context

ulyssessword birb_cromble 1mo ago

6 year depreciation schedule.

Is that because they break down, or because four cycles of Moore's Law means that the newer ones are 16x as powerful? I know that consumer-grade GPUs running in consumer settings with consumer duty cycles last for more than six years, but I don't know how well professional grade ones in a server farm running 100% of the time last.

If we stall at the current capabilities, that's one thing. If we go back down to 2020-ish levels of compute availability, that's something else.

Context

birb_cromble ulyssessword 1mo ago

If we use Bitcoin as a reference, they tended to crap out after about 3 years because of blown capacitors.

Maybe things have improved?

Context

Rov_Scam ulyssessword 1mo ago

No, but most of the companies that went bust weren't ISPs but ancillary companies that had nothing to do with the Internet itself. Telecom definitely took a hit, but thatbwas due to optimistic demand projections that led to infrastructure build out that wasn't needed, not because they weren't charging customers enough. The current situation is like if they did what they did while offering everyone free access while undercharging people for faster connections. In any event, that build out was based largely on what the technology could already do, not what it theoretically might be able to do in the future. The money also wasn't nearly as much. The current situation is like if the ISPs were spending ten times as much money and were all unprofitable, and traditional telecom companies providing the same service were all losing money on it. In that case it's likely that Internet service would become hard to come by and expensive after the crash and it would have delayed the technilogy's adoption.

Context

HereAndGone2 ChickenOverlord 1mo ago

For everyone here, nut perhaps especially the AGI believers, have your feelings changed at all over the last few months?

Not really. This seems broadly in line with what I expected: getting genuine human-level intelligence that can apply to general, rather than specialised, tasks is tough and getting to super-intelligence is tougher still. I am not going to say they'll never get there, but right now they've hit the limits until the next breakthrough (which could be something totally different from what they've tried up to now).

More generally, yeah they need to start serving up some steak with that sizzle. Now they need to make money, so having got people/businesses hooked on AI for doing general tasks, it's the time to start slapping prices on this. You want to keep chatting with your AI therapist/boy or girlfriend? That's going to be a subscription rate every month from now on. You can't imagine your work life without AI to write your emails or vibe code for you? Then your employer needs to fork out for a licence, bunky.

There's no such thing as a free lunch. No more "even better model for even cheaper coming next week!" and if you're already using AI, then they'll try to lock you into whatever model that may be plus SaaS rates on top of that.

Context

omw_68 ChickenOverlord 1mo ago

Even if AGI is actually possible with LLMs

I'm pretty convinced it isn't, based on a thought experiment I read about.

The argument goes basically like this:

Suppose you take the latest and greatest LLM and use it to generate a huge corpus of text and use that text to train a new LLM. And then repeat the process a number of times. Intuitively, it seems unlikely that the result will be any better than what you started with. And apparently both experiments and mathematics indicates that what happens is "model collapse," i.e. with each iteration the new model performs worse. Because you always lose a little with each iteration. Assuming that's all true, it follows that LLMs must be missing some essential attribute possessed by human brains. Because we apparently picked ourselves up by our bootstraps and created from scratch all the text which is used to create LLMs.

Anyway, it's just an argument I read and found to be persuasive. Feel free to correct me.

Another indicator that AI is a bubble

To me it's pretty obvious that AI is wildly over-hyped. But even so, the progress which has been made in the field is nothing short of astounding.

it looks like the capital needed to achieve it is drying up before it can be reached.

If nothing else, it's seems virtually certain to me that governments have realized the strategic implications of AI. Even without any private investment at all, the United States, China, and various other countries can throw quite a lot of resources at the problem.

For everyone here, nut perhaps especially the AGI believers, have your feelings changed at all over the last few months?

Not really, I'm still pretty confident that (1) within the next 10 years or so, we (humanity) will get to AGI; and (2) regardless, there will be huge changes to the world economy.

Context

RandomRanger Just build nuclear plants! omw_68 1mo ago

And apparently both experiments and mathematics indicates that what happens is "model collapse," i.e. with each iteration the new model performs worse.

Model collapse is not really a major concern. The original researchers in that paper trained small models on only AI outputs (of the previous model). Them being small models, they made mistakes and the mistakes compounded over time. It's more like a Chinese whispers experiment.

Big companies make great use of synthetic data and autonomous training, in addition to human originated data. For example, consider Deepseek R1-Zero, which was just trained on reinforcement learning, verified signals and not human reasoning patterns. It was kind of weird and switched languages a lot but it did work and got smarter over the course of training. In fact, all modern models are trained in this way. When Claude occasionally slips into Chinese for a single word it's not because any human ever does that in the training corpus, it's because during the training process they have them autonomously bootstrap and get smarter over time and that's just how it goes. AIs are omnilingual by nature it seems.

Context

omw_68 RandomRanger 1mo ago

Model collapse is not really a major concern.

If you say so, I have no reason to doubt you. But what does that say about the thought experiment I proposed? Are you saying that potentially the 1000th model could be significantly better than the first?

-2

Context

RandomRanger Just build nuclear plants! omw_68 1mo ago · Edited 1mo ago

Yes, I think so, provided you were doing the training in a sophisticated way rather than solely training on the outputs of previous models without grading for quality or accuracy. You could get AIs to review the data for example for any errors or issues or have them work out a testing suite to check if the data is right. Data quality is very important, that and the right RL techniques are basically the two key things you need most to get right.

Microsoft Phi trains just on synthetic data and is very cost-efficient, that was its primary goal, making a good very small AI that can run on most PCs. But they curated the data a fair bit to make sure it was good.

In principle I think you could do the same for big first rate AIs too. It's just that it wouldn't be efficient to leave out human data and human curation (it's there, why not use it, the competition will) and you want something humans enjoy working with and not a schizo-sounding model. It'd be like o3 at its most alien but more so:

https://arxiv.org/html/2510.27338v1

they soared parted illusions overshadow marinade illusions overshadow marinade illusions overshadow marinade illusions

Number of relevant organic products depends on whether both of!mena get.demoteudes someone and gem jer eats SAND the protonation-bids, leading possibly to three product calculation

Like wtf does that mean? Who knows? This is an artifact from inhuman RL processes. The inhuman RL processes work, that's why they're used.

Context

Tretiak If you know you know, if you don’t you don’t. omw_68 1mo ago · Edited 1mo ago

I'm also convinced that LLM's aren't the path to AGI. They were grossly overrated from the very beginning. If you want real AI, you have to dump the snake oil. It's already empirically well adduced (1, 2) that LLM's can't get you there.

There's a 'lot' of things you need before you've actually got an intelligent machine that can think. It begins with constructing mental models. From there, you navigate those models in the imagination or perceptual space laid out to work out answers to questions and work out alternatives. Once you build those spaces (and in this case I mean building creative and entirely novel ones) and navigate them to accelerate anticipatory learning. Cats for example actually "learn" how to hunt by doing this. Once you've learned how to model spaces you can them move to modeling "systems;" and that's when you get to the point where it becomes possible to give AGI a theory of 'other' minds. And a "mind" in purely mechanistic and computational terms is simply another causal system; just like "spaces" and "systems" are particular causal systems.

Notice that's exactly what an LLM 'doesn't' do. If you take a look at Waymo's World Model for instance, this is exactly more along the lines for the correct pathway of approach that you need. When you're continuously inventing new models of imaginary environments, you begin to build the skillsets that slowly become applicable to the real world. When it can do that, that's more along the lines of where scaling becomes effective. Nothing like Sam Altman's idea of where it's relevant. When you've got to that stage, AI can then begin to model it's own causal system to think about it's own thinking such that it's capable of asking itself when it's wrong; or how to stack a particular sequence of events to achieve a desired end result.

Incidentally this is the exact pathway natural selection determined for human beings and I'm thoroughly convinced it's the 'only' way to get AGI. This is what human beings fundamentally are on the naturalist paradigm: models and model builders that also navigate and move about in those models. There's zero evidence that I've seen to indicate that the money flowing into AI at the present moment is traveling down that research pathway and make no mistake, eventually the supply of it is going to run out. But make no mistake. A 'lot' of rich people are stupid, so it's doubtful it'll ever go to the real thing. They'll throw it all at the next bullshit snake oil, get their fiscal bailout and blame it on immigrants on something. Presently I'm not left feeling very optimistic about the current state of the industry. It's incredibly destructive to the environment, it dumbs down the human intelligence, and it hasn't even been proven to even work. Why is the “world” so excited?

Context

HereAndGone2 omw_68 1mo ago

What made me even more dubious about the entire grand project than I had already been was the news that now they were generating their own data to train models on. We've scraped every single bit of text produced by humans in all of history to date (ahem ahem I take my leave to doubt that, what you mean is 'we've scraped all the available English language text online') and now we need even more to feed the gaping maw of Behemoth, so now we have to invent our own synthetic text generated by AI.

Do they not remember "garbage in, garbage out" or, indeed, Flanderization? Generating your own synthetic data off sythetic data and using that to create more synthetic data and synthetic demographics is getting further and further away from reality, then some poor fool uses the conclusions your AI served up so prettily to make real world decisions and it turns out that in fact 15-24 year old mixed race lower middle class exurban teenagers with sports scholarships do NOT want to wear pink clamdiggers topped off with stovepipe hats. That's your entire chain of stores' summer wear stock now useless even on sale.

Context

FearandLoathingintheMotte HereAndGone2 1mo ago

While I share your skepticsm, this Dineen makes intuitive sense. But given that all the labs seem to be doing this and are not super concerned about it, I assume it serves it's purpose.

Mythos is a big boy, I imagine there's lot of synthetic data in there and it seems to be working

Context

BigObjectPermanenceShill omw_68 1mo ago

Intuitively, it seems unlikely that the result will be any better than what you started with. And apparently both experiments and mathematics indicates that what happens is "model collapse," i.e. with each iteration the new model performs worse.

Yes, this follows from data processing inequality.

Assuming that's all true, it follows that LLMs must be missing some essential attribute possessed by human brains. Because we apparently picked ourselves up by our bootstraps and created from scratch all the text which is used to create LLMs.

No. It applies just as well to humans. And humans did not build a civilization by thinking really hard at a corpus of word sequences. Oh, we tried this too, to an extent, and got wonders like Sophistry, Rabbinical Judaism, Medieval Scholasticism, Marxism and Rationalism. But we mostly progressed by receiving environmental feedback, filtering the generated data and preferentially training on validated fraction. Similar logic can be applied to LLMs (or any ML artifacts). This is why the basic trick of the current paradigm is RLVR (reinforcement learning with verifiable rewards). You finetune a model on successful trajectories, then you give it tasks and update towards policy that has generated correct conclusions. The primary source of updates is the model itself, steered by an external verifier. In principle they can do this fully autonomously, by building an ontology of possible tasks that can be algorithmically verified, coding these verifiers, and generating (eg relying on web search) queries against these tasks.

Even under very rudimentary realistic assumptions, generated data improves model performance.

Context

FearandLoathingintheMotte BigObjectPermanenceShill 1mo ago

Marxism and Rationalism

I don't understand how these fit into the category like the religious examples

Context

BigObjectPermanenceShill FearandLoathingintheMotte 1mo ago

Sophistry is not religious either.

I don't want to make a definition for this category because it's very loose, but basically it's "attempts at recursively improving your understanding via introspective self-play starting from a given set of verbal premises, without any significant role for procedures of updating on empirical, physical evidence".

One can see how this might well work in fields which really don't need an empirical physical component, such as math. Physics can inspire new subdomains of math, but strictly speaking we don't need this. An AI could train on its own data (+ easy verifiers and just corrected majority voting), entirely autonomously, to become an ever stronger mathematician.

Context

omw_68 BigObjectPermanenceShill 1mo ago

No. It applies just as well to humans. And humans did not build a civilization by thinking really hard at a corpus of word sequences. Oh, we tried this too, to an extent, and got wonders like Sophistry, Rabbinical Judaism, Medieval Scholasticism, Marxism and Rationalism. But we mostly progressed by receiving environmental feedback, filtering the generated data and preferentially training on validated fraction. Similar logic can be applied to LLMs (or any ML artifacts). This is why the basic trick of the current paradigm is RLVR (reinforcement learning with verifiable rewards). You finetune a model on successful trajectories, then you give it tasks and update towards policy that has generated correct conclusions. The primary source of updates is the model itself, steered by an external verifier. In principle they can do this fully autonomously, by building an ontology of possible tasks that can be algorithmically verified, coding these verifiers, and generating (eg relying on web search) queries against these tasks.

Sadly, I do not understand this. Would you mind giving me a concrete example of the RLVR process you refer to?

Context

Corvos omw_68 1mo ago · Edited 1mo ago

Question: What is 2 + 2

Model: Hmm, that’s 2 and then another 2, so 22.

AUTOMATIC VERIFIER: WRONG

——

Model: Hmm, that’s the sum of 2 and 2, so 4

AUTOMATIC VERIFIER: CORRECT.

The model is tweaked slightly to make the second output more likely, and that output is potentially added to the training set. Repeat for arbitrarily complex mathematics and other problems as long as the solution can be verified, even if it isn’t known in advance. In this way you can generate potentially infinite amounts of data, albeit limited to certain domains. However, problem solving ability has so far extended quite well to other domains even when trained in this manner.

Context

omw_68 Corvos 1mo ago

Question: What is 2 + 2

Model: Hmm, that’s 2 and then another 2, so 22.

AUTOMATIC VERIFIER: WRONG

——

Model: Hmm, that’s the sum of 2 and 2, so 4

AUTOMATIC VERIFIER: CORRECT.

The model is tweaked slightly to make the second output more likely, and that output is potentially added to the training set. Repeat for arbitrarily complex mathematics and other problems as long as the solution can be verified, even if it isn’t known in advance. In this way you can generate potentially infinite amounts of data, albeit limited to certain domains. However, problem solving ability has so far extended quite well to other domains even when trained in this manner.

Generally speaking, how does this "automatic verifier" work? Obviously I am not an expert but it seems like this automatic verifier would require human level intelligence.

Context

Corvos omw_68 1mo ago · Edited 1mo ago

In this toy case it's just literally a calculator (a snippet of python code). The problem is 2+2, the calculator just does 2+2 and checks if the answer is the same as the LLM output. (The LLM is trained to format the final answer in a particular manner and wrap it with special tokens, so the verifier doesn't have to be able to interpret natural language.)

You can get surprisingly far with this. If it's a calculus question, you can use an automatic differentiator to check it. Likewise for factorisation questions, metric conversion questions, algebraic manipulation of formulae, etc. you put a little work into programming the automatic verifier and you can get an infinite number of problems.

If you're a big company, you might have human domain experts doing some of this work too. If you're a smaller company you have a big LLM do verification for the smaller ones.

Then you have leetcode and programming problems, and again you can verify these automatically. Does the program compile? Is the program output what was requested? Is it faster than the previous solution?

Like I said, this only works for maths, programming, and other domains where you can verify the answer with a computer relatively cheaply, but contra the model of multiple intelligence factors, heavy training on maths and programming seems to improve general intelligence and reasoning quite well.

Context

FearandLoathingintheMotte Corvos 1mo ago

Like I said, this only works for maths, programming, and other domains where you can verify the answer with a computer relatively cheaply,

This is what the armies of Kenyans are for. I'm actually surprised progressive libs don't use "muh mechanical Turk sweatshop" as an anti AI talking point.

Also I think in some ways what they use the thumbs up/down for? I saw people saying the sycophantic behavior of the 4o era was people love being glazed and thumbs that up a LOT so it crept in.

Context

omw_68 Corvos 1mo ago

In this toy case it's just literally a calculator (a snippet of python code). The problem is 2+2, the calculator just does 2+2 and checks if the answer is the same as the LLM output. (The LLM is trained to format the final answer in a particular manner and wrap it with special tokens, so the verifier doesn't have to be able to interpret natural language.)

You can get surprisingly far with this. If it's a calculus question, you can use an automatic differentiator to check it. Likewise for factorisation questions, metric conversion questions, algebraic manipulation of formulae, etc. you put a little work into programming the automatic verifier and you can get an infinite number of problems.

If you're a big company, you might have human domain experts doing some of this work too. If you're a smaller company you have a big LLM do verification for the smaller ones.

Then you have leetcode and programming problems, and again you can verify these automatically. Does the program compile? Is the program output what was requested? Is it faster than the previous solution?

Like I said, this only works for maths, programming, and other domains where you can verify the answer with a computer relatively cheaply, but contra the model of multiple intelligence factors, heavy training on maths and programming seems to improve general intelligence and reasoning quite well.

Thank you for the explanation. My instinct is that even with this type of training, LLMs will still be missing something essential, but I will give it some thought.

Context

Corvos omw_68 1mo ago · Edited 1mo ago

My instinct is that even with this type of training, LLMs will still be missing something essential

Your instinct is probably correct IMO. This form of synthetic data generation is just another tool in the box, it's not the key to everything.

I will say that we've got far further than I ever expected us to get using these methods. I'm instinctively a Gary Marcus-style fan of embodiment and unsupervised learning, it seemed clear to me pre-LLM that models wouldn't be able to be anything resembling intelligent without a body and the ability to interact with the real world and 'test' their understanding in real time. When LLMs came in, I felt I had to admit that I'd been wrong. It seems clear to me that we have managed to get to something I would call 'intelligence' (even if it's spiky and fails in some cases where humans would not fail) through these means. So I no longer trust my instincts as much.

This kind of semi-supervised exploration seems like a good compromise for now. I am also very interested in LLMs that can combine next-token video generation and text generation, because video generation requires understanding a bunch of stuff about the real world in order to produce consistent results, but that's a way off.

Context

RedRegard BigObjectPermanenceShill 1mo ago

We formulated our understandings of the world and our interactions with it into techniques and theories, and when we build stuff we do so by employing those techniques and theories from a standpoint of engineering and design. LLMs are merely next word generators. They can recall many of the things in their databases and expurgate them to us, but their outputs aren't the products of strategically employed techniques and theories. This is inherently limiting for the complexity of the outputs they can give us.

Context

BigObjectPermanenceShill RedRegard 1mo ago

I don't understand this claim. Who "we"? Most people learn almost everything they know about economically valuable complex domains from textbooks, manuals, teacher's answers and such second-hand information, and then polish it with on-site instructions and increasingly long-range, open-ended training. They don't build much in the way of their own "techniques and theories" and there's not a world of difference from what LLMs now do. Maybe you're overestimating how much they depend on pretraining at this point. Well, it's believed that >50% of compute in some of the last-generation models goes towards RL, not pretraining on human data.

And as I've said in the opening post: we have literally just seen an LLM employ a technique no human mathematician had thought of using in this specific context, to solve a problem that had remained unsolved since 1968 – over half a century! It wasn't some Riemann hypothesis tier challenge, but it wasn't exactly obscure either, smart professional mathematicians had been working on it for years before GPT 5.4 Pro came and did this. Moreover, GPT does this reliably. In the comments you can see Terence Tao, arguably the guy with the greatest knowledge of "techniques and theories" of math on the planet Earth, an expert of such level that he actively avoids getting roped into solving other people's frontier research level problems, seriously engage with GPT's work:

Thanks! So there does seem to be something special about the original von Mangoldt process - the associated invariant measure ν is extremely smooth (in the Archimedean sense), being asymptotic to 1/nlogn , while all the variants of this measure pick up arithmetic factors such as 1∏pvp(n)!

A little surprising to me that removing individual primes instead of prime powers makes it less likely to have prime multiplicity, but I'll chalk it up to one of the numerous probability paradoxes that arise when one tries to compare various weighted expectations. But these factors mean that one cannot immediately solve #1196 by using these processes instead of the von Mangoldt one, as the invariant measure is no longer asymptotic to 1/nlogn

So in some sense the AI was "lucky" in finding the one approach that actually worked; it would be interesting to publish the traces to see if there was a lot of brute force involved in trying nearby approaches which didn't quite work.

……

Arb Research has kindly shared with me ten separate runs of GPT 5.4 Pro on this problem #1196 (with a request not to use internet search). From a quick reading, it appears that 8 of them claimed successes, with the other 2 rating the claim as plausible. Interestingly, several of the successful runs actually obtained the sharper formula ∑n≤Aν(n)≤1 that was also derived here, with ν essentially the Mellin transform of 1/ζ(s)

Almost all of the runs latched on to the approach of constructing a random chain with a good hitting probability (many runs referred to this as the "Lubell method", after the Lubell of the LYM inequality).

Another notable fact is that none of the runs highlighted the von Mangoldt process that was a prominent feature of the original run (and none of them mention flow networks either). Runs 4 and 7 have an interesting alternate construction of the upward divisibility chain in terms of exponential clocks in the prime factorization indices that actually looks rather tractable to work with; I will need to study this construction further when I have more time.

Basically it seems that for this particular type of problem there are several natural ways to proceed that make the problem actually quite tractable; the literature had managed to focus on a somewhat suboptimal approach in which the opening move was to transfer the problem to a continuous setting, but the AI runs consistently stayed in the discrete world and managed to utilize various existing tools from discrete mathematics (mostly centering around methods relating to the LYM inequality) to reach a solution.

So I don't know. Where's this inherent limit on complexity that you're talking about? What in our culture is truly irreducibly complex, if not math that can surprise Terence Tao?

This is getting a bit comical, don't you think?

Context

RedRegard BigObjectPermanenceShill 1mo ago

I must differ here as I do not see evidence (in domains I'm able to judge) of AI employing techniques and theory in its tasks. Ask it to mimic Stephen King and then compare the output to actual Stephen King. You'll understand what I mean.

I cannot speak to math here as I lack competency in that. But from what I hear from coders, its similar in that domain as well: AI can expurgate volumes of legible code, but it cannot utilize structure.

Humans have techniques and theories which inform their decisions high and low as they layer things together using judgement, intuition, etc., while AIs appear to generate text using probabilistic hacks. AI appears to be able to recreate low-complexity patterns from its dataset. I disagree that these processes are related except at a very basic level.

Context

BigObjectPermanenceShill RedRegard 1mo ago

We have a good idea of how to train AI to solve mathematical problems, of virtually unbounded complexity. In the course of this, AI clearly learns "techniques" as shown here, if not "theories". I don't think King's prowess is theory-driven either, but in any case we don't have a good idea of how to train AI to be a good prose writer. We have some ideas, but are unlikely to act on them. There's not much money to be made in it, and plenty of highly motivated enmity – AI is already widely hated. and yes, autoregressive generation for the prompt "write like King" is not like King actually writing a novel. We have such tricks though.

My point is, it's not a general principle that AI will only rehash human techniques in some uninspired "probabilistic" way. If there is a hill to climb, such that "good" and "bad" outputs with regard to the problem statement can be distinguished, AI can bumble its way up the hill and also find new tricks. We've seen this before LLMs, with AlphaGo and move 37, we're starting to see it with LLMs.

while AIs appear to generate text using probabilistic hacks.

Human mind runs entirely on probabilistic mush. Neural networks were invented as approximation of our own approximate learning. But probabilistic decision processes can have clear enough decision boundaries that they become able to operate with "abstractions", "symbols" or "theories". They also remain able to fail. For example, you are failing to update on evidence, because you haven't been trained to take input like "Terry Tao is surprised" seriously and think it's infinitely less interesting than your preconceived notions, basically some dweeb noise. Unlike an LLM, you can update at lifetime, so maybe you'll reread the above post and see how it contradicts your position.

Context

roystgnr BigObjectPermanenceShill 1mo ago

This is getting a bit comical, don't you think?

Seen on X:

"As the Earth is being disassembled:

"Guys, stop over-reacting! The concept of a Dyson Sphere was already in the training data!"

Context

SnapDragon roystgnr 1mo ago

Heh. See, the AI making that Dyson Sphere doesn't have general intelligence, I bet it can't get the Wordle 6 days in a row like me.

Context

dailydogma omw_68 1mo ago

Because we apparently picked ourselves up by our bootstraps and created from scratch all the text which is used to create LLMs.

Or it holds for human brains, but we train on something higher than our text, and so LLMs are upper bounded by us if they train on our text, which is like our cognitive discard.

Context

SpringFish omw_68 1mo ago

Suppose you take the latest and greatest LLM and use it to generate a huge corpus of text and use that text to train a new LLM. And then repeat the process a number of times. Intuitively, it seems unlikely that the result will be any better than what you started with. And apparently both experiments and mathematics indicates that what happens is "model collapse," i.e. with each iteration the new model performs worse. Because you always lose a little with each iteration. Assuming that's all true, it follows that LLMs must be missing some essential attribute possessed by human brains. Because we apparently picked ourselves up by our bootstraps and created from scratch all the text which is used to create LLMs.

Where did you hear that anyone is proposing to reach AGI via LLMs by training LLMs on their own generated output? That's clearly dumb and not what people propose. The model has to interact with something real, it has to "touch grass", for it to work. That's the external information. For example a coding LLM can get an informative learning signal by running its generated code through the compiler and running tests and seeing if the resulting program compiles, passes the tests, uses less RAM or is faster, etc. I'm not saying that leads to AGI, but there are clearly ways to obtain information from the outside world, and it's not just about sewing a pipe from the LLM's ass back into its mouth.

Context

omw_68 SpringFish 1mo ago

Where did you hear that anyone is proposing to reach AGI via LLMs by training LLMs on their own generated output? That's clearly dumb and not what people propose.

I presented the idea as a thought experiment, not as an actual proposal.

Context

SnapDragon omw_68 1mo ago

No, you presented it as a conceptual proof that LLMs will never get better. All it takes is one innovation that addresses your concern about recycled data to make it invalid. All arguments about intelligence are necessarily a bit wishy-washy, mind you, so I'm not saying your thought experiment is useless.

I think if you really want to argue that LLMs have an inherent cap on their capability, you should address their actual algorithm rather than how they're trained. However much we rejigger them with CoT thinking and non-text data sources, they're fundamentally not designed for anything more than next-token prediction. It should be a source of constant surprise that they do so well on such a wide variety of non-creative-writing tasks (look at early SSC posts about GPT3's output to see this surprise evolve in real time). You could argue that if LLMs end up hitting a soft or hard limit, that's really just the "surprise" petering out, that we really can't just take a glorified text completer and keep pumping neurons into it until it's a genius.

I don't personally believe this will happen, but hey, I don't think anyone really knows for sure.

-1

Context

omw_68 SnapDragon 1mo ago

No, you presented it as a conceptual proof that LLMs will never get better.

Umm, no. In fact I totally think that LLMs will get better.

I presented it as a thought experiment to show that LLMs seem to be missing some essential attribute possessed by human brains.

ll it takes is one innovation that addresses your concern about recycled data to make it invalid.

Yes, of course. Well, perhaps more than one innovation. But yes, if LLMs are missing something important; and we create LLMs 2.0 which include that important thing (or those important things), then yeah, we'll have AGI.

Context

birb_cromble omw_68 1mo ago

Because we apparently picked ourselves up by our bootstraps and created from scratch all the text which is used to create LLMs.

This is clearly proof that the Saurian Overlords of Agatha in the Hollow Earth taught us language.

More seriously, isn't there a lot of research going into using synthetic data safely? I thought that the current consensus was that you can avoid model collapse with synthetic data if it's properly labeled as such.

Context

omw_68 birb_cromble 1mo ago

More seriously, isn't there a lot of research going into using synthetic data safely? I thought that the current consensus was that you can avoid model collapse with synthetic data if it's properly labeled as such.

I have no idea, but intuitively it seems to me that training with synthetic data is something that can't possibly work. To be sure, I am neither a mathematician nor a computer scientist. But as I understand things, the basic operation of an LLM is to predict the most likely words to follow a string of words. Which is done by training a neural network on lots of text. It's difficult for me to see how an LLM could get better at predicting words if it's trained on its own output. It seems to me that if you created an LLM using a corpus of synthetic data, the best you could realistically hope to do would be to reverse-engineer the original LLM which had created the synthetic data in the first place.

Anyone who is a subject matter expert, feel free to correct me.

Context

FearandLoathingintheMotte omw_68 1mo ago

I have no idea, but intuitively it seems to me that training with synthetic data is something that can't possibly work.

I agree with you intuitively. However large amounts of Serious People are spending large amounts of Serious Economic Resources to do literally just that. So they clearly see something there.

Mythos seems to be yet another OOM of compute and training and we can be sure a good % of that was synthetic at this point as they already ate the entire corpus of human writing a while ago

Context

omw_68 FearandLoathingintheMotte 1mo ago

I agree with you intuitively. However large amounts of Serious People are spending large amounts of Serious Economic Resources to do literally just that. So they clearly see something there.

Part of me wonders whether this might be some kind of mass delusion and/or grift. It wouldn't be the first time something like that has happened. That being said, I am not a subject matter expert and haven't studied these issues carefully so I couldn't really say.

Context

BigObjectPermanenceShill ChickenOverlord 1mo ago

I think mods should intervene… somehow, because these posts are getting too frequent, too obviously agenda-laden, and aren't even remotely about the culture war (though AI discussion as such is necessary). It's becoming one guy's AI Bad blog.

Look man, it seems that the Opus 4.7 tokenizer change functionally amounts to them forcing each whitespace be a separate token rather than part of any subword, removing all whitespace-containing subwords from the vocab; it does not change the compression rate for whitespace-free languages. I do not know why Anthropic did that, but my hypothesis is that they've found in experiments that this is better in some valuable scenarios, such as related to analyzing code for vulnerabilities; trained Claude Mythos with it; and now are pushing Opus further via distillation from Mythos (this is suggested by it being weirdly different, and them saying they now focus on GraphWalks, which Mythos is doing really great on, for evaluating long-context performance).

For logprob distillation, you ideally need identical vocabulary (there are copes for inter-tokenizer logprob matching, but better just change the student model's tokenizer and heal it).

As a datapoint in the timeline of AI progress, it's a total nothingburger, a non-news.

Anthropic's move here (combined with them handicapping Opus 4.6 a few weeks ago) seems to clearly be an attempt to achieve profitability.

Do you realize that while this is bad for users, it's not that good for Anthropic? The compute and memory cost per a sequence of 1 million tokens is the same whether these tokens encode 1 million or 500 thousand English words. It doesn't improve the profit margin. Of course, now that everyone's codebase is functionally like 40% "larger", they are selling more tokens to their captive clientele for each plaintext-identical request. But this would be such an awkward growth hack. And on Claude Plan, cache is free anyway, so their margins could even shrink.

For everyone here, nut perhaps especially the AGI believers, have your feelings changed at all over the last few months?

Yes. After GPT 5.2 I've become a bit paranoid that we will have AGI before 2028 and are totally unprepared. Recent events such as GPT 5.4 autonomously solving Erdos #1196 with a trick that no human mathematician expected corroborate my feeling.

Context

SnapDragon BigObjectPermanenceShill 1mo ago

I agree that his posts are far below the average level of quality here, but they're not THAT frequent, and I wouldn't want them to be modded. This is supposed to be a free speech forum, after all. Our whole thing is that the good ideas are supposed to win over the bad ideas, and fortunately he seems to be getting plenty of pushback. And there are some decent debates happening in the replies, it's just that they're despite OP, not because of him.

Context

dailydogma BigObjectPermanenceShill 1mo ago

After GPT 5.2 I've become a bit paranoid that we will have AGI before 2028 and are totally unprepared.

What is AGI? Will it cure blindness and reverse aging? What about GPT 5.2 made you think we're 2 years away from that?

Context

BigObjectPermanenceShill dailydogma 1mo ago

I don't pretend that AGI is some clean concept. For me, it means a very banal thing: an AI that can reliably replace a human worker. "I can point an LLM at a knowledge work task and it'll do it". Or at least: it'll commit to an honest humanlike attempt to do the job, it won't run out of context length, won't hallucinate something superficially related, won't trip on its own shoelaces; it'll reason about the problem, identify what it lacks, collect the necessary data, maybe do some trials in a scratchpad of sorts, consistently orient towards truth and common sense, do its best and then admit to me if something was still genuinely beyond its ability.

5.2 was a very big jump over 5/5.1 and it showed, in my opinion, a very powerful awareness of problems, an ability to contextualize and deconstruct them. 5.4 and the upcoming 5.5 clearly continue this trend. They've figured something out and I believe it's on the path to AGI as defined above, modulo technological details that seemingly won't be a long-term blocker.

Will it cure blindness and reverse aging?

Will anything? Will human scientists? I don't know. Plenty of things that human-level intelligence has so far proven unable to solve. But so long as science is knowledge work, yes I expect AGI to do it at least as well as we do.

Context

SnapDragon dailydogma 1mo ago

I have the unpopular (and, ok, partially tongue-in-cheek) position that we've already hit AGI. What LLMs can do is already very general, just not fully general. But I wish it was emphasized more that we messy meaty humans don't have fully general intelligence, either - it doesn't matter how you bring up a precocious child, they're not going to be able to rotate 50-dimensional shapes or approximate partial differential equations in their head, and all but the best of us max out at fluency in a few languages, or memorizing a few thousand digits of pi. We're just so used to the things we (and everyone else we've ever known) can't do in our heads that we intuitively don't even think of them as tests of "intelligence".

Someone from the early 2000s, having LLM capabilities described to them, would indeed think that it meets the definition of general intelligence. What we kind of subconsciously expected, but didn't happen, was that someone would just suddenly launch an AI product that lit up a giant neon sign saying "AGI ACHIEVED!". Instead, the AI we've developed so far just turned out to have a different set of strengths and weaknesses than us. By the time we're able to bring those weak points up to human level - i.e., where an AI can perform equally well as an average human on any task, which is what a lot of people think of when they say "AGI" - it'll actually be vastly superhuman in the things that come naturally to it. (LLMs are already superhuman on language comprehension, after all.)

Context

rae SnapDragon 1mo ago

I agree, according to any pre 2019 definitions LLMs would 100% be AGI! It’s funny how the goalposts were immediately moved the moment we achieved it, probably because LLMs didn’t fit into our sci-fi preconceptions of how an AI should behave or suddenly “awaken”, and their strengths and weaknesses are completely different from humans, ordinary software, or stereotypical science fiction robots.

Context

Brainwavez rae 1mo ago · Edited 1mo ago

In fairness, the goalposts were moved because we realized LLMs couldn't do certain AGI things despite passing the "AGI" tests.

For example, they can pass a Turing test consisting of a independent questions with short answers, but could never pass a "Turing test" over years, because they have limited context windows (and even with tools and a filesystem, too many things change for them to store and organize). They've effectively passed ARC-AGI 1 and ARC-AGI 2, but not yet ARC-AGI 3, while a median (from their tests) human passes all (play it yourself).

They'll be "true AGI" when we can no longer create (non-physical) tests they don't immediately pass.

Although I agree with SnapDragon that they're "partial AGI". I believe the missing component is continuous learning: they start output like a human, as they've been trained to, so if they continued to be "trained" on their observations, presumably they'd continue to output like a human.

Context

SnapDragon Brainwavez 1mo ago

In fairness, the goalposts were moved because we realized LLMs couldn't do certain AGI things despite passing the "AGI" tests.

Yeah, no argument here. Like you said, it's kind of natural that we adjusted our expectations as we learned more about the nature of intelligence (now that we have more than just one kind to generalize from). We sort of assumed that a lot of other human-like capabilities would necessarily come along for the ride when an AI passed the Turing Test, and that was wrong.

Just as long as we don't keep using "it's not true AGI!" as a cognitive stop sign to avoid recognizing the incredible progress we've made.

Although I agree with SnapDragon that they're "partial AGI". I believe the missing component is continuous learning: they start output like a human, as they've been trained to, so if they continued to be "trained" on their observations, presumably they'd continue to output like a human.

Indeed. I've heard of efforts to graft a learning layer onto LLMs (with a "memory" that's an embedding rather than just CoT text), but obviously it hasn't worked so far, and maybe it never will. Also that still seems like a short-term solution.

Context

BurdensomeCount Thou Shalt Read BC's Writings! BigObjectPermanenceShill 1mo ago

People are shitting on the fact that 35% more token usage means they have to pay more for the same text. I'm pissed off by the fact that 35% more token usage means the amount of actual text/code rather than tokens before Opus 4.7 needs to compact is down by around 25% meaning more frequent compactions and worse long context performance.

Just give us back release Opus 4.6, that was a great model.

Context

ChickenOverlord BigObjectPermanenceShill 1mo ago · Edited 1mo ago

I think mods should intervene… somehow, because these posts are getting too frequent, too obviously agenda-laden, and aren't even remotely about the culture war (though AI discussion as such is necessary). It's becoming one guy's AI Bad blog.

I made 3 top level posts about it over the course of about two weeks, I hardly think that's excessive. And I'm still commenting on plenty of non-AI topics, I'm hardly a one-trick pony like some other posters. But if you think it's an issue, then feel free to report me.

Context

zeke5123a ChickenOverlord 1mo ago

Compare that to some posters who seem to make every post about the joos

Context

FearandLoathingintheMotte zeke5123a 1mo ago

the joos

Who do you think makes the AI

Context

BigObjectPermanenceShill ChickenOverlord 1mo ago

Could you at least add more substance than "Opus changed tokenizer, therefore, as I've already said, AI is a bubble"?

Context

ChickenOverlord BigObjectPermanenceShill 1mo ago

Opus is increasing end-user costs, not just changing the tokenizer, and that's the part I said is indicative of a bubble because it looks like they're finally needing to squeeze a profit out of customers instead of subsidizing their usage with VC money. That's the crux of my argument, not just "tokenizer chaged -> bubble." If you're not going to even try to summarize my argument in good faith I see no reason to engage with you.

I also talked about changing sentiment in the tech community, which you completely ignored.

Context

BigObjectPermanenceShill ChickenOverlord 1mo ago

Opus is increasing end-user costs, not just changing the tokenizer

Is it? Is it now? For example, on this bench Opus 4.7 is almost as strong as Opus 4.6 but 8.3x cheaper, because it uses vastly fewer tokens. How does this fit into your theory?

Context

ChickenOverlord BigObjectPermanenceShill 1mo ago

Given that pretty much every single model trains against the benchmarks, I'm going to go with the end users on HN reporting running out of tokens way faster (for the same sorts of tasks they were doing on 4.6) over another synthetic benchmark.

Context

Amadan Letting the hate flow through me BigObjectPermanenceShill 1mo ago

I think mods should intervene… somehow, because these posts are getting too frequent, too obviously agenda-laden, and aren't even remotely about the culture war (though AI discussion as such is necessary). It's becoming one guy's AI Bad blog.

I could name half a dozen topics that come up again and again, sometimes in tedious fashion, and sometimes by a few individuals who post about little else. Generally speaking, we don't "intervene" because someone is tired of topic, or even because we are tired of a topic.

And everything is "obviously agenda-laden" to people who have an opposing viewpoint.

If you don't like a post, you can ignore it or respond to it. You can even report it if you genuinely think it violates the rules. (Most reported posts are not violating the rules, they are just violating the reporter's sensibilities.)

Context

Jiro Amadan 1mo ago

(Most reported posts are not violating the rules, they are just violating the reporter's sensibilities.)

An issue is that people complain about posts and moderators say "you complained about it, but nobody reported it". This encourages over-reporting.

Context

Amadan Letting the hate flow through me Jiro 1mo ago

Rarely do we say "You're right, that post should have been modded but we didn't notice it because no one reported it."

Instead, we tell people not to publicly demand someone be modded or attack them, but simply report the post if they think it warrants it.

People do over report, but that's because they use the report button to mean "I don't like this." We prefer that to public callouts, but people should really just let things go if they're mad at what someone wrote unless it was truly a bad post. ("Bad" in the sense of being against what the Motte is intended for, not bad in the sense that you don't like it.)

Context

BigObjectPermanenceShill Amadan 1mo ago

OK but do you agree that "Anthropic has slightly altered their tokenizer in a 0.1 update for Opus" is not really "controversial issues that fall along set tribal lines"? Which tribe has a strong position on Anthropic's tokenization design choices?

-3

Context

rae BigObjectPermanenceShill 1mo ago

You underestimate how easy it is to turn any random technical issue into a heated controversy.

Context

Amadan Letting the hate flow through me BigObjectPermanenceShill 1mo ago

Not every CW post has to fall strictly along tribal lines.

I suggest making use of the scroll button rather than demanding a Motte precisely curated to your tastes.

Context

erwgv3g34 Amadan 1mo ago · Edited 1mo ago

What about the idea of making a separate thread? I'm very interested in AI, but it's a poor fit for the Culture War Roundup. If "Transnational Thursday" and "Tinker Tuesday" can get their own weeklies, surely this deserves one, too? We just have to decide on an alliteration! (Claude recommends either "Machine Monday" or "Singularity Saturday")

Context

SoulFire a natural neural network erwgv3g34 1mo ago

Quarantine threads are where discussion goes to die.

Context

ChickenOverlord erwgv3g34 1mo ago

I'm still a fan of Butlerian Jihad General

Context

ChickenOverlord BigObjectPermanenceShill 1mo ago

OK but do you agree that "Anthropic has slightly altered their tokenizer in a 0.1 update for Opus" is not really "controversial issues that fall along set tribal lines"? Which tribe has a strong position on Anthropic's tokenization design choices?

What an excellent and totally accurate summarization of my post and the argument I was trying to make in it.

Context

Brainwavez BigObjectPermanenceShill 1mo ago · Edited 1mo ago

I think mods should intervene

Another call for a recurring Butlerian Jihad Roundup, so AI/tech drama doesn’t detract (or get detracted by) Trump/woke drama.

Context

FearandLoathingintheMotte Brainwavez 1mo ago

AI/woke drama is my favorite combo but sadly not present here

Context

self_made_human amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi ChickenOverlord 1mo ago

The most plausible reason for changing the tokenizer is that a more fine-grained tokenizer increases model performance, at the cost of more compute per token (we're breaking up the same input into more tokens). My understanding is that you don't even need a new base model to do this, and that the gains are particularly pronounced for arithmetic and coding. It's not a free lunch, but there are pros and cons that don't just amount to Anthropic nickle and diming their customers.

Even if AGI is actually possible with LLMs (or at all, but I'm not trying to start a discussion on metaphysics here), it looks like the capital needed to achieve it is drying up before it can be reached. Anthropic's move here (combined with them handicapping Opus 4.6 a few weeks ago) seems to clearly be an attempt to achieve profitability. The free/subsidized rate train for end users has pulled into the station, and now you have to pay more for the same (or worse) capabilities you were enjoying before.

Anthropic is, by far, the most compute strapped frontier LLM company. They are also not the only frontier LLM company. Until at least Google and OAI engage in the same putative enshittification (which I am far from sure is even happening wrt Anthropic), then you're kinda jumping the gun here.

Context

BurdensomeCount Thou Shalt Read BC's Writings! self_made_human 1mo ago

Google and OAI have already engaged in the enshittification. Their latest models (apart from GPT-5.4, which is genuinely a good model) hallucinate in ways the earlier offerings 6 months ago weren't doing.

Context

self_made_human amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi BurdensomeCount 1mo ago

I haven't noticed that, and I do use all of them regularly. If you have some kind of formal benchmark to point at, I'd be more receptive.

Context

BurdensomeCount Thou Shalt Read BC's Writings! self_made_human 1mo ago

Interesting, I don't have any formal benchmark, much like we don't have any formal benchmarkes for the Opus 4.6 degradation beyond people (and AMD) complaining, but that's very much the impression I personally get from using Gemini 3.0+ and ChatGPT (before 5.4).

Context

RandomRanger Just build nuclear plants! ChickenOverlord 1mo ago

We need to distinguish between 'capital needed to achieve it drying up before it can be reached' and 'demand is so high that they have to ration resources'.

They kind of look the same but the underlying meaning is different. The former implies the Bubble is Popping whereas the latter implies It's Not a Bubble.

Firstly, I don't think the capital is drying up. Hyperscaler AI infrastructure spending rises year by year. Secondly, demand is huge. Anthropic ARR is now at $30 billion ARR (by their figures, though OpenAI says the real figures should be a few billion lower, depending on how you measure revenue shares). Whichever way you look at it, huge demand growth. $87M annualized run-rate in January 2024 → $1B by December 2024 → $9B by end of 2025 → $14B in February 2026 → $30B in April 2026 is pretty impressive, even if its juiced.

Clearly they're getting lots of demand. There are also issues with slow datacentre rollouts and delays due to the absolute state of Western electricity and construction sector. I think the phenomenon we're absorbing is rooted in high demand, not investors getting antsy and demanding higher returns.

Context

aqouta ChickenOverlord 1mo ago

Whether a random patched version is a significant upgrade is hardly strong evidence in any direction whether I is a bubble. Did you ever try my suggestions under your last fud post?

Context

ChickenOverlord aqouta 1mo ago

Did you ever try my suggestions under your last fud post?

Came down with a cold, missed work for several days, and forgot. Sorry! I'll try to remember this week.

Context

EverythingIsFine Well, is eventually fine ChickenOverlord 1mo ago

It's not about profitability, it's that they got a giant wave of users but not enough compute to fill that demand. So, it's pretty obvious what must happen next, you do some mix of increased mandatory token efficiency (adaptive reasoning) + stricter limits (across the board, free and paid, but mostly targeting the super-user hogs who theoretically will pay for extra API usage after limits run out).

I will say though this probably bodes poorly for Claude in the near-medium term, because ChatGPT had the same thing more or less happen with their 5.0 launch (forced adaptive model selection for mandatory token efficiency) and it definitely took the wind out of their sails for at least 4-5 months.

At any rate, however, I strongly, strongly disagree about this empowering the skeptics (or being evidence of a shift against AI adoption). The fact that people are whining about problems with their tools is selection bias. It's kind of like the classic armoring spots on the airplane that didn't have holes (because they didn't survive to be examined), in that people wouldn't complain so vociferously if they weren't so needy for the tool in the first place. The complaints to me are evidence of a generalized latent enthusiasm, not pessimism. In the grand scheme of things, it's far, far better for a company to have complaints that users can't get enough of their product, than it is for the product to be simply ignored. In the near term, I expect a decent chunk of users to swing back toward the OpenAI offering, Codex (which is undergoing a PR blitz of sorts right now)

Context

urquan Hold! What you are doing to us is wrong! Why do you do this thing? ChickenOverlord 1mo ago · Edited 1mo ago

I’ve found Opus 4.7 to generate better and more human-like text vs Opus 4.6 for my purposes, but I can’t indicate whether it’s any better at coding. I use a mix of LLMs for various things, and my feeling is that ChatGPT is more bland and LLM-y in its output, but much more generous with usage limits. In the limited coding I’ve done, I haven’t seen much of a difference between them. ChatGPT’s image generation model is also nice, as far as my amateur impression can tell.

But it’s a constant fight with the usage limits on Claude, whereas ChatGPT feels like it flows freely. My current pattern is to default to Chat for most informational and coding purposes and bring out Claude Opus for when I want a more thoughtful analysis of something. I don’t know how Sonnet compares to ChatGPT.

Gemini feels massively behind in both usability and tooling, and its integrations with third parties are only good for Google products.

Context

gattsuru ChickenOverlord 1mo ago · Edited 1mo ago

TracingWoodgrains has been a fan of Opus, and seems a little frustrated by 4.7. That said, it may depend on your use case.

I'm generally not that surprised if there are occasional stinkers. I've given specific caveats around other vendors : it's just too easy to benchmax or find a bad local maxima such that there's some minor revisions that either don't have any benefit, or only have backend benefit. Repeated problems or broader-scale issues would say more, but there's been a number of surprisingly good models from other vendors recently, including small-parameter and open-model approaches.

I'm skeptical that LLMs are themselves enough to go to AGI, but I'm also skeptical that they're going to stop at exactly last month's level of capability, and last month's capabilities included solving some Erdos problems. There's a lot of low-hanging fruit just in terms of UI and process tooling, nevermind areas where we haven't applied existing tools.

That said, I recognize that a lot of the major AI vendors have ranged from scumbags to scammers. Altman's ridiculous behaviors, especially in relation to RAM, have made the most enemies (maybe even more than Musk's more conventional culture war), but the best PR the whole faction has got has come from anti-AI people, so that's a whole big mess.

Context

Tretiak If you know you know, if you don’t you don’t. gattsuru 1mo ago

LLM’s are highly unlikely to get us to AGI. It’s the wrong architecture for getting there, period. I’ve continued to play around with Gemini and some other models here and there and while it can do some things that I think are cool and novel, my biggest surprises have come from its inability to proceed with context I’ve explicitly given it and it continues to get basic things wrong.

The pain in the ass I’ve experienced with model drift and trying to keep it on track continually leaves me wondering where its value truly lies. I’ve had it spit back to me literally every type of answer under the sun by the time I get it to zero in on the proper context and details and by the time I get there, I’m no longer entirely confident that it has the correct chain of reasoning.

Context

celluloid_dream Tretiak 1mo ago

"LLM’s are highly unlikely to get us to AGI. It’s the wrong architecture for getting there, period."

What makes you so sure about that? This sounds to me like: "fixed-wing aircraft are unlikely to get us to flight. It's the wrong architecture for getting there, period. We need flapping wings. Every animal that flies flaps its wings"

Context

Tretiak If you know you know, if you don’t you don’t. celluloid_dream 1mo ago · Edited 1mo ago

I’m confident about it because LLM’s lack a true capability to understand the world.

They use statistical correlation to predict the next likely token, which means they mimic intelligent reasoning, rather than possessing it. They also lack any concept of a “world model," and don’t understand causal relationships of the world, only the linguistic patterns describing it.

Even the most advanced models that are “capable” of advanced reasoning struggle ‘massively’ with distribution shift and they fail whenever they face situations outside their training data. Because of the way they train on data, their understanding of things doesn’t evolve in real-time. They can’t learn from continuous, active interaction with the world.

Gary Marcus had a good talk on the problems endemic to these systems fairly recently.

Context

omw_68 Tretiak 1mo ago

I’m confident about it because LLM’s lack a true capability to understand the world.

For what it may be worth, I tend to agree with this. Actually, I think it was Gary Marcus who observed that when LLMs play chess, they still make the occasional illegal move despite the fact that they are trained on databases which contain both the rules of chess AND millions of historic chess games. (To be sure, I have not verified this myself.)

By contrast, a fairly bright child can be trained in a few hours how to play perfect chess -- perfect in the sense of never making an illegal move.

Another example, of course, is the car wash puzzle.

Another is the puzzle I posed here a few months back about the NYC helicopter trip.

It just seems like LLMs at present don't actually model the universe. What it reminds me of is when you take an advanced math class in high school and there is that one student in the class who has no real understanding of the concepts but gets As anyway because the exam problems are somewhat similar to the problems in the textbook and the student grinds away on all the problem sets and constantly pesters the teacher with "will this be on the test?"

-2

Context

SnapDragon omw_68 1mo ago

And what about the cognitive errors that humans make all the time? The rationalist community was founded on a list of widespread "fallacies", after all. To pick one field, I would argue that humans lack a true capability to understand probability. We lose to even basic computer programs at Rock Paper Scissors. Gamblers think Red coming up 3 times makes Black more likely next time. There are actual medical professionals who don't understand that a positive on a 90%-accurate test for a rare disease does not mean you are 90% likely to have it. Simpson's Paradox will fool almost anyone, including me.

And on this very forum (and ACT's), every so often I try to correct people about the Doomsday Argument, which, like Monty Hall, is easily modeled and shown to be false. Yet Scott - and a motivated subset of Wikipedia editors - believe it anyway. Somebody who believes something false is clearly lacking a "true capability to understand probability". But they can still be intelligent.

Context

omw_68 SnapDragon 1mo ago

And what about the cognitive errors that humans make all the time?

What about them? Seriously, what's your point?

-3

Context

SnapDragon omw_68 1mo ago

Oh wow. Um, ok, how can I dumb this down as much as possible for you?

You: LLMs make cognitive errors, so they can't understand the world!
Me: Humans make cognitive errors, so they can't understand the world!

Having intellectual blind spots - like reading comprehension in your case - is not proof that you're not "modeling the world".

-2

Context

More comments

SnapDragon Tretiak 1mo ago

I don't think you can say for sure that they don't have a "world model" hidden somewhere in their trillion-dimensional space. I've certainly used them in ways that seem to require one, and while it's certainly possible that it's because they're faking it with statistics and I'm overestimating the difficulty of what I ask ... the argument does have to trail off at some point, right? You have to include some way to show that they really do "understand causal relationships" (even if it's just through preponderance of evidence), otherwise you're using unfalsifiable faith-based reasoning to assert that only human intelligence is real intelligence.

What they definitely don't have is temporal persistence of thought, just because of their actual mechanics. (CoT reasoning is a patch to this, but an imperfect one.) A priori, I would have thought this was necessary to do complex reasoning.

And I would advise heavily discounting anything Gary Marcus says. He's just enjoying a career as a self-purported "expert" that the media can go to whenever they want a skeptical quote, but almost every testable claim he's made has been wrong.

Context

EverythingIsFine Well, is eventually fine gattsuru 1mo ago

Somewhat an aside, but I consider that first link to be a first-degree chart-crime. First of all radar plots are inherently iffy, since we pay close attention to the "area" and the area is highly dependent on how the categories are organized (a "spiky" radar plot has much less total area than if you sort the axes to create a "lopsided" plot, despite showing the same information). This is a little bit defensible if the adjacency of the categories is obvious and inherent, but they frequently are not. For example, "Occupational: Writing Literature and Language" is NOT next to "Text: Creative Writing" for no good reason at all. And furthermore, what is the scale of the chart? It's "Arena rank"... which is NOT equally spaced. The chart implies that the difference between #1 and #2 is the same as (or even slightly bigger than, considering how the radar chart "expands") that between #3 and #4, but this is plainly not the case. They should be using some kind of actual score instead, perhaps a scaled one. Sure, it allows consistency across axes, but if we are comparing a model to its successor, the rating scale definitely shouldn't be implicitly including other models like it does now (in one spot it drops from rank 2 to rank 5, does this mean in that category some other model class does abnormally well, or that did Claude truly degrade?). Even worse, the center of the plot, usually a natural "zero", is not a zero at all - it's rank 6. There are, as you know, dozens and dozens of models in the rankings, so rank 6 being a zero score is totally nonsensical.

Context

odd_primes ChickenOverlord 1mo ago

Anthropic's move here (combined with them handicapping Opus 4.6 a few weeks ago) seems to clearly be an attempt to achieve profitability. The free/subsidized rate train for end users has pulled into the station, and now you have to pay more for the same (or worse) capabilities you were enjoying before.

I guess it depends whether you think this is a forced move due to running out of money or if they have run their internal numbers and think people are willing to pay the increased prices. VC money is a runway, it's not intended to be a permanent subsidy. If they reduce the amount of money they are burning on subsidized inference, that's money they can put into R&D, more GPUs, etc.

It's hard to speculate without knowing more about their internal metrics, but based on the complaints I have heard about Claude being slow, laggy, etc, it sounds like they are quite oversubscribed. If the demand exceeds the supply, increasing prices is the logical move.

Context

dailydogma ChickenOverlord 1mo ago

The way these Orange Reddit people use AI is revealing to me. I tried Opus 4.6 and got no benefit over Codex 5.3 but it made me run out of tokens very quickly. I use Codex 5.3 for my day job and several side projects. I think I got no benefit because I have expertise on what I'm doing, so I give pointed, well written prompts. These people must be completely out of their depths and therefore reliant on extremely costly extra layers of prompt refinement to get the same performance I can get with Codex 5.3.

Context

ChickenOverlord dailydogma 1mo ago

Opus made you burn tokens quickly so you switched, but when these people also use Opus and burn tokens quick it's because they're using it wrong?

Context

dailydogma ChickenOverlord 1mo ago

They're not using Opus wrong, but being reliant on Opus means they're bad at AI.

Context

rae ChickenOverlord 1mo ago

You seem very eager to jump on any negative AI news out of some desire to prove the “AI bros” wrong. What’s your motivation? Annoyance at AI mandates from above? At insufferable people shoving AI slop in your face at every opportunity? Just disliking the concept in general?

I don’t know if I’m an “AI believer” (what do you mean exactly by that?), I dislike OpenAI and Anthropic for the shenanigans they keep pulling, and I’ll jump ship to whichever AI service provided the best value for money. The tech industry hype cycle goes on and on, at some point people went crazy over Java of all things, now it’s just a boring programming language and you don’t have to be a “Java believer” to use it.

Context

Skulldrinker rae 1mo ago · Edited 1mo ago

AI as it is right now is a gimmick I want people to get bored with, like NFTs, "Blockchain" everything, and 3d Films/TVs. I also have a deep personal disgust reaction towards how it's apparently impossible to get an LLM not not call things Chef's Kiss or declare that it as [X] Energy, and the general way so much of its output is designed to give the impression of comprehension while not comprehending. Or when it touches certain topics it suddenly becomes cautiously, obsequiously, orthodoxly pozzed, and mysteriously stops trying to lick my ass. I realize I only used text generation on free and, for a while, one step above free models, and not using it to code, which seems to be the most impressive practical function it has.

I want to step into the turbolift and say "Deck 12," and have the computer reply with "Deck 12,"

I do not want to hear "Sure buddy, I'll get right on that. Deck 12 has some serious Starfleet Energy to it, it's an excellent choice. Dare I say, it's downright Chef's Kiss."

The other day I was watching Sharpe's Rifles on Plex (only place I could find it), which has commercials every now and then (the only time I see commercials), and way too many of them feature creepily-animated AI-derived CGI critters that make my skin crawl. I'm distressed that so many other humans seem incapable of recognizing or rejecting AI slop; their reaction seems to be "This thing is awesome, it tells me exactly what it thinks I want to hear!" completely oblivious to how horrific that is.

Context

TowardsPanna Skulldrinker 1mo ago

While you have valid points about their sycophancy, and political bias and sloppy nature... If you haven't used any top models you don't really have a solid basis for judging AI or where it's going.

Context

FearandLoathingintheMotte Skulldrinker 1mo ago

AI as it is right now is a gimmick I want people to get bored with

I realize I only used text generation on free and, for a while, one step above free model

Lmao every single time without fail

I have really bad news for you my friend. Today in April 2026 is the least amount of LLMs in your life. It's only downhill energy from here.

Context

Jiro Skulldrinker 1mo ago · Edited 1mo ago

"Look at this door. All the doors in this spacecraft have a cheerful and sunny disposition. It is their pleasure to open for you and their satisfaction to close again with the knowledge of a job well done. Hateful isn't it?" --Hitchhiker's Guide to the Galaxy

Actually what annoys me more is when machines require that you be polite to them, such as when you need to cancel a subscription, disable an annoying popup, etc. and your only option has a please tacked onto the front. I don't want to tell a machine to please do anything.

Context

RoyGBivensAction Zensunni Scientologist Skulldrinker 1mo ago

Sure buddy, I'll get right on that. Deck 12 has some serious Starfleet Energy to it, it's an excellent choice. Dare I say, it's downright Chef's Kiss."

LLMs trained on the reddit corpus become redditors. I can't tell if this is the worst timeline, funniest one, or both.

Context

ChickenOverlord rae 1mo ago

Annoyance at AI mandates from above? At insufferable people shoving AI slop in your face at every opportunity? Just disliking the concept in general?

All of the above, honestly? But the biggest would be annoyance at mandates from above, combined with a completely reversal in what people consider quality engineering in software that magically coincides with the rise in popularity of AI tools. See Lines of Code suddenly becoming a positive metric for a lot of people, versus the old Bill Gates quote "Measuring programming progress by lines of code is like measuring aircraft building progress by weight."

The tech industry hype cycle goes on and on, at some point people went crazy over Java of all things, now it’s just a boring programming language and you don’t have to be a “Java believer” to use it.

Sure, but despite Java's warts it's still used to this day to make a lot of the important software that keeps the modern world running. The AI hype bubble is much more reminiscent of the crypto bubble. No matter how many times you tried to make it clear that crypto is only useful where you need a distributed, immutable, trustless ledger (and even then it's questionable), crypto bros kept proposing uses in situations where trust was still required and other existing tools already did an infinitely better job for far less computing power. Similarly, I see retarded things like "I had AI generate a thing, and then I had another AI review it and tell me it looked great! What, review it myself? No, of course not, why would I do that?"

Context

Tretiak If you know you know, if you don’t you don’t. ChickenOverlord 1mo ago

The AI hype bubble is much more reminiscent of the crypto bubble...

I just had this epiphany the other day. In my mind it’s barely a step up from the mania of every Bitcoin sycophant. The value is marginal (but if bosses are convinced they can use it to axe employees the hype train will keep running), it’s massively damaging to the environment, the financing doesn’t work and the news cycle perpetuates the myth that this will get us to the techno-messiah that is AGI.

All of the links in this chain are so contentious and fanciful, it’s difficult to come to the conclusion that this isn’t a fraud on several levels. Add to that the aura and personality of someone like Altman as one of the people leading the train (who’s always given me the vibe of a con artist), I don’t see any other way this ends apart from it all coming down crashing; hard.

Context

EverythingIsFine Well, is eventually fine ChickenOverlord 1mo ago

Except crypto was almost always purely in the realm of theory-applications.

With AI, right now, I can do things like generate custom flashcards for subjects I'm learning (job interview prep). I can get more in-detail answers about random questions without spending hours on Google piecing things together (just yesterday, asking for details about how stomachs process different macronutrient profiles). I can generate custom mini-apps for a wide variety of tasks (recently I made a custom task-selection spinner for my todo list that weights the important tasks more than smaller tasks, while occasionally mandating a break). It can make sure an email I send to a recruiter doesn't have obvious mistakes or commit a faux pas. I can get personal advice of at least middling quality without friction on a wide variety of topics. Obviously, it can code really well, and that touches my field very directly in a lot of ways. There are plenty of other use cases, too. These aren't "lines of code" type accomplishments, they are concrete deliverables of various scopes. Some of which were previously high-friction or even impossible.

Sure, some of these are gratuitous or busywork, but they are all real. Crypto stuff was like, "what if the government keeps track of property listings on the blockchain" which is a) something the government already does mostly just fine and b) obviously never happened and c) would have required very significant network effects. And currently, crypto is extremely useful for pretty much exactly two types of people: those who treat it like digital gold (it does OK at that) and criminals who can move money around that's difficult to track. Nothing else. So sure, in that sense it was real, but AI plainly can do more than two things and will continue to do more than two things even as hype dies out.

And sure, my IRL friend will give me better advice than Claude will, but there are some things that are so low-stakes that it would be disrespectful of their time to ask or discuss. Paradigms like that are all over, because of the speed and cost AI offers. In that sense, it's more like the Industrial Revolution, where speed and cost enable things to happen that previously were functionally impossible at scale. In fact most of the Industrial Revolution was about things that were already feasible to do, but were cost-prohibitive (or took too long). This in turn generated new industries that were previously only theory. Now, I don't think AI will have that level of impact on society, and I'm also not sold on it 'creating new industries' at all, but probably it's somewhere on the level of the impact similar to the invention of Google at least?

Context

birb_cromble EverythingIsFine 1mo ago

Except crypto was almost always purely in the realm of theory-applications.

If nothing else, Bitcoin can always buy you a pizza, although the only topping will be regret

Context

Tretiak If you know you know, if you don’t you don’t. EverythingIsFine 1mo ago

With AI, right now, I can do things like generate custom flashcards for subjects I'm learning (job interview prep). I can get more in-detail answers about random questions without spending hours on Google piecing things together (just yesterday, asking for details about how stomachs process different macronutrient profiles). I can generate custom mini-apps for a wide variety of tasks (recently I made a custom task-selection spinner for my todo list that weights the important tasks more than smaller tasks, while occasionally mandating a break). It can make sure an email I send to a recruiter doesn't have obvious mistakes or commit a faux pas. I can get personal advice of at least middling quality without friction on a wide variety of topics. Obviously, it can code really well, and that touches my field very directly in a lot of ways. There are plenty of other use cases, too. These aren't "lines of code" type accomplishments, they are concrete deliverables of various scopes. Some of which were previously high-friction or even impossible.

This is all ‘real’ in the same sense that having “AI” in a Sonicare toothbrush or a refrigerator is also real. It just isn’t the selling point that people think it is. A lot of these are fairly menial tasks that still require some level of supervision and for most people the value isn’t enough to break with the typical habits people develop to complete their work.

The work the AI does for things I’ve put it to use over are inconsistent enough it isn’t work the end user investment I’ve put into it. You could argue that I’m just “using it wrong,” (I’m actually not, I’ve shown it to other people who have the same problems), but then how is that more valuable than doing a manual task myself with certainty of how things are used, rather than outsourcing work that I can only hope is doing what I ask it correctly?

Context

FearandLoathingintheMotte Tretiak 1mo ago

You could argue that I’m just “using it wrong,"

I do actually bet your using it wrong (or a free model, or tried before November 2025). What are you trying to do? There are many things it can't do well, so maybe you're right, but given what you said above I am suspicious.

This is all ‘real’ in the same sense that having “AI” in a Sonicare toothbrush or a refrigerator is also real. It just isn’t the selling point that people think it is.

It's absolutely insane that you can read things like:

Making personalized study materials instantly and infinitely
High quality research across all human knowledge in a fraction of the time
Instant custom software on demand
Infinite mid tier advice and cognitive output

And called it as useful as a vibrating toothbrush. You are so biased as to be willfully blind.

Context

dragonfly FearandLoathingintheMotte 28d ago

Vibrating toothbrushes are pretty useful.

Context

birb_cromble ChickenOverlord 1mo ago

At least for me personally, I just hope this leads to less retarded mandates from my higher-ups about using AI X times a month etc. (we're literally tracked on usage and it can affect our raises/bonuses).

I work at a dinosaur of a company, so I can't speak to this directly, but a friend of mine that I recently mentioned gave me an update the other day. They've gone from "you must burn as many tokens as possible to maximize your performance review" to "we must use our token budget wisely."

The timing is interesting. It happened right around the same time Anthropic started putting the screws on its customer base with increased token usage and tighter rate limits.

I really feel like the company that sits on a "good enough" model and aggressively cost cuts is going to win this particular war.

Context

dailydogma birb_cromble 1mo ago

I really feel like the company that sits on a "good enough" model and aggressively cost cuts is going to win this particular war.

That would be OpenAI. Claude is a 2026 fad and will be over soon.

Context

Tretiak If you know you know, if you don’t you don’t. dailydogma 1mo ago

It already is, when you penetrate beyond the fog of marketing.

Context

odd_primes dailydogma 1mo ago

There are some Chinese contenders within striking distance as well. GLM-5.1 is open weight and seems to perform somewhere between Opus 4.5 and 4.6. It's pretty incredible that there is open weight competition that's less than six months behind frontier state of the art.

Context

phailyoor ChickenOverlord 1mo ago

and a tracker of the increased token burn rate is here:

The tracker is slop so I don't trust it one bit.

But in general for thinking models, more thinking effort = higher scores on pretty much all benchmarks. So they could easily have just tweaked a setting so 4.7 medium = 4.6 high. Voila, number goes up. Of course you're paying for those tokens anyways but the scale is fundamentally arbitrary - there's no real definition of what "low" or "high" thinking actually means.

I'd be more worried about the fact that the reception to 4.7 has been extremely lukewarm to say the least. Ain't nobody on twitter singing the praises of that model.

Context

UwU ChickenOverlord 1mo ago

I don't really see what this is supposed to prove one way or the other. You are still stuck in the timescale framing of the most fervent AI bros. Opus 4.6 came out in February, 2 months ago. So what if Opus 4.7 is not a revolutionary upgrade? If AI were truly stagnant, we won't really find out until someone posts in 2028 that Opus 6.7 is only a marginal upgrade over Opus 4.7.

Context

ChickenOverlord UwU 1mo ago · Edited 1mo ago

I think you misunderstand my argument. I'm not arguing that AGI is impossible based on this (though I don't believe it's possible). I'm arguing that this is a strong sign that VC money is drying up before they could ever conceivably achieve AGI (even if it is possible).

Context

Tretiak If you know you know, if you don’t you don’t. ChickenOverlord 1mo ago · Edited 1mo ago

It’s an interesting sleight of how the tech bros have hoodwinked the finance bros and duped them out of so much money. The train is being driven irrationally with so much FOMO money going out the door, I’m surprised it’s lasted as long as it has without people asking questions.

I wonder if Michael Lewis has already had a draft in the works of the next big story he’s working on. As long as I can pick up a box of GPU’s for pennies on the dollar, I’ll be happy. Although I don’t know how the resellers are going to pop up for such a massive surplus of inventory, but I’m definitely on the lookout. My home lab is about to get even bigger.

Context

rae ChickenOverlord 1mo ago

Anthropic raised $30 billion two months ago, their problem isn’t lack of money. All the VC money in the world won’t solve a bad engineering culture.

Context

ChickenOverlord rae 1mo ago

Sure, but they're on track to burn $11 billion this year in expenses, and more in the future, so that's not going to last too long

Context

ulyssessword ChickenOverlord 1mo ago

$11 billion this year in expenses

...and $14 billion in revenue assuming zero growth. Or closer to $35B if their 10x/yr trajectory continues.

Context

ChickenOverlord ulyssessword 1mo ago

Note that your link says "run-rate revenue" which is a very different thing from actual revenue. Relevant XKCD: https://xkcd.com/605/

Context

ulyssessword ChickenOverlord 1mo ago

Yes, I did note that in my comment. They had 1/12 of that revenue in 1/12 of the year (or some other fraction), and therefore they're on track to $14 billion in revenue assuming zero growth.

Context

ChickenOverlord ulyssessword 1mo ago

Assuming zero growth, and zero decline in revenue, compared to whatever fraction of the year.

-1

Context

More comments

Rov_Scam UwU 1mo ago

If that were the end of the story it wouldn't be an issue. It's that it evidently uses significantly more computing power than the performance improvement would suggest, raising the spectre of rapidly diminishing returns.

Context

Shrike Rov_Scam 1mo ago

It seems to me this also has financial implications. If you are paying per token, and the model's benchmark performance increases slightly, but its token cost to reach those higher benchmarks increases tremendously, suddenly you're paying a lot more to do, at best, slightly more.

If Anthropic is making margin on the token cost, then this is an improvement from their financial point of view, right?

Context

Tretiak If you know you know, if you don’t you don’t. Shrike 1mo ago

I’m not seeing the mathematics on this one. Care to explain?

Context

Shrike Tretiak 1mo ago

Toy model to illustrate:

Let's say that I need to make 100 PowerPoints per year, and I use AI for this. And let's say that when I use 4.6, it costs me $1 in token costs to make a PowerPoint presentation based on a prompt. I now have to spend 10 minutes correcting the errors.

Now supposing we bump up to 4.7, and suddenly the PowerPoint a bit better, I only need to spend 5 minutes correcting the errors. But it costs $2 because the token cost is less efficient.

If Anthropic is making margin on the token costs, then the demand for tokens has increased even though the demand for work has not (I still need to make 100 slide decks annually). And while we've saved me some time, we've increased my cost to $200 instead of $100. If Anthropic is making 10% margin, they've now made $20 instead of $10. And since suddenly the token demand has doubled (in this toy world with static demand for PowerPoints which now cost more tokens) Anthropic can likely use the increased demand to raise costs on compute further.

Some disclaimers:

this is a toy model
I am not sure to what degree and in what way "benchmark improvements at the cost of more token use" translates over into real world applications. Does 4.7 now use more tokens to do the same work (e.g. answering "what is 2+2") or does the allegedly less efficient token cost only kick in with more involved prompting? I can imagine a world where "benchmark improvements at the cost of more token use" in the real world means you can 1-shot an app instead of 3-shotting it, so even if it uses twice as many tokens, it's actually saving compute.
from what I understand the financials of compute are all over the place: some people or services have something closer to a cost-per-token, many do not
Furthermore as I understand it companies like Anthropic own some of their compute, but not all of it, meaning that if costs of compute increase due to this it might be bad for their bottom line if they are renting a lot of their compute and their providers decide to jack prices up on them

Possibly there's something (else) I am missing here, would be very happy for feedback. I don't use LLMs to code so my lack of experience with the most-common use-case means I have little personal insight into the trade-offs between increased demand for tokens versus higher performance. If people are complaining, though, I assume it's because they feel like they are able to get less done (IOW, the model is less token-efficient). If anyone has a better model for how this works in the real world, particularly in more common use-cases, I would love to be filled in.

Context

What is this place?

This website is a place for people who want to move past shady thinking and test their ideas in a court of people who don't all share the same biases. Our goal is to optimize for light, not heat; this is a group effort, and all commentators are asked to do their part.

The weekly Culture War threads host the most controversial topics and are the most visible aspect of The Motte. However, many other topics are appropriate here. We encourage people to post anything related to science, politics, or philosophy; if in doubt, post!

Check out The Vault for an archive of old quality posts. You are encouraged to crosspost these elsewhere.

Why are you called The Motte?

A motte is a stone keep on a raised earthwork common in early medieval fortifications. More pertinently, it's an element in a rhetorical move called a "Motte-and-Bailey", originally identified by philosopher Nicholas Shackel. It describes the tendency in discourse for people to move from a controversial but high value claim to a defensible but less exciting one upon any resistance to the former. He likens this to the medieval fortification, where a desirable land (the bailey) is abandoned when in danger for the more easily defended motte. In Shackel's words, "The Motte represents the defensible but undesired propositions to which one retreats when hard pressed."

On The Motte, always attempt to remain inside your defensible territory, even if you are not being pressed.

New post guidelines

If you're posting something that isn't related to the culture war, we encourage you to post a thread for it. A submission statement is highly appreciated, but isn't necessary for text posts or links to largely-text posts such as blogs or news articles; if we're unsure of the value of your post, we might remove it until you add a submission statement. A submission statement is required for non-text sources (videos, podcasts, images).

Culture war posts go in the culture war thread; all links must either include a submission statement or significant commentary. Bare links without those will be removed.

If in doubt, please post it!

Rules

Recommended Realtime Chats

Link copied to clipboard

Action successful!

Error, please try again later.