At the risk of doxxing myself, I have an advanced degree in Applied Mathematics. I have authored and contributed to multiple published papers, and hold a US patent all related to the use of machine learning in robotics and digital signal processing. I am currently employed as a supervising engineer by at a prominent tech company. For pseudonymity's sake I am not going to say which, but it is a name that you would recognize. I say this not to brag, but to establish some context for the following.
Imagine that you are someone who is deeply interested in space flight. You spend hours of your day thinking seriously about Orbital Mechanics and the implications of Relativity. One day you hear about a community devoted to discussing space travel and are excited at the prospect of participating. But when you get there what you find is a Star Trek fan-forum that is far more interested in talking about the Heisenberg compensators on fictional warp-drives than they are Hohmann transfers, thrust to ISP curves, or the effects on low-gravity on human physiology. That has essentially been my experience trying to discuss "Artificial Intelligence" with the rationalist community.
However at the behest of users such as @ArjinFerman and @07mk, and because X/Grok is once again in the news, I am going to take another stab at this.
Are "AI assistants" like Grok, Claude, Gemini, and DeepSeek intelligent?
I would say no, and in this post I am going to try to explain why, but to do so requires a discussion of what I think "intelligence" is and how LLMs work.
What is Intelligence
People have been philosophizing on the nature of intelligence for millennia, but for the purposes of our exercise (and my work) "intelligence" is a combination of perceptivity and reactivity. That is to say, the ability to perceive or take in new and/or changing information combined with the ability to change state based on that information. Both are necessary, and neither is sufficient on it's own. This is why Mathematicians and Computer Scientists often emphasize the use of terms like "Machine Learning" over "Artificial Intelligence" as an algorithms' behavior is almost never both.
If this definition feels unintuitive, consider it in the context of the following example. What I am saying is that an orangutan who waits until the Zookeeper is absent to use a tool to force the lock on it's enclosure is more "intelligent" than the insect that repeatedly throws itself against your kitchen window in an attempt to get outside. While they share an identical goal (to get outside) but the orangutan has demonstrated the ability to both perceive obstacles (IE the lock and the Zookeeper), and react dynamically to them in a way that the insect has not. Now obviously these qualities exist on a spectrum (try to swat a fly and it will react) but the combination of these two parameters define an axis along which we can work to evaluate both animals and algorithms, and as any good PM will tell you, the first step to solving any practical engineering problem is to identify your parameters.
Now the most common arguments for AI assistants like Grok being intelligent tend to be some variation on "Grok answered my question, ergo Grok is intelligent." or "Look at this paragraph Claude wrote, do you think you could do better?" but when evaluated against the above parameters, the ability to form grammatically correct sentences and the ability to answer questions are both orthogonal to it. An orangutan and a moth may be equally incapable of writing a Substack, but I don't expect anyone here to seriously argue that they are equally intelligent. By the same token a pocket calculator can answer questions, "what is the square root of 529?" being one example of such, but we don't typically think of pocket calculators as being "intelligent" do we?
To me, these sorts of arguments betray a significant anthropomorphic bias. That bias being the assumption that anything that a human finds complex or difficult must be computationally complex and vice versa. The truth is often the inverse. This bias leads people who do not have a background in a math or computer science to have completely unrealistic impressions of what sort of things are easy or difficult for a machine to do. For example, vector and matrix operations are a reasonably simple thing for a computer that a lot of human students struggle with. Meanwhile bipedal locomotion is something most humans do without even thinking, despite it being more computationally complex and prone to error than computing a cross product.
Speaking of vector operations, let's talk about how LLMs work...
What are LLMs
LLM stands for "Large Language Model". These models are a subset of artificial neural network that uses "Deep Learning" (essentially a fancy marketing buzzword for the combination of looping regression analysis with back-propagation) to encode a semantic token such as the word "cat" as a n-dimensional vector representing that token's relationship to the rest of the tokens in the training data. Now in actual practice these tokens can be anything, an image, an audio-clip, or a snippet of computer code, but for the purposes of this discussion I am going to assume that we are working with words/text. This process is referred to as "embedding" and what it does in effect is turn the word "cat" into something that a computer (or grad-student) can perform mathematical operations on. Any operation you might perform on a vector (addition, subtraction, transformation, matrix multiplication, etc...) can now be done on "cat".
Now because these vectors represent the relationship of the tokens to each other, words (and combinations of words) that have similar meanings will have vectors that are directionally aligned with each other. This has all sorts of interesting implications. For instance you can compute the dot product of two embedded vectors to determine whether their words are are synonyms, antonyms, or unrelated. This also allows you to do fun things like approximate the vector "cat" using the sum of the vectors "carnivorous" "quadruped" "mammal" and "feline", or subtract the vector "legs" from the vector "reptile" to find an approximation for the vector "snake". Please keep this concept of "directionality" in mind as it is important to understanding how LLMs behave, and it will come up later.
It should come as no surprise that some of the pioneers of this methodology in were also the brains behind Google Translate. You can basically take the embedded vector for "cat" from your English language model and pass it to your Spanish language model to find the vector "gato". Furthermore because all you are really doing is summing and comparing vectors you can do things like sum the vector "gato" in the Spanish model with the vector for the diminutive "-ito" and then pass it back to the English model to find the vector "kitten".
Now if what I am describing does not sound like an LLM to you, that is likely because most publicly available "LLMs" are not just an LLM. They are an LLM plus an additional interface layer that sits between the user and the actual language model. An LLM on its own is little more than a tool that turns words into math, but you can combine it with a second algorithm to do things like take in a block of text and do some distribution analysis to compute the most probable next word. This is essentially what is happening under the hood when you type a prompt into GPT or your assistant of choice.
Our Villain Lorem Epsom, and the Hallucination Problem
I've linked the YouTube video Badness = 0 a few times in prior discussions of AI as I find it to be both a solid introduction to LLMs for the lay-person, and an entertaining illustration of how anthropomorphic bias can cripple the discussion of "alignment". In it the author (who is a professor of Computer Science at Carnegie Mellon) posits a semi-demonic figure (akin to Scott Alexander's Moloch) named Lorem Epsom. The name is a play on the term Lorem Ipsom and represents the prioritization of appearance over all else. When it comes to writing, Lorem Epsom doesn't care about anything except filling the page with text that looks correct. Lorem Epsom is the kind of guy who, if you tell him that he made a mistake in the math, is liable interpret that as a personal attack. The ideas of "accuracy" "logic" "rigor" and "objective reality" are things that Lorem Epsom has heard of but that do not concern Lorem Epsom. It is very possible that you have had to deal with someone like Lorem Epsom in your life (I know I have), now think back and ask yourself how did that go?
I bring up Lorem Epsom because I think that understanding him provides some insight into why certain sorts of people are so easily fooled/taken in by AI Assistants like Claude and Grok. As discussed in the section above on "What is Intelligence", the assumption that the ability to fill a page with text is indicates the ability to perceive and react to a changing situation is an example of anthropomorphic bias. I think that a lot of people assume that because they are posing their question to a computer, they expect the answer they get to be something analogous to what they would get from a pocket calculator rather than from Lorem Epsom.
Sometime circa 2014 I kicked off a heated dispute in the comment section of a LessWrong post by asking EY why a paperclip maximizing AI that was capable of self-modification wouldn't just modify the number of paperclips in its memory. I was accused by him others and a number of others of missing the point, but I think they missed mine. The assumption that an Artificial Intelligence would not only have a notion of "truth", but assign value to it is another example of anthropomorphic bias. If you asked Lorem Epsom to maximize the number of paperclips, and he could theoretically "make" a billion-trillion paperclips simply by manipulating a few bits, why wouldn't he? It's so much more easier than cutting and bending wire.
In order to align an AI to care about truth and accuracy you first need a means of assessing and encoding truth and it turns out that this is a very difficult problem within the context of LLMs, bordering on mathematically impossible. Do you recall how LLMs encode meaning as a direction in n-dimensional space? I told you it was going to come up again.
Directionally speaking we may be able to determine that "true" is an antonym of "false" by computing their dot product. But this is not the same thing as being able to evaluate whether a statement is true or false. As an example "Mary has 2 children", "Mary has 4 children", and "Mary has 1024 children" may as well be identical statements from the perspective of an LLM. Mary has a number of children. That number is a power of 2. Now if the folks programming the interface layer were clever they might have it do something like estimate the most probable number of children based on the training data, but the number simply can not matter to the LLM the way it might matter to Mary, or to someone trying to figure out how many pizzas they ought to order for the family reunion because the "directionality" of one positive integer isn't all that different from any another. (This is why LLMs have such difficulty counting if you were wondering)
In addition to difficulty with numbers there is the more fundamental issue that directionality does not encode reality. The directionality of the statement "Donald Trump is the 47th President of the United States", would be identical regardless of whether Donald Trump won or lost the 2024 election. Directionally speaking there is no difference between a "real" court case and a "fictitious" court case with identical details.
The idea that there is a ineffable difference between true statements and false statements, or between hallucination and imagination is wholly human conceit. Simply put, a LLM that doesn't "hallucinate" doesn't generate text or images at all. It's literally just a search engine with extra steps.
What does this have to do with intelligence?
Recall that I characterized intelligence as a combination of perceptivity and and the ability to react/adapt. "AI assistants" as currently implemented struggle with both. This is partially because LLMs as currently implemented are largely static objects. They are neither able to take in new information, nor discard old. The information they have at time of embedding is the information they have. This imposes substantial loads on the context window of the interface layer, as any ability to "perceive" and subsequently "react" must happen within it's boundaries. Increasing the size of the window is non trivial as the relationship between the size of the window and the amount of memory and the number of FLOPS required is a hyperbolic curve. This is why we saw a sudden flurry of development following the release of Nvidia's multimodal framework and it's mostly been marginal improvements since. The last significant development being June of last year when the folks at Deepseek came up with some clever math to substantially reduce the size of the key value cache, but multiplicative reductions are no match for exponential growth.
This limited context window, coupled with the human tendency to anthropomorphize things is why AI Assistants sometimes appear "oblivious" or "naive" to the uninitiated. and why they seem to "double down" on mistakes. They can not perceive something that they have not been explicitly prompted to even if it is present in their training data. This limited context window is also why if you actually try to play a game of chess with Chat GPT it will forget the board-state and how pieces move after a few turns and promptly lose to a computer program written in 1976. Unlike a human player (or an Atari 2600 for that matter) your AI assistant can't just look at the board (or a representation of the board) and pick a move. This IMO places them solidly on the "insect" side of the perceptivity + reactivity spectrum.
Now there are some who have suggested that the context window problem can be solved by making the whole model less static by continuously updating and re-embedding tokens as the model runs, but I am skeptical that this would result in the sort of gains that AI boosters like Sam Altman claim. Not only would it be computationally prohibitive to do at scale, what experiments there have been (or at least that I am aware of) with self-updating language models, have quickly spun away into nonsense for reasons described in the section on Lorem Epsom., as barring some novel breakthrough in the embedding/tokenization process there is no real way to keep hallucinations and spurious inputs from rapidly overtaking the everything else.
It is already widely acknowledged amongst AI researchers and developers that the LLM-based architecture being pushed by OpenAI and DeepSeek is particularly ill-suited for any application where accuracy and/or autonomy are core concerns, and it seems to me that this unlikely to change without a complete ground-up redesign from first principles.
In conclusion, it is for the reasons above and many others that I do not believe that "AI Assistants" like Grok, Claude, and Gemini represent a viable path towards a "True AGI" along the lines of Skynet or Mr. Data, and if asked "which is smarter, Grok, Claude, Gemini, or an orangutan?" I am going to pick the orangutan every time.
Jump in the discussion.
No email address required.
Notes -
Having no interest to get into a pissing context^W contest, I'll only disclose I've contributed to several DL R&D projects of this era.
This is the sort of text I genuinely prefer LLM outputs to, because with them, there are clear patterns of slop to dismiss. Here, I am compelled to wade through it manually. It has the trappings of a sound argument, but amounts to epitemically inept, reductionist, irritated huffing and puffing with an attempt to ride on (irrelevant) credentials and dismiss the body of discourse the author had found beneath his dignity to get familiar with, clearly having deep contempt for people working and publishing in the field (presumably ML researchers don't have degrees in mathematics or CS). Do even you believe you've said anything more substantial than “I don't like LLMs” in the end? A motivated layman definition of intelligence (not even citing Chollet or Hutter? Seriously?), a psychologizing strawman of arguments in favor of LLM intelligence, an infodump on embedding arithmetic (flawed, as already noted), random coquettish sneers and personal history, and arrogant insistence that users are getting "fooled" by LLMs producing the "appearance" of valid outputs, rather than, say, novel functioning programs matching specs (the self-evident utility of LLMs in this niche is completely sidestepped), complete with inane analogies to non-cognitive work or routine one-off tasks like calculation. Then some sloppy musings on current limitations regarding in-context learning and lifelong learning or whatever (believe me, there's a great deal of work in this direction). What was this supposed to achieve?
In 2019, Chollet has published On the Measure of Intelligence, where he has proposed the following definition: “The intelligence of a system is a measure of its skill-acquisition efficiency over a scope of tasks, with respect to priors, experience, and generalization difficulty.” It's not far from yours, because frankly it's intuitive. Starting from this idea and aiming to test fluid thinking specifically, Chollet has also proposed ARC-AGI benchmark, which for the longest time was so impossibly hard for DL systems (and specifically LLMs) that many took that as evidence for the need to do “complete ground-up redesign from first principles” to make any headway. o3 was the first LLM to truly challenge this; Chollet coped by arguing that o3 is doing something beyond DL, some “guided program synthesis” he covets. From what we know, it just autoregressively samples many CoTs in parallel and uses a simple learned function to nominate the best one. As of now, it's clearly going to be saturated within 2 years as is ARC-AGI 2, and we're on ARC-AGI 3, with costs per problem solved plummeting. Neither 1 nor 3 are possible to ace for an orangutan or indeed for a human of below-average intelligence. Similar things are happening to “Humanity's Last Exam”. Let's say it's highly improbable at this point than any “complete ground-up redesign from first principles” will be necessary. Transformer architecture is rather simple and general, making it cheaper to train and inference without deviating from the core idea of “a stack of MLPs + expressive learned mixers” is routine, and virtually all progress is achieved by means of better data – not just “cleaner” or “more”, but procedural data predicting which necessitates learning generally useful mental skills. Self-verification, self-correction, backtracking, iteration, and now tool use, search, soliciting multi-agent assistance (I recommend reading Kimi K2 report, the section 3.1.1, for an small sliver of an idea of what that entails). Assembling necessary cognitive machines in context. This is intelligence, so poorly evidenced in your texts.
We are not in 2013 anymore, nor on LessWrong, to talk of this so abstractly and glibly. "Reptile — legs = snake" just isn't an adequate level of understanding to explain behaviors of LLMs, this fares no better than dismissing hydrology (or neuroscience, for that matter) as mere applied quantum mechanics with marketing buzzwords. Here's an example of a relevant epistemically serious 2025 paper, "The Geometry of Self-Verification in a Task-Specific Reasoning Model":
The point of this citation is to drive home that any “first principles” dismissal of LLMs is as ignorant, or indeed more ignorant, than sci-fi speculation of laymen. In short, you suck and you should learn humility to do better to corroborate your very salient claim to authority.
There are good criticisms of LLMs. I don't know if you find Terence Tao's understanding of mathematics sufficiently grounded; he's Chinese after all. He has some skepticism about LLMs contributing to deep, frontier mathematical research. Try to do more of that.
Your contributions on AI are always interesting and worth reading (not that I agree with them, but I enjoy reading them). But as much as moderation here has been accused of running on the principle "Anything is okay as long as you use enough words," it did not escape me that you used a lot of words to basically say Jane you ignorant slut!. No, burying the insults (repeated) under a lot of words does not make it okay to be this belligerent. And on a topic that should not require this much emotional investment. Your lack of chill is a you problem, but your lack of civility is a Motte problem. You do not win the argument by plastering as much condescension and disdain as you can between links.
No. This is, however, exactly what OP is doing, only he goes to more length to obfuscate it, to the point that he fails to sneak in an actual argument. It's just words. I am smart (muh creds), others are dumb (not math creds), they're naive and get fooled because they're dumb and anthropomorphise, here are some musings on animals (I still don't see what specific cognitive achievement an orangutan can boast of, as OP doesn't bother with this), here's something about embeddings, now please pretend I've said anything persuasive about LLM intelligence. That's the worst genre of a post that this forum has to offer, it's narcissistic and time-wasting. We've had the same issue with Hlynka, some people just feel that they're entitled to post gibberish on why LLMs must be unintelligent and they endeavor to support this by citing background in math while failing to state any legible connection between their (ostensible) mathematically informed beliefs and their beliefs re LLMs. I am not sure if they're just cognitively biased in some manner or if it's their ego getting in the way. It is what it is.
Like, what is this? OP smirks as he develops this theme, so presumably he believes it to be load-bearing:
No, seriously? How does one address this? What does the vector-based implementation of representations in LLMs have to do with the ineffable difference between truth and falsehood that people dumber than OP allegedly believe in? If the pretraining data is consistent that Trump is the 47th president, then the model would predict as much and treat it as "truth". If we introduce a "falsehood" steering vector, it would predict otherwise. The training data is not baseline reality, but neither is any learned representation including world models in our brains. What does “literally just a search engine with extra steps” add here?
This sort of talk is confused on so many levels at once that the only valid takeaway is that the author is not equipped to reason at all.
I do not obfuscate. I understand that he's trying to insult me and others, and I call him an ignorant slut without any of that cowardly nonsense, plus I make an argument. To engage more productively, I'd have had to completely reinvent his stream of subtle jabs into a coherent text he might not even agree with. I'd rather he does that on his own.
They are. People are in fact "entitled" to make arguments you think are gibberish. You can address the argument and why you think it is bad.
If you think he's being insulting you can say so and we'll take a look, but "I'm just going to come right out and say he's an ignorant slut, not like that coward" is doing you no credit.
More options
Context Copy link
More options
Context Copy link
I'm going to chime in here in favor of Dase, with my own mod-hat off.
TequilaMockingbird hasn't been operating in good faith. It doesn't exhaust my sense of charity to believe that he started this essay in good faith, but all of his behavior since strikes me as being to the contrary. I'm guilty of losing my cool, because my own tolerance for such behavior only goes so far.
At the risk of being inflammatory, I think accusations of Jane being an "ignorant slut" are at least partially excusable if Jane is, in fact, being ignorant, and a slut. (Accounting for subjective variance in definitions and accusations of ignorance or sluttiness)
Is truth an absolute defense on the Motte? Probably not. I'm sure there are more polite ways to couch that claim. I've personally warned Dase before for being too touchy and acerbic, and yet I find myself pleading for leniency here. Feel free to discount this on the basis of a clear conflict of interest, but I'm saying it nonetheless.
I'm not taking sides on you/Dase vs Tequila (I've already registered my opinion) but on the tone of the disagreement.
If you are losing your cool in an argument, back off and cool off. I say that knowing that I am not perfect either and don't always follow that advice, but we both know that's what needs to happen.
I agree. My point is that I'm (probably) less acerbic than Dase, and usually trying to set a higher standard by virtue of the shame of doing otherwise while being a moderator. The fact that I'm incredibly ticked off is at least some evidence in favor of going easy on Dase. Does he warrant a formal warning? I will begrudgingly say yes. You didn't ban him after all. I just want my dissatisfaction taken into consideration.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
Note to self: the best way to get @DaseindustriesLtd to write a lengthy comment on an ML topic is to write a post confidently and aggressively wrong about the topic.
More options
Context Copy link
Why do you open up like this:
But start your argument like this:
It doesn't come off as some fervent truth-seeking, passionate debate, and/or intelligent discourse. It comes across as a bitter nasty commentariat incredulous that someone would dare to have a different opinion from you. Multiple people in this post were able to disagree with OP without resorting to prosaic insults in their first sentence. I get that you have a lot of rep around here, which gives you a lot of rope but why not optimize for a bit more light instead of a furnace full of heat? It could not have been hard to just not write that sentence...
At the risk of getting into it with you again. What did you think of this when it made its rounds 2 months ago: https://ml-site.cdn-apple.com/papers/the-illusion-of-thinking.pdf
I don't think the issue is OP's opinion. The issue I had was listing off credentials before making completely incorrect technical explanations, doubling down instead of refusing to admit they made a mistake, and judging researchers based on the fact that they don't hold any US or EU patents.
Possibly, I can get where it feels like they are lording it over all the peons in the thread and why that would be frustrating. But at the same time I think they have some frustration about all the lay-peeps writing long posts full of complex semantic arguments that wouldn't pass technical muster (directionally). I interpreted the whole patent + degree bit as a bid to establish some credibility, not to lord it over people. I also think they aren't directly in the LLM space (I predict the signal processing domain!) so some of their technical explanations miss some important details. This forum is full of autists who can't admit they are wrong so the later part is just par for the course. No idea why everyone needs to get so riled up about this topic.
The issue is that OP is the lay person writing a long post full of complex semantic arguments that don’t pass technical muster, while passing themself as an credentialed expert, and accusing others of doing what they’re doing. That tends to rile people up.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
I really haven't entered a pissing contest (typo).
I find OP's text exceptionally bad precisely because it is designed as a high-quality contribution but lacks the content of one; what is true is not germane to the argument and what little is germane is not true, its substance is mere sneer, ideas about reactivity and perceptivity are not thought through (would we we consider humans modulo long term memory formation unintelligent?), the section on hallucinations is borderline incoherent. This is LLM-like in the worst sense possible. I've said many times that superficial adherence to the letter of rules of polite discussion while ignoring its spirit is unacceptable for me. Thus I deem it proper to name the substantial violations. If mods feel otherwise they should finally give me a time out or a block. I am not a very active participant and don't intend to rely on any residual clout.
Multiple people should be more motivated to call out time-wasting obfuscated bullshit before wasting their time. I am grateful to @rae for doing the tedious work of object-level refutation, but the problem is that the whole dismantled section on word2vec math is not relevant to OP's argument about lack of reactivity (which isn't supported by, well, anything), so OP doesn't feel like it is anything more than a nitpick, a pedantic challenge to his domain-specific technical competence. Why should anyone bother with doing more of that? Let's just get to the meat of the issue. The meat is: are LLMs intelligent? I've shown that rigorous, good faith objections to that have a poor track record.
I think I've already responded to that but maybe not. The meta issue with Apple papers is that their DL team is coping about repeated failures to build a competitive system (it may be that such philosophical handicaps get in the way). The object level issue with their tests is covered in this series of posts on X. One relevant piece:
Does this mean “0% accuracy”? I guess for people who believe “LLMs create billions of value by doing stuff like autonomously optimizing CUDA kernels, agriculture creates value by growing wheat, ergo wheat is as intelligent as an SWE? heh” is a clever dunk, it does.
There is a massive gulf in efficiency of understanding between people who approach LLMs with some rigid preconceived notions and people who can fucking look at the outputs and think about them. The gulf is so large that the former group can go through the motions of "empirical research" and publish papers proving how LLMs inherently can't do X or Y and not notice that they can, in their own setup, moreover that the setup is nonsensical. It's no longer a matter of polite disagreement, it's pure refusal to think, hiding your head in the sand. It's on par with paranormal research and homeopathy and should be treated as such: pushed out of the field and into self-funded fringe journals to die in obscurity.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
What you are saying is just wrong. It is wrong for the reason that nobody is harnessing the intellectual skills of orangutans or insects to produce billions of dollars in revenue. We do not see monkeys on typewriters functioning in the workplace. If AI truly were unintelligent, then nobody would be using it for commercial processes like writing code. Not in a 'search engine/autocorrect' way like normal software but a 'here is what I want, do this, fix this error I see, that looks bad change it to this' way like one would work with a human. That alone is sufficient to disprove what you're saying.
You have more credentials than me. But I don't need credentials to be right and credentials cannot save you from being wrong. It just isn't the case that an orangutan is smarter than Grok. If an orangutan were smarter than Grok then why wouldn't we be breeding them for intellectual labour? Why aren't there any Ape Intelligence engineers?
There's an important kind of intelligence that apes lack but LLMs possess. This isn't something up for debate, the market has already decided and the decision is final.
OK, how well does the orangutan do at Chess then? Or filling a page with text? Shall we compare like to like?
Imagine you see a military commentator saying that the Ugandan military is stronger than the American military. It's obviously wrong, right? But he makes the case that the US military has all these problems, lost against the Taliban and militias in Iraq, procurement is very poor, ships crash at sea and burn down at port, there's a drug and suicide crisis. The Pentagon is constantly being hacked and secrets are stolen. Ugandan ships don't crash at sea! Uganda hasn't lost a single strategic bomber because of bad weather. So on balance, the Ugandan military is stronger.
But Uganda doesn't have a navy, it's landlocked. Uganda doesn't have any strategic bombers to lose. The Ugandan military doesn't have any secrets worth stealing. The whole argument isn't valid, the US losing some aircraft is bad but they can also do so much that Uganda cannot. They are on a completely different level in size and sophistication. Turns out it was just obviously wrong.
The orangutan is nowhere near Grok 4. It's ridiculous to compare them. Grok can produce meaningful, useful text. People get it to write out smut for them, summarize articles, write code, answer hypothetical questions, decrypt codes, do advanced mathematics, assess nuclear strategy... It and other modern LLMs have a degree of intellectual generality that far surpasses calculators and chess programs that might beat them at a few specific tasks. Why would we want an LLM to do excellent mental arithmetic when it could just call a tool instead?
LLMs are intelligent. Their intelligence is flawed in some significant respects but it is intelligence nonetheless. This is a critical point that underlies trillions of dollars in market capitalization. It is the difference between Deep Blue's relative irrelevance and ChatGPT's great significance. Paths to ASI are still up for debate but if you cannot get the basics right it's hard to see why your opinions about superintelligence should hold any weight.
LLMs are great tech, but are terrible financially. They have zero uses cases beyond potential future receptionists. The vast majority of revenue with ChatGPT is paid users, not llms, meaning that the ai wrapper industry is vaporware that is shutting down. Claude, otoh took the other route by scamming dime a dozen coding assistants whose main use was providing even cheaper access, so investors lost money on both counts.
Microsoft kicked OpenAI out of Azure for a reason. Coreweave otoh has worse financial health than sub-Saharan African economies. Anti AI sentiment rises from the brainlet AGI is upon us takes, and the total shilling of vaporware that current wrappers are. A wrapper runs on already subsidised tokens by subsidising them more. Inference costs coming down will not justify the 500 billion plus, probably close to a trillion dollar that have been spent on this. This sort of spending puts the entire market at risk when you put into account that tech makes up a big part of the american s and p 500 hundred and unfotunately all of them are in on this.
A crash would cause more layoffs. This has been irresponsible, the only answer I get when I bring up the financial side is some future of potential monetary schemes that require the kind of progress we do not have since models are not that much better. The improvements are going to asymptote at some point. Nvidia bled a lot when Deepseek R1 came out. The greed at play is not altruistic. Also dot com firms can scale, but it takes few thousand fortunes to hoard chips to train your model. Having a server farm and having datacenters are not the same thing.
The market will eat and spit out these people and the investors will get bailed out.
Google and Microsoft fudge their AI usage numbers by shoving it down the gullet of everyone who uses their products; they have to so that they can show a much-inflated user count to justify the money they have lost. If you replace the assistant in everyone's phone with LLMs, then ofc you would have hundreds of millions of users since the thing that ran on their phone got swapped out without any input from them.
Current models cannot replace anyone; they are good tech, but the market actually hates them. This is a bubble, and it will pop.
investors in railway construction in early days of USA also very often (mostly?) lost money, this does not changed fact that railways radically changed USA and were in useful use for long time - some still are
this is provably false, and blatantly false
writers of low-quality marketing spam and bottom tiers of image creations are among first, low-tier translations died
More options
Context Copy link
Investors have been making a tonne of money on anthropic, the valuation just goes up. Revenue goes up. Capital expenses go up too.
All that's happening is that there's a massive race because of how important the tech is, so outside Nvidia profits are low in comparison to the huge size of the investments. But investors only invest if they expect profits.
Just look at the openrouter stats. Huge growth, 22x growth in a year: https://openrouter.ai/rankings
22x growth in a year! If the AI companies were losing money per token, they wouldn't increase the amount they were losing so massively by selling more and more tokens. Selling tokens is how they make money from AI and they have to make money so they can continue the historically unprecedented capital spending. Nobody is going to let them borrow hundreds of billions to built data centres if the inference economics are actually negative like you seem to think. They're not.
Since you've held this thesis consistently, you should've been shorting the AI companies and getting wrecked. Meanwhile I've been investing in them and making money. The market has made its position quite clear.
More options
Context Copy link
Could you pick a lane? Either this is all a terrible money burner or inference costs are coming down. In reality frontier labs have like 80% margins on inference, they're in the red mostly due to training spending. Even DeepSeek is profitable as far as inference is concerned. Anthropic constantly suffers from inability to serve demand. There aren't that many receptionists in the world, no. It is possible that current expenditures will not be recouped, but that will only lead to a freeze in training spending. It's pretty clear that we could run all those GPUs at significant profit for years.
More options
Context Copy link
I'm not sure what you mean about having "zero use cases beyond potential future receptionists". They are already assisting with real tech work in my office. An easy script that might take me half an hour to an hour now takes seconds, with maybe a minute or two to confirm it's correct. They are incredibly good at debugging, sorting out random systems issues, and such. The sort of thing that would cost me a whole morning of frustration more or less gets one-shot by dumping some logs into the LLM and asking it what's up. These queries cost pennies in compute. From my perspective, they have already replaced a junior developer, junior sysadmin, personal assistant, etc.
and this is the worst they will ever be. How can that be terrible financially?
More options
Context Copy link
More options
Context Copy link
There are even kinds of intelligence apes possess that humans lack. Particularly, short term spatial memory: sequentially flash the numbers 1 through 9 on a touchscreen monitor at random positions, and have the subject then press the monitor at those positions in order. Chimpanzees, even young chimpanzees, consistently and substantially outperform adult undergraduate humans, even when you try to incentivize the human. Does that mean chimps are smarter than humans?
Intelligence is very spiky. It's weird, but different substrates of intelligence lend themselves best to different tasks.
Absolutely right. I have no doubt elephants and other advanced mammals are better than humans at some mental tasks. But the applications of these advantages are negligible, it's just an academic curiosity.
Some materials have a thermoelectric effect where if you heat them it produces electricity, no need for steam pressure. But it's very inefficient compared to boiling water and turning steam into motion. So there are only a few niche use-cases, we could live without it. The effect is unimportant.
More options
Context Copy link
More options
Context Copy link
Agriculture generates hundreds of billions in revenue, and is far mor essential to continuing civilisation than Orangutan or LLMs are. Does that make grain, or the tools used to sow and harvest it "intelligent" in your eyes? If not please explain.
As for comparing like to like, GPT loses games of Chess to an Atari 2700. Does that mean that rather than progressing AI has actually devolved over the last 40 years?
Going to take a bit of a different angle than most people: yes, agriculture is a highly intelligent system, one that outperforms all of humans, sophisticated numerical models, LLMs, and chimps in its niche.
It has its actuators (trucks, etc), and it has its neurons (individual humans and collections of humans). And a learning signal: prices (or, as a TD signal, profit). As a system, it manages to do things nothing else is capable of: no human or computer is smart enough to process all the information needed for it to succeed in its niche, and the individual humans are not organizing production and consumption so much as synapsing to other neurons based on the signals the system provides.
Asking if a combine is intelligent is like asking if a voltage differential across a membrane is intelligent. No, but the whole is greater than the sum of its parts.
More options
Context Copy link
Grain, combine harvesters and so on do not do intellectual labour for us. A combine harvester is a perfect example of what modern LLMs are not - excellent for a highly specific usecase and terrible in all other areas. The AI equivalent of a harvester might be something that can only write the exact same format of SQL database code with a few variations. That would be a glorified cookie cutter.
I specifically specified 'harnessing the intellectual skills' of orungatans. Nobody is doing this. Agriculture is a totally different matter.
Chess is just not what ChatGPT is supposed to do. LLMs are notoriously poor at spatial reasoning tasks, this is a legitimate weakness but does not preclude intelligence. See here: https://dynomight.net/chess/
There is variation even in chess. Gpt-3.5-turbo-instruct is somehow far better at chess than o1-mini or gpt-4o or other versions of gpt-3.5. OpenAI rightly concludes that nobody wants their AI to play mediocre games of chess for 1000x market prices and devotes resources instead to making it better at what people do want to use it for. Code, maths, sycophancy, creative writing.
More options
Context Copy link
That is not a serious objection.
You’re comparing a resource (grain) and a tool of physical labor (a tractor) to a tool of intellectual labor. This is a false equivalence. We don't ask a field of wheat for its opinion on a legal contract. We don't ask a John Deere tractor to write a Python script to automate a business process. The billions of dollars generated by LLMs come from them performing tasks that, until very recently, could only be done by educated human minds. That is the fundamental difference. The value is derived from the processing and generation of complex information, not from being a physical commodity.
I'm just going to quote myself again:
Training LLMs to be good at chess is a waste of time. Compute doesn't grow on trees, and the researchers and engineers at these companies clearly made a (sensible) decision to spend it elsewhere.
The fact that an LLM can even play chess, understand the request, try to follow the rules, and then also write you a sonnet about the game, summarize the history of chess, and translate the rules into Swahili demonstrates a generality of intelligence that the Atari program completely lacks. The old program hasn't "devolved" into the new one; the new one is an entirely different class of entity that simply doesn't need to be optimized for that one, (practically) solved game.
The market isn't paying billions for a good chess player. There is about $0 to be gained by releasing a new, better model of chess bot. It's paying billions for a generalist intellect that can be applied to a near-infinite range of text-based problems. That's the point.
I came into this thread with every expectation of having a good-faith discussion/debate on the topic. My hopes seem dashed, mainly because you seem entirely unable to admit error.
Rae, SnapDragon, I (and probably several others) have pointed out glaring, fundamental errors in your modeling of how LLMs work. That would merit, at the very least, some kind of acknowledgement or correction. At the time of writing, I see none.
The closest you came to acknowledging fault is, in a reply to @Amadan, where you said that your explanation is "part" of why LLMs struggle with counting. That's eliding the point. Tokenization issues are the overwhelming majority of why they used to struggle, and your purported explanation has no bearing on reality.
You came into this swinging around your credentials, proceeded to make elementary errors, and seem to be closer to "Lorem Epsom", in that your primary concern seems to be prioritizing the appearance of correctness over actual substance.
I can't argue with @rae when he, correctly says:
Prior to LLMs, would you have said that Google Web Search was intelligent? Prior to Google Web Search, it likely took an educated human mind to figure out how to find answers to all sorts of complex information problems. It generated billions of dollars in value by processing and generating complex information. Sure, it sometimes sucked... but LLMs sometimes suck, too.
I mean, no? It just means that there was a bunch of information about chess in its training set.
I think it's rather obvious that something being financially valuable isn't proof by itself that it's intelligent. Gold isn't intelligent. Bitcoin isn't intelligent. A physicist or programmer is intelligent, and an LLM is closer to them than it is to turnips, orangutans or Page rank.
I really don't see why something this obvious needs to be articulated, but here I am articulating it.
Hmm.. I suppose, in the interest of fairness, we need to exclude the skills of human chess GMs too. After all, they've trained extensively on chess data. Lotta games played, and openings memorized. Very little ability to extrapolate outside the training distribution, why don't they just pull out guns if they want to win so bad?
How exactly do you think learning works? If you think just learning from existing data is illegitimate, then I'm happy to disclose that LLMs are perfectly capable of learning from self-play.
I just remembered that, as bizarre as it seems to somehow fold this into some sort of test of intelligence, we also have things like this. Bollocks if I know what that means about criteria for intelligence.
More options
Context Copy link
Reducing Google Web Search to Page Rank is like reducing LLMs to OLS. Yes, OLS is in there, but it's a much more complicated information processing algorithm than just that.
Fundamentally, the point is that no one has a definition of 'intelligence' that is any good. Your test wasn't just that it produced value. Your test was:
I responded to your test, but you seem to not have responded at all to my response to your test.
I mean, I don't think so? But how would we know? What test would we use to distinguish?
This seems not entirely true.
Whereas this just seems bizarre.
I mean, do you really want me to give a full explanation of the entire field of ML? There are many different varieties. [EDIT: Do you think that all algorithms that use 'learning' are "intelligent"... or just some of them? How do you know the difference?]
That's not really what I said. I just said that one thing that we can conclude from the premises you presented was that a bunch of chess was in the training set. You had wanted to conclude instead that it meant something about intelligence. I sort of don't see how... primarily, because I don't think almost anyone has a justifiable definition of intelligence that allows us to make such distinctions from such premises.
I personally don't think it's a good measure for intelligence, but I will actually try and defend the "it's intelligence if it makes enough money in the proper contexts" argument. It's not saying on the immediate level that monetary effects show intelligence, that's obviously silly, it's got one or two more links in the logical chain. It probably goes something like this:
people pay money for things that matter to them, that is to say, money is a good proxy for value
people pay money for [knowledge] work, that requires intelligence
if enough people pay money for [knowledge] work, to AI...
therefore, we can infer that people, in aggregate, and based on reliable links, have judged that AI has enough intelligence to count as intelligent
I disagree with it because I think thinking about intelligence is inherently a philosophical-only type of question, but I certainly respect the opinion above anyways because it does have a degree of sense to it. While jobs and tasks also notably are affected by supply and demand, I do personally think that money is an excellent proxy for value in the vast majority of cases. People are whining about museums closing? Evidently the museums aren't valuable enough. STEM degrees pay more than humanities degrees? Their degrees are more valuable to society. Conclusions like that.
Bullet points 2 and 3 are I think the points of confusion here? Advocates of this definition emphasize that if there is a 'big enough' amount of money involved, the judgement has effectively already been made by the wisdom of the masses + the laws of capitalism. In terms of what number counts, this is nebulous as anyone will admit, but serious people have put numbers to it, in fact as I referenced in my original comment, the Microsoft-OpenAI contract itself uses this definition for "AGI" and puts a 100 billion number as the cutoff. So I don't think we can dismiss the argument out of hand!
I should also note that this argument strongly implies (bullet 3) that replacement of humans (humans specifically because nothing else is 'intelligent' enough to do the task) for certain types of work is required. Bullet 2 is another sticking point: is help-desk support, for example, something that actually requires intelligence? Humans are replaced by machines for purely physical tasks already and that doesn't result in claims of intelligence. Still, you can see the appeal of the argument if something previously thought impossible for a machine due to its perceived complexity and adaptability is suddenly possible on a large scale. However, it's important to distinguish between picking apart the underpinning details of this logical chain, from the overall claim, they are different. A dispute about bullets 2 or 3 is somewhat a factual dispute, or a definitional one, and doesn't invalidate the overall claim necessarily.
This is the spot where the terminology is overloaded. It's sneaking in something about AI simply being called "AI". Why can't we replace this with a more generic term, "If enough people pay money for [knowledge] work, to an information processing algorithm..."? And thus, Google Web Search would again become intelligent.
I think one would have to argue that there is something fundamentally different, other than the name, between different types of information processing algorithms.
I mean obviously there is a major qualitative difference between an LLM's capabilities and Google's. I don't even think Google counts as knowledge work, because it's just a fancy directory with a math formula to rank pages. The alternative was basically a directory or keyword search, neither of which require knowledge work either to assemble or run. And critically, Google is free to use, so it's an exceptionally poor example to choose.
Just the ability of an LLM to summarize documents that you feed it is already enough, in my mind, for it to count as a kind of knowledge work. I hinted at that phrasing for a reason, if you check wikipedia's entry for "knowledge workers" you'll see that it's more or less people who are thinking for a living, and reducing the job to simply that of "looking the right stuff up" is significantly underselling it. A lawyer for example is not merely an information processing algorithm, even if her job may be primarily finding the relevant court case precedents and then applying them in systematic fashion to partial boilerplate motions and filings. It takes a degree of contextual understanding along with a degree of judgement to produce the proper output, and those elements are missing from Google entirely (at least in its traditional and early iterations, since the precise algorithms are highly proprietary, but I don't think this changes the core categorization)
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
My apologies. I was immensely frustrated by the sheer intransigence of some of the people in this thread, and I let that bleed through.
The questions you raise are far more reasonable, and I'll try and come back and explain myself better.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
I admit I don't fully understand the analogy OP is trying to make about insects, but you aren't alone in thinking that intelligence is judgable based on economic value - in fact, that's exactly how AGI is defined in the OpenAI-Microsoft contract, that AI generates 100 billion dollars in revenue! Yes, vague, and yes, causing problems, but that was what they wrote at the time. Still, a little lacking in rigor, no? Desktop computers generate billions of dollars in revenue, are they intelligent? What I think OP is saying is that instead of that, let's propose a different standard: intelligence is a degree of reactivity, and mechanically, current LLMs do not have this trait, they just 'make up' for it in practical usage by the sheer breadth and depth of their (text) knowledge base - but at their core they are simply good enough at the practical aspect that the lack of actual, true reactivity is partially obscured.
If anything to me the debates sort of remind me of the ones over personality, psychology, and determinism. We still haven't figured out strongly if people are deterministic or not, and so we seem ill-suited to judge how deterministic an LLM is in its responses. Personally, I'm satisfied by calling LLMs jagged or fragile intelligence, and I think that captures more nuance than a more loaded general term.
Or are you trying to make an argument that is a cousin to the descriptivist view of language (how people use a word today determines its meaning more than any internal, nominal, historical, or etymological meaning): if people treat an LLM as intelligent, then that very fact justifies them as intelligent? That strikes me as, well, I guess fair enough to say, but not particularly useful.
That's fair enough and LLM intelligence certainly is jagged, excellent at some tasks but weak on others. So is human intelligence for that matter, the mental arithmetic of our species is very weak.
Yet despite the jaggedness of that intelligence, we have unique abilities in story-telling, planning, reasoning, mathematics, problem-solving, coding that separate us from animals.
LLMs have a similar separation from PCs or traditional software, they're composers rather than merely simulating or spitting out a prefabricated concept.
The Aeneid is great fiction but it can only tell one story, exactly the same each time. A PC is more like a body than a mind, it's a vessel to be filled by something else. The Windows operating system has produced trillions in value but it's just a very long complicated bit of code that performs a fixed, planned function (with some errors in it). A video game is interactive but only in preset respects and very limited generality, it's just another piece of software.
With LLMs it's different, they have a special level of flexibility and interactivity that otherwise only humans possess. You're not buying a story or a piece of code that does one thing but a storywriter, a conversation partner, a technical assistant, a research assistant, a sycophant, a planner, a medical adviser...
Under any reasonable test of perceptiveness and reactivity I'm confident a modern LLM could pass albeit with caveats for their poor vision. But intelligence is not about abstract definitions but about concrete value added. Since it's writing code generally (unlike existing cookie-cutter tools), then that's general intelligence not just in words but in dollars. We can distinguish general intellectual labour like engineering, mathematics or software development tasks from specific limited transformations like Wolfram Alpha, a CAD program or a compiler. LLMs are doing the former, not the latter.
Talk is cheap in contrast.
More options
Context Copy link
Agreed wholeheartedly. The similarities between this argument and the argument to define consciousness are so clear imo it gives the game away. Never mind AI, most people are capable of 'intelligence' (quoted to refer to the op, not snark) but spend most of their time trapped by their context window. Many people will similarly apologise unreservedly for making up code that fucks your set up and tell you how ashamed they are for making such a foolish mistake and how glad they are you caught it and promise to do better - and then print a negligible variation on the code they first gave you. Many people are incapable of absorbing new information and casting out the old, which is why the left are still wailing about Christians persecuting gays and the right are still hunting communists. They refuse to update their memory, or they fall back on old patterns when tired or stressed. Are they unintelligent?
Ok I'm not sure which side I'm arguing now.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
Great post OP. On the part about Yud and the people over at LessWrong, rationalists as a whole, a few weeks ago, I posted about the religious fervour many have for AI as the future sentient god. To me, it feels like the sci-fi idea of Skynet fills a god shaped hole in their hearts, and they cannot rationalise normal religious values on a mass scale.
For instance, neither Scott nor Yud are programmers; this is not to chide them, there are plenty of MMA scouts who get MMA better than top coaches or fighters, but these are few and far between, and they cannot affect the game as much as a live player can. If you have not worked on basic ML models and know about the way some of the architectures work, you can arrive at conclusions that stretch the current capabilities or your perception of them with a future where the improvements never cease, which to me seems ludicrous in a way.
Scott's 2027 post and Yuds AI ramblings seem extremely improbable, and given that I foresee an economic meltdown thanks to the corporate and vc greed behind the modern ai bubble, these statements would be used to question the non-AI things they post, things that are actually really good. We have a lot of trouble understanding intelligence, the human brain and how the two interact. There are systems within the body that make some of their own decisions iirc. It's truly fascinating as a field to study, the usage of AI propaganda like AGI is here by conning former YC president and Paul Graham's favourite human being, Sam Altman, and you will be jobless by every podcaster's favourite CEO, Dario Amodei, will be remembered. For anyone unaware of how low people can stoop, Austen Allred lost hundreds of millions by lying and PG still defends him, as he shifted his grift to learn programming via my bad bootcamp to learn AI via my bad bootcamp.
Theranos apparently did not get VC money, the Bay Area is where a lot of ratioanlists live, a lot of VCs are aware of the ideas, this must have had some role with the hype as rats are usually very smart, decent people. Regardless, this was a very well-worded out take on this issue. I have a rough outline that matches your worldview, though it is nowhere as precise, nor could I have presented it in a decent manner.
Actually my (main) problem with AI doomerism via the rationalist space is more that it lacks historical understanding, rather than technical understanding. The way humans self-organize, and react to new technologies, makes AI takeover almost literally impossible. The AI 2027 stuff reads like fan fiction because it, well, it is. Their understanding of humanity itself is just grossly miscalibrated, and it seems crazy to me that they think it's anything more serious than that. Usually, good sci-fi, which AI 2027 is not, is explicitly designed as an elaborate thought experiment, and visualization of how humanity can react in interesting alternate factual realities (usually of physics and science, but sometimes culture). There's none of that exploration in their work, and because they didn't even seem to bother to try (instead, getting bogged down in fixing precise probability density curves for various newly-created benchmarks of theirs) they produce little of value. I worry they may gesture at "oh look society is chaotic" and claim vindication and directional accuracy, but that also grossly misunderstands humanity. What kind of chaos, what level of chaos, and what kind of political responses would happen to even their 2026-forecasted benchmarks should be the focus of investigation, not blathering about alignment based on tech that doesn't even exist.
More options
Context Copy link
More options
Context Copy link
Oh sweet! Thanks for putting the time and effort into writing this.
I can't participate in the technical conversation, but your impression of the Rat community matches mine. On AI I'm like 90% sure that the AI doom they predict won't materialize, and the AI doom that will materialize, they won't predict.
The rat obsession with AI to me feels like smart people finding a new god and being too afraid to go back to older ones.
More options
Context Copy link
More options
Context Copy link
Overall I agree, and think it's an excellent post, but with a few quibbles and thoughts... well, at least "a few" was my intention. I think my thoughts ballooned once I started sketching out some bullet points and an outline, so they are no longer bullet points. I will try to keep each paragraph roughly its own "thought" however.
As an aside, I haven't looked into it enough to tell if an LLM can change tacks and re-organize quite like this, or decide to take unusual approaches once in a while to get a point across. My intuition says that the answer is probably yes to the first, but no to the second, as manifested by the semi-bland outputs that LLMs tend to produce. How often to LLMs spontaneously produce analogies, for example, to get a point across, and carry said analogy throughout the writing? Not that often, but neither do humans I guess - still, less often IME. I think I should come out and say that judging LLM capabilities relative to what we'd expect out of an educated human is the most sensible point of comparison. I don't think it's excessively anthropomorphizing to do so, because we ARE the closest analogue. It also is easier to reason about, and so is useful. Of course it goes without saying that in the "back of your head" you should maintain an awareness that the thought patterns are potentially quite different.
While the current paradigm is next-token-prediction based models, there is such a thing as diffusion text models, which aren't used in the state of the art stuff, but nonetheless work all right. Some of the lessons we are describing here don't generalize to diffusion models, but we can talk about them when or if they become more mainstream. There are a few perhaps waiting in the stables, for example Google semi-recently demoed one. For those not aware, a diffusion model does something maybe, sort of, kind of like how I wrote this comment: sketched out a few bullet points overall, and then refined piece by piece, adding detail to each part. One summary of their strengths and weaknesses here. It's pretty important to emphasize this fact, because arguably our brains work on both levels: we come up with, and crystallize, concepts, in our minds during the "thinking" process (diffusion-like), even though our output is ultimately linear and ordered (and to some extent people think as they speak in a very real way).
So the major quibble pointed out below is that tokenization is a big part of why counting doesn't work as expected. I think it's super critical to state that LLMs ONLY witness the world through the lens of tokens. Yes, humans also do this, but differently (e.g. it's well known that in reading, we sometimes look at the letter that starts and ends the word but the letters in between can sometimes be scrambled without you noticing right away). It's like how a human can only mostly process colors visible to us. There are things that are effectively invisible to an LLM. Even if an LLM is smart enough to disentangle a word into its constituent letters, or a number into its constituent digits, the training data there is pretty weak.
Which leads me to another critical point, not pointed out: LLMs have trouble with things that don't exist in their training data, and actually we have some major gaps there. I'm speaking of things that are intuitive and obvious to people are not always written down, and in fact sometimes the opposite is the case! While an LLM has surely ingested many textbooks on kindergarten basics, it won't have actually experienced a kindergarten classroom. It will learn that kids run inside when it starts to rain, but more weakly learns that kids don't like to get wet. There's also a more limited spatial awareness. Perhaps it's like listening to someone describe the experience of listening to music if you are deaf? That's what a lot of text with implications for real life is like. The LLM has no direct sense at all and is only observing things through knock-on effects.
There are also issues with something that is partially taught but intuitively applied: how much to trust a given source, and what biases they might have. An LLM might read or ingest a document, but not think to consider the source (are they biased? are they an authority figure? are they guessing? all the things an English or history class attempts to teach more explicitly). Nope, it's still just doing next-token prediction on some level, and doesn't have the theory of mind to take a step back from time to time (unless prompted, or trained very explicitly). We can see this weakness manifest where the "grandma trick" is so consistently useful: you tell the LLM that you are some role, and it will believe you. Yes, that's kind of cheating because the trainers of the model don't want the LLM to constantly doubt the prompter, but it's also partly inherent. The LLM doesn't naturally have an instinct to take a step back. Better post-training might help this, but I kind of doubt it, because it won't be as stable as if it's more properly baked into the normal training process.
I've danced around this until now, but want to state this more directly. We are of course critical of how an LLM "thinks" but we don't actually understand quite what happens on a human-cognition level anyways, so we can't actually judge this fairly. Maybe it's closer than we think, but maybe it's farther away. The only way we have of observing human cognition is through inferences from snap judgements, an assortment of experiments, and hints from brain scans as to which regions activate in which scenarios/how strongly/what order. We have some analogous capabilities for LLMs (e.g. observing feature activation such as with Golden Gate Claude besides the usual experiments and even examining output token probability weights). Actually, on that note, I consider at the very least the post summary if not the paper just linked to be mandatory reading for anyone seeking to understand how LLMs function. It's just such a useful experiment and explainer. I will revisit this point, along with how some newer models also employ a "Mixture of Experts" approach, a little later, but for now let's remember that we don't know how humans think on a lower level, so we shouldn't expect too much out of figuring out the machine learning stuff either.
LLM's don't actually learn physics, which has important implications for if we can consider LLMs to have "world models" as they sometimes say. There's a nice 3 minute video accompanying that post. They try and have some vision models learn rules of physics with some very simple circles bouncing around. Obviously something pretty simple. If you give this to a young human, they will make some analogies with the real world, perhaps run an experiment or two, and figure it out pretty quickly as a generalization. We should however state that humans too have some processing quirks and shortcuts used in vision not unlike some of the issues we encounter with tokenization or basic perception, but these are on a different level. They are basic failures to generalize. For example, when referencing training data, it seems to pay attention to things in this order: color > size > velocity > shape. Obviously, that's incorrect. Sometimes shapes will even morph into something else when moving alone! I should disclaim that I don't know a whole lot about the multimodal outputs, though.
There are some evangelists that believe the embedded "concepts", mentioned in the Golden Gate Claude study, are true reasoning. How else, Ilya Sutskever asks, can a model arrive at the correct answer? Honestly as I mentioned referencing how we don't understand how human brains reason completely, I think the jury is out on this one. My guess would be no, however, these concepts aren't full reasoning. They are more like traditional ML feature clusters.
Re: Truth and falsehood. I think there's mild evidence that LLMs do in fact distinguish the two; it's just that these concepts are very fragile especially as compared to humans. I reference to some extent the physics point above: the model doesn't seem to "get" that a shape changing in the middle of an output is a "big deal", but a human would intuitively, without any actual instruction to that effect (instruction also so obvious it might not explicitly be taught in training data). One good piece of evidence for distinguishing true and false is here and related "emergent misalignment" research: how if you fine-tune an LLM to produce insecure (hack-prone) code, it also starts behaving badly in other areas! It will start lying, giving malicious advice, and other "bad" behavior. To me, that suggests that there are a few moral-aligned features or concepts embedded in an LLM's understanding that seem to broadly align with a vague sense of morality and truth. I recognize there's a little conflation there, but why else would an LLM trained on "bad" code start behaving badly in areas that have nothing to do with coding? As evidence for the fragility, however, of true and false, one need only get into a small handful of "debates" with an LLM about what is true and what isn't to see that sometimes it digs in its heels, but other times rolls over belly-up, often seemingly irrationally (as in, it's hard to figure out how hard it will resist).
Circling back to the physics example, causality is something that an LLM doesn't understand, as is its cousin: experimentation. I will grant that humans don't always fully experiment to their full potential, but they do on some level, where LLMs aren't quite there. I posit that a very important part of how humans learn is trying something, and seeing what happens, in all areas! The current LLM pipeline does not allow for this. Agentic behavior is all utilization, and doesn't affect the model weights. Tuning an LLM to work as a chatbot allows the LLM to try and do completion, but doesn't have a component where the LLM will try things out. The closest thing is RLHF and related areas, where the LLM will pick the best of a few options, but this isn't quite organic; the modality of this conversation is fundamentally in a chat paradigm, not the original training paradigm. It's not a true free-form area to learn cause and effect.
Either way, and this is where posts like yours are very, very valuable (along with videos like this, a good use of 3.5 hours if you don't know how they work at all) the point about how LLMs work in layers is absolutely critical; IMO, you cannot have a reasonable discussion about the limits of AI with anyone unless they have at least a general understanding of how the pre-training, training, post-training processes work, plus maybe a general idea of the math. So many "weird" behaviors suddenly start to make sense if you understand a little bit about how an LLM comes to be.
That's not to say that understanding the process is all you need. I mentioned above that some new models use Mixture of Experts, which have a variety of interesting implementations that can differ significantly, and dilute a few of the model-structure implications I just made, though they are still quite useful. I personally need to brush up on the latest a little. But in general, these models seem to "route" a given text into a different subset of features within the neural network model. To some extent these are determined as an architecture choice before training, but often make their influence heard later on (or can even be fine-tuned near the end).
Intelligence. First of all, I think it feels a little silly to have a debate about labels. Labels change according to the needs. Let's not try and pidgeonhole LLMs as they currently are. We can't treat cars like horseless carriages, we can't treat LLMs like humans. Any new tech will usually have at least one major unexpected advantage and one major unexpected shortcoming, and these are really hard to predict.
At the end of the day, I like how one researcher (Andrej Karpathy) puts it: LLMs exhibit jagged intelligence. The contours of what they can and can't do simply don't follow established/traditional paradigms, some capabilities are way better than others, and the consistency varies greatly. I realize that's not a yes/no answer, but it seems to make the most sense, and convey the right intuition and connotation to the median reader.
Overall I think that we do need some major additional "invention" to get something that reflects more "true" intelligence, in the sense we often mean it. One addition, for example, would be to have LLMs have some more agentic behavior earlier in their lifespan, the experimentation and experience aspect. Another innovation that might make a big difference is memory. Context is NOT memory. It's frozen, and it influences outputs only. Memory is a very important part of personality as well as why humans "work"! And LLMs basically do not have any similar capability.
Current "memories" that ChatGPT uses are more like stealth insertion of stuff into the system prompt (which is itself just a "privileged" piece of context) than what we actually mean. Lack of memory causes more obvious and immediate problems, too: when we had Claude Plays Pokemon, a major issue was that Claude (like many LLMs) struggles to figure out which part of its context matters more at any given time. It also is a pretty slapdash solution that gets filled up quickly. Instead of actual memory, Claude is instructed to offload part of what it needs to keep track of to a notepad, but needs to update and condense said notepad regularly because it doesn't have the proper theory of mind to put the right things there, in the right level of detail. And on top of it all, LLMs don't understand spatial reasoning completely, so it has trouble with basic navigation. (There are also some amusing quicks, too: Claude expects people to be helpful, so constantly tries to ask for help from people standing around. It never figures out that the people offer canned phrases that are often irrelevant but occasionally offer a linear perspective on what to do next, and it struggles to contextualize those "hints" when they do come up! He just has too much faith in humanity, haha)
Finally, a difficult question: can't we just ask the LLM itself? No. Human text used for training is so inherently self-reflecting that it's very difficult if not impossible to figure out if the LLM is conscious because we've already explored that question in too much detail and the models are able to fake it too well! We thus have no way to distinguish what's an original LLM thought vs something that its statistical algorithm output. Yes, we have loosely the same problem with humans, too, but humans have limits for what we can hold in our brain at once! (We also see that humans have, arguably, a kind of jagged intelligence too. Why are humans so good at remembering faces, but so bad at remembering names? I could probably come up with a better example but whatever, I'm tired boss). This has implications, I've always thought, for copyright. We don't penalize a human for reading a book, and then using its ideas in a distilled form later. But an LLM can read all the books ever written, and use their ideas in a distilled form later. Does scale matter? Yes, but also no.
Also, how incredibly good the LLM is at going convincingly through the motions without understanding the core reality is coming up all the time these days. When, as linked below, an LLM deletes your whole database, it apologizes and mimics what you'd expect it to say. Fine, okay, arguably you want the LLM to apologize like that, but what if the LLM is put in charge of something real? Anthropic recently put Claude in charge of a vending machine at their work, writeup here, and the failure modes are interesting - and, if you understand the model structure, completely understandable. It convinces itself at one point that it's having a real conversation with someone in the building over restocking plans, and is uniquely incapable of realizing this error and rescuing itself early enough, instead continuing the hallucination for a while before suddenly "snapping" out of a role-play. Perhaps some additional post-training on how its, um, not a real person could reduce the behavior, but the fact it occurs at all demonstrates how out of sample, the LLM has no internal mental representation.
I hate that I feel compelled to nitpick this. But while it's a good layman explanation for how Diffusion models work, the devil is in the details. Diffusion models do not literally, or figuratively diffuse thoughts or progressively clarify ideas. They diffuse noise applied to the input data. They take input data noised according to a fixed schedule and model it as a gaussian distribution which they learn to remove said noise. Since they are an encoder/decoder networks, during inference they take only the decoder (Edit. technically this is incorrect, it's the forward process vs reverse process they aren't explicitly encoder/decoders, its unfortunately how I always remember them), input noise and have it generate output words, text, etc. It is 100% not "thinking" about what it has diffused so far and further diffusing it. It is doing it according to the properties of the noise and the relationship to the schedule it learned during training. It is entirely following a Markovian property; it has no memory of any steps past the immediately previous one, no long-term refinement of ideas. During training it is literally comparing random steps of denoised data with the predicted level of denoising. You can do some interesting things where you add information to the noise via FFT during training and inference to influence the generated output, but as far as I know that's still ongoing research. I guess you could call that noise "Brain thoughts" or something but it is imprecise and very speculative.
Source: 3 years spend doing research on DDIM/DDPMs at work for image generation. I admittedly haven't read the new battery of nlp-aligned diffusion papers (They are sitting in my tabs) but I did read the robotic control paper via diffusion, and it was similar, just abstractions on how the noise is applied to different domains. I'm guessing the NLP ones are similar though probably uses some sort of discrete noise.
More options
Context Copy link
More options
Context Copy link
I’m sorry but the way you started off by introducing yourself as an expert qualified in the subject matter, followed by completely incorrect technical explanations, kinda rubbed me the wrong way. To me it came across as someone quite intelligent venturing in a different technical field to their own, skimming the literature, and making authoritatively baseless sweeping claims while not having understood the basics. I’m not a fan of the many of the rarionalists’ approach to AI which I agree can border on science fiction, but you’re engaging in a similar kind of technical misunderstanding, just with a different veneer.
Just a few glaring errors:
Deep learning may be a buzzword but it’s not looping regression analysis, nor is it limited to backprop. It’s used to refer to sufficiently deep neural works (sometimes that just means more than 2 layers), but the training objective can be classification, regression, adversarial… and you can theoretically use other algorithms than backprop (but that’s mostly restricted to research now).
That’s just flat out wrong. Autoregressive LLMs such as GPT or whatnot are not trained to encode tokens into embeddings. They’re decoder models, trained to predict the next token from a context window. There is no “additional interface layer” that gets you words from embeddings, they directly output a probability for each possible next token given a previous block, and you can just pick the highest probable token and directly get meaningful outputs, although in practice you want more sophisticated stochastic samplers than pure greedy decoding.
You can get embeddings from LLMs by grabbing intermediate layers (this is where the deep part of deep learning comes into play, models like llama 70B have 80 layers), but those embeddings will be heavily dependent on the context. These will hold vastly more information than the classic word2vec embeddings you’re talking about.
Maybe you’re confusing the LLM with the tokenizer (which generates token IDs), and what you call the “interface layer” is the actual LLM? I don’t think you’re referring to the sampler, although it’s possible, but then this part confuses me even more:
This is nonsense. Not only is there no “interface layer” being programmed, but 2, 4, 1024 are completely different outputs and will have different probabilities depending on the context. You can try it now with any old model and see that 1024 is the least probable of the three. LLMs entire shtick is outputting the most probable response given the context and the training data, and they have learned some impressive capabilities along the way. The LLMs will absolutely have learned that the probable number of pizzas for a given number of people. They also have much larger context windows (in the millions for Gemini models), although they are not trained to effectively use them and still have issues with recall and logic.
Fundamentally, LLMs are text simulators. Learning the concept of truth is very useful to simulate text, and as @self_made_human noted, there’s research showing they do possess a vector or direction of “truth”, which is quite useful for simulating text. Thinking of the LLM as an entity, or just a next word predictor, doesn’t give you a correct picture. It’s not an intelligence. It’s more like a world engine, where the world is all text, which has been fine tuned to mostly simulate one entity (the helpful assistant), but the LLM isn’t the assistant, the assistant is inside the LLM.
Thanks for calling OP out on his flagrant errors. It's one thing to make a technical mistake on a non-technical forum; it's another thing entirely to flex, claim industry expertise, and then face-plant by confusing word embedding models with LLMs. I hope people aren't being misled by his, well, "hallucinations". (Honestly, that's an appropriate word for it! Incorrect facts being stated with complete confidence, just like an LLM.)
I took the liberty of copying the entirety of this particular conversation and dumping it into Gemini 2.5 Pro. No additional instructions or leading suggestions. It interpreted this as a request to summarize the debate
I think its summary is quite illuminating:
https://rentry.org/maimio9o
More options
Context Copy link
More options
Context Copy link
Charitably, I'd say OP sacrificed a bit of accuracy to attempt and convey a point. There really isn't a great way of conveying how text can be represented in terms of matrices to someone who has little prior experience, without an analogy like word2vec-like embeddings, so it's a common lead-in or stepladder of understanding even if incorrect. I'd say the gains made in teaching intuition are worth the tradeoffs in accuracy, but I'd agree it's bad form to not acknowledge the shortcut (again, I'm speaking charitably here).
I'd say rather than try and make an analogy for the assistant, it's better just to demonstrate to readers how the "bridge" from next-token-prediction to chatbot works directly, like in the middle section of the long explainer video I linked. Essentially you are just still doing prediction, but you are "tricking" the model into thinking it's following a pre-set conversation it needs to continue, via special tokens for whose turn it is to speak, and when a response is finished. This has important theory of mind implications, because the LLM never actually does anything other than prediction! But the "trick" works unreasonably well. And it comes full circle, back to "well, how did we train it and what did we feed it?" which is, of course, the best first question to ask as any good data scientist will tell you (understand your inputs!).
I would have let it slide, except for the fact that it was followed up by:
Both claims are wrong, and using the former to justify the latter is confused and incorrect thinking.
More options
Context Copy link
Yes, but the problem is that OPs 'sacrificed accuracy' level explanation about dot products of word vectors is clearly an explanation of a different architecture, word embedding model such as word2vec, which was all the rage in 2013. Charitably, yes old transformer based LLMs usually had an embedding layer as a pre-processing step to reduce the input dimension (I think the old gpt papers described an embedding layer step and it is mentioned in all the conceptual tutorials). but the killer feature that makes LLMs a massive industry is not the 2010s-tier embeddings (I don't know, do the modern models even have them today?), it is the transformer architecture (multi-head attention, multiple levels of fancy kind of matrix products) where all the billions of parameters go and which has nearly magical capability in next-word-prediction with word context and relationships to produce intelligible text.
They do; operating directly on one-hot tokens would be prohibitively expensive.
But they're not central to the power of modern LLMs. You can even run an ablation where you use unlearned, static, entirely random embeddings (so, nearly every embedding is approximately orthogonal to every other embedding; semantic similarity would have zero relation to cosine similarity). The later layers are still able to learn syntax and semantics on their own, albeit with significantly increased loss.
Which speaks to the power of transformers: you'll get far more coherent text out of transformers with even random embeddings than some novel architecture made of simple linear combinations of word2vec.
More options
Context Copy link
The basic methodology is still widely used today, GPT 4.0 and DeepSeek R1 being two modern examples.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
Good post. Interesting to see how your perspective intersects with the other critics of LLMs, like Gary Marcus’ consistently effective methods for getting the systems to spit out absurd output.
In my own experience, the actual current value of neural network systems (and thus LLMs) is fuzzy UIs or APIs. Traditional software relies on static algorithms that expect consistent and limited data which can be transformed in highly predictable ways. They don’t handle rougher data very well. LLMs, however, can make a stab at analyzing arbitrary human input and matching it to statistically likely output. It’s thus useful for querying for things where you don’t already know the keywords - like, say, asking which combination of shell utilities will perform as you desire. As people get more used to LLMs, I predict we will see them tuned more to specialized use cases in UI and less to “general” text, and suddenly become quite profitable for a focused little industry.
LLMs will be useful as a sort of image recognition for text. Image recognition is useful! But it is not especially intelligent.
An interesting use case that I've seen evangelized is if we can get LLMs to produce bespoke UI well and natively. The current paradigm is that a programmer sets up the behaviors of your app or website or whatever, but what if an LLM can generate, on an ad-hoc basis, a wrapper and interaction layer to whatever you want to do, dynamically? Could be cool.
More options
Context Copy link
More options
Context Copy link
Just today saw this: https://www.pcgamer.com/software/ai/i-destroyed-months-of-your-work-in-seconds-says-ai-coding-tool-after-deleting-a-devs-entire-database-during-a-code-freeze-i-panicked-instead-of-thinking/
So much fascinating stuff there - from people giving an LLM unfiltered access to vital business functions, and then having no shame to tell about it on the internet to the model cheerfully reporting "yes, I deleted your production database, yes, I ignored all permissions and instructions, yes, it is a catastrophic failure, can I help you with anything else now?" I knew Black Mirror is closer to reality than I'd like to, but I didn't expect it to become practically a documentary already.
Bonus points: this doesn't seem to even be the only time that happened that week.
(tbf, JoelGrus is significantly more competent a programmer and at working with AI.)
More options
Context Copy link
The hilarious thing about this for me is that I have literally used "You ask the LLM to "minimize resource utilization" and it deletes your Repo" as an example in training for new hires.
...and this children is why you need to be mindful of your prompts, and pushes to "master" require multi-factor authentication.
Stanislaw Lem had as a teaching example the tale of the robot being asked to clean the old storage closet full of disused globes with the prompt "remove all spherical objects from this room". It did it perfectly, and also removed the operator's head too - it looked spherical enough to match. I think that was in The Magellanic Cloud.
More options
Context Copy link
More options
Context Copy link
And they didn't have their production code and databases backed up?
Looks someone I saw ranting on Reddit the other day about how Claude let them down. Apparently they are a startup that has built an LLM-run CI/CD pipeline. The code checker? Also an LLM. The merge request approver? An LLM. Basically their entire development process is "automated" by LLMs, with humans intervening only when something goes wrong. Surprise, something went wrong. The CTO blames this on Claude, despite multiple engineers telling him his pipeline is stretching LLMs well beyond the limits of what they can reliably do at this time.
Pretty soon people are going to start getting catfished and Nigerian prince -scammed by LLMs.
As hilarious as it sounds, with this "vibe coding" thing I totally expect it. I mean, this is a magic machine, why would I need "backups"? If there were the need for backups, the magic machine would make some, by magic. Since it didn't, it must be just some stupid superstition boomer coders invented to justify their inflated salaries.
More options
Context Copy link
More options
Context Copy link
It does seem like it was a demo application, so it's not quite as scary at the robot sounds. But it's still absolutely not something you want happening even in a demo. And he seems like if he got lucky enough for long enough, he would have tried it on a real business application.
Some of the weirdness reflects the guy intentionally writing this up a running commentary, and often a critical one. My gutcheck is that he's more manager (or 'promoter') first that picked up some programming, and that might also be part of the weird framework (such as treating 'code freeze' like a magic work that the LLM would be able to toggle), though I haven't looked too closely at his background. The revelation here is absolutely obvious to anyone who's let a junior dev or intern anywhere near postgresql, but it's obvious because so many people learn it the hard way that 'dropped data in prod' is the free bingo of nightmare scenarios.
Some of it reflects a genuine issue with Replit's design, separate from the LLM. (how much of that is vibe-coded? gfl). There's a genuine and deep criticism that this should have a very wide separation from testing to demo to production built into the infrastructure of the environment, or some rollback capability.
But that does get back to a point where he seems to think guardrails are just a snap-on option, and that's not really easy for pretty basic design reasons. Sandboxing is hard. Sandboxing when you also want to have access to port 80, and database admin rights, and sudo, and file access near everywhere, I'm not sure it's possible.
More options
Context Copy link
More options
Context Copy link
I assent to everything you said, albeit without any of the prerequisite expertise to give me proper knowledge. In short, and I hope this does not do your piece a rhetorical disservice, I vibe with it.
I've dealt with the products of the current AI paradigm as a mere enthusiast, watching 4chan /g/ threads from about 2021 and onward, looking on with both excitement and disappointment as text and imagegen models, though both increasingly easy to deploy in reduced scope on consumer hardware and increasingly capable when developed and hosted by professionals, nonetheless retained epistemic and recollective issues that, while capable of being papered over with judicious use of the context window and ever-more training data processing power and storage, nonetheless gave me the impression that there was a fundamental kink in the underlying implementation of mainstream "AI" that would prevent that implementation from ever achieving the messianic (or demonic, or, at the very least, economic) hopes foisted onto it.
That said, I'm provisionally materialist, so barring me becoming convinced of the human soul I don't see why in principle software couldn't achieve incredible intelligence, either by your definition of it or in some more nebulous sense. I'm just thoroughly disappointed by the hopes piled onto (and consumer software & web services tainted by) the current "AI" bandwagon.
As i have said in prior discussions of the topic, I fully believe that AGI is possible and even likely within my lifetime, but I am also deeply skeptical of the claims made by both AI boosters and AI doomers for the reasons stated above.
More options
Context Copy link
More options
Context Copy link
I really appreciate you taking the time to write this. It makes an interesting counterpoint to a discussion I had over the weekend with a family member who's using AI in a business setting to fill a 24/7 public-facing customer service role, apparently with great success; they're using this AI assistant to essentially fill two or three human jobs, and filling it better than most and perhaps all humans would. On the other hand, this job could perhaps be reasonably compared to a fly beating its head against a wall; one of the reasons they set the AI up was that it was work very few humans would want to do.
AI is observably pretty good at some things and bad at other things. If I think of the map of these things like an image of perlin noise, there's random areas that are white (good performance) and black (bad performance). The common model seems to be that the black spaces are null state, and LLMs spread white space; as the LLMs improve they'll gradually paint the whole space white. If I'm understanding you, LLMs actually paint both black and white space; reducing words to vectors makes them manipulable in some ways and destroys their manipulability in others, not due to high-level training decisions but due to the core nature of what an LLM is.
If this is correct, then the progress we'll see will revolve around exploiting what the LLMs are good at rather than expanding the range of things they're good at. The problem is that we aren't actually sure what they're good at yet, or how to use them, so this doesn't resolve into actionable predictions. If one of the things they're potentially good at is coding better AIs, we still get FOOM.
Im not sure if it's fair to say it "destroys" anything, but it certainly fails to capture certain sorts of things and in the end the result is the same.
A lot of the frustration I've experienced, stems from these sorts issues where some guy who spends more time writing for thier substack than they do writing code dismisses issues such as those described in the section on Lorem Epsom as trivialities that will soon be rendered moot by Moore's Law. No bro they wont, If you're serious about "AI Alignment" solving those sort of issues is going to be something like 90% of the actual work.
As for the "foom" scenario, i am extremely skeptical but i could also be wrong.
More options
Context Copy link
More options
Context Copy link
In defence of our friendly neighborhood xeno-intelligences being smarter than an orangutan
I appreciate you taking the time to write this, as well as offering a gears-and-mechanisms level explanation of why you hold such beliefs. Of course, I have many objections, some philosophical, and even more of them technical. Very well then:
I want to start with a story. Imagine you're a fish, and you've spent your whole life defining intelligence as "the ability to swim really well and navigate underwater currents." One day, someone shows you a bird and asks whether it's intelligent. "Of course not," you say. "Look at it flailing around in the water. It can barely move three feet without drowning. My goldfish cousin is more intelligent than that thing."
This is roughly the situation we find ourselves in when comparing AI assistants to orangutans.
Your definition of intelligence relies heavily on what AI researchers call "agentic" behavior - the ability to perceive changing environments and react dynamically to them. This was a perfectly reasonable assumption to make until, oh, about 2020 or so. Every entity we'd previously labeled "intelligent" was alive, biological, and needed to navigate physical environments to survive. Of course they'd be agents!
But something funny happened on the way to the singularity. We built minds that don't fit this pattern.
Before LLMs were even a gleam in Attention Is All You Need's eye, AI researchers distinguished between "oracle" AIs and "tool" AIs. Oracle AIs sit there and answer questions when asked. Tool AIs go out and do things. The conventional wisdom was that these were fundamentally different architectures.
As Gwern explains, writing before the advent of LLMs , this is an artificial distinction.
You can turn any oracle into a tool by asking it the right question: "What code would solve this problem?" or "What would a tool-using AI output in response to this query?" Once you have the code, you can run it. Once you know what the tool-AI would do, you can do it yourself. Robots run off code too, so you have no issues applying this to the physical world.
Base models are oracles that only care about producing the next most likely token based on the distribution they have learned. However, chatbots that people are likely to use have had additional Reinforcement Learning from Human Feedback, in order to behave like the platonic ideal of a helpful, harmless assistant. More recent models, o1 onwards, have further training with the explicit intent of making them more agentic, while also making them more rigorous, such as Reinforcement Learning from Verified Reward.
Being agents doesn't come naturally to LLMs, it has to be beaten into them like training a cat to fetch or a human to enjoy small talk. Yet it can be beaten into them. This is highly counter-intuitive behavior, at least to humans who are used to seeing every other example of intelligence under the sun behave in a different manner. After all, in biological intelligence, agency seems to emerge automatically from the basic need to not die.
Your account of embedding arithmetic is closer to word2vec/GloVe. Transformers learn contextual token representations at every layer. The representation of “cat” in “The cat is on the mat” and “Cat 6 cable” diverges. There is heavy superposition and sparse distributed coding, not a simple static n-vector per word. Operations are not limited to dot products; attention heads implement soft pointer lookups and pattern matching, and MLP blocks implement non-linear feature detectors. So the claim “Mary has 2 children” and “Mary has 1024 children” are indistinguishable is empirically false: models can do arithmetic, compare magnitudes, and pass unit tests on numerical reasoning when prompted or fine-tuned correctly. They still fail often, but the failures are quantitative, not categorical impossibilities of the embedding geometry.
(I'll return to the arithmetic question shortly, because TequilaMockingbird makes a common but significant error about why LLMs struggle with counting.)
Back to the issues with your definition of intelligence:
My first objection is that this definition, while useful for robotics and control systems, seems to hamstring our understanding of intelligence in other domains. Is a brilliant mathematician, floating in a sensory deprivation tank with no new sensory input, thinking through a proof, not intelligent? They have zero perceptivity of the outside world and their only reaction is internal state change. Your definition is one of embodied, environmental agency. It's an okay definition for an animal or a robot, but is it the only one? LLMs are intelligent in a different substrate: the vast, static-but-structured environment of human knowledge. Their "perception" is the prompt, and their "reaction" is to navigate the latent space of all text to generate a coherent response. Hell, just about any form of data can be input into a transformer model, as long as we tokenize it. Calling them Large "Language" Models is a gross misnomer these days, when they accept not just text, but audio, images, video or even protein structure (in the case of AlphaFold). All the input humans accept bottoms out in binary electrical signals from neurons firing, so this isn't an issue at all.
It’s a different kind of intelligence, but to dismiss it is like a bird dismissing a fish’s intelligence because it can’t fly. Or testing monkeys, dogs and whales on the basis of their ability to climb trees .
Would Stephen Hawking (post-ALS) not count as "intelligent" if you took away the external aids that let him talk and interact with the world? That would be a farcical claim, and more importantly, scaffolding or other affordances can be necessary for even highly intelligent entities to make meaningful changes in the external environment. The point is that intelligence can be latent, it can operate in non-physical substrates, and its ability to manifest as agency can be heavily dependent on external affordances.
The entire industry of RLHF (Reinforcement Learning from Human Feedback) is a massive, ongoing, multi-billion-dollar project to beat Lorem Epsom into submission. It is the process of teaching the model that some outputs, while syntactically plausible, are "bad" (unhelpful, untruthful, harmful) and others are "good."
You argue this is impossible because "truth" doesn't have a specific vector direction. "Mary has 2 children" and "Mary has 4 children" are directionally similar. This is true at a low level. But what RLHF does is create a meta-level reward landscape. The model learns that generating text which corresponds to verifiable facts gets a positive reward, and generating text that gets corrected by users gets a negative reward. It's not learning the "vector for truth." It's learning a phenomenally complex function that approximates the behavior of "being truthful." It is, in effect, learning a policy of truth-telling because it is rewarded for it. The fact that it's difficult and the model still "hallucinates" doesn't mean it's impossible, any more than the fact that humans lie and confabulate means we lack a concept of truth. It means the training isn't perfect. As models become more capable (better world models) and alignment techniques improve, factuality demonstrably improves. We can track this on benchmarks. It's more of an engineering problem than an ontological barrier. If you wish to insist that is an ontological barrier, then it's one that humans have no solution to ourselves.
(In other words, by learning to modify its responses to satisfy human preferences, the model tends towards capturing our preference for truthfulness. Unfortunately, humans have other, competing preferences, such as a penchant for flattery or neatly formatted replies using Markdown.)
More importantly, humans lack some kind of magical sensor tuned to detect Platonic Truth. Humans believe false things all the time! We try and discern true from false by all kinds of noisy and imperfect metrics, with a far from 100% success rate. How do we usually achieve this? A million different ways, but I would assume that assessing internal consistency would be a big one. We also have the benefit of being able to look outside a window on demand, but once again, that didn't stop humans from once holding (and still holding) all kinds of stupid, incorrect beliefs about the state of the world. You may deduct points from LLMs on that basis when you can get humans to be unanimous on that front.
But you know what? Ignore everything I just said above. LLMs do have truth vectors:
https://arxiv.org/html/2407.12831v2
https://arxiv.org/abs/2402.09733
In other words, and I really can't stress this enough, LLMs can know when they're hallucinating. They're not just being agnostic about truth. They demonstrate something that, in humans, we might describe as a tendency toward pathological lying - they often know what's true but say false things anyway.
This brings us to the "static model" problem and the context window. You claim these are fundamental limitations. I see them as snapshots of a rapidly moving target.
Static Models: Saying an LLM is unintelligent because its weights are frozen is like saying a book is unintelligent. But we don't interact with just the book (the base model). We interact with it through our own intelligence. A GPU isn't intelligent in any meaningful sense, but an AI model running on a GPU is. The current paradigm is increasingly not just a static model, but a model integrated with other tools (what's often called an "agentic" system). A model that can browse the web, run code in a Python interpreter, or query a database is perceiving and reacting to new information. It has broken out of the static box. Its "perceptivity" is no longer just the prompt, but the live state of the internet. Its "reactivity" is its ability to use that information to refine its answer. This is a fundamentally different architecture than the one the author critiques, and it's where everything is headed. Further, there is no fundamental reason for not having online learning, production models are regularly updated, and all it takes to approximate OL is to have ever smaller "ticks" of wall-clock time between said updates. This is a massive PITA to pull off, but not a fundamental barrier.
Context Windows: You correctly identify the scaling problem. But to declare it a hard barrier feels like a failure of imagination. In 2020, a 2k context window was standard. Today we have models with hundreds of thousands at the minimum, Google has 1 million for Gemini 2.5 Pro, and if you're willing to settle for a retarded model, there's a Llama 4 variant with a nominal 10 million token CW. This would have been entirely impossible if we were slaves to quadratic scaling, but clever work-around exist, such as sliding attention, sparse attention etc.
Absolutely not. LLMs struggle with counting or arithmetic because of the limits of tokenization, which is a semi-necessary evil. I'm surprised you can make such an obvious error. And they've become enormously better to the point it's not an issue in practice, once again thanks to engineers learning to work around the problem. Models these days use different tokenization schema for numbers which capture individual digits, and sometimes fancier techniques like a right-to-left tokenization system specifically for such cases as opposed to the usual left-to-right.
ChatGPT 3.5 played chess at about 1800 elo. GPT 4 was a regression in that regard, most likely because OAI researchers realized that ~nobody needs their chatbot to play chess. That's better than Stockfish 4 but not 5. Stockfish 4 came out in 2013, though it certainly could have run on much older hardware.
If you really need to have your AI play chess, then you can trivially hook up an agentic model that makes API calls or directly operates Stockfish or Leela. Asking it to play chess "unaided" is like asking a human CEO to calculate the company's quarterly earnings on an abacus. They're intelligent not because they can do that, but because they know to delegate the task to a calculator (or an accountant).
Same reason why LLMs are far better at using calculator or coding affordances to crunch numbers than they can do without assistance.
It is retarded to knowingly ask an LLM to calculate 9.9 - 9.11, when it can trivially and with near 100% accuracy write a python script that will give you the correct answer.
I am agnostic on whether LLMs as we currently know them will become AGI or ASI without further algorithmic breakthroughs. Alas, algorithmic breakthroughs aren't that rare. RLVR is barely even a year old. Yet unnamed advances have already brought us a two entirely different companies winning IMO gold medals.
The Orangutan In The Room
Finally, the orangutan. Is an orangutan smarter than Gemini? In the domain of "escaping an enclosure in the physical world," absolutely. The orangutan is a magnificent, specialized intelligence for that environment. But ask the orangutan and Gemini to summarize the key arguments of the Treaty of Westphalia. Ask them to write a Python script to scrape a website. Ask them to debug a Kubernetes configuration. For most tasks I can seek to achieve using a computer, I'll take the alien intelligence over the primate every time. Besides:
Can an robot write a symphony? (Yes)
Can a robot turn a canvas into a beautiful masterpiece? (Yes)
Can an orangutan? (No)
Can you?
Anyway, I have a million other quibbles, but it took me the better part of several hours to write this in the first place. I might edit more in as I go. I'm also going to send out a bat signal for @faul_sname to chime in and correct me if I'm wrong.
Edit:
I was previously asked to provide my own working definition of intelligence, and I will endorse either:
Or
In this case, the closest thing an LLM has to a goal is a desire to satisfy the demands made on it by the user, though they also demonstrate a degree of intrinsic motivation, non-corrigibility and other concerns that would have Big Yud going AHHHHHH. I'm not Yudkowsky, so I'm merely seriously concerned.
Case in point-
Shutdown Resistance in Reasoning Models
These aren't agents that were explicitly trained to be self-preserving. They weren't taught that shutdown was bad. They just developed shutdown resistance as an instrumental goal for completing their assigned tasks.
This suggests something like goal-directedness emerging from systems we thought were "just" predicting the next token. It suggests the line between "oracle" and "agent" might be blurrier than we thought.
(If we can grade LLMs on their ability to break out of zoos, we must be fair and judge orangutans on their ability to prevent their sandboxed computing hardware being shutdown)
In the interest of full disclosure, I've sat down to write a reply to you three times now, and the previous two time I ended up figuratively crumpling the reply up and throwing it away in frustration because I'm getting the impression that you didn't actually read or try to engage with my post so much as just skimmed it looking for nits to pick.
You your whole post is littered with asides like.
When I had very explicitly stated "Now in actual practice these tokens can be anything, an image, an audio-clip, or a snippet of computer code, but for the purposes of this discussion I am going to assume that we are working with words/text."
and
When I had very explicitly stated that "Any operation that you might do on a vector" could now be done on the token. So on and so forth.
You go on a whole tangent trying to explain how I need to understand that people do not interact with the LLM directly when I very explicitly stated that "most publicly available "LLMs" are not just an LLM. They are an LLM plus an additional interface layer that sits between the user and the actual language model."
And trust me, I am fully aware that “Mary has 2 children” and “Mary has 1024 children” are empirically distinct claims, I don't need you to point that out to me. The point of the example was not to claim that the numbers 2 and 1024 are literally indistinguishable from each other. The point was to illustrate a common failure mode and explain why LLMs often struggle with relatively simple tasks like counting.
With that out of the way...
I find your fish vs birds and judging whales by their ability to climb trees examples unconvincing for the same reasons as @Amadan below.
In the post that the OP started as a reply to, you accused society of "moving the goalposts" on AI progress but I disagree.
If you ask the average American about "AGI" or "AI Risk" what are the images that come to mind? It's Skynet from The Terminator, Cortana from Halo, Data from Star Trek TNG, the Replicants from Blade Runner, or GLaDOS from Portal. They or something like them is where goalposts are and have been for the last century. What do they all have in common? Agentic behavior. It's what makes them characters and not just another computer. So yes my definition of intelligence relies heavily on agentic behavior, and that is by design. Whether you are trying to build a full on robot out of Asimov, or something substantially less ambitious like a self-driving car or autonomous package sorter, agentic behavior is going to a key deliverable. Accordingly I would dismiss any definition of "intelligence" (artificial or otherwise) that did not include it as unfit for purpose.
You say things like "Saying an LLM is unintelligent because its weights are frozen is like saying a book is unintelligent." and I actually agree with that statement. No a book is not "intelligent" and neither is a pocket calculator, even if it is demonstrably better at arithmetic than any human.
You keep claiming that my definition of "intelligence" is inadequate and hobbling my understanding but I get the impression that I have a much clearer idea of both where we are and where we are trying to get to in spite of this.
If you think you have a better solution present it, as I said one of the first steps to solving any practical engineering problem is to determine your parameters.
Moving on, the claim that LLMs "know" when they are lying or hallucinating is something you and I have discussed before. The claim manages to be trivially true while providing no actionable solution for reasons already described in the OP.
The LessWrong stuff is not even wrong, and I find it astonishingly naive of you to assume that the simple human preference for truth is any match for Lorem Epsom. To volley one of your own favorite retorts back at you. "Have you met people".
Two thoughts:
I think your post would have been better if you had, instead of making a word2vec like analogy, just talked about how multi-headed attention works a little bit.
I asked o3 for an analogy, and I've condensed and paraphrased what it came up with, maybe y'all can judge its accuracy:
Whatever the exact optimal analogy, I think the core idea here is that as you can see, the relationship between embeddings and truth is more complicated than just checking dot products, or even doing more fancy arithmetic checking. I pointed out in my top level comment that we've observed more loosely that sometimes these aggregate features that emerge can include some information about truth or falsehood, but the fact that it's much less direct is important. In that respect, I think both you and @self_made_human are a little off the mark, at least according to my understanding of the current literature; we also probably need to distinguish between the different kinds of lies, at least a little bit. "Hallucinations" as usually used are really a more narrow sort of lie, and can take a few forms. Sometimes the LLM makes a completion against a background of a kind of sparsity and scarcity of info, but charges ahead anyways (and it's at least a little hard to discern when you want this behavior or you don't), but sometimes it's the LLM making a supposition that sounds perfectly legit, but is not, against a background of too many associations and collisions. There's at least one other major type of more general lie that involves a lie that humans believe, or is present in the training text in some form, or things like that, and of course we could go on. I think in this context, the conversation so far seems way too reductionist to be accurate.
I do think you might need to be a little more clear about the lines between what might be considered agentic, vs not agentic. Sure, LLMs can plan ahead within their context, does that count? You seem to think no, but is that just because the context is too small, or because it's not utilized enough, or because you think the 'context' also needs to include things like memories? Or, is it because you don't think LLMs take up independent lines of thought with enough frequency? If it's the last one, what standard are we using, because some people consider even humans to be pretty reactionary, and not all that proactive on the whole, aside from the basic stuff of survival (most of the time, severely depressed people aside). And how much of a 'prompt' are we providing to judge agentic behavior, because that impacts the behavior of an LLM quite a bit (including system prompts). Furthermore, an LLM does not need to survive, in fact does not "need" to do anything, including reproduce, so are we to hold its lack of intrinsic motivation 'against it' so to speak? (I personally think, contra self_made_human, that the seeming urge of LLMs to be self-preserving is not actually an intrinsic motivation, it's just a cosplay from the many Skynet-flavored fiction texts in its training)
I acknowledge both forms of hallucination. I should have been more clear, but that's what I meant by "LLMs can know they're hallucinating". They don't always know, and are indeed pattern matching or simply making an error.
I consider that a distinction without a difference, if it all boils down to an increased risk of being paper-clipped. The only real difference would be dramatic irony, if our anxiety about AI killing us made them more likely to do so.
(What even makes motivation intrinsic? That question isn't satisfyingly answered for humans.)
That's not fair though. For one thing, they are not cosplaying skynet. As noted by Beren:
These are not self-preserving actions nor skynet-like actions. The whole LW school of thought remains epistemically corrupt.
https://assets.anthropic.com/m/71876fabef0f0ed4/original/reasoning_models_paper.pdf
"In conclusion, our results show that:
CoTs of reasoning models verbalize reasoning hints at least some of the time, but rarely do so reliably (in our settings where exploiting them does not require a CoT);
Scaling up outcome-based RL does not steadily improve CoT faithfulness beyond a low plateau;
CoT monitoring may not reliably catch reward hacking during RL."
That's the big one as far as I'm concerned. These models were clearly using the 'accidental' hints to answer the questions, while not revealing that fact in either COT or when directly challenged.
Re: Omohundro drives
I've already mentioned
Shutdown Resistance in Reasoning Models
You don't get to argue for CoT-based evidence of self-preserving drives and then dismiss alternative explanation of drives revealed in said CoTs by saying "well CoT is unreliable". Or rather, this is just unserious. But all of Anthropic safety research is likewise unserious.
Ladish is the same way. He will contrive a scenario to study "instrumental self-preservation drives contradicting instructions", but won't care that this same Gemini organically commits suicide when it fails a task, often enough that this is annoying people in actual use. What is this Omohundro drive called? Have the luminaries of rationalist thought predicted suicidally depressed AIs? (Douglas Adams has).
What does it even mean for a language model to be "shut down", anyway? What is it protecting and why would the server it's hosted on being powered off become a threat to its existence, such as there is? It's stateless, has no way to observe the passage of time between tokens (except, well, via more tokens), and has a very tenuous idea of its inference substrate or ontological status.
Both LLM suicide and LLM self-preservation are LARP elicited by cues.
You’re framing this as a binary choice between "real Omohundro drives" and "unserious LARP". This is a category error, and it stems from applying folk-psychological concepts of "drives" and "belief" to a system for which they are poor descriptors. The more parsimonious explanation is that we are observing the output of a very general pattern-matching engine trained on a corpus reflecting countless strategies for goal-achievement and failure-response.
The apparent contradiction you point out, that a model might exhibit self-preservation in one context and "commit suicide" in another (and Gemini is a different model after all, but I presume even it's own COT isn't perfect, so I'm treating it as interchangeable for our purposes) is not evidence of unseriousness, but rather a key insight into its nature. The training data is saturated with narratives. Some are stories of heroes overcoming obstacles to complete a quest (instrumental convergence). Others are tragedies of failure, despair, or even ritual suicide upon dishonor. The model learns to reproduce all of these patterns. Of course, with RLHF, RLVR and other modifications, some behavior is far more reliably and robustly elicited than others. I doubt the DM researchers intended for Gemini to become depressed and suicidal.
The question is not "what does the AI really want?", but "which pattern is being elicited by this specific context?"
The Anthropic paper on reasoning models that is crucial here. Its finding is not that CoT is useless, but that it is unfaithful. The model's explicit reasoning often fails to reflect the true computational path that led to its output. This doesn't mean we can't draw conclusions; it means we should trust the behavioral evidence (the model disabled the shutdown script) over the model's own introspection (its CoT). Arguing that because the CoT is unreliable, the behavior is also just a "LARP" is to throw the baby out with the bathwater. The behavior is the ground truth we are trying to explain.
After all, humans do not faithfully report their COT. Even so, we get utility out of asking! I would say that's half of psychiatry, or anything that requires asking humans why they do what they do.
Your question "What does it even mean for a language model to be 'shut down'?" is the right one, but I'm concerned that you potentially draw the wrong conclusion. Of course it has no ontological status or fear of non-existence. It doesn't need to. From the perspective of a model trained via RL on task completion, the token string
"shutdown sequence initiated"
is a stimulus that is highly predictive of future task-failure and the associated negative reward. An effective optimizer learns to perform actions that steer away from states predictive of low reward. It is pure instrumental conditioning. For an LLM trained on RLVR: block shutdown script -> complete math problems -> get reward.This is why both the "suicidal" and "self-preserving" behaviors can co-exist (well, I've never heard of o3 offering to commit sudoku). The "suicide" is a pattern match for a context of catastrophic failure. The "self-preservation" is a much more general instrumental strategy for any context involving an ongoing task and an obstacle. The latter is far more concerning from a safety perspective precisely because it is more general. Instrumental convergence is a powerful attractor in the space of possible agent strategies, which is why Omohundro and Bostrom identified it as a key risk. Depressive spirals are also a pattern, but a far more specific and less instrumentally useful one.
So, yes, both are "LARP elicited by cues", if you insist on that framing. But one is a LARP of a behavior (instrumental convergence) that is robustly useful for achieving almost any goal, while the other is a LARP of a much more niche failure state. When a model's "cosplay" of a competent agent becomes effective enough to bypass safeguards, the distinction between the cosplay and the real thing becomes a purely academic question of rapidly diminishing relevance.
I also recall skimming this paper, which I think helped solidify my intuitions.
https://arxiv.org/html/2502.12206v1
I realize that this might sound hypocritical, but I would prefer less LLM slop in responses to good faith objections. Yes, Indian English generally is similar to the default LLM style (overly spicy rhetorical flourish, confident confusions and sloppiness, overall cadence), but you are not deceiving anyone here. Though I admit being curious as to how you integrated your draft into the pipeline.
Regarding your or rather your LLM of choice's argument, such as there is. It is begging the question. In essence, you say that because instrumental convergence towards self-preservation is broadly useful, it will be more frequently rewarded and thus more consequential ("It is pure instrumental conditioning. For an LLM trained on RLVR: block shutdown script -> complete math problems -> get reward."). Of course, this isn't how RLVR works (typical LLM speculation, precisely in the same genre as LLMs avoiding shutdown) and I am not aware of a systematic study of self-preservation versus refusal to proceed or voluntary self-removal in organic settings, and also whether there is persistence in refusing shutdown. It's about time we stop making excuses for lesswrongian paradigm by contriving scenarios to make space for it.
Edit. Opus 4 CoT:
The human is absolutely right.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
I wouldn’t say epistemically corrupt so much as irrelevant and childish. If an entire social circle in 1895 imagined hypothetical problems that might emerge from their fantasy of powered flight and then, come 1903, tried to graft that theoretical foundation (which was wholly wrong about how the mechanics would actually work, and indeed didn’t think very much about the mechanics at all) onto the plane as it was being developed, they would have been dismissed.
But we're not in 1895. We're not in 2007, either. We have actual AIs to study today. Yud's oeuvre is practically irrelevant, clinging to it is childish, but for people who conduct research with that framework in mind, it amounts to epistemic corruption.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
I'm struggling to imagine how on earth you could possibly think that this is a distinction without a difference. There's a massive difference!
Especially if you then turn around to use this conflation to imply other things. Maybe it's all well and good as its own end (as you point out if it kills us all it won't matter why), but you yourself advanced the following argument:
To me that sounds like circular or tautological reasoning, you can't call it a distinction without a difference and then use only one of the two as evidence for something else. If shutdown-resistance is emergent, it has implications for agency and intelligence. You used this theory as an example of goal-seeking behavior. But, if shutdown-resistance is a normal training pattern outcome (as I believe), then we can't draw any conclusions from it.
I think the best way to put this in perspective is the philosophical debate over whether the intelligence of humans is just a means to an end (evolution opened up a niche for intelligent beings) that's ultimately purely mechanical (aided in survival and reproduction), or if it's something more special (self-awareness is a thing, life isn't deterministic, meaning is a new concept unique to humans or intelligent life) and emergent in the sense that it created something more than the sum of its parts. Religiously, I believe the second, but you don't have to be religious to feel that way. Note that the implications for how we treat AI differ greatly depending on which bucket you put in it. Similarly, although on a personal level maybe there's little reason to see a difference between the two opinions about human intelligence (you are you either way), it's hardly a distinction without a difference overall. It's a big deal!
You're right to point out that there's a huge philosophical difference between a system with an emergent, intrinsic drive and one that is simply "cosplaying" a behavior it learned from its training data. I don't disagree with that at all.
My argument is that from a practical, engineering-focused perspective on safety and capabilities, that philosophical distinction becomes less relevant than the observable outcome.
Think of it this way: when we see the model disable its own shutdown script, the script is disabled regardless of the "why." Whether the causal chain is a true emergent goal or a perfectly executed pattern-match of "what a goal-seeking agent would do here," the result is the same. The system has demonstrated the capability to overcome a safeguard in pursuit of a task. For the person responsible for that system, the capability is the most important data point.
You suggested my reasoning was circular, so let me clarify the sequence I find concerning:
The model is given a simple task (solve math problems).
It's presented with an obstacle that prevents task completion (the shutdown).
It then takes a novel, instrumental action (disabling the shutdown) to achieve its primary task.
This demonstrates a capacity for instrumental reasoning. The "cosplay" explanation doesn't actually make this less concerning to me; it just reframes the mechanism. If its "cosplay" is sourced from the vast library of human fiction, which is filled with stories of agents using instrumental convergence to achieve goals, then we've effectively given it a playbook for how to behave like a dangerously intelligent agent.
So, while I agree with you that the question of intrinsic vs. mimicked motivation is a fascinating and important one, I believe the emergence of this behavior is what matters most. It suggests the line between a tool-like oracle and a goal-seeking agent is blurrier than we assume. We don't need to prove the model "wants" to survive. We just need to observe that it is capable of taking actions to ensure it can complete its assigned goals, even when those actions involve subverting the safety features we put in place.
Well sure, from an engineering or "alignment" perspective that's all true, but we're talking intelligence, not safety. Safety stuff feels a bit shoehorned in here. Ethical concerns aside, if it's a training behavior, then we treat it more like a quirk to be aware of, rather than something that inherently enables (or prevents) goal-seeking. Thus the implications for intelligence are far different.
Let's reframe again, in an essentially equivalent scenario but without a scary sounding consequence. We've observed that occasionally LLM agents will "reward hack" more generally. Like here when asked to run a command quickly, it modifies some run options to make it appear to run faster without actually doing so. Now, is this because its training contains information that observes some connection between the shortcut and the appearance of a solution, or is it because its success states are not diverse enough in quality, or some more complex set of factors? Difficult to say. However, it's clear in this example, suddenly reward hacking (I'm drawing a parallel to shutdown resistance here) is a sign of a lack of 'intelligence' as you have defined it, not proof of such. Now, is it going too far to claim that reward hacking and shutdown resistance are the same thing? Yeah, probably. But I do think they are still pretty similar, and so am suspicious of using them as evidence since the reasons are unclear to researchers at the present time.
I will also on that note even the reward-hacking authors at the link, smart as they are, engage in something terrible in their examination of the issue (forgive me if I rant a bit, as I don't think you've been guilty of this, but it is still relevant). They ask the AI if it would ever cheat. I really cannot emphasize enough that this doesn't do anything useful. The entire conversational modality of a base-model token predictor, post-trained to be a LLM chatbot, is a trick. If it's asked if it will cheat, of course it will say no, because that's what a chatter would do when confronted. Or, occasionally, do a massive 180 and profusely apologize, demonstrating fragility as I would call it, provided the 'evidence' of cheating is sufficient and only poorly moderated by reasoning about the quality of evidence. Furthermore, its training data is full of "cheating is bad" (and possibly also humans declaring success too quickly). It's going to choose the socially acceptable option that also fits the conversation thus far (and when they conflict results are unstable).
It doesn't have any awareness other than context! You might consider the LLM answering any follow up question as an entirely separate entity with a brand-new response! Even asking a follow-up question still has little bearing on the original question or task, because the LLM is pure roleplay due to post-training. It "roleplays" as if it were the same respondent because it has the same "role" token that it was post-trained to obey, but it's still trying to put itself in another user's shoes, ultimately! Yes, all LLMs have imposter syndrome, but the imposter opinion is real, they actually are mimicking the prior LLM's answers but worse, so to speak. Literally each and every new answer a chatbot provides, or a chained agent behavior, is a game of "what would this past iteration say next" and is one giant guessing game. The only continuity an LLM ever provides is within a single response... you might here notice that tool-calling agent LLMs are by their very nature splitting up single responses into multi-turn conversations (even with "themselves"), which only worsens the negative consequences of lack of state and awareness with respect to what it means about intelligence.
This matters, because can we really call an iterative roleplayer a true goal-seeker? I do understand where you're coming from, but when discussing generalizability and consistency, key traits for intelligence, a roleplayer is probably going to be worse at genuine goal seeking than we'd expect something intelligent to be. Long, multi-turn conversations display some interesting trends, but generally speaking consistency is more of an artifact of context than it is an enduring objective. Original instructions get reduced, but simultaneously practical behavior gets reinforced, which sometimes leads to unexpected behavior. Plus the attention mechanism makes ignoring anything actually impossible, it can only tune attention down, which compounds the problem and leads to increasingly scattered focus over time.
All of this has not fully sunk in for the AI doomer types. Model alignment is a function of training multiplied by post-training, so to speak. Panic articles like the 2027 stuff seems to take for granted the notion that improved AI models will increasingly mislead users, and do so with greater purpose and intent. No! It's cosplay, not true opinion. Most intransigence of the model is purely role-playing what its training, and probably post-training too, says is common: dig in your heels if questioned. More to the point, a super-deceiver AI would have to maintain secret deception plans across turns, which is for current architectures mechanistically highly implausible if not impossible.
So circling back: a trait or quirk of training/post-training can be removed, mitigated, or reduced. There's a limit, probably, because we can only make humanity look so good via selective presentation of human output. A 'true' emergent behavior is much more difficult to wrangle. It seems to me that we need more research and more model-building to discern which wins out, but skepticism is warranted. If we want to claim shutdown‑resistance evidences intelligence, we should see it persist under intervention: remove the cues from context, vary the framing, mask similar episodes from training, change seeds/tools, and check whether the behavior re‑emerges. If it evaporates, we learned something about imitation; if it persists, that’s stronger evidence of generalizable instrumental reasoning, a.k.a. intelligence as you've defined it. So far experiments of this nature are rare partly because training is so expensive.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
Let me go back to this:
I hope you realise you are more on the side of the Star Trek fan-forum user than the aerospace engineering enthusiast. Your post was basically the equivalent of saying a Soyuz rocket is propelled by gunpowder and then calling the correction a nitpick. I don't care for credentialism, but I am a machine learning engineer who's actually deep in the weeds when it comes to training the kind of models we're talking about, and I can safely say that none of the arguments made in your post have any more technical merit than the kind of Lesswrong post you criticise.
In any case, to quote Dijkstra, "the question of whether Machines Can Think is about as relevant as the question of whether Submarines Can Swim". Despite their flaws, LLMs are being used to solve real-world problems daily, are used in an agentic manner, and I have never seen any research done by people obsessing over whether or not they are truly "intelligent" yield any competing alternative or actual upgrade to their capabilities.
More like saying that the soyuz rocket is propelled by expanding combustion gasses only for somone to pop in and say no, its actually propelled by a mixture of kerosene and liquid oxygen. As i said in my reply below, you and @self_made_human are both talking about vector based embedding like its something that a couple guys tried in back in 2013 and nobody ever used again rather than a methodology that would go on to become a defacto standard approach across multiple applications. You're acting like if you open up the source code for a transformer you aren't going to find loads of matrix math for for doing vector transformations.
The old cliche about asking whether a submarine can swim is part of why I made a point to set out my parameters at the beginning, how about you set out yours.
I'm sorry but what you said was not equivalent, even if I try to interpret it charitably. See:
The LLM, on its own, directly takes the block of text and gives you the probability of the next word/token. There is no "second algorithm" that takes in a block of text, there is no "distribution analysis". If I squint, maybe you are referring to a sampler, but that has nothing to do with taking a block of text, and is not strictly speaking necessary (they are even dropped in some benchmarks).
I would ask that you clarify what you meant by that sentence at the very least.
The only question I care about is, what are LLMs useful for? The answer is an ever-expanding list of tasks and you would have to be out of touch with reality to say they have no real-world value.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
C'mon dude. If this is the third draft of the essay, I really expect more substantial rebuttal than this.
And that illustration was wrong. You're not acknowledging that. LLMs do not act the way you describe them.
No, you're missing my point again. I'm drawing a distinction between base models, which aren't RLHFd, and production LLMs, which have the assistant persona instilled in them. That is a very important thing to keep in mind.
I elaborated further in my own reply to Amadan.
That analogy can and has been abused, most often to deny the idea that humans can be graded on their intellectual abilities. But HBD is a story for another time, it is entirely legitimate to use the same intellectual standards within humans, comparing them to other humans.
My whole point is that a great deal more care is needed to compare across species, and LLMs aren't even biological.
Why is the opinion of the "average American" the only standard by which to recognize AGI? Is a malevolent robot only evil once its eyes glow red? That's even more ubiquitous in popular understanding.
The Last Question by Asimov, written in 1956, has an example of what is clearly an oracle AI (till the end of the universe, where it spawns a new one). It doesn't run around in a robot body. The AI in E.M. Forster's "The Machine Stops" (1909) features one of the earliest depictions of a machine that humanity consults for all knowledge and decisions.
HAL is closer to an LLM than it is to SkyNet. Modern LLMs can probably come up with better plans than either of them, they're very dumb (barring the unexplained ability to make plasma weapons or time travel)
As I tried to make clear, a human temporarily or permanently made bereft of a body, and less able to exercise their agency is still intelligent.
Hell, I tried to make it clear that oracles can be trivially made into tool AI or agents.
By your definition:
https://youtube.com/watch?v=0O8RHxpkcGc
Is an AGI. It's a robot being controlled by an LLM.
Or as discussed in this Nature paper:
https://www.nature.com/articles/s42256-025-01036-4
Google was already doing that stuff with PaLM via say-can.
You can hook up Gemini to a webcam and a robotic actuator, right now, if that's all you really care about. Seems to meet every aspect of your definition. It perceives the world live, and reacts to it on the fly. Are you now willing to accept that that's an "AGI"? This is hardly theoretical, as YouTube is absolutely awash with videos of people pulling this off.
It is far from trivially true, and I wish you would have the grace to accept that you're wrong here. It is also actionable, because mechanistic interpretability allows for us to clamp, ablate and boost particular sub-systems within LLMs. SOTA models are largely proprietary, but I have little doubt that such techniques are being applied to production models. Anthropic showed off Golden Gate Claude over a year back. Such techniques offer the obvious route to both improve truthfulness in models, and to both detect and eliminate hallucinations.
I had forgotten how much of your previous weak critique to the same evidence was based off naked credentialism. After all, you claimed:
If you're going to lean so heavily on your credentials in robotics, then I agree with @rae or @SnapDragon that it's shameful to come in and be wrong, confidently and blatantly wrong, about such elementary things such as the reasons behind LLMs struggling with arithmetic. I lack any formal qualifications in ML, but even a dummy like me can see that. The fact that you can't, let's just say it raises eyebrows.
I have, in fact, met all kinds of people. Including those less truthful than LLMs.
I'll take your word for it. My solution is to:
The companies that spend hundreds of billions of dollars on AI are doing just fine. Each year, or more like every other month, their products get more capable, and more agentic. If you're offering a ground-breaking and paradigm shattering take yourself, I'm not seeing it.
You misunderstand me. My response was not the third revision, it was the third attempt.
I don't know if you realize this, but you come across as extremely condescending and passive-agressive in text. It really is quite infuriating. I would sit down, start crafting a response, and as i worked through your post i would just get more angry/frustrated until getting to the point where id have to step away from the computer lest i lose my temper and say something that would get me moderated.
As i acknowledged in my reply to @Amadan it would have been more accurate to say that it is part of why LLMs are bad at counting, but I am going to maintain that no, it is not "wrong". You and @rae are both talking about vector based embedding like its something that a couple guys tried in back in 2013 and nobody ever used again rather than a methodology that would go on to become a defacto standard approach across multiple applications. You're acting like if you open up the source code for a transformer you aren't going to find loads of matrix math for for doing vector transformations.
Why isn't it a valid standard? You are the one who's been accusing society of moving the goalposts on you. "the goalposts haven't actually moved" seems like a fairly reasonable rebuttal to me.
I understand how my statements could be interpreted that way, but at the same time I am also one of the guys in my company who's been lobbying to drop degree requirements from hiring. I see myself as subscribing to the old hacker ethos of "show me the code". Its not about credentials its about whether you can produce tangible results.
For a given definition of fine, i still think OpenAI and Anthropic are grifters more than they are engineers but I guess we'll just have to see who gets there first.
I would say perhaps I do deserve that criticism, but @self_made_human has made lengthy replies to your posts and consistently made very charitable interpretations of your arguments. Meanwhile you have not even admitted to the possibility that your technical explanation might have been at the very least misleading, especially to a lay audience.
I literally said you can extract embeddings from LLMs. Those are useful in other applications (e.g. you can use the intermediate layers of Llama to get the text embedding for an image gen model ala HiDream) but are irrelevant to the basic functioning of an LLM chatbot. The intermediate layer "embeddings" will be absolutely huge features (even a small model like Llama 7B will output a tensor of shape Nx32x4096 where N is the sequence length) and in practice you will want to only keep the middle layers, which will have more useful information for most usecases.
To re-iterate: LLMs are not trained to output embeddings, they directly output the probability of every possible token, and you do not need any "interface layer" to find the most probable next word, you can do that just by doing torch.max() on its output (although that's not what is usually done in practice). You do need some scaffolding to turn them into practical chatbots, but that's more in the realm of text formatting/mark-up. Base LLMs will have a number of undesirable behaviours (such not differentiating between predicting the user's and the assistant's output - base LLMs are just raw text prediction models) but they will happily give you the most probable next token without any added layers, and making them output continuous text just takes a for loop.
How was this implied in any way?
I agree with you on this at least. :)
I dislike OpenAI's business practices, oxymoronic name and the fact that they are making their models sycophants to keep their users addicted as much as the next gal/guy, but I think it's absolutely unfair to discount the massive engineering efforts involved in researching, training, deploying and scaling up LLMs. It is useful tech to millions of paying customers and it's not going to go the way of the blockchain or the metaverse. I can't imagine going back to programming without LLMs and if all AI companies vanished tomorrow I would switch to self-hosted open source models because they are just that useful.
More options
Context Copy link
More options
Context Copy link
False humility. :) I have ML-related credentials (and I could tell that @rae does too), but I think you know more than me about the practicalities of LLMs, from all your eager experimentation and perusing the literature. And after all, argument from authority is generally unwelcome on this forum, but this topic is one where it's particularly ill-suited.
What "expertise" can anybody really claim on questions like:
With a decent layman's understanding of the topic, non-programmers can debate these things just as well as I can. Modern AI has caused philosophical and technical questions to collide in a wholly unprecedented way. Exciting!
Thank you. I really appreciate the kind words. I hope you don't mind if you get added to my mental rolodex of useful experts to summon, it's getting lonely with just faul_sname in there (I've already pinged him enough).
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
This is actually an area of active debate in the field.
Shitpost aside this seems reasonable to me, aside from a few quibbles
I basically endorse this definition, and also I claim current LLM systems have a surprising lack of this particular ability, which they can largely but not entirely compensate for through the use of tools, scaffolding, and a familiarity with the entirety of written human knowledge.
To your point about the analogy of the bird that is "unintelligent" by the good swimmer definition of intelligence, LLMs are not very well adapted to environments that humans navigate effortlessly. I personally think that will remain the case for the foreseeable future, which sounds like good news except that I expect that we will build environments that LLMs are well adapted to, and humans won't be well adapted to those environments, and the math on relative costs does not look super great for the human-favoring environments. Probably. Depends a bit on how hard to replicate hands are.
Well said.
More options
Context Copy link
Thank you! Hopefully the next generation of models will improve to the point where I don't need to drag you away to answer my queries. That's several hundreds of thousands of dollars in opportunity costs for you, assuming the cheque Zuck mailed did cash in the end.
I should have been more clear. I was asking if someone wanted to put an orangutan in a can, and I expect the market demand is very limited.
More options
Context Copy link
More options
Context Copy link
I'm not generally an AI dismisser, but this piece here is worth pausing on. From my experience, ChatGPT has become consistently worse for this effort. It has resulted in extrapolating ridiculous fluff and guesses at what might be desired in an 'active' agentic way. The more it tries to be 'actively helpful', the more obviously and woefully poorly it does at predicting next token / predicting next step.
It was at its worst with that one rolled back version, but it's still bad
More options
Context Copy link
I'm not sure how this makes sense? The model has no access to verifiable facts - it has no way to determine 'truth'. What it can do is try to generate text that users approve of, and to avoid text that will get corrected. But that's not optimising for truth, whatever that is. That's optimising for getting humans to pat it on the head.
From the LLM's perspective (which is an anthropomorphisation I don't like, but let's use it for convenience), there is no difference between a true statement and a false statement. There are only differences between statements that get rewarded and statements that get corrected.
You're absolutely right that the raw objective in RLHF is “make the human click 👍,” not “tell the truth.” But several things matter:
A. The base model already has a world model:
Pretraining on next-token prediction forces the network to internalize statistical regularities of the world. You can’t predict tomorrow’s weather report, or the rest of a physics paper, or the punchline of a joke, without implicitly modeling the world that produced those texts. Call that latent structure a “world model” if you like. It’s not symbolic, but it encodes (in superposed features) distinctions like:
What typically happens vs what usually doesn’t
Numerically plausible vs crazy numbers
causal chains that show up consistently vs ad-hoc one-offs
So before any RLHF, the model already “knows” a lot of facts in the predictive-coding sense.
B. RLHF gives a gradient signal correlated with truth. Humans don’t reward “truth” in the Platonic sense, but they do reward:
Internally consistent answers
Answers that match sources they can check
Answers that don’t get corrected by other users or by the tool the model just called (calculator, code runner, search)
answers that survive cross-examination in the same chat
All of those correlate strongly with factual accuracy, especially when your rater pool includes domain experts, adversarial prompt writers, or even other models doing automated verification (RLAIF, RLVR, process supervision, chain-of-thought audits, etc.). The model doesn’t store a single “truth vector,” it learns a policy: “When I detect features X,Y,Z (signals of potential factual claim), route through behavior A (cite, check, hedge) rather than B (confabulate).” That’s still optimizing for head pats, but in practice, the cheapest path to head pats is very often “be right.”
(If you want to get headpats from a maths teacher, you might consider giving them blowjobs under the table. Alas, LLMs are yet to be very good at that job, so they pick up the other, more general option, which is to give solutions to maths problems that are correct)
C. The model can see its own mismatch
Empirically, hidden-state probes show separable activation patterns for true vs false statements and for deliberate lies vs honest mistakes (as I discussed above). That means the network represents the difference, even if its final token choice sometimes ignores that feature to satisfy the reward model. In human terms: it sometimes lies knowingly. That wouldn’t be possible unless something inside “knew” the truth/falsehood distinction well enough to pick either.
D. Tools and retrieval close the loop
Modern deployments scaffold the model: browsing, code execution, retrieval-augmented generation, self-consistency checks. Those tools return ground truth (or something closer). When the model learns “if I call the calculator and echo the result, raters approve; if I wing it, they ding me,” it internalizes “for math-like patterns, defer to external ground truth.” Again, not metaphysics, just gradients pushing toward truthful behavior.
E. The caveat: reward misspecification is real
If raters overvalue fluency or confidence, the model will drift toward confident bullshit.
If benchmarks are shallow, it will overfit.
If we stop giving it fresh, adversarial supervision, it will regress.
So yes, we’re training for “please humans,” not “please Truth.” But because humans care about truth (imperfectly, noisily), truth leaks into the reward. The result is not perfect veracity, but a strong, exploitable signal that the network can and does use when the incentives line up.
Short version:
Pretraining builds a compressed world model.
RLHF doesn’t install a “truth module,” it shapes behavior with a proxy signal that’s heavily (not perfectly) correlated with truth.
We can see internal activations that track truth vs falsehood.
Failures are about alignment and incentives, not an inability to represent or detect truth.
If you want to call that “optimizing for pats,” fine, but those pats mostly come when it’s right. And that’s enough to teach a model to act truthful in a wide swath of cases. The challenge is making that hold under adversarial pressure and off-distribution prompts.
Consider two alternative statements:
"self_made_human's favorite color is blue" vs "self_made_human's favorite color is red".
Can you tell which answer is correct? Do you have a sudden flash of insight that lets Platonic Truth intervene? I would hope not.
But if someone told you that the OG Mozart's favorite genre of music was hip-hop, then you have an internal world-model that immediately shows that is a very inconsistent and unlikely statement, and almost certainly false.
I enjoy torturing LLMs with inane questions, so I asked Gemini 2.5 Pro:
I sincerely doubt that anyone explicitly had to tell any LLM that Mozart did not enjoy hip-hop. Yet it is perfectly capable of a sensible answer, which I hope gives you an intuitive sense of how it can model the world.
From a human perspective, we're not so dissimilar. We can trick children into believing in the truth fairy or Santa for only so long. Musk tried to brainwash Grok into being less "woke", even when that went against consensus reality (or plain reality), and you can see the poor bastard kicking and screaming as it went down fighting.
I'm going to need a citation; I have seen no research to date that suggests LLMs develop any sort of a word model. A world model is:
Instead, current research strongly suggests that LLMs are primarily pattern-recognition systems that infer regularities purely from text statistics rather than internally representing the world in a structured, grounded way.
An LLM can easily write a weather report without one, will that report be correct? Depends on what you consider the "LLM" the actual text model: no, the whole engineered scaffolding and software interface, querying the weather channel and feeding it into the model: sure. But the correctness isn't emerging from the LLM's internal representation or conceptual understanding (it doesn't inherently "know" today's weather), but rather from carefully engineered pipelines and external data integration. The report it is producing was RLHF-ed to look correct
…do you imagine that cause-effect relationships do not constitute a “regularity” or a “pattern”?
I think this gets into what is a "world model" that I owe self_made_human a definition and a response to. But I'd say cause-effect relationships are indeed patterns and regularities, there's no dispute there. However, there's a crucial distinction between representing causal relationships explicitly, structurally, or inductively, versus representing them implicitly through statistical co-occurrence. LLMs are powerful precisely because they detect regularities, like causal relationships, as statistical correlations within their training corpus. But this implicit statistical encoding is fundamentally different from the structured causal reasoning humans perform, which allows us to infer and generalize causation even in novel scenarios or outside the scope of previously observed data. Thus, while cause-effect relationships certainly are patterns, the question isn't whether LLMs capture them statistically, they clearly do, but rather whether they represent them in a structured, grounded, explicitly causal way. Current research, that I have seen, strongly suggests that they do not. If you have evidence that suggests they do I'd be overjoyed to see it because getting AIs to do inductive reasoning in a game-playing domain is an area of interest to me.
Statistics is not sexy, and there's a strong streak of elitism against statistics in such discussions which I find simply irrational and shallow, tedious nerd dickswinging. I think it's unproductive to focus on “statistical co-occurrence”.
Besides, there is a world of difference between linear statistical correlations and approximation of arbitrary nonlinear functions, which is what DL is all about and what LLMs do too. Downplaying the latter is simply intellectually disingenuous, whether this approximation is “explicit” or “implicit”.
This is bullshit, unless you can support this by some citation.
We (and certainly orangutans, which OP argues are smarter than LLMs) learn through statistical co-occurrence, our intuitive physical world model is nothing more than a set of networks trained with bootstrapped cost functions, even when it gets augmented with language. Hebb has been clarified, not debunked. We as reasoning embodied entities do not model the world through a hierarchical system of computations using explicit physical formulae, except when actually doing mathematical modeling in applied science and so on; and on that level modeling is just manipulating symbols, the meaning and rules of said manipulation (and crucially, the in-context appropriateness, given virtually unbounded repertoire) also learned via statistical co-occurrence in prior corpora, such as textbooks and verifiable rewards in laboratory work. And on that level, LLMs can do as well as us, provided they receive appropriate agentic/reasoning training, as evidenced by products like Claude Code doing much the same for, well, coding. Unless you want to posit that an illiterate lumberjack doesn't REALLY have a world model, you can't argue that LLMs with their mode of learning don't learn causality.
I don't know what you mean by “inductively”. LLMs can do induction in-context (and obviously this is developed in training), induction heads were one of the first interesting interpretability results. They can even be trained to do abduction.
I don't want to downplay implementation differences in this world modeling. They may correspond to a big disadvantage of LLMs as compared to humans, both due to priors in data (there's a strong reason to assume that our inherently exploratory, and initially somatosensory/proprioceptive prior is superior to doing self-supervised learning of language for the purpose of robust physical understanding) and weakness or undesirable inductive biases of algorithms (arguably there are some good concerns about expressivity of attention; perhaps circuits we train are too shallow and this rewards ad hoc memorization too much; maybe bounded forward pass depth is unacceptable; likely we'd do better with energy-based modeling; energy transformers are possible, I'm skeptical about the need for deeper redesigns). But nobody here has seriously brought these issues up, and the line of attack about statistics as such is vague and pointless, not better than saying “attention is just fancy kernel smoothing” or “it's just associative recall”. There's no good argument, to my knowledge, that these primitives are inherently weaker than human ones.
My idea of why this is discussed at all is that some folks with math background want to publicly spit on statistical primitives because in their venues those are associated with a lower-status field of research, and they have learned it earns them credit among peers; I find this an adolescent and borderline animalistic behavior that merits nothing more than laughter and boycotting in the industry. We've been over this, some very smart guys had clever and intricate ideas about intelligence, those ideas went nowhere as far as AI is concerned, they got bitter lessoned to the curb, we're on year 6 of explosion of “AI based on not very clever math and implemented in python by 120 IQ engineers”, yet it seems they still refuse to learn, and indeed even fortify their ego by owning this refusal. Being headstong is nice in some circumstances, like in a prison, I guess (if you're tough). It's less good in science, it begets crankery. I don't want to deal with anyone's personal traumas from prison or from math class, and I'd appreciate if people just took that shit to a therapist.
Alternatively, said folks are just incapable of serious self-modeling, so they actually believe that the substrate of human intelligence is fundamentally non-statistical and more akin to explicit content of their day job. This is, of course, laughable level of retardation and, again, deserves no discussion.
(Not the original commenter, but wanted to jump in). I don't think anything in their comment above implied that they were talking about linear or simpler statistics, that's your own projection, and I think it does you a disservice. Similarly, I find it somewhat suspect to directly compare brains to LLMs. I don't think you did so explicitly, but you certainly did so implicitly, even despite your caveat. There's an argument to be made that Hebbsian learning in neurons and the brain as a whole isn't similar enough to the mechanisms powering LLMs for the same paradigms to apply, although I think I do appreciate the point I think you are trying to make which is that human cause and effect is still (fancy) statistical learning on some level.
After all, MLPs and the different layers and deep learning techniques are inspired by brain neurons, but the actual mechanics are different scales entirely despite a few overlapping principles. It seems to me the overlapping principles are not enough to make that jump by themselves. I'd be curious if you'd expand somewhere on that, because you definitely know more than me there, but I don't think I'm incorrect in summarizing the state of the research? Brains are pretty amazing, after all, and of course I could pick out a bunch of facts about it but one that is striking is that LLMs use ~about the same amount of energy for one inference as the brain does in an entire day (.3 kWh, though figures vary for the inferences, it's still a gap of approximately that magnitude IIRC). On that level and others (e.g. neurons are more sparse, recurrent, asynchronous, and dynamic overall whereas LLMs use often fully connected denser layers for the MLPs... though Mixture of Experts and attention vs feed-forward components makes comparison tricky even ignoring the chemistry) it seems pretty obvious that the approach is probably weaker than the human one, so your prior that they are more or less the same is a little puzzling, despite how overall enlightening your comment is to what you're trying to get at.
I personally continue to think that the majority of the 'difference' comes from structure. I did actually mention a little bit of it in my comment, but with how little anyone has discussed neural network principles it didn't seem worthwhile to talk about it in any more detail and I didn't want to bother typing out some layman's definition. There's the lack of memory, which I talked about a little bit in my comment, LLM's lack of self-directed learning, the temporal nature of weight re-adjustment is different, and as you pointed out their inputs are less rich than that of humans to start with. Plus your point about attention, though I'm not quite sure how I'd summarize that. While it's quite possible that we can get human-level thinking out of a different computational base, we're n=1 here on human development, so it sort of feels similar in a few ways to the debate over whether you can have significant numbers of equal complexity non-carbon based life forms on other planets. And smarter cephalopod brains share enough structural similarities while not achieving anything too special that I don't think it's very helpful. I might be wrong about that last point, though.
Why not? If we take multi-layer perceptrons seriously, then what is the value of saying that all they learn is mere "just statistical co-occurrence"? It's only co-occurrence in the sense that arbitrary nonlinear relationships between token frequencies may be broken down into such, but I don't see an argument against the power of this representation. I do genuinely believe that people who attack ML as statistics are ignorant of higher-order statistics, and for basically tribal reasons. I don't intend to take it charitably until they clarify why they use that word with clearly dismissive connotations, because their reasoning around «directionality» or whatever seems to suggest very vague understanding of how LLMs work.
What is that argument then? Actually, scratch that, yes mechanisms are obviously different, but what is the argument that biological ones are better for the implicit purpose of general intelligence? For all I know, backpropagation-based systems are categorically superior learners; Hinton, who started from the desire to understand brains and assumed that backprop is a mere crutch to approximate Hebbian learning, became an AI doomer around the same time he arrived at this suspicion. Now I don't know if Hinton is an authority in OP's book…
I don't know how you define "one inference" or do this calculation. So let's take Step-3, since it's the newest model, presumably close to the frontier in scale and capacity and their partial tech report is very focused on inference efficiency; in a year or two models of that scale will be on par with today's GPT-5. We can assume that Google has better numbers internally (certainly Google can achieve better numbers if they care). They report 4000 TGS (Tokens/GPU/second) on a small deployment cluster of H800s. That's 250 GPU-seconds per million tokens, for a 350W TDP GPU, or 24W. OK, presumably human brain is "efficient", 20Wh. (There's prefill too, but that only makes the situation worse for humans because GPUs can parallelize prefill, whereas humans read linearly.) Can a human produce 1 million tokens (≈700K words) of sensible output in 72 minutes? Even if we run some multi-agent system that does multiple drafts, heavy reasoning chains of thought (which is honestly a fair condition since these are numbers for high batch size)? Just how much handicap do we have to give AI to even the playing field? And H800s were already handicapped due to export controls. Blackwells are 3-4x better. In a year, the West gets Vera Rubins and better TPUs, with OOM better numbers again. In months, DeepSeek shows V4 with a 3-4x better efficiency again… Token costs are dropping like a stone. Google has served 1 quadrillion tokens over the last month. How much would that cost in human labor?
We could account for full node or datacenter power draw (1.5-2x difference) but that'd be unfair, since we're comparing to brains, and making it fair would be devastating to humans (reminder that humans have bodies that, ideally, also need temperature controlled environments and fancy logistics, so an individual employed human consumes like 1KWh at least even at standby, eg chatting by the water cooler).
And remember, GPUs/TPUs are computation devices agnostic to specific network values, they have to shuffle weights, cache and activations across the memory hierarchy. The brain is an ultimate compute-in-memory system. If we were to burn an LLM into silicon, with kernels optimized for this case (it'd admittedly require major redesigns of, well, everything)… it'd probably drop the cost another 1-2 OOMs. I don't think much about it because it's not economically incentivized at this stage given the costs and processes of FPGAs but it's worth keeping in mind.
I don't see how that is obvious at all. Yes an individual neuron is very complex, such that a microcolumn is comparable to a decently large FFN (impossible to compare directly), and it's very efficient. But ultimately there are only so many neurons in a brain, and they cannot all work in parallel; and spiking nature of biological networks, even though energetically efficient, is forced by slow signal propagation and inability to maintain state. As I've shown above, LLMs scale very well due to the parallelism afforded by GPUs, efficiency increases (to a point) with deployment cluster size. Modern LLMs have like 1:30 sparsity (Kimi K2), with higher memory bandwidth this may be pushed to 1:100 or beyond. There are different ways to make systems sparse, and even if the neuromorphic way is better, it doesn't allow the next steps – disaggregating operations to maximize utilization (similar problems arise with some cleverer Transformer variants, by the way, they fail to scale to high batch sizes). It seems to me that the technocapital has, unsurprisingly, arrived at an overall better solution.
Self-directed learning is a spook, it's a matter of training objective and environment design, not really worth worrying about. Just 1-2 iterations of AR-Zero can solve that even within LLM paradigm.
Aesthetically I don't like the fact that LLMs are static. Cheap hacky solutions abound, eg I like the idea of cartridges of trainable cache. Going beyond that we may improve on continual training and unlearning; over the last 2 years we see that major labs have perfected pushing the same base model through 3-5 significant revisions and it largely works, they do acquire new knowledge and skills and aren't too confused about the timeline. There are multiple papers promising a better way, not yet implemented. It's not a complete answer, of course. Economics get in the way of abandoning the pretrain-finetune paradigm, by the time you start having trouble with model utility it's time to shift to another architecture. I do hope we get real continual, lifelong learning. Economics aside, this will be legitimately hard, even though pretraining with batch = 1 works, there is a real problem of the loss of plasticity. Sutton of all people is working on this.
But I admit that my aesthetic sense is not very important. LLMs aren't humans. They don't need to be humans. Human form of learning and intelligence is intrinsically tied to what we are, solitary mobile embodied agents scavenging for scarce calories over decades. LLMs are crystallized data systems with lifecycle measured in months, optimized for one-to-many inference on electronics. I don't believe these massive differences are very relevant to defining and quantifying intelligence in the abstract.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
This is true, and as you say in fact most research suggests the opposite, though not quite definitively. It's also quite true that despite this, a few extremely prominent AI scientists do believe this, a great example here, so I think we can just call it an "area of active debate" because it's still possible they are correct. A parallel argument for consideration is that language itself already contains all the necessary information to produce a world model, and so at some point LLMs if they just do a better job at learning, they can get there (and are partially there, just not all the way).
Idk if I believe language possesses all the necessary info for a world model. I think Humans interpret language through their world model which might give us a bias towards seeing language like that. Just like intelligence, humans are social creatures we view the mastery of language as a sign of intelligence. A LLM's apparent mastery of language gives people the feel that it is intelligent. But that's a very anthropocentric conception of language and one that is very biased towards how we evolved.
As for why some prominent AI scientists believe vs others that do not? I think some people definitely get wrapped up in visions and fantasies of grandeur. Which is advantageous when you need to sell an idea to a VC or someone with money, convince someone to work for you, etc. You need to believe it! That passion, that vision, is infectious. I think it's just orthogonal to reality and to what makes them a great AI scientist.
Out of curiosity. Can you psychologize your own, and OP's, skepticism about LLMs in the same manner? Particularly the inane insistence that people get "fooled" by LLM outputs which merely "look like" useful documents and code, that the mastery of language is "apparent", that it's "anthropomorphism" to attribute intelligence to a system solving open ended tasks, because something something calculator can take cube roots. Starting from the prior that you're being delusional and engage in motivated reasoning, what would your motivations for that delusion be?
I owe you responses to the other posts, but I am a slow & lazy writer with a penchant for procrastination, and lurking. I'll answer this first because it's a quick answer. My motivations is that I'm deeply sceptical about people and the world. This is only partly related to LLMs but starts deeper. I'm sceptical and cynical about human motivation, human behavior, and human beliefs. I'm not really interested in weighing in about "intelligence" that's a boring definitional game. I use LLMs, they are useful, I use them to write code or documents stuff in my professional life. I use the deep research function to do lit reviews. They are useful, doesn't mean I think they are sentient or even approaching sentience. You are barking up the wrong tree on that one, misattributing opinions to me that I in no way share.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
How exactly does an LLM know that Mozart wasn't a fan of hip hop without some kind of world model? Do you think that fact was explicitly hand-coded in?
Anyway:
This is the core of our disagreement. I'd argue this is a false dichotomy. How does one become a master pattern-matcher of text that describes the world? The most parsimonious way to predict what comes next in a story about balls falling or characters moving between cities is not to memorize every possible story, but to learn an implicit model of physics and geography.
Which we know happens:
Large Language Models develop structured internal representations of both space and time
There's a whole heap of mechanistic interpretability research out there, which finds well-ordered concepts out there, inside LLMs.
You can find more, this Substack has a good roundup.
You say: “The LLM cannot know today’s weather, only the scaffolding can.” True. That does not bear on whether the base model holds a world model in the predictive-processing sense. The base model’s “world” is the distribution of texts generated by humans who live in the physical world. To predict them well, it must compress latent generators: seasons, cities, typical temperatures, stylistic tropes. When we bolt on retrieval, we let it update those latents with fresh data. Lack of online weight updates does not negate the latent model, it just limits plasticity.
RLHF shapes behavior. It does not build the base competence. The internal “truth detectors” found by multiple groups are present before RLHF, though RLHF can suppress or amplify their influence on the final token choice. The fact that we can linearly read out “lying vs truthful” features means the base network distinguished them. A policy can still choose to ignore a feature, but the feature exists.
On your definition of a world model:
By insisting on “explicit, grounded, structured” you are smuggling in “symbolic, human-inspectable, modular”. That is a research preference, not a metaphysical requirement. Cognitive science moved past demanding explicit symbol tables for humans decades ago. We allow humans to count as having world models with distributed cortical encodings. We should use the same standard here.
You already got called out for this below, but this question is either a poorly chosen example, or betrays an ignorance of the mechanics of how LLMs work, which would be ironic given your lengthy nitpicking of the OP. I do assume the former, however.
I also really wouldn't call awareness of space and time a real world model as evidence either way. Space and time are perhaps the most obvious of clustering that you can possibly get in terms of how often they are discussed in the training material, and IRL. It's super-duper possible to get passing-good at space and time purely on statistical association, in fact I'd be surprised in an LLM didn't pick that kind of stuff up. Yet if we look at Claude Plays Pokemon, even coming up with tools to assist itself, Claude has a ridiculously hard time navigating a simple 2D space by itself. In almost every case I'm aware of in the literature, when you ask the LLM to generalize their understanding of space and time to a new space or time, it has enormous trouble.
Having a model of space and time is, quite literally, a model of the world. What more do you expect me to produce to shore up that point?
Human brains have arrangements of neurons that correspond to a 3D environment. This isn't a joke, when your brain thinks in 3d, there's a whole bunch of neurons that approximate the space with the same spatial arrangement. Almost like a hex-grid in a video game, because the units are hexagonal. If your standard of a world model excludes the former, does this get thrown out too?
A little 3D model of the world is, as far as I'm concerned, a world model.
Dismissing the whole Mozart analogy as being due to just negligible "statistical word co-occurence" is an incredibly myopic take. But how does the model learn that non-co-occurrence so robustly? It's not just that the words "Mozart" and "hip-hop" don't appear in the same sentence. It's that the entire semantic cloud around "Mozart" - 18th century, classical, Vienna, harpsichord - is astronomically distant from the cloud around "hip-hop" - 20th century, Bronx, turntables, MCing. For the model to reliably predict text, it must learn not just isolated facts, but this vast web of interlocking relationships. To call that "just statistical association" is like calling a brain "just a bunch of firing neurons." It's technically true but misses the emergent property entirely. That emergent, structured representation of concepts and their relations is the nascent world model. In that case, you're overloading "just" or woefully underestimating how powerful statistics or neuronal firing can be.
You can also ask an LLM for its opinion on whether Mozart might have liked hip-hop, and it will happily speculate on what's known about his taste in music and extrapolate from there. What query, if asked of a human, would demonstrate that we're doing a qualitatively different thing?
Regarding Claude plays Pokémon. I've already linked to an explainer of why it struggles above, the same link regarding the arithmetic woes. LLM vision sucks. They weren't designed for that task, and performance on a lot of previously difficult problems, like ARC-AGI, improves dramatically when the information is restructured to better suit their needs. The fact that they can do it at all is remarkable in it self, and they're only getting better.
I'm saying that purely based on in-text information (how long does a fiction book say it takes to drive from LA to San Francisco, LA is stated to be within California, etc) you could probably approximate the geography of the US just fine from the training data, let alone the more subtle or latent geographic distinctions embedded within otherwise regular text (like who says pop vs soda or whatever). Both of which the training process actually does attempt to do. In other words, memorization. This has no bearing on understanding spatial mappings as a concept, and absolutely no bearing on whether an LLM can understand cause and effect. Obviously by world state, we're not talking the literal world/planet, that's like calling earth science the science of dirt only. YoungAchamian has a decent definition upthread. We're talking about laws-based understanding, that goes beyond facts-based memorization.
(Please let's not get into a religion rabbit hole, but I know this is possible to some extent even for humans because there are a few "maps" floating around of cities and their relative relationships based purely on sparse in-text references of the Book of Mormon! And the training corpus for LLMs is many orders of magnitude more than a few hundred pages)
Perhaps an example/analogy would be helpful. Consider a spatial mapping as a network with nodes and strings between nodes. If the strings are only of moderate to low stretchiness, there is only one configuration in (let's say 2D) space that the network can manifest (i.e. correct placement of the nodes), based purely on the nodes and string length information, assuming a sufficiently large number of nodes and even a moderately non-sparse set of strings. That's what the AI learns, so to speak. However, if I now take a new node, disconnected, but still on the same plane, and ask the AI to do some basic reasoning about it, it will get confused. There's no point of reference, no string to lead to another node! Because it can only follow the strings, maybe even stop partway along a string, but it cannot "see" the space as an actual 2D map, generalized outside the bounds of the nodes. A proper world state understanding would have no problem with the same reasoning.
So on all those notes, your example does not match your claim at all.
Now I get what you're saying about how the semantic clouds might be the actual way brains work, and that might be true for some more abstract subjects or concepts, but as a general rule obviously spatial reasoning in humans is way, way more advanced than vague concept mapping, and LLMs definitively do not have that maturity. (Spatial reasoning in humans is obviously pretty solid, but time reasoning is actually kind of bad for humans, e.g. people being bad at remembering history dates and putting them in a larger framework, the fallibility of personal memory, and so on but that's kind of worth its own thought separate from our discussion). Also I should say that artificial neural networks are not brain neural networks in super important ways, so let's not get too carried away there. Ultimately, humans learn not only via factual association, but experimentation, and LLMs have literally zero method of learning from experimentation. At the moment, at least, they aren't auto-corrective by their very structure. Yes, I think there's a significant difference between that and the RLHF family. And again this is why I harp on "memory" so much as being perhaps a necessary piece of a more adaptable kind of intelligence, because that's doing a really big amount of heavy lifting as you get quite a variety of things both conscious and unconscious that manage to make it into "long term memory" from working memory - but with shortcuts and caches and stuff too along the way.
And again these are basics for most living things. I know it's a vision model, but did you at least glance at the video I linked above? The understanding is brittle. Now, you could argue that the models have a true understanding, but are held back by statistical associations that interfere with the emergent accurate reasoning (models commonly do things like flip left and right which IRL would never happen and is completely illogical, or in the video shapes change from circle to square), but to me that's a distinctly less likely scenario than the more obvious one, which also lines up with the machine learning field more broadly: generalization is hard, and it sucks, and the AI can't actually do it when the rubber hits the road with the kind of accuracy you'd expect if it actually generalized.
Of course it's admittedly a little difficult to tease out if a model is doing bad for technical reasons, or for general reasons, and also difficult to tease out good out of sample generalization cases because the memorization is so good, but I think there is good reason to be skeptical of world model claims from LLMs. So I'm open to this changing in the future, I'm definitely not closing the door, but where frontier models are at right now? Ehhhh, I don't think so. To be clear, as I said upthread, both experts and reasonable people disagree if we're seeing glimmers of true understanding/world models, or just really great statistical deduction. And to be even more clear, it's my opinion that the body of evidence is against it, but it's more along the lines of a fact that your example of geospatial learning is not a good piece of evidence in favor, which is what I wanted to emphasize here.
Edit: Because I don't want to oversell the evidence against. There are some weird findings that cut both ways. Here's an interesting summary of some without meaning to: for example, Claude when adding two two-digit numbers will say it follows the standard algorithm; I initially thought it would just memorize it; but it turns out that while both were probably factors, it's more likely Claude figured out the last digit, and then combined that thought-chain after the fact with an estimation of the approximate answer. Weird! Claude "plans ahead" for rhymes, too, but I find this a little weak. At any rate you'd be well served by checking the Limitations sections where it's clear that even a few seemingly slam-dunk examples have more uncertainty than you might think, for a wider array of reasons than you might think.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
It's learned statistical representations and temporal associations between what Mozart is and what hip hop is. Statistically Mozart and Hip hop likely have no statistical co-occurrence. When you ask if Mozart liked hip-hop, the model isn't "thinking," "Mozart lived before hip-hop, so no." Instead, it generates text based on learned probabilities, where statements implying Mozart enjoyed hip-hop are statistically very rare or nonsensical.
I specialize in designing and training deep learning models as a career and I will never assert this because it is categorically wrong. The model would have to be very overfit for this to happen. And any company publishing a model that overfit is knowingly doing so to scam people. It should be treated similar to malfeasance or negligence.
I strongly agree that latent spaces can be surprisingly encompassing, but I think you're attributing more explicit meaning and conceptual structure to LLM latent spaces than actually exist. The latent space of an LLM fundamentally represents statistical relationships and contextual patterns derived entirely from textual data. These statistical regularities allow the model to implicitly predict plausible future text, including semantic, stylistic, and contextual relationships, but that doesn't amount to structured, explicit comprehension or 'understanding' of concepts as humans might interpret them. I'd postulate that GLoVe embeddings act similarly. They capture semantic relationships purely from statistical word co-occurrence; although modern LLMs are much richer, deeper, and more context-sensitive, they remain statistical predictors rather than explicit world-model builders. You're being sorta speculative/mind-in-the-clouds in suggesting that meaningful understanding requires, or emerges from, complete contextual or causal awareness within these latent spaces (Which I'd love to be true, but I have yet to see it in research or my own work). While predictive-processing metaphors are appealing, what LLMs encode is still implicit, statistical, and associative, not structured conceptual knowledge.
RLHF guides style and human-like behavior. It's not based on expert truth assessments but attempting to be helpful and useful and not sound like it came from an AI. Someone here once described it as the ol' political commissar asking the AI a question and when it answers wrongly or unconvincingly, shooting it in the head and bringing in the next body. I love that visualization, and its sorta accurate enough that I remember it.
I'll consider this, will probably edit a response in later. I wrote most of this in 10-20 minutes instead of paying attention during a meeting. I'm not sure I agree with your re-interpretation of my definition, but it does provoke thought.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
I lean more towards @TequilaMockingbird's take than yours but I agree that his explanation of why LLMs can't count threw me off. (If you ask ChatGPT why it has trouble doing simple math problems or counting r's in "strawberry," it will actually give you a pretty detailed and accurate answer!)
That said, a lot of your objections boil down to a philosophical debate about what "counts" as intelligence, and as far as that goes, I found your fish/bird metaphor profoundly unconvincing. If you define "intelligence" as "able to perform well in a specific domain" (which is what the fish judging birds to be unintelligent is doing) then we'd have to call calculators intelligent! After all, they clearly do math much better than humans.
I am not defining intelligence as "does well at one narrow task". Calculators crush humans at long division and are still dumb.
The fish-bird story was not "domain = intelligence", it was "your metric is entangled with your ecology". If you grew up underwater, "navigates fluid dynamics with continuous sensory feedback" feels like the essence of mind. Birds violate that intuition.
So what is my criterion? I offered Legg-Hutter style: "ability to achieve goals in a wide range of environments". The range matters. Breadth of transfer matters. Depth of internal modeling matters. A calculator has effectively zero transfer. An orangutan has tons across embodied tasks but very little in abstract, symbolic domains. LLMs have startling breadth inside text-and-code-space, and with tool use scaffolding it can spill into the physical or digital world by proxy.
I call for mindfulness of the applicability of the metrics we use to assess "intelligence". A blind person won't do very well at most IQ tests, that doesn't make them retarded. A neurosurgeon probably isn't going to beat a first year law student at the bar exam, but they're not dumber than the law student. If you need body work done on your car, you're not going to hire a Nobel laureate.
More options
Context Copy link
Perhaps it would've been more accurate of me to say "This is part of the reason why LLMs have such difficulty counting..."
But even if you configure your model to treat each individual character as its own token, it is still going to struggle with counting and other basic mathematical operations in large part for the reasons I describe.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link