site banner

Is your "AI Assistant" smarter than an Orangutan? A practical engineering assessment

At the risk of doxxing myself, I have an advanced degree in Applied Mathematics. I have authored and contributed to multiple published papers, and hold a US patent all related to the use of machine learning in robotics and digital signal processing. I am currently employed as a supervising engineer by at a prominent tech company. For pseudonymity's sake I am not going to say which, but it is a name that you would recognize. I say this not to brag, but to establish some context for the following.

Imagine that you are someone who is deeply interested in space flight. You spend hours of your day thinking seriously about Orbital Mechanics and the implications of Relativity. One day you hear about a community devoted to discussing space travel and are excited at the prospect of participating. But when you get there what you find is a Star Trek fan-forum that is far more interested in talking about the Heisenberg compensators on fictional warp-drives than they are Hohmann transfers, thrust to ISP curves, or the effects on low-gravity on human physiology. That has essentially been my experience trying to discuss "Artificial Intelligence" with the rationalist community.

However at the behest of users such as @ArjinFerman and @07mk, and because X/Grok is once again in the news, I am going to take another stab at this.

Are "AI assistants" like Grok, Claude, Gemini, and DeepSeek intelligent?

I would say no, and in this post I am going to try to explain why, but to do so requires a discussion of what I think "intelligence" is and how LLMs work.

What is Intelligence
People have been philosophizing on the nature of intelligence for millennia, but for the purposes of our exercise (and my work) "intelligence" is a combination of perceptivity and reactivity. That is to say, the ability to perceive or take in new and/or changing information combined with the ability to change state based on that information. Both are necessary, and neither is sufficient on it's own. This is why Mathematicians and Computer Scientists often emphasize the use of terms like "Machine Learning" over "Artificial Intelligence" as an algorithms' behavior is almost never both.

If this definition feels unintuitive, consider it in the context of the following example. What I am saying is that an orangutan who waits until the Zookeeper is absent to use a tool to force the lock on it's enclosure is more "intelligent" than the insect that repeatedly throws itself against your kitchen window in an attempt to get outside. While they share an identical goal (to get outside) but the orangutan has demonstrated the ability to both perceive obstacles (IE the lock and the Zookeeper), and react dynamically to them in a way that the insect has not. Now obviously these qualities exist on a spectrum (try to swat a fly and it will react) but the combination of these two parameters define an axis along which we can work to evaluate both animals and algorithms, and as any good PM will tell you, the first step to solving any practical engineering problem is to identify your parameters.

Now the most common arguments for AI assistants like Grok being intelligent tend to be some variation on "Grok answered my question, ergo Grok is intelligent." or "Look at this paragraph Claude wrote, do you think you could do better?" but when evaluated against the above parameters, the ability to form grammatically correct sentences and the ability to answer questions are both orthogonal to it. An orangutan and a moth may be equally incapable of writing a Substack, but I don't expect anyone here to seriously argue that they are equally intelligent. By the same token a pocket calculator can answer questions, "what is the square root of 529?" being one example of such, but we don't typically think of pocket calculators as being "intelligent" do we?

To me, these sorts of arguments betray a significant anthropomorphic bias. That bias being the assumption that anything that a human finds complex or difficult must be computationally complex and vice versa. The truth is often the inverse. This bias leads people who do not have a background in a math or computer science to have completely unrealistic impressions of what sort of things are easy or difficult for a machine to do. For example, vector and matrix operations are a reasonably simple thing for a computer that a lot of human students struggle with. Meanwhile bipedal locomotion is something most humans do without even thinking, despite it being more computationally complex and prone to error than computing a cross product.

Speaking of vector operations, let's talk about how LLMs work...

What are LLMs
LLM stands for "Large Language Model". These models are a subset of artificial neural network that uses "Deep Learning" (essentially a fancy marketing buzzword for the combination of looping regression analysis with back-propagation) to encode a semantic token such as the word "cat" as a n-dimensional vector representing that token's relationship to the rest of the tokens in the training data. Now in actual practice these tokens can be anything, an image, an audio-clip, or a snippet of computer code, but for the purposes of this discussion I am going to assume that we are working with words/text. This process is referred to as "embedding" and what it does in effect is turn the word "cat" into something that a computer (or grad-student) can perform mathematical operations on. Any operation you might perform on a vector (addition, subtraction, transformation, matrix multiplication, etc...) can now be done on "cat".

Now because these vectors represent the relationship of the tokens to each other, words (and combinations of words) that have similar meanings will have vectors that are directionally aligned with each other. This has all sorts of interesting implications. For instance you can compute the dot product of two embedded vectors to determine whether their words are are synonyms, antonyms, or unrelated. This also allows you to do fun things like approximate the vector "cat" using the sum of the vectors "carnivorous" "quadruped" "mammal" and "feline", or subtract the vector "legs" from the vector "reptile" to find an approximation for the vector "snake". Please keep this concept of "directionality" in mind as it is important to understanding how LLMs behave, and it will come up later.

It should come as no surprise that some of the pioneers of this methodology in were also the brains behind Google Translate. You can basically take the embedded vector for "cat" from your English language model and pass it to your Spanish language model to find the vector "gato". Furthermore because all you are really doing is summing and comparing vectors you can do things like sum the vector "gato" in the Spanish model with the vector for the diminutive "-ito" and then pass it back to the English model to find the vector "kitten".

Now if what I am describing does not sound like an LLM to you, that is likely because most publicly available "LLMs" are not just an LLM. They are an LLM plus an additional interface layer that sits between the user and the actual language model. An LLM on its own is little more than a tool that turns words into math, but you can combine it with a second algorithm to do things like take in a block of text and do some distribution analysis to compute the most probable next word. This is essentially what is happening under the hood when you type a prompt into GPT or your assistant of choice.

Our Villain Lorem Epsom, and the Hallucination Problem
I've linked the YouTube video Badness = 0 a few times in prior discussions of AI as I find it to be both a solid introduction to LLMs for the lay-person, and an entertaining illustration of how anthropomorphic bias can cripple the discussion of "alignment". In it the author (who is a professor of Computer Science at Carnegie Mellon) posits a semi-demonic figure (akin to Scott Alexander's Moloch) named Lorem Epsom. The name is a play on the term Lorem Ipsom and represents the prioritization of appearance over all else. When it comes to writing, Lorem Epsom doesn't care about anything except filling the page with text that looks correct. Lorem Epsom is the kind of guy who, if you tell him that he made a mistake in the math, is liable interpret that as a personal attack. The ideas of "accuracy" "logic" "rigor" and "objective reality" are things that Lorem Epsom has heard of but that do not concern Lorem Epsom. It is very possible that you have had to deal with someone like Lorem Epsom in your life (I know I have), now think back and ask yourself how did that go?

I bring up Lorem Epsom because I think that understanding him provides some insight into why certain sorts of people are so easily fooled/taken in by AI Assistants like Claude and Grok. As discussed in the section above on "What is Intelligence", the assumption that the ability to fill a page with text is indicates the ability to perceive and react to a changing situation is an example of anthropomorphic bias. I think that a lot of people assume that because they are posing their question to a computer, they expect the answer they get to be something analogous to what they would get from a pocket calculator rather than from Lorem Epsom.

Sometime circa 2014 I kicked off a heated dispute in the comment section of a LessWrong post by asking EY why a paperclip maximizing AI that was capable of self-modification wouldn't just modify the number of paperclips in its memory. I was accused by him others and a number of others of missing the point, but I think they missed mine. The assumption that an Artificial Intelligence would not only have a notion of "truth", but assign value to it is another example of anthropomorphic bias. If you asked Lorem Epsom to maximize the number of paperclips, and he could theoretically "make" a billion-trillion paperclips simply by manipulating a few bits, why wouldn't he? It's so much more easier than cutting and bending wire.

In order to align an AI to care about truth and accuracy you first need a means of assessing and encoding truth and it turns out that this is a very difficult problem within the context of LLMs, bordering on mathematically impossible. Do you recall how LLMs encode meaning as a direction in n-dimensional space? I told you it was going to come up again.

Directionally speaking we may be able to determine that "true" is an antonym of "false" by computing their dot product. But this is not the same thing as being able to evaluate whether a statement is true or false. As an example "Mary has 2 children", "Mary has 4 children", and "Mary has 1024 children" may as well be identical statements from the perspective of an LLM. Mary has a number of children. That number is a power of 2. Now if the folks programming the interface layer were clever they might have it do something like estimate the most probable number of children based on the training data, but the number simply can not matter to the LLM the way it might matter to Mary, or to someone trying to figure out how many pizzas they ought to order for the family reunion because the "directionality" of one positive integer isn't all that different from any another. (This is why LLMs have such difficulty counting if you were wondering)

In addition to difficulty with numbers there is the more fundamental issue that directionality does not encode reality. The directionality of the statement "Donald Trump is the 47th President of the United States", would be identical regardless of whether Donald Trump won or lost the 2024 election. Directionally speaking there is no difference between a "real" court case and a "fictitious" court case with identical details.

The idea that there is a ineffable difference between true statements and false statements, or between hallucination and imagination is wholly human conceit. Simply put, a LLM that doesn't "hallucinate" doesn't generate text or images at all. It's literally just a search engine with extra steps.

What does this have to do with intelligence?
Recall that I characterized intelligence as a combination of perceptivity and and the ability to react/adapt. "AI assistants" as currently implemented struggle with both. This is partially because LLMs as currently implemented are largely static objects. They are neither able to take in new information, nor discard old. The information they have at time of embedding is the information they have. This imposes substantial loads on the context window of the interface layer, as any ability to "perceive" and subsequently "react" must happen within it's boundaries. Increasing the size of the window is non trivial as the relationship between the size of the window and the amount of memory and the number of FLOPS required is a hyperbolic curve. This is why we saw a sudden flurry of development following the release of Nvidia's multimodal framework and it's mostly been marginal improvements since. The last significant development being June of last year when the folks at Deepseek came up with some clever math to substantially reduce the size of the key value cache, but multiplicative reductions are no match for exponential growth.

This limited context window, coupled with the human tendency to anthropomorphize things is why AI Assistants sometimes appear "oblivious" or "naive" to the uninitiated. and why they seem to "double down" on mistakes. They can not perceive something that they have not been explicitly prompted to even if it is present in their training data. This limited context window is also why if you actually try to play a game of chess with Chat GPT it will forget the board-state and how pieces move after a few turns and promptly lose to a computer program written in 1976. Unlike a human player (or an Atari 2600 for that matter) your AI assistant can't just look at the board (or a representation of the board) and pick a move. This IMO places them solidly on the "insect" side of the perceptivity + reactivity spectrum.

Now there are some who have suggested that the context window problem can be solved by making the whole model less static by continuously updating and re-embedding tokens as the model runs, but I am skeptical that this would result in the sort of gains that AI boosters like Sam Altman claim. Not only would it be computationally prohibitive to do at scale, what experiments there have been (or at least that I am aware of) with self-updating language models, have quickly spun away into nonsense for reasons described in the section on Lorem Epsom., as barring some novel breakthrough in the embedding/tokenization process there is no real way to keep hallucinations and spurious inputs from rapidly overtaking the everything else.

It is already widely acknowledged amongst AI researchers and developers that the LLM-based architecture being pushed by OpenAI and DeepSeek is particularly ill-suited for any application where accuracy and/or autonomy are core concerns, and it seems to me that this unlikely to change without a complete ground-up redesign from first principles.

In conclusion, it is for the reasons above and many others that I do not believe that "AI Assistants" like Grok, Claude, and Gemini represent a viable path towards a "True AGI" along the lines of Skynet or Mr. Data, and if asked "which is smarter, Grok, Claude, Gemini, or an orangutan?" I am going to pick the orangutan every time.

19
Jump in the discussion.

No email address required.

A. The base model already has a world model:

Pretraining on next-token prediction forces the network to internalize statistical regularities of the world. You can’t predict tomorrow’s weather report, or the rest of a physics paper, or the punchline of a joke, without implicitly modeling the world that produced those texts. Call that latent structure a “world model” if you like. It’s not symbolic, but it encodes (in superposed features) distinctions like:

What typically happens vs what usually doesn’t Numerically plausible vs crazy numbers causal chains that show up consistently vs ad-hoc one-offs

I'm going to need a citation; I have seen no research to date that suggests LLMs develop any sort of a word model. A world model is:

  • An explicit internal representation of cause-effect relationships.
  • Grounded reasoning about physical, social, or conceptual structures independent of linguistic statistics.
  • A structured understanding of external reality beyond pure linguistic correlation.

Instead, current research strongly suggests that LLMs are primarily pattern-recognition systems that infer regularities purely from text statistics rather than internally representing the world in a structured, grounded way.

An LLM can easily write a weather report without one, will that report be correct? Depends on what you consider the "LLM" the actual text model: no, the whole engineered scaffolding and software interface, querying the weather channel and feeding it into the model: sure. But the correctness isn't emerging from the LLM's internal representation or conceptual understanding (it doesn't inherently "know" today's weather), but rather from carefully engineered pipelines and external data integration. The report it is producing was RLHF-ed to look correct

Instead, current research strongly suggests that LLMs are primarily pattern-recognition systems that infer regularities purely from text statistics rather than internally representing the world in a structured, grounded way.

…do you imagine that cause-effect relationships do not constitute a “regularity” or a “pattern”?

I think this gets into what is a "world model" that I owe self_made_human a definition and a response to. But I'd say cause-effect relationships are indeed patterns and regularities, there's no dispute there. However, there's a crucial distinction between representing causal relationships explicitly, structurally, or inductively, versus representing them implicitly through statistical co-occurrence. LLMs are powerful precisely because they detect regularities, like causal relationships, as statistical correlations within their training corpus. But this implicit statistical encoding is fundamentally different from the structured causal reasoning humans perform, which allows us to infer and generalize causation even in novel scenarios or outside the scope of previously observed data. Thus, while cause-effect relationships certainly are patterns, the question isn't whether LLMs capture them statistically, they clearly do, but rather whether they represent them in a structured, grounded, explicitly causal way. Current research, that I have seen, strongly suggests that they do not. If you have evidence that suggests they do I'd be overjoyed to see it because getting AIs to do inductive reasoning in a game-playing domain is an area of interest to me.

However, there's a crucial distinction between representing causal relationships explicitly, structurally, or inductively, versus representing them implicitly through statistical co-occurrence

Statistics is not sexy, and there's a strong streak of elitism against statistics in such discussions which I find simply irrational and shallow, tedious nerd dickswinging. I think it's unproductive to focus on “statistical co-occurrence”.

Besides, there is a world of difference between linear statistical correlations and approximation of arbitrary nonlinear functions, which is what DL is all about and what LLMs do too. Downplaying the latter is simply intellectually disingenuous, whether this approximation is “explicit” or “implicit”.

But this implicit statistical encoding is fundamentally different from the structured causal reasoning humans perform, which allows us to infer and generalize causation even in novel scenarios or outside the scope of previously observed data.

This is bullshit, unless you can support this by some citation.

We (and certainly orangutans, which OP argues are smarter than LLMs) learn through statistical co-occurrence, our intuitive physical world model is nothing more than a set of networks trained with bootstrapped cost functions, even when it gets augmented with language. Hebb has been clarified, not debunked. We as reasoning embodied entities do not model the world through a hierarchical system of computations using explicit physical formulae, except when actually doing mathematical modeling in applied science and so on; and on that level modeling is just manipulating symbols, the meaning and rules of said manipulation (and crucially, the in-context appropriateness, given virtually unbounded repertoire) also learned via statistical co-occurrence in prior corpora, such as textbooks and verifiable rewards in laboratory work. And on that level, LLMs can do as well as us, provided they receive appropriate agentic/reasoning training, as evidenced by products like Claude Code doing much the same for, well, coding. Unless you want to posit that an illiterate lumberjack doesn't REALLY have a world model, you can't argue that LLMs with their mode of learning don't learn causality.

I don't know what you mean by “inductively”. LLMs can do induction in-context (and obviously this is developed in training), induction heads were one of the first interesting interpretability results. They can even be trained to do abduction.

I don't want to downplay implementation differences in this world modeling. They may correspond to a big disadvantage of LLMs as compared to humans, both due to priors in data (there's a strong reason to assume that our inherently exploratory, and initially somatosensory/proprioceptive prior is superior to doing self-supervised learning of language for the purpose of robust physical understanding) and weakness or undesirable inductive biases of algorithms (arguably there are some good concerns about expressivity of attention; perhaps circuits we train are too shallow and this rewards ad hoc memorization too much; maybe bounded forward pass depth is unacceptable; likely we'd do better with energy-based modeling; energy transformers are possible, I'm skeptical about the need for deeper redesigns). But nobody here has seriously brought these issues up, and the line of attack about statistics as such is vague and pointless, not better than saying “attention is just fancy kernel smoothing” or “it's just associative recall”. There's no good argument, to my knowledge, that these primitives are inherently weaker than human ones.

My idea of why this is discussed at all is that some folks with math background want to publicly spit on statistical primitives because in their venues those are associated with a lower-status field of research, and they have learned it earns them credit among peers; I find this an adolescent and borderline animalistic behavior that merits nothing more than laughter and boycotting in the industry. We've been over this, some very smart guys had clever and intricate ideas about intelligence, those ideas went nowhere as far as AI is concerned, they got bitter lessoned to the curb, we're on year 6 of explosion of “AI based on not very clever math and implemented in python by 120 IQ engineers”, yet it seems they still refuse to learn, and indeed even fortify their ego by owning this refusal. Being headstong is nice in some circumstances, like in a prison, I guess (if you're tough). It's less good in science, it begets crankery. I don't want to deal with anyone's personal traumas from prison or from math class, and I'd appreciate if people just took that shit to a therapist.

Alternatively, said folks are just incapable of serious self-modeling, so they actually believe that the substrate of human intelligence is fundamentally non-statistical and more akin to explicit content of their day job. This is, of course, laughable level of retardation and, again, deserves no discussion.

I have seen no research to date that suggests LLMs develop any sort of a world model.

This is true, and as you say in fact most research suggests the opposite, though not quite definitively. It's also quite true that despite this, a few extremely prominent AI scientists do believe this, a great example here, so I think we can just call it an "area of active debate" because it's still possible they are correct. A parallel argument for consideration is that language itself already contains all the necessary information to produce a world model, and so at some point LLMs if they just do a better job at learning, they can get there (and are partially there, just not all the way).

Idk if I believe language possesses all the necessary info for a world model. I think Humans interpret language through their world model which might give us a bias towards seeing language like that. Just like intelligence, humans are social creatures we view the mastery of language as a sign of intelligence. A LLM's apparent mastery of language gives people the feel that it is intelligent. But that's a very anthropocentric conception of language and one that is very biased towards how we evolved.

As for why some prominent AI scientists believe vs others that do not? I think some people definitely get wrapped up in visions and fantasies of grandeur. Which is advantageous when you need to sell an idea to a VC or someone with money, convince someone to work for you, etc. You need to believe it! That passion, that vision, is infectious. I think it's just orthogonal to reality and to what makes them a great AI scientist.

How exactly does an LLM know that Mozart wasn't a fan of hip hop without some kind of world model? Do you think that fact was explicitly hand-coded in?

Anyway:

Instead, current research strongly suggests that LLMs are primarily pattern-recognition systems that infer regularities purely from text statistics rather than internally representing the world in a structured, grounded way.

This is the core of our disagreement. I'd argue this is a false dichotomy. How does one become a master pattern-matcher of text that describes the world? The most parsimonious way to predict what comes next in a story about balls falling or characters moving between cities is not to memorize every possible story, but to learn an implicit model of physics and geography.

Which we know happens:

Large Language Models develop structured internal representations of both space and time

The researchers discovered that large language models (LLMs) develop structured internal representations of both space (geographic locations) and time (historical dates/periods) during their training, even though they’re only trained to predict the next word in text.

There's a whole heap of mechanistic interpretability research out there, which finds well-ordered concepts out there, inside LLMs.

You can find more, this Substack has a good roundup.

You say: “The LLM cannot know today’s weather, only the scaffolding can.” True. That does not bear on whether the base model holds a world model in the predictive-processing sense. The base model’s “world” is the distribution of texts generated by humans who live in the physical world. To predict them well, it must compress latent generators: seasons, cities, typical temperatures, stylistic tropes. When we bolt on retrieval, we let it update those latents with fresh data. Lack of online weight updates does not negate the latent model, it just limits plasticity.

The report it is producing was RLHF-ed to look correct

RLHF shapes behavior. It does not build the base competence. The internal “truth detectors” found by multiple groups are present before RLHF, though RLHF can suppress or amplify their influence on the final token choice. The fact that we can linearly read out “lying vs truthful” features means the base network distinguished them. A policy can still choose to ignore a feature, but the feature exists.

On your definition of a world model:

By insisting on “explicit, grounded, structured” you are smuggling in “symbolic, human-inspectable, modular”. That is a research preference, not a metaphysical requirement. Cognitive science moved past demanding explicit symbol tables for humans decades ago. We allow humans to count as having world models with distributed cortical encodings. We should use the same standard here.

How exactly does an LLM know that Mozart wasn't a fan of hip hop without some kind of world model? Do you think that fact was explicitly hand-coded in?

You already got called out for this below, but this question is either a poorly chosen example, or betrays an ignorance of the mechanics of how LLMs work, which would be ironic given your lengthy nitpicking of the OP. I do assume the former, however.

I also really wouldn't call awareness of space and time a real world model as evidence either way. Space and time are perhaps the most obvious of clustering that you can possibly get in terms of how often they are discussed in the training material, and IRL. It's super-duper possible to get passing-good at space and time purely on statistical association, in fact I'd be surprised in an LLM didn't pick that kind of stuff up. Yet if we look at Claude Plays Pokemon, even coming up with tools to assist itself, Claude has a ridiculously hard time navigating a simple 2D space by itself. In almost every case I'm aware of in the literature, when you ask the LLM to generalize their understanding of space and time to a new space or time, it has enormous trouble.

Having a model of space and time is, quite literally, a model of the world. What more do you expect me to produce to shore up that point?

Human brains have arrangements of neurons that correspond to a 3D environment. This isn't a joke, when your brain thinks in 3d, there's a whole bunch of neurons that approximate the space with the same spatial arrangement. Almost like a hex-grid in a video game, because the units are hexagonal. If your standard of a world model excludes the former, does this get thrown out too?

A little 3D model of the world is, as far as I'm concerned, a world model.

Dismissing the whole Mozart analogy as being due to just negligible "statistical word co-occurence" is an incredibly myopic take. But how does the model learn that non-co-occurrence so robustly? It's not just that the words "Mozart" and "hip-hop" don't appear in the same sentence. It's that the entire semantic cloud around "Mozart" - 18th century, classical, Vienna, harpsichord - is astronomically distant from the cloud around "hip-hop" - 20th century, Bronx, turntables, MCing. For the model to reliably predict text, it must learn not just isolated facts, but this vast web of interlocking relationships. To call that "just statistical association" is like calling a brain "just a bunch of firing neurons." It's technically true but misses the emergent property entirely. That emergent, structured representation of concepts and their relations is the nascent world model. In that case, you're overloading "just" or woefully underestimating how powerful statistics or neuronal firing can be.

You can also ask an LLM for its opinion on whether Mozart might have liked hip-hop, and it will happily speculate on what's known about his taste in music and extrapolate from there. What query, if asked of a human, would demonstrate that we're doing a qualitatively different thing?

Regarding Claude plays Pokémon. I've already linked to an explainer of why it struggles above, the same link regarding the arithmetic woes. LLM vision sucks. They weren't designed for that task, and performance on a lot of previously difficult problems, like ARC-AGI, improves dramatically when the information is restructured to better suit their needs. The fact that they can do it at all is remarkable in it self, and they're only getting better.

I'm saying that purely based on in-text information (how long does a fiction book say it takes to drive from LA to San Francisco, LA is stated to be within California, etc) you could probably approximate the geography of the US just fine from the training data, let alone the more subtle or latent geographic distinctions embedded within otherwise regular text (like who says pop vs soda or whatever). Both of which the training process actually does attempt to do. In other words, memorization. This has no bearing on understanding spatial mappings as a concept, and absolutely no bearing on whether an LLM can understand cause and effect. Obviously by world state, we're not talking the literal world/planet, that's like calling earth science the science of dirt only. YoungAchamian has a decent definition upthread. We're talking about laws-based understanding, that goes beyond facts-based memorization.

(Please let's not get into a religion rabbit hole, but I know this is possible to some extent even for humans because there are a few "maps" floating around of cities and their relative relationships based purely on sparse in-text references of the Book of Mormon! And the training corpus for LLMs is many orders of magnitude more than a few hundred pages)

Perhaps an example/analogy would be helpful. Consider a spatial mapping as a network with nodes and strings between nodes. If the strings are only of moderate to low stretchiness, there is only one configuration in (let's say 2D) space that the network can manifest (i.e. correct placement of the nodes), based purely on the nodes and string length information, assuming a sufficiently large number of nodes and even a moderately non-sparse set of strings. That's what the AI learns, so to speak. However, if I now take a new node, disconnected, but still on the same plane, and ask the AI to do some basic reasoning about it, it will get confused. There's no point of reference, no string to lead to another node! Because it can only follow the strings, maybe even stop partway along a string, but it cannot "see" the space as an actual 2D map, generalized outside the bounds of the nodes. A proper world state understanding would have no problem with the same reasoning.

So on all those notes, your example does not match your claim at all.

Now I get what you're saying about how the semantic clouds might be the actual way brains work, and that might be true for some more abstract subjects or concepts, but as a general rule obviously spatial reasoning in humans is way, way more advanced than vague concept mapping, and LLMs definitively do not have that maturity. (Spatial reasoning in humans is obviously pretty solid, but time reasoning is actually kind of bad for humans, e.g. people being bad at remembering history dates and putting them in a larger framework, the fallibility of personal memory, and so on but that's kind of worth its own thought separate from our discussion). Also I should say that artificial neural networks are not brain neural networks in super important ways, so let's not get too carried away there. Ultimately, humans learn not only via factual association, but experimentation, and LLMs have literally zero method of learning from experimentation. At the moment, at least, they aren't auto-corrective by their very structure. Yes, I think there's a significant difference between that and the RLHF family. And again this is why I harp on "memory" so much as being perhaps a necessary piece of a more adaptable kind of intelligence, because that's doing a really big amount of heavy lifting as you get quite a variety of things both conscious and unconscious that manage to make it into "long term memory" from working memory - but with shortcuts and caches and stuff too along the way.

And again these are basics for most living things. I know it's a vision model, but did you at least glance at the video I linked above? The understanding is brittle. Now, you could argue that the models have a true understanding, but are held back by statistical associations that interfere with the emergent accurate reasoning (models commonly do things like flip left and right which IRL would never happen and is completely illogical, or in the video shapes change from circle to square), but to me that's a distinctly less likely scenario than the more obvious one, which also lines up with the machine learning field more broadly: generalization is hard, and it sucks, and the AI can't actually do it when the rubber hits the road with the kind of accuracy you'd expect if it actually generalized.

Of course it's admittedly a little difficult to tease out if a model is doing bad for technical reasons, or for general reasons, and also difficult to tease out good out of sample generalization cases because the memorization is so good, but I think there is good reason to be skeptical of world model claims from LLMs. So I'm open to this changing in the future, I'm definitely not closing the door, but where frontier models are at right now? Ehhhh, I don't think so. To be clear, as I said upthread, both experts and reasonable people disagree if we're seeing glimmers of true understanding/world models, or just really great statistical deduction. And to be even more clear, it's my opinion that the body of evidence is against it, but it's more along the lines of a fact that your example of geospatial learning is not a good piece of evidence in favor, which is what I wanted to emphasize here.

Edit: Because I don't want to oversell the evidence against. There are some weird findings that cut both ways. Here's an interesting summary of some without meaning to: for example, Claude when adding two two-digit numbers will say it follows the standard algorithm; I initially thought it would just memorize it; but it turns out that while both were probably factors, it's more likely Claude figured out the last digit, and then combined that thought-chain after the fact with an estimation of the approximate answer. Weird! Claude "plans ahead" for rhymes, too, but I find this a little weak. At any rate you'd be well served by checking the Limitations sections where it's clear that even a few seemingly slam-dunk examples have more uncertainty than you might think, for a wider array of reasons than you might think.

How exactly does an LLM know that Mozart wasn't a fan of hip hop without some kind of world model? Do you think that fact was explicitly hand-coded in?

It's learned statistical representations and temporal associations between what Mozart is and what hip hop is. Statistically Mozart and Hip hop likely have no statistical co-occurrence. When you ask if Mozart liked hip-hop, the model isn't "thinking," "Mozart lived before hip-hop, so no." Instead, it generates text based on learned probabilities, where statements implying Mozart enjoyed hip-hop are statistically very rare or nonsensical.

Do you think that fact was explicitly hand-coded in?

I specialize in designing and training deep learning models as a career and I will never assert this because it is categorically wrong. The model would have to be very overfit for this to happen. And any company publishing a model that overfit is knowingly doing so to scam people. It should be treated similar to malfeasance or negligence.

To predict them well, it must compress latent generators: seasons, cities, typical temperatures, stylistic tropes. When we bolt on retrieval, we let it update those latents with fresh data.

I strongly agree that latent spaces can be surprisingly encompassing, but I think you're attributing more explicit meaning and conceptual structure to LLM latent spaces than actually exist. The latent space of an LLM fundamentally represents statistical relationships and contextual patterns derived entirely from textual data. These statistical regularities allow the model to implicitly predict plausible future text, including semantic, stylistic, and contextual relationships, but that doesn't amount to structured, explicit comprehension or 'understanding' of concepts as humans might interpret them. I'd postulate that GLoVe embeddings act similarly. They capture semantic relationships purely from statistical word co-occurrence; although modern LLMs are much richer, deeper, and more context-sensitive, they remain statistical predictors rather than explicit world-model builders. You're being sorta speculative/mind-in-the-clouds in suggesting that meaningful understanding requires, or emerges from, complete contextual or causal awareness within these latent spaces (Which I'd love to be true, but I have yet to see it in research or my own work). While predictive-processing metaphors are appealing, what LLMs encode is still implicit, statistical, and associative, not structured conceptual knowledge.

RLHF shapes behavior. It does not build the base competence.

RLHF guides style and human-like behavior. It's not based on expert truth assessments but attempting to be helpful and useful and not sound like it came from an AI. Someone here once described it as the ol' political commissar asking the AI a question and when it answers wrongly or unconvincingly, shooting it in the head and bringing in the next body. I love that visualization, and its sorta accurate enough that I remember it.

By insisting on “explicit, grounded, structured” you are smuggling in “symbolic, human-inspectable, modular”. That is a research preference, not a metaphysical requirement. Cognitive science moved past demanding explicit symbol tables for humans decades ago. We allow humans to count as having world models with distributed cortical encodings. We should use the same standard here.

I'll consider this, will probably edit a response in later. I wrote most of this in 10-20 minutes instead of paying attention during a meeting. I'm not sure I agree with your re-interpretation of my definition, but it does provoke thought.