I found this insightful, particularly bringing back LLMs into the Yudkowski / Sequences fold, whereas many have claimed the rise of LLMs has shown decades of Yudkowskian AI speculation to be way off base. I don't have enough technical knowledge to evaluate the accuracy of this post, but I am hopeful that large parts of it are true.
The brute-force training process naturally sculpts Transformers into inference engines. They don’t just approximate the math; they build a physical geometry — orthogonal hypothesis frames and entropy-ordered manifolds — that implements Bayesian updating as a mechanical process.
They aren’t Bayesian by design; they are Bayesian by geometry.
To the extent the article has merit, it does seem to explain why CoT and Reasoning models are able to "outperform". The 20 questions model, where we are not merely bisecting the information space, but looking to maximize rejection or filtering, offers a lot of insight into the nature of the problem. When a fixed number of layers gets exhausted, is this where normal models hallucinate? With CoT or reasoning, we can feed the smaller space back into the first layer, and continue filtering down.

Jump in the discussion.
No email address required.
Notes -
It was meant to be a comment on his: "If you can't explain it to a 5 year old, you don't really understand it" (the irony is not lost of me). Here (this medium article and hundreds others like it) I feel like people use deliberately obscuring or jargon rich language because its not about the conversation, its about the social-intellectual signaling.
And what is honestly the worst part, is if you don't partake you are the odd one out. Signaling you are either too autistic, truth-driven, or asocial to really get with the program of speaking AI-gobbly-gook speak to the peons in an attempt to sell more, network more, chest bump more etc. It really is a weird and annoying af failure mode.
More options
Context Copy link