I found this insightful, particularly bringing back LLMs into the Yudkowski / Sequences fold, whereas many have claimed the rise of LLMs has shown decades of Yudkowskian AI speculation to be way off base. I don't have enough technical knowledge to evaluate the accuracy of this post, but I am hopeful that large parts of it are true.
The brute-force training process naturally sculpts Transformers into inference engines. They don’t just approximate the math; they build a physical geometry — orthogonal hypothesis frames and entropy-ordered manifolds — that implements Bayesian updating as a mechanical process.
They aren’t Bayesian by design; they are Bayesian by geometry.
To the extent the article has merit, it does seem to explain why CoT and Reasoning models are able to "outperform". The 20 questions model, where we are not merely bisecting the information space, but looking to maximize rejection or filtering, offers a lot of insight into the nature of the problem. When a fixed number of layers gets exhausted, is this where normal models hallucinate? With CoT or reasoning, we can feed the smaller space back into the first layer, and continue filtering down.

Jump in the discussion.
No email address required.
Notes -
Here is some more on CoT that I find related and interesting, but revolving around deception:
https://nickandresen.substack.com/p/how-ai-is-learning-to-think-in-secret
Thanks, this was very interesting, and I loved the way he touched on something i've noticed. There are people out there that require that exact same "narrated internal monologue" to produce and coordinate more complex ideas and tasks, as opposed to my personal experience, where thought is instant and arrives almost fully formed, later having to squeeze all that rich highly abstract information into communicable words. I had first noticed that back in highschool in foreign language classes, my other classmates who were slow and clumsily wading through that other language (here omitted for opsec reasons) when asked would explain how they first read the assignment fragments, had to translate what they read from the language to english, understand what it was then form a reply in english, translate that in their head and then speak the answer, as opposed to reading the text directly in that other language, instantly understanding/forming a thought then converting that into words.
More options
Context Copy link
More options
Context Copy link
Admittedly I skimmed it but I didn't find it all that enlightening. Maybe this is just my personal pet peeve or I am getting cantankerous in my old age, but I really hate the sci-fi/mathematical-gooblie-gook that lots of AI/ML discourse becomes. It become increasingly hard for my midwit engineering brain to parse what is actually being talked about rather than people wanting to be seen "to have deep AI understanding". It's like a deliberate failing of the Feynman Technique in order to sound impressive and smart. If I have to spend 10 mins a paragraph and a bunch of Wikipedia tabs trying to understand what you are conveying, that's bad.
I'm actually curious what the semantic distance between this and how I talk about AI/ML in my professional life to my non-mle colleagues. Maybe I sound closer to this than I think, which terrifies me. My heuristic is the more unparsable, jargon-filled sounding it is, the more scammer-adjacent the speaker.
It doesn't seem to be a technique at all, just... Trying to explain it, then try again addressing any issues you had etc. It's the basic process of learning/thinking (with others' critiquing) which is... what conversation is. I suppose this is a very interesting failure mode (of overall human networking), where too many people engaging with a topic before really understanding reduces overall knowledge!
It was meant to be a comment on his: "If you can't explain it to a 5 year old, you don't really understand it" (the irony is not lost of me). Here (this medium article and hundreds others like it) I feel like people use deliberately obscuring or jargon rich language because its not about the conversation, its about the social-intellectual signaling.
And what is honestly the worst part, is if you don't partake you are the odd one out. Signaling you are either too autistic, truth-driven, or asocial to really get with the program of speaking AI-gobbly-gook speak to the peons in an attempt to sell more, network more, chest bump more etc. It really is a weird and annoying af failure mode.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
As a machine learning researcher, I don't find anything in the article that is outright wrong, but I also don't see any insights that are new or useful to me.
One of the fundamental results of machine learning theory states (very informally) that every learning algorithm has both a bayesian and a frequentist interpretation. So the quote "[LLMs] aren’t Bayesian by design; they are Bayesian by geometry" is certainly true, it is also true of every other possible learning algorithm. Basically everything else in the article strikes me as the same sort of tautology.
I find this sort of thing to be rampant in my circles of tech and research especially--reframing an existing thing as though that in and of itself accomplishes something. I worked at a startup once that wanted to write their own data processing pipeline "using category theory," somehow. The only thing more heretical than asking them to say precisely what they meant was to ask why in god's name you'd do such a thing.
More options
Context Copy link
More options
Context Copy link