Yeah that's what I'm saying. All the RL methods applied to LLMs just scale the gradients in fancy ways when computing next token loss. Nothing about it changes the nature of LLMs as "next token predictors".
Perhaps it's two separate critiques of LLMs expressed via the same language?
Objection 1: "all they do is simply maximize the probability of their pretraining data". This is essentially a critique of maximum likelihood estimation, but is not true of RL stages of training.
Objection 2: "architecturally, they're systems that simply sample tokens one by one conditioned on the existing context. And this is not the kind of entity that can be truly smart/conscious/whatever". But as for this point, RL doesn't really have any bearing. RL doesn't architecturally change what an LLM is; it's a strategy to make its ability to sample tokens smarter.
Double akchually - even during reinforcement learning, you're optimizing a form of next token prediction. It's just with RL you're trying to next-token-predict in a way that optimizes for cumulative reward, instead of optimizing for similarity to the training set (as with the pretraining or supervised finetuning stages). So it's a matter of MLE objective vs RL objective, but it's still next token prediction either way.
Iirc it's « Les Gryffonchads »
The French version renames a lot of stuff too. Hogwarts -> Poudlard, Snape -> Rogue, Tom Riddle -> Tom Elvis Jedusor, Slytherin -> Serpentard, Muggle -> Moldu, etc.
- Prev
- Next

I wasn't actually making either of those objections mysef btw, I was trying to clarify what I think LLM skeptics are usually trying to say when they criticize LLMs as "stochastic parrots" etc.
That being said, for various reasons, I don't think the architecture of LLMs is the type of thing that can produce consciousness. But I'm still fairly bullish on LLMs anyway.
More options
Context Copy link