Zvi Mowshowitz reporting on an LLM exhibiting unprompted instrumental convergence. Figured this might be an update to some Mottizens.
- 34
- 10
Zvi Mowshowitz reporting on an LLM exhibiting unprompted instrumental convergence. Figured this might be an update to some Mottizens.
Jump in the discussion.
No email address required.
Notes -
They are not good at that yet. But there are already indicators that they could become so.
So to say that machine learning can't synthesise information from two fields in ways that have not been done before needs more qualification, to be defensible.
I was talking about (transformer-based generative) LLMs specifically. I am not a sufficiently good mathematician to feel confident in this answer, but LLMs and diffusion models are very different in structure and training, and I don't think that you can generalise from one to the other. Midjourney is basically a diffusion model, unscrambling random noise to 'denoise' the image that it thinks is there. The body with spiky hair seems like the model alternatively interpreting the same blurry patch of pixels as 'spikes' because 'hedgehog' and 'hair' because 'boy'. Which I think is very different from a predictive LLM realising that concept A has implications when combined with concept B that generates previously unknown information C.
I haven't kept up to date on RL, but I don't think this is relevant. Firstly because the concept of self-play is not really relevant to text generation, and secondly because I don't suppose the ability to play chess is being applied to go. Indeed, I don't really see how it could be, because the state and action space is different for each game. It seems more likely to me that the same huge set of parameters can store state-action-reward correlations for multiply games simultaneously without that information interacting in any significant way.
I'm not aware of this. Can you give some more info?
Diffusion models work for text too. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10909201/
The blending of concepts that we see in MidJourney is probably less to do with the diffusion per se as with CLIP - a building block within diffusion. CLIP aligns a language model with an image model. Moving concepts between different representations helps with concept generation. There's a lot being done with 'MultiModal models' to make the integration between different modalities work better.
'Self play' is relevant for text generation. There is a substantial cottage industry in using LLMs to evaluate the output of LLMs and learn from the feedback. It can be easier to evaluate whether text 'is good' than it is to generate good text. So multiple attempts and variations can lead to feedback and improvement. Mostly self play to improve LLMs is done at the level of optimising prompts. However the outputs improved by that method can be used as training examples, and so can be used to update the underlying weights.
https://topologychat.com is a commercial example of using LLMs in a way inspired by chess programming (Leela, Stockfish). It does a form of self play on inputs that have been given to it, building up and prioritising different lines. It then uses these results to update weights in a mixture of experts model.
Here's the quote from Geoffrey Hinton:
From transcript at https://medium.com/@jalilnkh/geoffrey-hinton-will-digital-intelligence-replace-biological-intelligence-fc23feb83cfb of the video.
Last I checked, diffusion models work at all for text but they don't work very well. More specifically, text diffusion models remind me quite strongly of the classic-style Markov chain text generators that used to be popular for generating amusing locally-coherent-globally-word-salad text. Here's the best concrete example of this I can point to (italicized text is the prompt, normal text is continuation, examples sourced from this JDP tweet, whole tweet is excellent but somewhat OT here):
Diffusion model:
GPT-2:
Now obviously in the limit as computational power and training data volume go to infinity, diffusion models and transformer models will generate the same text, since in the limit they're pulling from the same distribution with minimal error. But in the very finite regime we find ourselves in, diffusion models "spend" their accuracy on making the text locally coherent (so if you take a random 10 token sequence, it looks very typical of 10 token sequences within the training set), while transformer LLMs "spend" their accuracy on global coherence (so if you take two 10 token sequences a few hundred tokens apart in the same generated output, you would say that those two sequences look like they came from the same document in the training set).
Agreed. Obvious once you point it out but I hadn't thought about it that way before, so thanks.
Notably, Anthropic's Constitutional AI (i.e. the process by which Anthropic turned a base LLM into the "helpful, honest, harmless" Claude) process used RLAIF, which is self play by another name. And that's one big cottage.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link