This is the Quality Contributions Roundup. It showcases interesting and well-written comments and posts from the period covered. If you want to get an idea of what this community is about or how we want you to participate, look no further (except the rules maybe--those might be important too).
As a reminder, you can nominate Quality Contributions by hitting the report button and selecting the "Actually A Quality Contribution!" option. Additionally, links to all of the roundups can be found in the wiki of /r/theThread which can be found here. For a list of other great community content, see here.
These are mostly chronologically ordered, but I have in some cases tried to cluster comments by topic so if there is something you are looking for (or trying to avoid), this might be helpful.
Jump in the discussion.
No email address required.
Notes -
I am trying my best to be charitable here, but I literally explained why that paragraph was wrong, over and over, and you... just repeated that same paragraph?
I will say it for the last time. That paragraph is pure fiction from your part. There is no interface layer, there is no second algorithm like you described, and you have completely misinterpreted how LLMs work. Ironically, that paragraph sounds like an LLM hallucination.
Am I out of bounds by saying that this is constitutes trolling at this point? This is genuinely upsetting.
Dude, look, here's code for the core functionality of a GPT2 model taken from the most simplified but still functional source I could find: https://jaykmody.com/blog/gpt-from-scratch/
This is the ENTIRE code you need to run a basic LLM (save for loading it).
Let me walk you through the important parts:
First, the prompt is encoded with a byte-pair encoding tokenizer. This groups letters together and turns them into integers to use as ids. This is just a look-up table.
The generate loop gets logits directly from running the LLM. What are logits? It's a probability that's assigned to each possible token id.
With that, you just need to take the highest value and that gives you the token.
See how the LLM directly outputted the highest probable word (or token rather)? Where is the "interface layer"? Where is the second algorithm? No such thing.
And yes, this is pretty much how ALL modern LLMs work. It's extremely simple. They just predict the next token, by themselves. All the sophisticated outputs you see arise purely out of that. THAT is the miracle that no-one could believe for a while.
When put like that, it gives the sense that one Mark Zuckerberg is seriously overpaying some recent hires.
More options
Context Copy link
I think he’s arguing that the argmax you run over the logits is not technically part of the LLM neural network so the LLM is just ‘an algorithm that produces math’ (ie produced a probability distribution), but that seems tendentious and also kind of weirdly put because it sounds like describing a tokenizer.
More options
Context Copy link
More options
Context Copy link