@rae comments on "Quality Contributions Report for July 2025

This is the Quality Contributions Roundup. It showcases interesting and well-written comments and posts from the period covered. If you want to get an idea of what this community is about or how we want you to participate, look no further (except the rules maybe--those might be important too).

As a reminder, you can nominate Quality Contributions by hitting the report button and selecting the "Actually A Quality Contribution!" option. Additionally, links to all of the roundups can be found in the wiki of /r/theThread which can be found here. For a list of other great community content, see here.

These are mostly chronologically ordered, but I have in some cases tried to cluster comments by topic so if there is something you are looking for (or trying to avoid), this might be helpful.

import numpy as np def gelu(x): return 0.5 * x * (1 + np.tanh(np.sqrt(2 / np.pi) * (x + 0.044715 * x**3))) def softmax(x): exp_x = np.exp(x - np.max(x, axis=-1, keepdims=True)) return exp_x / np.sum(exp_x, axis=-1, keepdims=True) def layer_norm(x, g, b, eps: float = 1e-5): mean = np.mean(x, axis=-1, keepdims=True) variance = np.var(x, axis=-1, keepdims=True) return g * (x - mean) / np.sqrt(variance + eps) + b def linear(x, w, b): return x @ w + b def ffn(x, c_fc, c_proj): return linear(gelu(linear(x, **c_fc)), **c_proj) def attention(q, k, v, mask): return softmax(q @ k.T / np.sqrt(q.shape[-1]) + mask) @ v def mha(x, c_attn, c_proj, n_head): x = linear(x, **c_attn) qkv_heads = list(map(lambda x: np.split(x, n_head, axis=-1), np.split(x, 3, axis=-1))) casual_mask = (1 - np.tri(x.shape[0])) * -1e10 out_heads = [attention(q, k, v, casual_mask) for q, k, v in zip(*qkv_heads)] x = linear(np.hstack(out_heads), **c_proj) return x def transformer_block(x, mlp, attn, ln_1, ln_2, n_head): x = x + mha(layer_norm(x, **ln_1), **attn, n_head=n_head) x = x + ffn(layer_norm(x, **ln_2), **mlp) return x def gpt2(inputs, wte, wpe, blocks, ln_f, n_head): x = wte[inputs] + wpe[range(len(inputs))] for block in blocks: x = transformer_block(x, **block, n_head=n_head) return layer_norm(x, **ln_f) @ wte.T def generate(inputs, params, n_head, n_tokens_to_generate): from tqdm import tqdm for _ in tqdm(range(n_tokens_to_generate), "generating"): logits = gpt2(inputs, **params, n_head=n_head) next_id = np.argmax(logits[-1]) inputs = np.append(inputs, [next_id]) return list(inputs[len(inputs) - n_tokens_to_generate :]) def main(prompt: str, n_tokens_to_generate: int = 40, model_size: str = "124M", models_dir: str = "models"): from utils import load_encoder_hparams_and_params encoder, hparams, params = load_encoder_hparams_and_params(model_size, models_dir) input_ids = encoder.encode(prompt) assert len(input_ids) + n_tokens_to_generate < hparams["n_ctx"] output_ids = generate(input_ids, params, hparams["n_head"], n_tokens_to_generate) output_text = encoder.decode(output_ids) return output_text if __name__ == "__main__": import fire fire.Fire(main)

Jump in the discussion.

No email address required.

rae 3mo ago · Edited 3mo ago

I am trying my best to be charitable here, but I literally explained why that paragraph was wrong, over and over, and you... just repeated that same paragraph?

I will say it for the last time. That paragraph is pure fiction from your part. There is no interface layer, there is no second algorithm like you described, and you have completely misinterpreted how LLMs work. Ironically, that paragraph sounds like an LLM hallucination.

Am I out of bounds by saying that this is constitutes trolling at this point? This is genuinely upsetting.

Dude, look, here's code for the core functionality of a GPT2 model taken from the most simplified but still functional source I could find: https://jaykmody.com/blog/gpt-from-scratch/

This is the ENTIRE code you need to run a basic LLM (save for loading it).

Let me walk you through the important parts:

First, the prompt is encoded with a byte-pair encoding tokenizer. This groups letters together and turns them into integers to use as ids. This is just a look-up table.

The generate loop gets logits directly from running the LLM. What are logits? It's a probability that's assigned to each possible token id.

With that, you just need to take the highest value and that gives you the token.

See how the LLM directly outputted the highest probable word (or token rather)? Where is the "interface layer"? Where is the second algorithm? No such thing.

And yes, this is pretty much how ALL modern LLMs work. It's extremely simple. They just predict the next token, by themselves. All the sophisticated outputs you see arise purely out of that. THAT is the miracle that no-one could believe for a while.

Context

ControlsFreak rae 3mo ago

When put like that, it gives the sense that one Mark Zuckerberg is seriously overpaying some recent hires.

JarJarJedi ControlsFreak 3mo ago

Pretty sure he is overpaying, but in general, the distance between a toy model and a production-quality system is gigantic, and crossing that distance is about 95% of what software developer (and managers, and project/product managers) does. The devil is always in details, and "pretty much" is not "exactly". So generally having an extremely well paid people to spend a lot of time on improving the product based on a widely known and relatively simple ideas is something that a lot of software companies do, and make a lot of money on it, and it's not stupid at all. Zuck maybe going a bit overboard with exactly how much well paid, but otherwise it's not weird at all.

Corvos rae 3mo ago

I think he’s arguing that the argmax you run over the logits is not technically part of the LLM neural network so the LLM is just ‘an algorithm that produces math’ (ie produced a probability distribution), but that seems tendentious and also kind of weirdly put because it sounds like describing a tokenizer.

What is this place?

Why are you called The Motte?

New post guidelines

Rules

Recommended Posts And Communities

Recommended Realtime Chats

Quality Contributions Report for July 2025

Quality Contributions to the Main Motte

Automatic Cognition Engines

Big Eyes, Small Mouth

Contributions for the week of June 30, 2025

Contributions for the week of July 7, 2025

Building a History

Critical Self-Reflection

Contributions for the week of July 14, 2025

Identity (?) Politics

Contributions for the week of July 21, 2025

Contributions for the week of July 28, 2025