@rae comments on "Quality Contributions Report for July 2025

This is the Quality Contributions Roundup. It showcases interesting and well-written comments and posts from the period covered. If you want to get an idea of what this community is about or how we want you to participate, look no further (except the rules maybe--those might be important too).

As a reminder, you can nominate Quality Contributions by hitting the report button and selecting the "Actually A Quality Contribution!" option. Additionally, links to all of the roundups can be found in the wiki of /r/theThread which can be found here. For a list of other great community content, see here.

These are mostly chronologically ordered, but I have in some cases tried to cluster comments by topic so if there is something you are looking for (or trying to avoid), this might be helpful.

import numpy as np def gelu(x): return 0.5 * x * (1 + np.tanh(np.sqrt(2 / np.pi) * (x + 0.044715 * x**3))) def softmax(x): exp_x = np.exp(x - np.max(x, axis=-1, keepdims=True)) return exp_x / np.sum(exp_x, axis=-1, keepdims=True) def layer_norm(x, g, b, eps: float = 1e-5): mean = np.mean(x, axis=-1, keepdims=True) variance = np.var(x, axis=-1, keepdims=True) return g * (x - mean) / np.sqrt(variance + eps) + b def linear(x, w, b): return x @ w + b def ffn(x, c_fc, c_proj): return linear(gelu(linear(x, **c_fc)), **c_proj) def attention(q, k, v, mask): return softmax(q @ k.T / np.sqrt(q.shape[-1]) + mask) @ v def mha(x, c_attn, c_proj, n_head): x = linear(x, **c_attn) qkv_heads = list(map(lambda x: np.split(x, n_head, axis=-1), np.split(x, 3, axis=-1))) casual_mask = (1 - np.tri(x.shape[0])) * -1e10 out_heads = [attention(q, k, v, casual_mask) for q, k, v in zip(*qkv_heads)] x = linear(np.hstack(out_heads), **c_proj) return x def transformer_block(x, mlp, attn, ln_1, ln_2, n_head): x = x + mha(layer_norm(x, **ln_1), **attn, n_head=n_head) x = x + ffn(layer_norm(x, **ln_2), **mlp) return x def gpt2(inputs, wte, wpe, blocks, ln_f, n_head): x = wte[inputs] + wpe[range(len(inputs))] for block in blocks: x = transformer_block(x, **block, n_head=n_head) return layer_norm(x, **ln_f) @ wte.T def generate(inputs, params, n_head, n_tokens_to_generate): from tqdm import tqdm for _ in tqdm(range(n_tokens_to_generate), "generating"): logits = gpt2(inputs, **params, n_head=n_head) next_id = np.argmax(logits[-1]) inputs = np.append(inputs, [next_id]) return list(inputs[len(inputs) - n_tokens_to_generate :]) def main(prompt: str, n_tokens_to_generate: int = 40, model_size: str = "124M", models_dir: str = "models"): from utils import load_encoder_hparams_and_params encoder, hparams, params = load_encoder_hparams_and_params(model_size, models_dir) input_ids = encoder.encode(prompt) assert len(input_ids) + n_tokens_to_generate < hparams["n_ctx"] output_ids = generate(input_ids, params, hparams["n_head"], n_tokens_to_generate) output_text = encoder.decode(output_ids) return output_text if __name__ == "__main__": import fire fire.Fire(main)

Jump in the discussion.

No email address required.

rae 3mo ago

I'm disappointed this was selected as a quality contribution due to the litany of easily-verifiable falsehoods from the author and his refusal to correct or acknowledge them. Strangely enough, I am more upset by this than any hot-button culture war issue I've read on here. I suppose if someone's political opinion differs from mine, I can dismiss it as a matter of opinion, but when someone tells complete falsehoods about the area you work in, doubles down, and is highlighted as a quality contributor, it feels worse.

Context

gattsuru rae 3mo ago

Yeah. There's been a long-standing conflict between AAQCs not needing to be correct so long as they're positive contributions for the community. This at least looks like a serious if flawed attempt to discuss a complicated rather than active trolling, so it's far from the worst version of that issue, but the lack of engagement with even the most overt criticism of the most central claims makes it really frustrating.

TequilaMockingbird Brown-skinned Fascist MAGA boot-licker gattsuru 3mo ago · Edited 3mo ago

I feel like i addressed @rae's objections about structure and LLMs just being token predictors within the body of the text itself. Eg

most publicly available "LLMs" are not just an LLM. They are an LLM plus an additional interface layer that sits between the user and the actual language model. An LLM on its own is little more than a tool that turns words into math, but you can combine it with a second algorithm to do things like take in a block of text and do some distribution analysis to compute the most probable next word...

@self_made_human disagreed with my definition of intelligence and approach to assessing it wich is interesting from a philosophical standpoint but also kind of irrelevant in practical terms. Fact is that adapability and agentic behavior are key things to consider when discussing whether a robot can replace a human worker, or if we're going to wakeup tomorrow to find out that Claude or Grok has suddenly gone "FOOM" and turned into Skynet, and i don't think it's "hamstringing" my (or anyone else's) understanding to point that out.

@daseindustries just seems to be angry that someone would break from the rationalist consensus.

Though aditedly taking the week of the 28th off to go on vacation probably dindnt help.

rae TequilaMockingbird 3mo ago · Edited 3mo ago

I am trying my best to be charitable here, but I literally explained why that paragraph was wrong, over and over, and you... just repeated that same paragraph?

I will say it for the last time. That paragraph is pure fiction from your part. There is no interface layer, there is no second algorithm like you described, and you have completely misinterpreted how LLMs work. Ironically, that paragraph sounds like an LLM hallucination.

Am I out of bounds by saying that this is constitutes trolling at this point? This is genuinely upsetting.

Dude, look, here's code for the core functionality of a GPT2 model taken from the most simplified but still functional source I could find: https://jaykmody.com/blog/gpt-from-scratch/

This is the ENTIRE code you need to run a basic LLM (save for loading it).

Let me walk you through the important parts:

First, the prompt is encoded with a byte-pair encoding tokenizer. This groups letters together and turns them into integers to use as ids. This is just a look-up table.

The generate loop gets logits directly from running the LLM. What are logits? It's a probability that's assigned to each possible token id.

With that, you just need to take the highest value and that gives you the token.

See how the LLM directly outputted the highest probable word (or token rather)? Where is the "interface layer"? Where is the second algorithm? No such thing.

And yes, this is pretty much how ALL modern LLMs work. It's extremely simple. They just predict the next token, by themselves. All the sophisticated outputs you see arise purely out of that. THAT is the miracle that no-one could believe for a while.

ControlsFreak rae 3mo ago

When put like that, it gives the sense that one Mark Zuckerberg is seriously overpaying some recent hires.

JarJarJedi ControlsFreak 3mo ago

Pretty sure he is overpaying, but in general, the distance between a toy model and a production-quality system is gigantic, and crossing that distance is about 95% of what software developer (and managers, and project/product managers) does. The devil is always in details, and "pretty much" is not "exactly". So generally having an extremely well paid people to spend a lot of time on improving the product based on a widely known and relatively simple ideas is something that a lot of software companies do, and make a lot of money on it, and it's not stupid at all. Zuck maybe going a bit overboard with exactly how much well paid, but otherwise it's not weird at all.

Corvos rae 3mo ago

I think he’s arguing that the argmax you run over the logits is not technically part of the LLM neural network so the LLM is just ‘an algorithm that produces math’ (ie produced a probability distribution), but that seems tendentious and also kind of weirdly put because it sounds like describing a tokenizer.

What is this place?

Why are you called The Motte?

New post guidelines

Rules

Recommended Posts And Communities

Recommended Realtime Chats

Quality Contributions Report for July 2025

Quality Contributions to the Main Motte

Automatic Cognition Engines

Big Eyes, Small Mouth

Contributions for the week of June 30, 2025

Contributions for the week of July 7, 2025

Building a History

Critical Self-Reflection

Contributions for the week of July 14, 2025

Identity (?) Politics

Contributions for the week of July 21, 2025

Contributions for the week of July 28, 2025