site banner

Quality Contributions Report for July 2025

This is the Quality Contributions Roundup. It showcases interesting and well-written comments and posts from the period covered. If you want to get an idea of what this community is about or how we want you to participate, look no further (except the rules maybe--those might be important too).

As a reminder, you can nominate Quality Contributions by hitting the report button and selecting the "Actually A Quality Contribution!" option. Additionally, links to all of the roundups can be found in the wiki of /r/theThread which can be found here. For a list of other great community content, see here.

These are mostly chronologically ordered, but I have in some cases tried to cluster comments by topic so if there is something you are looking for (or trying to avoid), this might be helpful.


Quality Contributions to the Main Motte

@Rov_Scam:

@gattsuru:

@wemptronics:

@Dean:

Automatic Cognition Engines

@DaseindustriesLtd:

@TequilaMockingbird:

Big Eyes, Small Mouth

@raakaa:

@self_made_human:

Contributions for the week of June 30, 2025

@Rov_Scam:

@FCfromSSC:

@StJohnOfPatmos:

@CrispyFriedBarnacles:

@urquan:

Contributions for the week of July 7, 2025

@grendel-khan:

@4bpp:

@Dean:

Building a History

@naraburns:

@Hieronymus:

@MathWizard:

Critical Self-Reflection

@Clementine:

@Southkraut:

Contributions for the week of July 14, 2025

@netstack:

@OliveTapenade:

@CrispyFriedBarnacles:

@WhiningCoil:

@FiveHourMarathon:

@Sunshine:

Identity (?) Politics

@Primaprimaprima:

@CrispyFriedBarnacles:

@Southkraut:

@Hoffmeister25:

@urquan:

@WhiningCoil:

@cjet79:

@Iconochasm:

Contributions for the week of July 21, 2025

@Dean:

@quiet_NaN:

Contributions for the week of July 28, 2025

@self_made_human:

@P-Necromancer:

@ThisIsSin:

@SSCReader:

@faceh:

12
Jump in the discussion.

No email address required.

I am trying my best to be charitable here, but I literally explained why that paragraph was wrong, over and over, and you... just repeated that same paragraph?

I will say it for the last time. That paragraph is pure fiction from your part. There is no interface layer, there is no second algorithm like you described, and you have completely misinterpreted how LLMs work. Ironically, that paragraph sounds like an LLM hallucination.

Am I out of bounds by saying that this is constitutes trolling at this point? This is genuinely upsetting.

Dude, look, here's code for the core functionality of a GPT2 model taken from the most simplified but still functional source I could find: https://jaykmody.com/blog/gpt-from-scratch/

This is the ENTIRE code you need to run a basic LLM (save for loading it).

import numpy as np

def gelu(x):
    return 0.5 * x * (1 + np.tanh(np.sqrt(2 / np.pi) * (x + 0.044715 * x**3)))

def softmax(x):
    exp_x = np.exp(x - np.max(x, axis=-1, keepdims=True))
    return exp_x / np.sum(exp_x, axis=-1, keepdims=True)

def layer_norm(x, g, b, eps: float = 1e-5):
    mean = np.mean(x, axis=-1, keepdims=True)
    variance = np.var(x, axis=-1, keepdims=True)
    return g * (x - mean) / np.sqrt(variance + eps) + b

def linear(x, w, b):
    return x @ w + b

def ffn(x, c_fc, c_proj):
    return linear(gelu(linear(x, **c_fc)), **c_proj)

def attention(q, k, v, mask):
    return softmax(q @ k.T / np.sqrt(q.shape[-1]) + mask) @ v

def mha(x, c_attn, c_proj, n_head):
    x = linear(x, **c_attn)
    qkv_heads = list(map(lambda x: np.split(x, n_head, axis=-1), np.split(x, 3, axis=-1)))
    casual_mask = (1 - np.tri(x.shape[0])) * -1e10
    out_heads = [attention(q, k, v, casual_mask) for q, k, v in zip(*qkv_heads)]
    x = linear(np.hstack(out_heads), **c_proj)
    return x

def transformer_block(x, mlp, attn, ln_1, ln_2, n_head):
    x = x + mha(layer_norm(x, **ln_1), **attn, n_head=n_head)
    x = x + ffn(layer_norm(x, **ln_2), **mlp)
    return x

def gpt2(inputs, wte, wpe, blocks, ln_f, n_head):
    x = wte[inputs] + wpe[range(len(inputs))]
    for block in blocks:
        x = transformer_block(x, **block, n_head=n_head)
    return layer_norm(x, **ln_f) @ wte.T

def generate(inputs, params, n_head, n_tokens_to_generate):
    from tqdm import tqdm
    for _ in tqdm(range(n_tokens_to_generate), "generating"):
        logits = gpt2(inputs, **params, n_head=n_head)
        next_id = np.argmax(logits[-1])
        inputs = np.append(inputs, [next_id])
    return list(inputs[len(inputs) - n_tokens_to_generate :])

def main(prompt: str, n_tokens_to_generate: int = 40, model_size: str = "124M", models_dir: str = "models"):
    from utils import load_encoder_hparams_and_params
    encoder, hparams, params = load_encoder_hparams_and_params(model_size, models_dir)
    input_ids = encoder.encode(prompt)
    assert len(input_ids) + n_tokens_to_generate < hparams["n_ctx"]
    output_ids = generate(input_ids, params, hparams["n_head"], n_tokens_to_generate)
    output_text = encoder.decode(output_ids)
    return output_text

if __name__ == "__main__":
    import fire
    fire.Fire(main)

Let me walk you through the important parts:

First, the prompt is encoded with a byte-pair encoding tokenizer. This groups letters together and turns them into integers to use as ids. This is just a look-up table.

The generate loop gets logits directly from running the LLM. What are logits? It's a probability that's assigned to each possible token id.

With that, you just need to take the highest value and that gives you the token.

See how the LLM directly outputted the highest probable word (or token rather)? Where is the "interface layer"? Where is the second algorithm? No such thing.

And yes, this is pretty much how ALL modern LLMs work. It's extremely simple. They just predict the next token, by themselves. All the sophisticated outputs you see arise purely out of that. THAT is the miracle that no-one could believe for a while.

When put like that, it gives the sense that one Mark Zuckerberg is seriously overpaying some recent hires.

I think he’s arguing that the argmax you run over the logits is not technically part of the LLM neural network so the LLM is just ‘an algorithm that produces math’ (ie produced a probability distribution), but that seems tendentious and also kind of weirdly put because it sounds like describing a tokenizer.