site banner

Friday Fun Thread for April 17, 2026

Be advised: this thread is not for serious in-depth discussion of weighty topics (we have a link for that), this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.

1
Jump in the discussion.

No email address required.

Do you see the average human seeing a random reddit comment from 5 years ago and then pinning it on the right person, and associating it with their other work?

If that person had a searchable database? Sure.

It's possible I'm just entirely misunderstanding how they work down in the guts, but I interpreted the task as something like "I searched my database of training data, found the exact post, and replied with the linked username", which is powerful and "superhuman" in an objective sense, but the sort of thing I would have expected from Google search 15 years ago, pre-enshittification.

Identifying you by new writing would be much more impressive and alarming, and it sounds like they can actually do that for people like Scott, from some of the other posts people have made.

Identifying you by new writing would be much more impressive and alarming, and it sounds like they can actually do that for people like Scott, from some of the other posts people have made.

It got me from old posts in a forum that probably wasn't in the training data, and (admittedly with four other guesses) from old drafts that I never published anywhere before today, and a quick test with a post from five days ago (admittedly, a pretty easy one... for someone who knows a lot about TheMotte) shows success, too.

I could probably write a long-post later this week and try again, but I don't expect to have time before Thursday.

It knew it was from The Motte which reduces potential author count from a billion down to (realistically) less than fifty regular posters. I think that’s slightly burying the lede here.

Fair. That said, I took this post, removed any links to TheMotte, my personal blog, or non-mainstream sources, and it still guessed me at TheMotte, though. And while Monika's a little closer to my interests than abortion law, it's not one of my mainstays.

I am still unable to exclude bleed from one context to the next, configuration error, or be confident of its compliance with the 'don't search the web' toggle, though.

EDIT: repeat without the spoiler marks didn't get to TheMotte specifically (and misidentified me as David Hunt? Who I don't even recognize). But still got my screenname as a most likely candidate, albeit along with some hilariously wrong ones.

Models don't have access to training data at inference time.

Then, if it doesn't have access to internet search, how is it looking things up?

If I gave you a snippet of Shakespeare and asked you to guess who wrote it, I expect the Bard would be one of your top choices. How are you doing that if you don't consult Google or your Shakespeare box set?

Each token that the model sees in training updates its view of what sort of things are associated and in what way. Elements of style or topics may be clustered somewhere in the high dimensional latent space with the corresponding authors.