site banner

Small-Scale Question Sunday for August 6, 2023

Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?

This is your opportunity to ask questions. No question too simple or too silly.

Culture war topics are accepted, and proposals for a better intro post are appreciated.

2
Jump in the discussion.

No email address required.

You will probably Light Yagami yourself with information you gave away about your personal life long before they can fingerprint your text.

Sure, but the point is that these methods overlap and you can use a powerful LLM to parse high-likelihood text samples for shared details (or even things like shared interests, obscure facts, specific jargon), narrowing down your list of a thousand matches. Plus the passwords/emails thing is really important, most people reuse them (at least sometimes) and there are tons of leaked lists online, with that you can chain together pseudonymous identities (automatically, right now this is still extremely labor intensive so only happens with high-profile doxxings where suspicions already exist).

And I think writing styles are more unique than you think. Specific repeated spelling mistakes, specific repeated phrases, uncommon analogies or metaphors, weird punctuation quirks. And the size of the dataset for a regular user here (many hundreds of thousands of words, in quite a few cases) is likely enough for a model tuned on it to be really good at identifying the unique writing patterns of such a user.

Okay but where is the literature? Just show me theoretically it's possible. I would do the math that supports my side of the argument, but you know.. burden of proof.

The reasons discussed in the two comments above apply even with your new scenario. I don't think you understood the core of the arguments.

Also what you are saying can be done in the present, absent of a "powerful LLM". And no it can't be done automatically anytime soon because HTTP requests are not going to have a "fast takeoff" anytime soon.