site banner

Small-Scale Question Sunday for February 26, 2023

Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?

This is your opportunity to ask questions. No question too simple or too silly.

Culture war topics are accepted, and proposals for a better intro post are appreciated.

1
Jump in the discussion.

No email address required.

This comment I made about how hard/easy it is to find pseudonymous users online identities got me thinking about infosec.

So I spent the rest of the evening researching about OSINT tools and other methods to do bad things (This has the added benefit of implicitly letting you know how to not have bad things done to you, but knowing how to be safe won't necessarily teach you how to be dangerous).

I am not sure how feasible this is, but I checked some emails of people I know using https://haveibeenpwned.com/ and it tells you which databreach the email password combination was found in. So isn't all that remains to acquire the breached data hoping they don't use 2FA? Or am I missing something?

Anyways, back on the topic of infosec/osint, what are you favorite tools that you totally use for security reasons? I am interested in knowing any clever techniques you have heard being used or used yourself as well for/against all things infosec.

I am interested in knowing any clever techniques you have heard being used or used yourself as well for/against all things infosec.

For a very broad definition of 'clever,' I basically keep all my internet activity 'siloed' between different username/password combos that generally NEVER interact. I apply somewhat different levels of information hygiene to each.

My pseudonym on reddit was different from my youtube username is different from my motte username is different from my twitter handle is different from my Steam username. Okay, I copied my motte username over from reddit.

Sure, you could probably tie them all to a particular IP address, and then use that to make a connection to a location in realspace that roughly corresponds with my physical location, but fundamentally that is shaky evidence if you want to reasonably prove that a given comment was physically typed out by a given person. If the account in question doesn't include dozens of photographs and videos that really only could have been uploaded by me, it requires a sizeable leap to say that the comments in question couldn't have been generated by someone else.

Most of the time if someone is pilloried IRL for any comments they posted to a pseudonymous forum it means they:

A) They admitted to typing them.

B) They used the pseudo to communicate directly with another party and exposed their identity to said party (maybe even meeting up in person) such that the other person could reliably ID them as the account-holder.

C) They directly connected the pseudo to some account that was directly tied to their IRL identity.

That is, I don't think it is easy at all to tie a purely text-based interaction on a given forum via pseudonyms to a real person in a way that can't be plausibly denied, unless you're a major government or corporation that controls the person in question's account.

Here's a genuine thought I've had: if all they really have on you are internet comments associated with a psuedonym, and they have no further direct proof that you, the real person, actually typed those comments, then one can probably invoke the Shaggy Defense.

Wasn't me!

"Oh but all those personal details divulged in the comments match your profile."

And? those details are all things that someone else could emulate with minimal effort. Why do you assume those details were true about the person typing the comments?

"You saying you were framed?"

I'm saying I'm falsely accused, I never typed those comments.

"The typing style even matches posts publicly associated with you!"

Yeah, so it's wouldn't be difficult to mimic that style. Unless you're saying you saw me physically type the text I'm not sure how you're so certain you found the right person.

"Show me your browser history and I'll maybe believe you."

Nice try, but no. I think we're done here.


This defense has only gotten more plausible in an era of GPT and deepfakes, where generating absolutely gobs of convincing text would require much less effort by a savvy actor.

Of course if you've pissed off someone online enough to come and actually try to harm you or your reputation that denial might not be enough.

I have been tracked down a few times actually – though all instances but the one mentioned in the last thread have taken place before I started to pay any real attention to privacy. Virtually all of those people who have succeeded became my friends, at least for a while. Guess I was more likable back then. One of them was a fledgling teenage hacker who later started working for FSB in basically the same capacity, and knows a lot about this stuff, but we haven't been in touch in a while.

I don't believe there are very clever things one can do to ensure anonymity. (Maybe LLM instances to populate correlated but misleading online identities? Style transfer? I'll use this as soon as possible though my style is... subjectively not really a writing style in the sense of some superficial gimmicks, more like the natural shape of my thought, and I can only reliably alter it by reducing complexity and quality, as opposed to any lateral change). @gattsuru gives decent advice but, aside from those technological attack surfaces, you should just understand the threat model and not share data that meaningfully narrows down one's identity. Speaking of elder gods, Terry Tao has written on this topic:

Anonymity on the internet is a very fragile thing; every anonymous online identity on this planet is only about 31 bits of information away from being completely exposed. This is because the total number of internet users on this planet is about 2 billion, or approximately 2^{31}. Initially, all one knows about an anonymous internet user is that he or she is a member of this large population, which has a Shannon entropy of about 31 bits. But each piece of new information about this identity will reduce this entropy. For instance, knowing the gender of the user will cut down the size of the population of possible candidates for the user’s identity by a factor of approximately two, thus stripping away one bit of entropy. (Actually, one loses a little less than a whole bit here, because the gender distribution of internet users is not perfectly balanced.) Similarly, any tidbit of information about the nationality, profession, marital status, location (e.g. timezone or IP address), hobbies, age, ethnicity, education level, socio-economic status, languages known, birthplace, appearance, political leaning, etc. of the user will reduce the entropy further.

One can reveal quite a few bits of information about oneself without any serious loss to one’s anonymity; for instance, if one has revealed a net of 20 independent bits of information over the lifetime of one’s online identity, this still leaves one in a crowd of about 2^{11} \sim 2000 other people, enough to still enjoy some reasonable level of anonymity. But as one approaches the threshold of 31 bits, the level of anonymity drops exponentially fast. Once one has revealed more than 31 bits, it becomes theoretically possible to deduce one’s identity, given a sufficiently comprehensive set of databases about the population of internet users and their characteristics.

Thus, in today’s online world, a crowd of billions of other people is considerably less protection for one’s anonymity than one may initially think, and just because the first 20 or 30 bits of information you reveal about yourself leads to no apparent loss of anonymity, this does not mean that the next 20 or 30 bits revealed will do so also.

Restricting access to online databases may recover a handful of bits of anonymity, but one will not return to anything close to pre-internet levels of anonymity without extremely draconian information controls. Completely discarding a previous online identity and starting afresh can reset one’s level of anonymity to near-maximum levels, but one has to be careful never to link the new identity to the old one, or else the protection gained by switching will be lost, and the information revealed by the two online identities, when combined together, may cumulatively be enough to destroy the anonymity of both.

...one additional way to gain more anonymity is through deliberate disinformation. For instance, suppose that one reveals 100 independent bits of information about oneself. Ordinarily, this would cost 100 bits of anonymity (assuming that each bit was a priori equally likely to be true or false), by cutting the number of possibilities down by a factor of 2^{100}; but if 5 of these 100 bits (chosen randomly and not revealed in advance) are deliberately falsified, then the number of possibilities increases again by a factor of \binom{100}{5} \approx 2^{26}, recovering about 26 bits of anonymity. In practice one gains even more anonymity than this, because to dispel the disinformation one needs to solve a satisfiability problem, which can be notoriously intractible computationally, although this additional protection may dissipate with time as algorithms improve (e.g. by incorporating ideas from compressed sensing).

We've moved past compressed sensing, of course.

I wonder if this was a purely theoretical musing, a good-faith advice, or a hint that Fields laureates, too, have opinions to hide and shitposts to send. «On the internet, nobody knows you're a Tao».

I don't believe there are very clever things one can do to ensure anonymity. (Maybe LLM instances to populate correlated but misleading online identities? Style transfer? I'll use this as soon as possible though my style is... subjectively not really a writing style in the sense of some superficial gimmicks, more like the natural shape of my thought, and I can only reliably alter it by reducing complexity and quality, as opposed to any lateral change).

Reminds me of that joke about a janitor who looked exactly like Vladimir Lenin. When someone from the Competent Organs suggested that it's kinda untoward, maybe he should at least shave his beard, the guy responded that of course he could shave the beard, but what to do with the towering intellect?

This is precisely why one should regularly nuke one's internet identity. Online handles are like underwear in this regard. Have multiple, switch them often, and don't use them past their expiration date.

This goes very much against common sentiment on basically all internet communities except for the chans. Mainly because the people who will have the largest sway over the community will be those who have built a reputation for themselves. We are thus amplifying the voices of the least OPSEC conscious.

If your attacker is particularly skilled/motivated (or maybe this has changed with new tools, too lazy to duck it now), stylometry is also a hard to work around threat. It isn't as easy to use at scale (queries of the type: sort all users on Twitter whose writing most resembles this sample, descending, a la perceptual hashing), but if you can narrow down with communities that a person is likely to be a part of, it can be a pretty fast iterative search.

People particularly intent on segregating online identities often either take on affected styles (harder than it might seem at first, especially with 100% consistency!) or use a scrambling tool (rudimentary form of this used to be roundtripping translation).

If your attacker is particularly skilled/motivated (or maybe this has changed with new tools, too lazy to duck it now), stylometry is also a hard to work around threat.

If stylometry is all they have, though, surely one can simply deny being the person who posted the offending comments?

After all, in order to get a sample to compare the offending comment's style, you had to pull info that was publicly available, which would probably be available to anyone else who wanted to mimic that style.

Sure, it is more plausible that the same person produced comments of the same style. And one can always attempt to track down corroborating evidence to bolster the claim.

But by itself it has to be considered pretty weak evidence that merely because the style matches, the same person must have typed it.

It's not interesting as a proof of identity, more as an extra powerful correlation/fingerprinting attack. Consider the following scenario, you perfectly segregate two identities (separate devices, connection locations, posting times, interests) online. For some piquant, let's assume you have aboveboard beliefs/communications (posts that are kosher for your local authoritarian government) and below-board/seditious ones. Your aboveboard ones often leak your identity location, because why practice aggressive OPSEC when you're asking where's the best place to buy fresh onions near your village (even worse, aggressive OPSEC in these cases could tip off the authorities that someone buying onions around that area is up to no good!)? However, because you don't randomize your writing style, your government eventually is led to suspect that FuckTheGovernment93 is actually the same person as LocalFarmer82. You are arrested by the secret police, tortured and shipped off to a black site.

Even worse, consider a more aggressive scenario that's actually plausible in the modern age: you only have one identity that's completely distinct from your day-to-day activities. There is no other public content to compare to. However, because your government has access to your online schooling records/past essays/whatever writing you performed during mandatory schooling, they still manage to figure out FuckTheGovernment93 is you. Same outcome as above.

I am not sure how feasible this is, but I checked some emails of people I know using https://haveibeenpwned.com/ and it tells you which databreach the email password combination was found in. So isn't all that remains to acquire the breached data hoping they don't use 2FA? Or am I missing something?

Depends on the data breach. I've been pwned on gravatar, for example, but that meant that it only linked my gravatar to my e-mail. Even if a breach includes password information, it's been long-considered a best-practice to hash stored password data, and while some hashes are effectively broken (MD5), others are expensive enough to break that uncommon passwords have not yet been broken.

And some more important sites will block even correct passwords if the login is coming from a strange enough location (usually banks, some e-mail providers). And some people will change (or be forced to change) their passwords.

That said, it's usually a good practice to change passwords after a breach rather than praying.

Anyways, back on the topic of infosec/osint, what are you favorite tools that you totally use for security reasons? I am interested in knowing any clever techniques you have heard being used or used yourself as well for/against all things infosec.

Answering the explicit question:

Absolutely most vital: KeePass is an offline locally-stored password manager. People find cloud options like 1Password and LastPass more attractive because they're a little more convenient across multiple devices, but there's a reason that they have long incident reports. KeePass can't protect you from a pwned host computer, but it reduces your threat model to that. More importantly, it makes it possible to avoid or reduce password reuse across multiple sites.

Some non-SMS 2FA. I like Authy, but they're a dime-a-dozen. Not every site supports this, and even sites that do support it sometimes it's too obnoxious to be worth the effort, but for banks and e-mails you really should default to it on. Do make sure to save your emergency tokens, however; in addition to the risk of losing a cell phone, time desyncs can cause Problems.

(Open)SSH. Yes, you could do a self-hosted VPN of some kind if absolutely necessary, but it's obnoxious. SSH can quickly get ports on one machine to ports on another machine, where and when you need them, whether that's to redirect your web browser requests or something more esoteric.

Some web-scraping tools. I use a home-mixed C# abomination, some fossils just use wget, some madmen use javascript. Being able to bulk pull files down from the web and parse them locally has a ton of applications, and is really something you need to understand. More advance options exist -- SpiderFoot seems pretty popular for specifically OSINT work -- but ultimately it's just an extrapolation of existing tools.

Related, Inspection Mode for your web browser of choice. This is pretty useful for anything from bypassing paywalls to finding the underlying sources for specific media to tracing javascript. The UIs are universally bad, but you still should learn them.

The implicit one:

SHODAN or MassScan. It's so damned hard to lock down this stuff that a surprising number of places are vulnerable across a wide range of matters. You can use nmap if you're going after one or two targets, but as either black or white hat, you absolutely need to understand that these are running everywhere all the time.

Remote Access Tools. This is an ugly one because they're absolutely vital to even small-scale IT management and support. TeamViewer's an easy and relatively legit one, but SplashTop, LogMeIn, yada yada are all valid mainstream services focused on it; Guacamole and the various VNC descendants are more self-hosted options. For Virtue of Silence reasons I'm not gonna list the scuzzy ones. But they're also core to a lot of really scummy stuff, ranging from tech support scams to outright owning someone's home machine.

People find cloud options like 1Password and LastPass more attractive because they're a little more convenient across multiple devices, but there's a reason that they have long incident reports.

It's unfair to lump 1Password and LastPass together, 1Password's security record is much better.

There are key differences that make 1Password much more secure.

Yeah, of the two I've been more impressed by 1Password's model and record. If your use case makes online a requirement, it's probably better than self-hosted file transfer, if you trust 1Password.

Passwords are hard. Pwned host computer is game over for almost everyone, barring some Qubes-type VM segregation setup. The passwords need to be entered in plaintext somehow. You can limit the extent of a breach by keeping your entire password db on an offline machine and lazily QR code'ing it across to the live machine whenever it needs a refresh. Password db encrypted with a gpg smartcard is also pretty good (though not as good as the offline setup, unless you need to tap per decryption like with a Yubikey, in which case I'd rate it as only slightly inferior).

I think you forgot the most important tip however: the more secure your setup, the higher the risk of you locking yourself out of your accounts/backups/encrypted storage. Find a way to dump your secrets in plaintext that fits your threat model (all of them, including TOTP secrets - ie, what generates your 2FA codes). This might be a box in your apartment with a backup at your office, or a safety deposit box, for instance. On the other end of the paranoid spectrum, a engraved titanium plate inside a waterproof container encased inside a block of concrete dumped in the middle of a remote lake works as well.

For what it's worth, I could probably dox 3-5 regular posters with overlap on here/reddit/twitter, given say a week or two's work. If you have read 70-90% of someone's comments over the years, you can build up quite a reasonable profile on someone. For example, if you have:

-Age range

-Industry (narrowed to a few places of work)

-Location

-Interests

-Social background (schools etc)

-Ethnicity/Religion

-Sex/gender/sexuality

And at least 2 of their social media accounts, how much harder could it be to dox someone from that, without even having to use data-breaches. If you were a PI I imagine you'd begin by trawling sites like Linkedin (probably the most useful due to the breadth of information and easy access) and quite quickly finding some obvious candidates. I've always assumed I'd be relatively easy to dox and I tend on the lurker side of reddit/blogposts/twitter.

I remember there being a website that would scrape your Reddit comments and catch all the little details you let slip about you. It would quite reliably guess your age, marital status, place of residence, etc.

Strong argument for salting your comments with the occasional absolute fabrication.

I don't use any type of warez to cover myself (besides a VPN where good hygiene is advisable), due in part to sloth and also a lack of wherewithal. My justification for this is probably cope, but my layman's view is that, these days, unless you have a pretty comprehensive suite of software and are unrelentingly fastidious with your choice of hardware/setup, any government entity or motivated individual/group who wants to find you will.

As a result I try to keep myself clean with burner emails, use a new handle for every new platform (and password, which should go without saying. That said I know an embarrassing number of people who use the same pass for everything, up to and including using their ATM card's PIN for their phone) I find myself on and semi regularly (1-3 years) change up the primary screen names and PFPs I use. My last trick and the one I use the least because it discomfits me some is lying consistently about minor identifying details. The consistency of it is important as the purpose is to generate a false positive that'll show up in the kinds of datasets you were demonstrating in your previous post.

Gwern's incredible analysis on Death Note was the primary inspiration for these practices, not that I'm familiar with opsec/digital fingerprints or anywhere near important enough for someone to look for me. The idea is to just throw enough obstacles in the way that, contingent upon an amateur getting ants in the pants over my presence online, I have time to scrub what I can from the 'net (not much, in practice more than you think, so long as you aren't notorious or prolific, as you said). Then I can move my daily business over to a set of cutouts made a while back that I keep the credentials for in my safe.

I'm not anonymized from serious players but I can't play at that level anyway, so fuck it. If things get that far I'll have bigger fish to fry than that time my teenaged self wrote the n-word on a BBS for a kids show fifteen years ago. Would love it for someone with actual expertise in this field to chime in, maybe let me know how if my prophylactics are stupid or not.

I believe lying is quite effective, and there's no need to be consistent. On one of my longer-running Reddit alts, I've claimed to live in at least a dozen different cities, none of which I've ever actually been to. And made up a bunch of careers, family situations, hobbies, etc. Let any attacker read them all and try to guess which if any are real and which are fake. The bonus of being highly inconsistent is if you slip up once, there's no way for any attacker to know that that detail is real versus all the others.

I think just lying is the most effective and information theoretically robust infosec measure. I lie about minor details as well. Not egregious lies but shifting the months of events by a month or two, or claiming I went to neighboorhood X when I actually went to the identical neighboorhood 5 miles away from neighboorhood X. Dont do it much on the motte because I am confident anyone I will ever come across will not be on the Motte, but extremely useful when Im posting on my city subredit.

After reading that post I'd honestly pay one of you guys to pentest dox me, because better a friend than an attacker. Hadn't realized there were so many automated tools.

Admittedly I am terrible at doing this. My heuristic to gauge how easy someone would be to dox is;

  1. Obviously directly proportional to how much details they share about their personal life.

  2. Directly proportional to the volume of content they have online as well. It's really hard to have thousands of comments online without giving away at least some identifiable information. Combining little bits of information over a long period of time can effectively nullify not sharing detailed information.

But the above is just basic applied information theory/ deduction. I am looking to learn more but I am getting some pisstake useless stuff anywhere I look online, it's as if no one wants you to learn how to potentially do bad things, lol. Also those who do might not want to give away their tricks.

Nonetheless using the heuristic above, I'd wager 2rafas doxxer was just a standard issue doxer, but Daseindustries doxxer must have been an elder god. (Assuming both of them start from 0 information like I would have to, which often isn't the case if they are 2-3 layers of separation away from you)

Yeah, I found a relative's Reddit account just because something he wrote sounded like something he would say, and when I checked his posting history, everything matched: the city he lived in, his career, his other opinions, his family.

A few others were opportunistic. Scott Alexander it was because someone said he was easy to dox so I went looking and quickly found out who he was. A regular here I figured out because he said something about himself that was extremely specific and Googleable. He probably isn't trying to be very anonymous though. The hardest were some reddit mods of a certain subreddit where I was told that some of were involved in political parties, but eventually I found some of them describe their jobs and combining that with the cities they lived in was able to find their LinkedIn accounts.

So, I don't think I've figure it out who anyone was completely from scratch without having some reason to think I'd be able to figure it out. I've never used any fancy tools though.

It may very well have been a legit (above) average neurodivergent slav who had a bone to pick with our Russian friend. As a group, they seem to be quite handy with computers.