site banner

not-guilty is not the same as innocent

felipec.substack.com

In many discussions I'm pulled back to the distinction between not-guilty and innocent as a way to demonstrate how the burden of proof works and what the true default position should be in any given argument. A lot of people seem to not have any problem seeing the distinction, but many intelligent people for some reason don't see it.

In this article I explain why the distinction exists and why it matters, in particular why it matters in real-life scenarios, especially when people try to shift the burden of proof.

Essentially, in my view the universe we are talking about is {uncertain,guilty,innocent}, therefore not-guilty is guilty', which is {uncertain,innocent}. Therefore innocent ⇒ not-guilty, but not-guilty ⇏ innocent.

When O. J. Simpson was acquitted, that doesn’t mean he was found innocent, it means the prosecution could not prove his guilt beyond reasonable doubt. He was found not-guilty, which is not the same as innocent. It very well could be that the jury found the truth of the matter uncertain.

This notion has implications in many real-life scenarios when people want to shift the burden of proof if you reject a claim when it's not substantiated. They wrongly assume you claim their claim is false (equivalent to innocent), when in truth all you are doing is staying in the default position (uncertain).

Rejecting the claim that a god exists is not the same as claim a god doesn't exist: it doesn't require a burden of proof because it's the default position. Agnosticism is the default position. The burden of proof is on the people making the claim.

-2
Jump in the discussion.

No email address required.

I don't understand what would make you think I believe that.

It's the straightforward interpretation of

they don't encode uncertainty at all.

If you wanted to say "they don't encode uncertainty-about-uncertainty in the number 0.5", and not falsely imply that they don't encode uncertainty at all (0.5 is aleatory uncertainty integrated over epistemic uncertainty!) or that they don't encode all their uncertainty anywhere, you should have made that true claim instead.

You said of "They use many numbers to compute",

They don't.

This is flatly false. I just gave you two examples, still at the "toy problem" level even, the first discretizing an infinite-dimensional probability measure and using 101 numbers to compute, the second using polynomials from an infinite-dimensional subspace to compute!

You said,

Whatever uncertainty they had at the beginning is encoded in the number 0.5.

Which is false, because it ignores that the uncertainty is retained for further updates. That uncertainty is also typically published; that posterior probability measure is found in all sorts of papers, not just those graphs you ignored in the link I gave you. I'm sorry if not everybody calling themselves "Bayesian" always does that (though since you just ignored a link which did do that, you'll have to forgive me for not taking your word about it in other cases).

You said,

My conclusion is the same: p=0.5 is useless.

This is false. p=0.5 is what you need to combine with utilities to decide how to make an optimal decision without further data. If you have one binary outcome (like the coin flip case) then a single scalar probability does it, you're done. If you have a finite set of outcomes then you need |S|-1 scalars, and if you have an infinite set of outcomes (and/or conditions, if you're allowed to affect the outcome) you need a probability measure, but these are not things that Bayesians never do, they're things that Bayesians invented centuries ago.

the result is a single value.

This is trivially true in the end with any decision-making system over a finite set of possible decisions. You eventually get to "Do X1" or "Do X2". If you couldn't come up with that index as your result then you didn't make a decision!

If maximizing expected utility, you get that value from plugging marginalized probabilities times utilities and finding a maximum, so you need those probabilities to be scalar values, so scalar values is usually what you publish for decision-makers, in the common case where you're only estimating uncertainties and you're expecting others to come up with their own estimates of utilities. If you expect to get further data and not just make one decision with the data you have, you incorporate that data via a Bayesian update, so you need to retain probabilities as values over a full measure space, and so what you publish for people doing further research is some representation of a probability distribution.

I was not the one having trouble, they were.

Your title was literally "2 + 2 is not what you think", and as an example you used [2]+[2]=[4] in ℤ/4ℤ (with lazier notation), except you didn't know that there [0]=[4] so you just assumed it was definitively "0", then you wasted a bunch of people's time arguing against undergrad group theory.

Or do you disagree that in computing you can't put infinite information in a variable of a specific type?

What I disagreed with was

one bit can only fit one bit of information. Period.

This is the foundation of information theory.

And I disagreed with it because it was flatly false! The foundation of information theory is I = -log(P); this is only I=1 (bit) if P=1/2, i.e. equal probability in the case of 1 bit of data. I gave you a case where I=1/6, and a more complicated case where I=0.58 or I=1.58, and you flatly refuted the latter case with "it cannot be more". It can. -log₂(P) does exceed 1 for P<1/2. If I ask you "are you 26 years old", and it's a lucky guess so you say "yes", you've just given me one bit of data encoding about 6 bits of information. The expected information in 1 bit can't exceed 1 (you're probably going to say "no" and give me like .006 bits of information), but that's not the same claim; you can't even calculate the expected information without a weighted sum including the greater-than-1 term(s) in the potential information.

Distinctions are important! If you want to talk like a mathematician because you want to say accurate things then you need to say them accurately; if you want to be hand-wavy then just wave your hands and stop trying to rob mathematics for more credible-sounding phrasing. The credibility does not rub off, especially if instead of understanding the corrections you come up with a bunch of specious rationalizations for why they have "zero justification".

It's the straightforward interpretation of

It's not.

I just gave you two examples,

No, you didn't. You showed how they could do it, that doesn't necessarily imply that's what they actually do in the real world.

That uncertainty is also typically published

Wrong. I'm not talking about mathematics or statistics, I'm talking about epistemology.

You don't seem to know what subject we are even talking about: bayesian epistemology. How one particular rationalist makes a decision is not published.

Even more proof that people don't even listen to what is being actually said.