site banner

not-guilty is not the same as innocent

felipec.substack.com

In many discussions I'm pulled back to the distinction between not-guilty and innocent as a way to demonstrate how the burden of proof works and what the true default position should be in any given argument. A lot of people seem to not have any problem seeing the distinction, but many intelligent people for some reason don't see it.

In this article I explain why the distinction exists and why it matters, in particular why it matters in real-life scenarios, especially when people try to shift the burden of proof.

Essentially, in my view the universe we are talking about is {uncertain,guilty,innocent}, therefore not-guilty is guilty', which is {uncertain,innocent}. Therefore innocent ⇒ not-guilty, but not-guilty ⇏ innocent.

When O. J. Simpson was acquitted, that doesn’t mean he was found innocent, it means the prosecution could not prove his guilt beyond reasonable doubt. He was found not-guilty, which is not the same as innocent. It very well could be that the jury found the truth of the matter uncertain.

This notion has implications in many real-life scenarios when people want to shift the burden of proof if you reject a claim when it's not substantiated. They wrongly assume you claim their claim is false (equivalent to innocent), when in truth all you are doing is staying in the default position (uncertain).

Rejecting the claim that a god exists is not the same as claim a god doesn't exist: it doesn't require a burden of proof because it's the default position. Agnosticism is the default position. The burden of proof is on the people making the claim.

-2
Jump in the discussion.

No email address required.

A Bayesian would say that beliefs have continuous degrees, expressible on a scale from 0% to 100%.

I'm not overly familiar with the Bayesian way of thinking, I have seen it expressed very often in The Motte and similar circles, but I don't see why would anyone conclude that this is a valid way of reasoning, especially when it comes to beliefs. I do understand Bayes' theorem, and I understand the concept of updating a probability, what I don't understand is why anyone would jump to conclusions based on that probability.

Let's say through a process of Bayesian updating I arrive to a 83% probability of success, should I jump the gun? That to me is not nearly enough information.

Now let's say that if I "win" I get $100, and if I "lose" I pay $100. Well now I have a bit more information and I would say this bet is in my favor. But if we calculate the odds and adjust the numbers so that if I lose I pay $500, now it turns out that I don't gain anything by participating in this bet, the math doesn't add up: ((5 / 6) * 100) / ((1 / 6) * 500) = 1.

Even worse: let's say that if I win I get $100, but if I lose I get a bullet in my brain. I'm literally playing Russian roulette.

83% tells me absolutely nothing.

Real actions in real life are not percentages, they are: do you do it or not? and: how much are you willing to risk?

You can't say I'm 60% certain my wife is faithful, so I'm going to 40% divorce her. Either you believe something, or you don't. Period.

Even worse is the concept of the default position in Bayesian thinking, which as far as I understand it's 50%.

Edit: I mean the probability that the next coin toss is going to land heads is 50%.

So starting off if I don't know if a coin is fair or not, I would assume it is. If I throw the coin 100 times and 50 of those it lands head the final percentage is 50%. If I throw the coin 1,000,000 times and 500,000 of those times it land heads it's still 50%, so I have gained zero information. This does not map to the reality I live in at all.

My pants require at least two numbers to be measured properly, surely I can manage two numbers for a belief. So let's say before I have any evidence I believe a coin is fair 50%±50 (no idea), after throwing it a million times I would guess it's about 50%±0.01 (I'm pretty sure it's fair).

So no, I'm not sold on this Bayesian idea of a continuous belief, I can't divorce my wife 40%, or blow my brains 17%. In the real world I have to decide if I roll the dice or not.

surely I can manage two numbers for a belief.

Does this mean that what you said right after is how you would see the coin case?

If so, well, a Bayesian wouldn't use just one number here either. And it would indeed lead to having more information after throwing the coin a million times. If this doesn't go contrary to what you meant to say, ignore. If it does and you don't see why, feel free to ask a follow-up.

If so, well, a Bayesian wouldn't use just one number here either.

Do you have any source? Everyone I've debated says it's a single number: 50%.

This article in Stanford Encyclopedia of Philosophy goes to great lengths to explain why the standard view of degree of belief is limited and proposes alternative views using imprecise probabilities: Imprecise Probabilities. It seems to confirm my belief that Bayesians consider only a single probability.

Both things are true. But it's like, they'd say height is a single number but if they don't know your height precisely they'd assign it a probability distribution, which is not just a number.

So yes, Bayesians give probabilities of events as a single number. The coin would have a single number P representing its probability of the event "will land heads" when flipped.

But if the Bayesian isn't certain about what the value of that P is, and since that value has to be a single precise number, he would add a previous step to the experiment to encode his uncertainty, before the coinflips happen. A step that only answers the question: what world are we living in, one where the coin is unbiased, somewhat biased, very biased? i.e. what's the exact value of P? Here, P does not represent a probability but a random variable, with its whole probability distribution, parameterized by as many numbers as you need (typically here just a beta distribution with 2 parameters that sort of map to your two numbers in "50%±50" ).

Then as coins are flipped and you get results, this distribution of P gets updated and sooner or later it gets narrower around the real coin bias, just like you said it should happen.

This is not like stretching Bayesianism into a pretzel. It's a very canonical formulation.

What's true is that this view forces you encode your uncertainty about P through a full distribution when all you probably have is some fuzzy feeling like "50%±50". That's what iiuc those imprecise probabilities try to improve upon.

Then as coins are flipped and you get results, this distribution of P gets updated and sooner or later it gets narrower around the real coin bias, just like you said it should happen.

Are you sure about that? Maybe you consider the distribution, and maybe some Bayesians do consider the distribution, but I've debated Scott Alexander, and I'm pretty sure he used a single number to arrive to the conclusion that doing something was rational.

I've been writing about uncertainty in my substack and I've felt a substantial amount of pushback regarding established concepts such as the burden of proof, not-guilty is not the same as innocent, and the null hypothesis implies uncertainty. Even ChatGPT seems to be confused about this.

I'm pretty certain that most people--even rationalists--do not factor uncertainty by default, which is why I don't think Bayesians thoroughly consider the difference between: 0/0, 50/50, or 500/500.

Are you sure about that?

Now this one is a question that a Bayesian would typically answer with a single number. In reality it is either true or false, but due to uncertainty I'd use a single number and say something like, I'm 98% sure. This works for any event, or yes-no question. Will Russia detonate a nuclear weapon in 2023? Etc. You could say you're also giving a distribution here, but since you can only have two outcomes, once I say I'm 6% sure Russia will throw a nuke, I'm also saying 94% they won't, so the full distribution is defined by a single number.

But the coin bias in reality is not true or false. It's 45%, 50.2%, or any number 0-100, so you need a full distribution.

I've been writing about uncertainty in my substack and I've felt a substantial amount of pushback regarding established concepts such as the burden of proof, not-guilty is not the same as innocent, and the null hypothesis implies uncertainty. Even ChatGPT seems to be confused about this.

I don't know what to say, haven't read. I'll take a guess that there was taking past each other. It seems to me you think Bayesians around here don't like to consider the uncertainty behind those three concepts. But it's the other way around. Bayesians want a more fine description of the uncertainty than those 3 concepts allow. It's like with the coin example, where you suggested two numbers but a Bayesian would use a full distribution.

So, when there's a binary event, like the Russia nuke question, a Bayesian says 6% probability, but a "burden-of-proofer" may say "I think the people that claim Russia will throw a nuke have the burden of proof", a null-hypothesis-er would say "the null hypothesis is that Russia will not throw a nuke", etc. These concepts don't give the uncertainty with enough resolution for a Bayesian. They only give you 2 or 3 options: burden on one side vs the other, guilty vs innocent vs not-guilty.

But the coin bias in reality is not true or false.

I'm not asking if the coin is biased, I'm asking if the next coin flip will land heads. It's a yes-or-no question that Bayesians would use a single number to answer.

So, when there's a binary event, like the Russia nuke question, a Bayesian says 6% probability, but a "burden-of-proofer" may say "I think the people that claim Russia will throw a nuke have the burden of proof"

No, I say "I don't know" (uncertain), which cannot be represented with a single probability number.

I'm not asking if the coin is biased, I'm asking if the next coin flip will land heads. It's a yes-or-no question that Bayesians would use a single number to answer.

Yeah

But at first it seems to me you were talking about the bias and what you can learn about it from repeated tosses (and were confused in thinking Bayesians wouldn't learn).

If I throw the coin 100 times and 50 of those it lands head the final percentage is 50%. If I throw the coin 1,000,000 times and 500,000 of those times it land heads it's still 50%, so I have gained zero information.

So like we've talked, they'd use many numbers to compute the probability of the yes-no question, they just give the final answer as one number. Bayesians do consider uncertainty, to all levels they feel they need. What they don't do is give uncertainties about uncertainties in their answers. And they see the probability of next toss heads as equivalent to "how certain am I that it's going to be heads?" (to a Bayesian, probabilities are also uncertainties in their minds, not just facts about the world). Iiuc, you would be happy saying you believe the next toss has 50%±20 chances of being heads. Why not add uncertainty to the 20% too since you are not sure it should be exactly 20%, as in 50%±(20±5)%? If that feels redundand in some sense, that's how a Bayesian feels about saying "coin will come up heads, I'm 50% sure, but I'm only 30% sure of how sure I am.". If it doesn't feel redundant, add another layer until it does :P

No, I say "I don't know" (uncertain), which cannot be represented with a single probability number.

Still, I think I see your point in part. There is clearly some relevant information that's not being given in the answer if the answer to "will this fair coin land heads?", 50%, is the same as the answer given to "plc ashetn ðßh sst?" (well-posed question in a language I just invented), now a lame 50% meaning "the whaat huuhhh?".

More comments