 # Dismantling pseudo-skepticism

##### Jump in the discussion.

The level N skeptic achieves true enlightment by admitting that all he knows is that he knows nothing.

So then is the level 3 skeptic the one who points out that the answer to the coin problem depends on your prior? The answer given, where the probability of success follows Beta(successes, trials), is more rigorously derived by taking Beta(1,1) as the prior (which is the same as the uniform distribution); see mathematical details at https://en.wikipedia.org/wiki/Rule_of_succession. The upshot is that if you have seen many coins before, and they resemble the coin you have right now, then that is evidence that the probability of heads is similar to the probability of heads for those other coins, so the prior is different.

Yes, but to point out what the true answer depends on, a level-3 skeptic has to first doubt the problem. A level-3 skeptic might not know the answer, but it's better to say "I don't know", than what some confident people would automatically say: 50%.

It's reasonable to express uncertainty, but for a case like this with a very limited set of possible outcomes then "I don't know" should still convert to a number. In fact, with maximum uncertainty, 50% is correct: If your distribution over the true probabilities is uniform, then integrating over that distribution gives your subjective probability of heads as 1/2. On the other hand, if you've flipped a lot of coins and you know that most coins are fair, then seeing 8 heads shouldn't move the needle much, so the answer might not be exactly 50% but it would be quite close.

It's reasonable to express uncertainty, but for a case like this with a very limited set of possible outcomes then "I don't know" should still convert to a number.

No, it's a function, not a single number.

In fact, with maximum uncertainty, 50% is correct: If your distribution over the true probabilities is uniform, then integrating over that distribution gives your subjective probability of heads as 1/2.

No, if it's a uniform distribution you can calculate the probability that the actual probability is between 45% and 55%: 10%. For me 10% is very unlikely.

But the probability that the actual probability is between 90% and 100% is equally likely: 10%.

On the other hand, if you've flipped a lot of coins and you know that most coins are fair, then seeing 8 heads shouldn't move the needle much, so the answer might not be exactly 50% but it would be quite close.

You are confusing the most likely probability with "the answer". The most likely probability is close to 50%, yeah, but that's not the answer. The answer is a function. Given that function you can calculate the probability that the actual probability is between 45% and 55%, and given that the most likely probability is in this range, the likelihood is going to be high, but there's a non-zero probability that the true probability lies outside that range.

Probabilities of probabilities should make anyone question their own certainty on "the answer".

If you have a distribution over a probability of an outcome, it's entirely valid to integrate over that density and get a single number for the probability of the outcome. This is done all the time in probability. In fact, this works for any parameter: If you have a probability distribution Y for the mean of a random variable X with standard deviation 1, for example, then you can compute the average value of X. Specifically, the average of each of the possible normal distributions, weighted by how likely that distribution is according to Y. (The exact interpretation of what this process means depends on your interpretation of probability; for the first case, a frequentist would say something about flipping many coins, where the probability of heads for each is selected from the distribution, while a Bayesian would say something about your subjective belief. But the validity of this process can be confirmed by doing some calculus, or by running simulations if you're better at programming than math).

If you have a distribution over a probability of an outcome, it's entirely valid to integrate over that density and get a single number for the probability of the outcome.

You get the probability that the actual probability is on that region, but it's never 100%.

In fact, this works for any parameter: If you have a probability distribution Y for the mean of a random variable X with standard deviation 1, for example, then you can compute the average value of X.

But the average value is not necessarily "the answer".

You get the probability that the actual probability is on that region, but it's never 100%.

I have no idea what you're trying to say here. If you have a distribution for the probability of heads, you can calculate the probability of getting heads. For any symmetrical distribution, it will be 50%, reflecting the fact that you have no reason to favor heads over tails.

Think about it this way: Suppose that you have a much simpler distribution over p, the probability of heads, where it's 0.4 with probability 0.3, otherwise 0.7 Then by the law of total probability, the probability of heads is (probability of heads given p=0.4)*(probability p = 0.4) + (probability of heads given p = 0.7) * (probability p = 0.7) which is clearly 0.12 + 0.49 = 0.61. You might note this is also the expected value of p; in the continuous case, we would use the formula integral_0^1 xf(x) dx where f is the PDF. For your solution, Beta(9, 3), this is just 9/12 = 0.75. This is basically the same example as at the top of https://en.wikipedia.org/wiki/Law_of_total_expectation#Example

But the average value is not necessarily "the answer".

I never said it was? It was just another example where you can compute a specific property of the underlying random variable, given a distribution on one of its parameters.

If you have a distribution for the probability of heads, you can calculate the probability of getting heads.

Actually you can't. I don't think you quite understand the point. I can program a `f()` function that return heads `p` percent of the time. How many results do you need to accurately "calculate the probability of getting heads"?

Suppose that you have a much simpler distribution over p, the probability of heads, where it's 0.4 with probability 0.3, otherwise 0.7

OK.

You might note this is also the expected value of p

Yes, but the "expected value" is not "the answer".

I programmed your example of `0.3*0.4/0.7*0.7` as `g(0.3)`, let's say that the threshold `t` in this case is 0.3, but I choose a different threshold for comparison and I run the function 10 times. Can you guess which results are which?

1. `[0.7, 0.4, 0.7, 0.7, 0.7, 0.7, 0.4, 0.7, 0.7, 0.4]`

2. `[0.7, 0.7, 0.7, 0.7, 0.4, 0.7, 0.7, 0.4, 0.7, 0.7]`

Which is `g(0.3)`, which is `g(t)`, and what do you guess is the value of `t` I choose?

How many results do you need to accurately "calculate the probability of getting heads"?

What is "accurately"? The method I described will give the correct probability given all of the information available. As you gather more information, the probability changes. Are you getting confused between probability and statistics?

Actually you can't.

Yes, you can. I just gave you a complete worked example.

Yes, but the "expected value" is not "the answer".

In this case, it is. In fact, in any case where you have binary outcome and independent events, the expected number of successes is equal to the p*the number of trials. In the special case of n=1 coin flip, we have E(number of heads) = p. See https://en.wikipedia.org/wiki/Binomial_distribution

Can you guess which results are which?

1. is more likely to be 0.3 because the portion of 0.4 outcomes is 0.3, and in 2) it's 0.2, so the estimate for t is 0.2. Those are the choices which maximize the likelihood of the data. But I'm not going to do a bunch of math to figure out the exact probabilities (and there might not even be a closed form solution); what's the point of all this?

Thanks for writing this. I found it accessible, despite being fairly weak on stats (though I do remember what a beta distribution is).

Your piece has a vibe of a warning for young rationalists that goes something like, "Beware, for not all who claim to be skeptics are ones." Would you say this is a correct interpretation?

"Beware, for not all who claim to be skeptics are ones."

Yes. Many people who claim to be skeptics actually are being skeptical in many claims, but the point of calling yourself "skeptic" is that you are being skeptical in all of them (or close to 100%). You can't call yourself a "peaceful" person if there are enough times you've reacted violently.

Very good, although I would quibble with the wording in a few places, e.g. -

"A meta-skeptic should doubt everything"

I would put it as Hume did when discussing miracles: "A wise man proportions his belief to his evidence." Evidence is never conclusive, but it can be stronger or weaker. The coin toss observations favour the hypothesis that the coin is biased towards heads, but not to an extent that can't be easily dismissed as random error.

Professional skeptics tend to focus on easy cases where credulity goes wrong, which encourages the conflation of skepticism and denial that you describe.

I would put it as Hume did when discussing miracles: "A wise man proportions his belief to his evidence." Evidence is never conclusive, but it can be stronger or weaker.

Indeed. This is a point I often emphasize in debates. The quote "absence of evidence is not evidence of absence" is wrong because it is evidence, but people often confuse evidence with proof.

But I don't see evidence as a continuum, I see certainty as a continuum. I would say for example "I believe the coin is biased with 95% certainty". 50% certainty means no belief one way or the other. This is a matter of semantics of course.

In the end what "true skeptics" should agree is that 100% certainty is not characteristic of skepticism.

Yes, and some Bayesians would even distinguish between e.g. 50% certainty in the coin landing heads on the next toss after 50 heads and 50 tails from your rational beliefs before testing the coin at all. They would model the latter with a convex set of different beta distribution priors (some very biased to heads, some very biased to tails) and the former as the beta posteriors after using your observations of the 100 coin tosses to do Bayesian updating on each element in that set. I'm not persuaded by this "Imprecise Bayesianism," but I agree that it's a useful distinction.

https://plato.stanford.edu/entries/imprecise-probabilities/

Yes, and some Bayesians would even distinguish between e.g. 50% certainty in the coin landing heads on the next toss after 50 heads and 50 tails from your rational beliefs before testing the coin at all.

You can use the beta distribution to calculate the probability that the actual probability is between 45% and 55% given 50H/50T, and it's around 70%: graph. So in that case I would say I believe the coin is fair with 70% certainty. With 0H/0T it's around 10%.

The more tosses the more likely the actual probability is between a certain range, so the more "precise" it should be.

https://plato.stanford.edu/entries/imprecise-probabilities/

Articles from Stanford Encyclopedia of Philosophy are very interesting, but way too complicated for me. This article is no exception, very interesting, but my point is much more general.

By using probability I'm not trying to find an accurate value of belief, what I'm trying to do is show is that even in simple questions people have an unwarranted level of certainty, even people who call themselves "skeptics".

Sorry, wasn't meant as a critique: just something else that is interesting to think about.

Yes. I didn't consider it a critique. I think we are talking about the same thing except at different levels, like those Wired videos of explaining one concept "in 5 levels of difficulty".