site banner
Jump in the discussion.

No email address required.

So then is the level 3 skeptic the one who points out that the answer to the coin problem depends on your prior? The answer given, where the probability of success follows Beta(successes, trials), is more rigorously derived by taking Beta(1,1) as the prior (which is the same as the uniform distribution); see mathematical details at https://en.wikipedia.org/wiki/Rule_of_succession. The upshot is that if you have seen many coins before, and they resemble the coin you have right now, then that is evidence that the probability of heads is similar to the probability of heads for those other coins, so the prior is different.

Yes, but to point out what the true answer depends on, a level-3 skeptic has to first doubt the problem. A level-3 skeptic might not know the answer, but it's better to say "I don't know", than what some confident people would automatically say: 50%.

It's reasonable to express uncertainty, but for a case like this with a very limited set of possible outcomes then "I don't know" should still convert to a number. In fact, with maximum uncertainty, 50% is correct: If your distribution over the true probabilities is uniform, then integrating over that distribution gives your subjective probability of heads as 1/2. On the other hand, if you've flipped a lot of coins and you know that most coins are fair, then seeing 8 heads shouldn't move the needle much, so the answer might not be exactly 50% but it would be quite close.

It's reasonable to express uncertainty, but for a case like this with a very limited set of possible outcomes then "I don't know" should still convert to a number.

No, it's a function, not a single number.

In fact, with maximum uncertainty, 50% is correct: If your distribution over the true probabilities is uniform, then integrating over that distribution gives your subjective probability of heads as 1/2.

No, if it's a uniform distribution you can calculate the probability that the actual probability is between 45% and 55%: 10%. For me 10% is very unlikely.

But the probability that the actual probability is between 90% and 100% is equally likely: 10%.

On the other hand, if you've flipped a lot of coins and you know that most coins are fair, then seeing 8 heads shouldn't move the needle much, so the answer might not be exactly 50% but it would be quite close.

You are confusing the most likely probability with "the answer". The most likely probability is close to 50%, yeah, but that's not the answer. The answer is a function. Given that function you can calculate the probability that the actual probability is between 45% and 55%, and given that the most likely probability is in this range, the likelihood is going to be high, but there's a non-zero probability that the true probability lies outside that range.

Probabilities of probabilities should make anyone question their own certainty on "the answer".

If you have a distribution over a probability of an outcome, it's entirely valid to integrate over that density and get a single number for the probability of the outcome. This is done all the time in probability. In fact, this works for any parameter: If you have a probability distribution Y for the mean of a random variable X with standard deviation 1, for example, then you can compute the average value of X. Specifically, the average of each of the possible normal distributions, weighted by how likely that distribution is according to Y. (The exact interpretation of what this process means depends on your interpretation of probability; for the first case, a frequentist would say something about flipping many coins, where the probability of heads for each is selected from the distribution, while a Bayesian would say something about your subjective belief. But the validity of this process can be confirmed by doing some calculus, or by running simulations if you're better at programming than math).

If you have a distribution over a probability of an outcome, it's entirely valid to integrate over that density and get a single number for the probability of the outcome.

You get the probability that the actual probability is on that region, but it's never 100%.

In fact, this works for any parameter: If you have a probability distribution Y for the mean of a random variable X with standard deviation 1, for example, then you can compute the average value of X.

But the average value is not necessarily "the answer".

You get the probability that the actual probability is on that region, but it's never 100%.

I have no idea what you're trying to say here. If you have a distribution for the probability of heads, you can calculate the probability of getting heads. For any symmetrical distribution, it will be 50%, reflecting the fact that you have no reason to favor heads over tails.

Think about it this way: Suppose that you have a much simpler distribution over p, the probability of heads, where it's 0.4 with probability 0.3, otherwise 0.7 Then by the law of total probability, the probability of heads is (probability of heads given p=0.4)*(probability p = 0.4) + (probability of heads given p = 0.7) * (probability p = 0.7) which is clearly 0.12 + 0.49 = 0.61. You might note this is also the expected value of p; in the continuous case, we would use the formula integral_0^1 xf(x) dx where f is the PDF. For your solution, Beta(9, 3), this is just 9/12 = 0.75. This is basically the same example as at the top of https://en.wikipedia.org/wiki/Law_of_total_expectation#Example

But the average value is not necessarily "the answer".

I never said it was? It was just another example where you can compute a specific property of the underlying random variable, given a distribution on one of its parameters.

If you have a distribution for the probability of heads, you can calculate the probability of getting heads.

Actually you can't. I don't think you quite understand the point. I can program a f() function that return heads p percent of the time. How many results do you need to accurately "calculate the probability of getting heads"?

Suppose that you have a much simpler distribution over p, the probability of heads, where it's 0.4 with probability 0.3, otherwise 0.7

OK.

You might note this is also the expected value of p

Yes, but the "expected value" is not "the answer".

I programmed your example of 0.3*0.4/0.7*0.7 as g(0.3), let's say that the threshold t in this case is 0.3, but I choose a different threshold for comparison and I run the function 10 times. Can you guess which results are which?

  1. [0.7, 0.4, 0.7, 0.7, 0.7, 0.7, 0.4, 0.7, 0.7, 0.4]

  2. [0.7, 0.7, 0.7, 0.7, 0.4, 0.7, 0.7, 0.4, 0.7, 0.7]

Which is g(0.3), which is g(t), and what do you guess is the value of t I choose?

How many results do you need to accurately "calculate the probability of getting heads"?

What is "accurately"? The method I described will give the correct probability given all of the information available. As you gather more information, the probability changes. Are you getting confused between probability and statistics?

Actually you can't.

Yes, you can. I just gave you a complete worked example.

Yes, but the "expected value" is not "the answer".

In this case, it is. In fact, in any case where you have binary outcome and independent events, the expected number of successes is equal to the p*the number of trials. In the special case of n=1 coin flip, we have E(number of heads) = p. See https://en.wikipedia.org/wiki/Binomial_distribution

Can you guess which results are which?

  1. is more likely to be 0.3 because the portion of 0.4 outcomes is 0.3, and in 2) it's 0.2, so the estimate for t is 0.2. Those are the choices which maximize the likelihood of the data. But I'm not going to do a bunch of math to figure out the exact probabilities (and there might not even be a closed form solution); what's the point of all this?
More comments