site banner
Jump in the discussion.

No email address required.

It's reasonable to express uncertainty, but for a case like this with a very limited set of possible outcomes then "I don't know" should still convert to a number.

No, it's a function, not a single number.

In fact, with maximum uncertainty, 50% is correct: If your distribution over the true probabilities is uniform, then integrating over that distribution gives your subjective probability of heads as 1/2.

No, if it's a uniform distribution you can calculate the probability that the actual probability is between 45% and 55%: 10%. For me 10% is very unlikely.

But the probability that the actual probability is between 90% and 100% is equally likely: 10%.

On the other hand, if you've flipped a lot of coins and you know that most coins are fair, then seeing 8 heads shouldn't move the needle much, so the answer might not be exactly 50% but it would be quite close.

You are confusing the most likely probability with "the answer". The most likely probability is close to 50%, yeah, but that's not the answer. The answer is a function. Given that function you can calculate the probability that the actual probability is between 45% and 55%, and given that the most likely probability is in this range, the likelihood is going to be high, but there's a non-zero probability that the true probability lies outside that range.

Probabilities of probabilities should make anyone question their own certainty on "the answer".

If you have a distribution over a probability of an outcome, it's entirely valid to integrate over that density and get a single number for the probability of the outcome. This is done all the time in probability. In fact, this works for any parameter: If you have a probability distribution Y for the mean of a random variable X with standard deviation 1, for example, then you can compute the average value of X. Specifically, the average of each of the possible normal distributions, weighted by how likely that distribution is according to Y. (The exact interpretation of what this process means depends on your interpretation of probability; for the first case, a frequentist would say something about flipping many coins, where the probability of heads for each is selected from the distribution, while a Bayesian would say something about your subjective belief. But the validity of this process can be confirmed by doing some calculus, or by running simulations if you're better at programming than math).

If you have a distribution over a probability of an outcome, it's entirely valid to integrate over that density and get a single number for the probability of the outcome.

You get the probability that the actual probability is on that region, but it's never 100%.

In fact, this works for any parameter: If you have a probability distribution Y for the mean of a random variable X with standard deviation 1, for example, then you can compute the average value of X.

But the average value is not necessarily "the answer".

You get the probability that the actual probability is on that region, but it's never 100%.

I have no idea what you're trying to say here. If you have a distribution for the probability of heads, you can calculate the probability of getting heads. For any symmetrical distribution, it will be 50%, reflecting the fact that you have no reason to favor heads over tails.

Think about it this way: Suppose that you have a much simpler distribution over p, the probability of heads, where it's 0.4 with probability 0.3, otherwise 0.7 Then by the law of total probability, the probability of heads is (probability of heads given p=0.4)*(probability p = 0.4) + (probability of heads given p = 0.7) * (probability p = 0.7) which is clearly 0.12 + 0.49 = 0.61. You might note this is also the expected value of p; in the continuous case, we would use the formula integral_0^1 xf(x) dx where f is the PDF. For your solution, Beta(9, 3), this is just 9/12 = 0.75. This is basically the same example as at the top of https://en.wikipedia.org/wiki/Law_of_total_expectation#Example

But the average value is not necessarily "the answer".

I never said it was? It was just another example where you can compute a specific property of the underlying random variable, given a distribution on one of its parameters.

If you have a distribution for the probability of heads, you can calculate the probability of getting heads.

Actually you can't. I don't think you quite understand the point. I can program a f() function that return heads p percent of the time. How many results do you need to accurately "calculate the probability of getting heads"?

Suppose that you have a much simpler distribution over p, the probability of heads, where it's 0.4 with probability 0.3, otherwise 0.7

OK.

You might note this is also the expected value of p

Yes, but the "expected value" is not "the answer".

I programmed your example of 0.3*0.4/0.7*0.7 as g(0.3), let's say that the threshold t in this case is 0.3, but I choose a different threshold for comparison and I run the function 10 times. Can you guess which results are which?

  1. [0.7, 0.4, 0.7, 0.7, 0.7, 0.7, 0.4, 0.7, 0.7, 0.4]

  2. [0.7, 0.7, 0.7, 0.7, 0.4, 0.7, 0.7, 0.4, 0.7, 0.7]

Which is g(0.3), which is g(t), and what do you guess is the value of t I choose?

How many results do you need to accurately "calculate the probability of getting heads"?

What is "accurately"? The method I described will give the correct probability given all of the information available. As you gather more information, the probability changes. Are you getting confused between probability and statistics?

Actually you can't.

Yes, you can. I just gave you a complete worked example.

Yes, but the "expected value" is not "the answer".

In this case, it is. In fact, in any case where you have binary outcome and independent events, the expected number of successes is equal to the p*the number of trials. In the special case of n=1 coin flip, we have E(number of heads) = p. See https://en.wikipedia.org/wiki/Binomial_distribution

Can you guess which results are which?

  1. is more likely to be 0.3 because the portion of 0.4 outcomes is 0.3, and in 2) it's 0.2, so the estimate for t is 0.2. Those are the choices which maximize the likelihood of the data. But I'm not going to do a bunch of math to figure out the exact probabilities (and there might not even be a closed form solution); what's the point of all this?

The method I described will give the correct probability given all of the information available.

It won't.

In this case, it is.

It's not.

is more likely to be 0.3

Yes, but it is not. You got it wrong.

so the estimate for t is 0.2

But it is not 0.2.


This is the whole point of the article: to raise doubt. But you are not even considering the possibility that you might be wrong, I bet even when I'm telling you the values of t in those examples are not the ones you guessed, you will still not consider the possibility that you are wrong, even when the answers are objectively incorrect.

It won't.

It will.

It's not.

It is.

It seems like you don't know enough probability to really have this discussion. "The expected number of heads from 1 flip is equal to the probability of heads" is a trivial calculation.

es, but it is not. You got it wrong.

That's not how probability works. I said it was more likely and that statement is correct. Just like how in your original coin flip example, it is more likely that p is 0.8 than 0.3, but if it really was 0.3 and you just got unlucky your answer would not have been "wrong" because that's not how probabilistic statements are judged.

But it is not 0.2.

Do you know not what an estimate is?

But you are not even considering the possibility that you might be wrong, I bet even when I'm telling you the values of t in those examples are not the ones you guessed, you will still not consider the possibility that you are wrong, even when the answers are objectively incorrect.

This blatant strawmanning isn't helping your case. The statements I made above about probability are correct, they are fairly basic mathematical ideas that get used all the time. If you think they're wrong, provide an actual argument. I have never said anything like "t is definitely 0.2" nor do I care what its value is, because it's an irrelevant exercise. Have you considered that you might be wrong?